Monday, December 5, 2016

The Intelligence Framework Update

Note: This is a guest blog post by Jan Grashöfer, the original post may be found here

Recently Bro's intelligence framework was refactored and extended with a couple of new features. This post will discuss the updates and tries to clear some of the backgrounds that turned out to be common pitfalls in the past.

The Intelligence Framework Data Model


Understanding the intel framework's data model is the key for exploiting its full potential, so let's have a closer look: The core of an intelligence datum is the indicator (also indicator of compromise, IoC), e.g. an IP, hash or domain name (for a list of available types see Bro's script reference). The indicator can be enriched by meta data of different kinds, e.g. a description, url or severity level. The same indicator can be obtained from different intelligence sources, providing different meta data. Thus in Bro's intelligence framework, a plain indicator can be described by multiple meta data records. A meta data record is uniquely identified by its source. Figure 1 illustrates the described relation.















Figure 1: Data Model of Bro's Intelligence Framework

Keeping this in mind now let's have a look at the intel files. Each line represents what is called an intelligence item (Intel::Item). An intelligence item consists of the indicator, the indicator's type and a meta data record (fields prefixed by meta.), including the meta data source. In terms of the data model this is equal to the internal representation with n=1. So what about the relations? The second thing to keep in mind regarding the intelligence framework's data model is the fact that intelligence files are only the supply mechanism to feed intelligence data into Bro. The "database" Bro uses for matching is kept in memory following the described model. So how do they interact? Bro uses the input framework to read intelligence files. Each line triggers a corresponding insert into the in-memory data structure. Imagine the same indicator was obtained from two different sources, each supplying different meta data. Thus the same indicator occurs multiple times (in a single file or in different files). Ingesting the files, Bro will store the indicator only once, associate both meta data records to it but will not duplicate it (see Figure 2).

#field indicator indicator_type meta.source meta.desc meta.url
bro.org Intel::DOMAIN test-source1 domain for testing http://source1.com/id-x
bro.org Intel::DOMAIN test-source2 domain for testing http://source2.com/id-y












Figure 2: Internal representation of the example items

In case a file is changed (Note: Changes have to be atomic e.g. using mv), Bro will reread the whole file and update the in-memory data structure. Changing some meta data values will cause a corresponding update but changing the meta data source, Bro will assume the indicator was obtained again from a new source, causing Bro to add another meta data record and assign it to the given indicator. Likewise Bro will add a new intelligence datum if the indicator or indicator type was changed, while keeping the original item in the in-memory data structure. Accordingly deleting a line from an intel file will not delete the corresponding intelligence item from Bro's database (see the next section on how to get rid of inserted intelligence data). This means in particular that the intelligence files on disk do not necessarily reflect the actual database Bro uses for matching.

Now that we have the intelligence indicators at hand, let's have a quick look how matching works. In theory, every piece of data, that is made available by Bro's events, can be used for matching. Once there is something that should be checked against the database of indicators, the datum is wrapped inside a Intel::Seen record and sent to the intel framework. The record contains the seen indicator, its type and additional information, e.g. where the indicator was seen. Bro comes with a set of policy scripts located in intel/seen/ that report indicators by evaluating well-known events. For example connection_established provides IP addresses or dns_request is used to extract domains. Figure 3 illustrates the data flow.















Figure 3: Data Flow of Bro's Intelligence Framework

Finally there is one detail left, which might be not that intuitive. When it comes to interacting with the intelligence framework, most of the functions, hooks and events use Intel::Item to pass information about indicators. In case an indicator is associated with more than one meta data records, it will be unrolled into a set of multiple items. For example the Intel::match event's items will contain an Intel::Item record for every meta data record that is associated with the matched indicator. So we have been talking about a yet simple but still relational data model and every time it is accessed it gets denormalized. Doesn't seem very smart, right? The reason behind is that the import (intel files) as well as the output (Bro's logfiles) is based on CSV-like plain text files (although writing JSON is possible, nested structures are not supported). All in all the intelligence framework's design realizes an easy to use interface while providing as much flexibility as possible. Theory done. Time for some new features.

Removing Intelligence Items


Prior to the refactoring, the only way to get rid of an intelligence item was to whitelist it using Seth Hall's intel-extensions (we will come back to that). The only way to purge an item from the in-memory datastructure was to restart Bro. In case of long running live systems or frequently changing intelligence data that was a major handicap. In context of the framework update a new function has been added: remove: remove: function(item: Item, purge_indicator: bool &default = F); The Intel::Item type represents a single line of an intelligence file and thus just contains a single meta data record. But keeping in mind the data model there might be multiple meta data records associated to an indicator. In this case, only the meta data record matching the given meta data will be deleted. As meta data records are identified by source, it would be sufficient to specify only the source name inside the item that is passed to the function. In case there is no meta data left, the whole indicator is removed. If purge_indicator is set, the given metadata is ignored and the indicator is removed including all possible instances of meta data associated.

In principal the new remove function allows any script to delete an intel item. Imagine you have accidentally added your webserver's IP and alerts start flooding. Now a small tool would be great to remove that IP from your Bro instance without shutting down Bro. These extensions contain a small python script (utils/delete_intel.py) that connects to Bro using broker and triggers item removal. The corresponding Bro script (scripts/remote_delete.bro) sets up broker and handles incoming deletion requests (Note: As broker is under development, there is a high probability that the scripts do not work with current master as of reading these lines):

event Intel::remote_remove(indicator: string, indicator_type: string)
    {
    local item: Item = [
        $indicator = indicator,
        $indicator_type = type_tbl[indicator_type],
        $meta = record($source = "")
    ];
    remove(item, T);
    }
The only thing that's done here is the composition of an Intel::Item record using the values sent by the python script to call the remove function (type_tbl is a string-indexed table to map a string to the corresponding Intel::Type). Instead of this broker-based solution, one could also write a script, analog to the intelligence import, that reads files containing indicators to delete. While these possibilities are already quite useful, the following new feature provides another excellent use case. So let's continue.

Intelligence Expiration


Intelligence expiration is the new feature I like most. Imagine we are ingesting a large intelligence feed of probably bad IPs into Bro. On the first day there is a hit, that indicates some malware is calling home. On the second day there is nothing. But on the third day the owner of the IP changed (think of agile cloud environments) and the system behind now offers a legitimate service. As users start to use that service, false positives pop up. The bottom line is that most intelligence data has a natural half life (e.g. hashes might be an exception here). So let's put an expiration date on it.

Bro's intelligence framework now allows to configure the Intel::item_expiration interval. Once an indicator expires, the intel framework executes the item_expired hook passing the indicator, its type and the associated meta data as arguments. The hook can be used to handle expiration. By default the intel framework won't do anything except executing the hook, so we are free to use that mechanism for whatever we like. But in case the hook chain is broken (see Bro's script reference for details about hooks), the expired indicator will be removed automatically. Coming back to the IP feed example, all we need to do is configuring the expiration interval and break the hook chain to remove expired indicators. As this is somewhat the default case, Bro ships with a new policy script do_expire.bro:

##! This script enables expiration for intelligence items.

@load base/frameworks/intel

module Intel;

redef Intel::item_expiration = 10min;

hook item_expired(indicator: string, indicator_type: Type,
    metas: set[MetaData]) &priority=-10
    {
    # Trigger removal of the expired item.
    break;
    }
So all that is left to do for us is loading that script and adapt the expiration interval according to our needs:

@load intel/do_expire
redef Intel::item_expiration = 2days;
Neat, isn't it? Something to keep in mind is, that expiration time runs as soon as an item is inserted into Bro. In case the item is "reinserted", the expiration timer is reset. Note: Whenever an intelligence file changes, all items listed in the file are reinserted! Technically this allows to keep the intel files and the intel database inside Bro in sync. For example one could define an expiration interval of 1 hour plus 30 minutes buffer. Now scheduling an update of the intel files every hour would cause an expiration timer reset of all indicators corresponding to items contained in the files, while indicators of items that have been removed from the files will expire in the given time frame.

Extending the Intelligence Framework


The next feature that is worth to discuss is the new extension mechanism. To be precise this feature is not completely new as it is based on the intel extensions created by Seth Hall (see https://github.com/sethhall/intel-ext). The idea is to allow reacting to an intelligence match. As it turned out to be very useful this was integrated into the intelligence framework. Now it is possible to influence the framework's matching behavior via the extend_match hook. The hook receives the info record to log, the seen record that was observed and a set of items that have been matched (remember, the set of items is the unrolled internal representation). A hook may change these values and thus can influence what is logged. Additionally breaking the hook chain will prevent the intelligence framework from logging the match at all. A good example how to use the new mechanism would be whitelisting indicators. That means indicators are kept in memory for matching but logging of matches gets prevented. Bro already ships with a policy script (whitelist.bro, see Bro repository) that implements whitelisting. But as this was already part of Seth's intel extensions, let's discuss another thing that can be achieved using the new extension mechanism.

A desirable functionality would be to allow enriching item's meta data with some extra information, aggregate this information in case of a match and extend the intel log accordingly. For example one could add identifiers for the local Security Information and Event Management system (SIEM). The following script does the job:

module Intel;

export {
    redef record MetaData += {
        ## My SIEM identifier.
        siem_id: string &optional;
    };

    redef record Info += {
        ## Set of SIEM IDs involved.
        siem_ids: set[string] &optional &log;
    };
}

hook extend_match(info: Info, s: Seen, items: set[Item])
    {
    info$siem_ids = set();
    for ( item in items )
        {
        if ( item$meta?$siem_id )
            add info$siem_ids[item$meta$siem_id];
        }
    }
At first the Intel::MetaData record is extended with a SIEM identifier. Then the Intel::Info record is extended to allow logging a set of SIEM identifiers (keep in mind that a single indicator could have been obtained from multiple sources resulting into multiple meta data records associated). Note that the fields added to the info record have to be optional or defined with a default value as the record gets created by the intel framework, which does not know about the field. Finally the hook implements the aggregation logic. It initializes the set, loops the matched items and adds the SIEM identifier, if present, to the set. That's it.

Using this feature, the intelligence framework can be extended in multiple ways. Let me just give one last example. Let's assume we have a feed that publishes domains generated by Domain Generation Algorithms (DGA). These domains are only valid for a certain time window. However, the time window might differ for each of the DGAs. Just ingesting the feed would blow memory and performance sooner or later. So what to do? We could combine the expiration feature and the extension feature and implement per item expiration. The do_item_expire.bro script implements per item expiration by allowing to define individual expiration timeouts using the new meta data value expire.

The small things


Last but not least there are some minor improvements to the intelligence framework. Minor in terms of visible effect, which is definitely just the tip of the iceberg! We don't need to dive into the details. Let's just keep it with the good news: The intelligence framework supports subnets. The new type Intel::SUBNET can be used to ingest subnets in CIDR notation. The subnets are matched against seen addresses. Thus a hit on a single IP could be triggered by an intelligence item describing the exact address or a subnet containing the address or both. To distinguish these cases the new field matched was added to the intel log. As the name might suggest, matched is a set of intelligence types that triggered the hit. Furthermore subnets might overlap. Assume Bro's intelligence database contains 192.168.23.0/26 and 192.168.0.0/16. In this case seeing 192.168.23.42 would as well trigger a single hit, caused by multiple indicators. At this point it could be useful to recall the data model again. A single Intel::Seen record that is reported can trigger multiple indicators. Each indicator can have multiple meta data records attached, as the same indicator can be obtained from different sources. So in case of addresses there are two levels of indirection.

Another couple of small changes hides inside the do_notice.bro policy script. Notice emails are extended to contain the service(s) inferred for the connection that triggered the hit as well as the intel source of the matched indicator. Additionally an identifier is added. Having an identifier allows notice suppression in the notice framework. To suppress intel notices for 12 hours we just need a simple redef:

redef Notice::type_suppression_intervals += {
    [Intel::Notice] = 12hr,
};
The notice identifier is composed of indicator, originator's and responder's IP without considering the direction of the flow. Thus all connections between two IPs regarding the given indicator will ignored for the defined suppression interval. Note that only the corresponding notices are suppressed. The intel log will still contain all hits.

Summary


This blog post discussed the data model of Bro's intelligence framework and the new remove function. Furthermore the intelligence expiration and match extension mechanisms have been explained. Finally the new type for subnets and the changes to the do_notice.bro script have been reviewed. I hope this post could shed some light on the ideas behind Bro's intelligence framework. Have fun integrating the framework into your Bro deployment!

No comments:

Post a Comment