Thursday, December 13, 2012

The Tree of Trust

As we mentioned in our preceding blog posting, ICSI has been harvesting details about SSL connections and their contained certificates since the beginning of this year.
We use the data to provide a notary service to the community, which can be used to retrieve information about individual certificates.

To enable a better understanding of the relationships between root and intermediate Certificate Authorities (CAs), we created  the tree of trust, an interactive graph which shows the relationships between root and intermediate CAs.

In this graph, each node represents a CA, where red nodes correspond to root CAs and green nodes to intermediates. The node diameter scales logarithmically with the number of certificates signed by the node. Similarly, the color of the green nodes scales proportional to the diameter.

Clicking on a CA reveals further information, including the exact number of certificates that have been signed by it, the full subject, and the validity periods. Moreover, the search bar allows for quick location of a CA by name.

We generated the graph by validating all currently existing certificates in our notary database. For all certificates that chained up to one of the roots in the current Mozilla root-store, we recorded the whole path. For the tree of trust, we merged all the paths together and summed up the number of certificates that each CA signed.

In the graph, the CA that signed the largest number of certificates is the Go Daddy Secure Certification Authority, an intermediate of GoDaddy. Our current dataset contains over 74,000 certificates that it signed.

The DFN-Verein CA has signed the largest number of intermediate CA certificates. The DFN provides certificates for German higher education institutions and also for many German research institutions. It creates a unique sub-CA for each institution for which it issues certificates. This behavior has administrative reasons and the DFN retains full control over all their child-CAs by not revealing the private key of the sub-CA to the individual institutions.

Friday, November 2, 2012

Using the ICSI Certificate Notary

Today, we are happy to publicly announce the ICSI Certificate Notary. This service provides near real-time reputation information on a large number of TLS/SSL certificates seen in the wild, collected continuously by Bro at several partner network sites. The notary’s data includes the time when a certificate was first and last seen, and whether we can establish a valid chain to a root certificate from the Mozilla root store. You can use the service by sending a DNS request for an A or TXT record to:

<sha1>.notary.icsi.berkeley.edu

The token <sha1> represents the SHA1 digest of the certificate to query. For A record queries, the result comes back either as the address 127.0.0.1 to indicate that our data providers have seen the certificate, as 127.0.0.2 if we could recently validate the certificate against the Mozilla root store, or NXDOMAIN if we have not seen the certificate. For TXT record queries, the notary returns key-value pairs with more details. Here is an example lookup:

dig +short txt C1956DC8A7DFB2A5A56934DA09778E3A11023358.all.notary.icsi.berkeley.edu
"version=1 first_seen=15387 last_seen=15646 times_seen=260 validated=1"

Incidentally, Vlad Grigorescu recently taught Bro how to handle DNS TXT records, which now opens new possibilities in terms of real-time certificate analysis. If you do not remember how to perform DNS lookups from a Bro script, here is an example:

Vlad’s additions now also enable TXT queries via the function lookup_hostname_txt. The snippet below asks our notary for details of each certificate in the network traffic:

Please let us know if you have questions, find problems, or have feature requests.

Wednesday, August 29, 2012

Bro 2.1 Release

We are very excited to release Bro 2.1 today. See the download page for the source code; binary packages will come soon.

Bro 2.1 comes with extensive support for IPv6, tunnel decapsulation, a new input framework for integrating external information in real-time into the processing, support for load-balancing in BroControl, two new experimental log output formats (DataSeries, ElasticSearch), and many more improvements and fixes throughout the code base. See the NEWS for the release notes, and the CHANGES for the exhaustive commit list. BroControl has its own CHANGES.

Many thanks to everybody who helped us test the earlier beta version, much appreciated.

Wednesday, August 1, 2012

Bro 2.1 Public Beta

Just in time for the upcoming Bro Exchange,  we are happy to announce a public beta of Bro 2.1. Head over to the download page to get the source.

Bro 2.1 comes with extensive support for IPv6, tunnel decapsulation, a new input framework for integrating external information in real-time into the processing, support for load-balancing in BroControl, two new experimental log output formats (DataSeries, ElasticSearch), and many more improvements and fixes throughout the code base. See the NEWS for the preliminary release notes, and the CHANGES for the exhaustive commit list. 

Feedback is welcome and best send to the Bro mailing list. Please note though that we do not yet recommend using this beta for production use.

Thursday, June 21, 2012

Bro Exchange 2012 Registration Form Posted

The Bro Exchange 2012 registration form has just been activated. This event is being organized rather quickly at this point, so please everyone register quickly and send in your presentation proposals to us at info@bro-ids.org.


For more information about travel and lodging, please refer to our website about the Bro Exchange.

Monday, June 11, 2012

Bro Exchange 2012: Dates finalized

We have now finalized the dates for our upcoming Bro users meeting: the Bro Exchange 2012 will take place on August 7-8, 2012, at the National Center for Atmospheric Research (NCAR) in Boulder, Colorado.

See this web page for more information about logistics.  Registration will open soon, but the page already has hotel information; make sure to reserve your room soon. We are also still looking for more presentation proposals.

Friday, June 1, 2012

Upcoming: Loading Data into Bro with the Input Framework

Bro now features a flexible input framework that allows users to import data into Bro. Data is either read into Bro tables or converted to events which can then be handled by scripts.

The input framework is merged into the git master and we will give a short summary on how to use it. The input framework is automatically compiled and installed together with Bro. The interface to it is exposed via the scripting layer.

This post gives the most common examples. For more complex scenarios it is worthwhile to take a look at the unit tests in testing/btest/scripts/base/frameworks/input/.

Reading Data into Tables

Probably the most interesting use-case of the input framework is to read data into a Bro table.

By default, the input framework reads the data in the same format as it is written by the logging framework in Bro - a tab-separated ASCII file.

We will show the ways to read files into Bro with a simple example. For this example we assume that we want to import data from a blacklist that contains server IP addresses as well as the timestamp and the reason for the block.

An example input file could look like this:

#fields ip timestamp reason
192.168.17.1 1333252748 Malware host
192.168.27.2 1330235733 Botnet server
192.168.250.3 1333145108 Virus detected

To read a file into a Bro table, two record types have to be defined. One contains the types and names of the columns that should constitute the table keys and the second contains the types and names of the columns that should constitute the table values.

In our case, we want to be able to lookup IPs. Hence, our key record only contains the server IP. All other elements should be stored as the table content.

The two records are defined as:

type Idx: record {
        ip: addr;
};

type Val: record {
        timestamp: time;
        reason: string;
};

ote that the names of the fields in the record definitions have to correspond to the column names listed in the '#fields' line of the log file, in this case 'ip', 'timestamp', and 'reason'.

The log file is read into the table with a simple call of the add_table function:

global blacklist: table[addr] of Val = table();

Input::add_table([$source="blacklist.file", $name="blacklist", $idx=Idx, $val=Val, $destination=blacklist]);
Input::remove("blacklist");

With these three lines we first create an empty table that should contain the blacklist data and then instruct the Input framework to open an input stream named blacklist to read the data into the table. The third line removes the input stream again, because we do not need it any more after the data has been read.

Because some data files can - potentially - be rather big, the input framework works asynchronously. A new thread is created for each new input stream. This thread opens the input data file, converts the data into a Bro format and sends it back to the main Bro thread.

Because of this, the data is not immediately accessible. Depending on the size of the data source it might take from a few milliseconds up to a few seconds until all data is present in the table. Please note that this means that when Bro is running without an input source or on very short captured files, it might terminate before the data is present in the system (because Bro already handled all packets before the import thread finished).

Subsequent calls to an input source are queued until the previous action has been completed. Because of this, it is, for example, possible to call add_table and remove in two subsequent lines: the remove action will remain queued until the first read has been completed.

Once the input framework finishes reading from a data source, it fires the update_finished event. Once this event has been received all data from the input file is available in the table.

event Input::update_finished(name: string, source: string) {
        # now all data is in the table
        print blacklist;
}

The table can also already be used while the data is still being read - it just might not contain all lines in the input file when the event has not yet fired. After it has been populated it can be used like any other Bro table and blacklist entries easily be tested:

if ( 192.168.18.12 in blacklist )
        # take action

Re-reading and streaming data

For many data sources, like for many blacklists, the source data is continually changing. For this cases, the Bro input framework supports several ways to deal with changing data files.

The first, very basic method is an explicit refresh of an input stream. When an input stream is open, the function force_update can be called. This will trigger a complete refresh of the table; any changed elements from the file will be updated. After the update is finished the update_finished event will be raised.

In our example the call would look like:

Input::force_update("blacklist");

The input framework also supports two automatic refresh mode. The first mode continually checks if a file has been changed. If the file has been changed, it is re-read and the data in the Bro table is updated to reflect the current state. Each time a change has been detected and all the new data has been read into the table, the update_finished event is raised.

The second mode is a streaming mode. This mode assumes that the source data file is an append-only file to which new data is continually appended. Bro continually checks for new data at the end of the file and will add the new data to the table. If newer lines in the file have the same index as previous lines, they will overwrite the values in the output table. Because of the nature of streaming reads (data is continually added to the table), the update_finished event is never raised when using streaming reads.

The reading mode can be selected by setting the mode option of the add_table call. Valid values are MANUAL (the default), REREAD and STREAM.

Hence, when using adding $mode=Input::REREAD to the previous example, the blacklists table will always reflect the state of the blacklist input file.

Input::add_table([$source="blacklist.file", $name="blacklist", $idx=Idx, $val=Val, $destination=blacklist, $mode=Input::REREAD]);

Receiving change events

When re-reading files, it might be interesting to know exactly which lines in the source files have changed.

For this reason, the input framework can raise an event each time when a data item is added to, removed from or changed in a table.

The event definition looks like this:

event entry(description: Input::TableDescription, tpe: Input::Event, left: Idx, right: Val) {
        # act on values
}

The event has to be specified in $ev in the add_table call:

Input::add_table([$source="blacklist.file", $name="blacklist", $idx=Idx, $val=Val, $destination=blacklist, $mode=Input::REREAD, $ev=entry]);

The description field of the event contains the arguments that were originally supplied to the add_table call. Hence, the name of the stream can, for example, be accessed with description$name. tpe is an enum containing the type of the change that occurred.

It will contain Input::EVENT_NEW, when a line that was not previously been present in the table has been added. In this case left contains the Index of the added table entry and right contains the values of the added entry.

If a table entry that already was present is altered during the re-reading or streaming read of a file, tpe will contain Input::EVENT_CHANGED. In this case left contains the Index of the changed table entry and right contains the values of the entry before the change. The reason for this is, that the table already has been updated when the event is raised. The current value in the table can be ascertained by looking up the current table value. Hence it is possible to compare the new and the old value of the table.

tpe contains Input::REMOVED, when a table element is removed because it was no longer present during a re-read. In this case left contains the index and right the values of the removed element.

Filtering data during import

The input framework also allows a user to filter the data during the import. To this end, predicate functions are used. A predicate function is called before a new element is added/changed/removed from a table. The predicate can either accept or veto the change by returning true for an accepted change and false for an rejected change. Furthermore, it can alter the data before it is written to the table.

The following example filter will reject to add entries to the table when they were generated over a month ago. It will accept all changes and all removals of values that are already present in the table.

Input::add_table([$source="blacklist.file", $name="blacklist", $idx=Idx, $val=Val, $destination=blacklist, $mode=Input::REREAD,
                $pred(typ: Input::Event, left: Idx, right: Val) = {
                        if ( typ != Input::EVENT_NEW ) {
                                return T;
                        }
                        return ( ( current_time() - right$timestamp ) < (30 day) );
                }]);

To change elements while they are being imported, the predicate function can manipulate left and right. Note that predicate functions are called before the change is committed to the table. Hence, when a table element is changed ( tpe is INPUT::EVENT_CHANGED ), left and right contain the new values, but the destination (blacklist in our example) still contains the old values. This allows predicate functions to examine the changes between the old and the new version before deciding if they should be allowed.

Different readers

The input framework supports different kinds of readers for different kinds of source data files. At the moment, the default reader reads ASCII files formatted in the Bro log-file-format (tab-separated values). At the moment, Bro comes with two other readers. The RAW reader reads a file that is split by a specified record separator (usually newline). The contents are returned line-by-line as strings; it can, for example, be used to read configuration files and the like and is probably only useful in the event mode and not for reading data to tables.

Another included reader is the BENCHMARK reader, which is being used to optimize the speed of the input framework. It can generate arbitrary amounts of semi-random data in all Bro data types supported by the input framework.

In the future, the input framework will get support for new data sources like, for example, different databases.

Add_table options

This section lists all possible options that can be used for the add_table function and gives a short explanation of their use. Most of the options already have been discussed in the previous sections.

The possible fields that can be set for an table stream are:

source
A mandatory string identifying the source of the data. For the ASCII reader this is the filename.
name
A mandatory name for the filter that can later be used to manipulate it further.
idx
Record type that defines the index of the table
val

Record type that defines the values of the table

reader The reader used for this stream. Default is READER_ASCII.

mode
The mode in which the stream is opened. Possible values are MANUAL, REREAD and STREAM. Default is MANUAL. MANUAL means, that the files is not updated after it has been read. Changes to the file will not be reflected in the data Bro knows. REREAD means that the whole file is read again each time a change is found. This should be used for files that are mapped to a table where individual lines can change. STREAM means that the data from the file is streamed. Events / table entries will be generated as new data is added to the file.
destination
The destination table
ev
Optional event that is raised, when values are added to, changed in or deleted from the table. Events are passed an Input::Event description as the first argument, the index record as the second argument and the values as the third argument.
pred
Optional predicate, that can prevent entries from being added to the table and events from being sent.
want_record
Boolean value, that defines if the event wants to receive the fields inside of a single record value, or individually (default). This can be used, if val is a record containing only one type. In this case, if want_record is set to false, the table will contain elements of the type contained in val.

Reading data to events

The second supported mode of the input framework is reading data to Bro events instead of reading them to a table using event streams.

Event streams work very similarly to table streams that were already discussed in much detail. To read the blacklist of the previous example into an event stream, the following Bro code could be used:

type Val: record {
        ip: addr;
        timestamp: time;
        reason: string;
};

event blacklistentry(description: Input::EventDescription, tpe: Input::Event, ip: addr, timestamp: time, reason: string) {
        # work with event data
}

event bro_init() {
        Input::add_event([$source="blacklist.file", $name="blacklist", $fields=Val, $ev=blacklistentry]);
}

The main difference in the declaration of the event stream is, that an event stream needs no separate index and value declarations -- instead, all source data types are provided in a single record definition.

Apart from this, event streams work exactly the same as table streams and support most of the options that are also supported for table streams.

The options that can be set for when creating an event stream with add_event are:

source
A mandatory string identifying the source of the data. For the ASCII reader this is the filename.
name
A mandatory name for the stream that can later be used to remove it.
fields
Name of a record type containing the fields, which should be retrieved from the input stream.
ev
The event which is fired, after a line has been read from the input source. The first argument that is passed to the event is an Input::Event structure, followed by the data, either inside of a record (if want_record is set) or as individual fields. The Input::Event structure can contain information, if the received line is NEW, has been CHANGED or DELETED. Singe the ASCII reader cannot track this information for event filters, the value is always NEW at the moment.
mode
The mode in which the stream is opened. Possible values are MANUAL, REREAD and STREAM. Default is MANUAL. MANUAL means, that the files is not updated after it has been read. Changes to the file will not be reflected in the data Bro knows. REREAD means that the whole file is read again each time a change is found. This should be used for files that are mapped to a table where individual lines can change. STREAM means that the data from the file is streamed. Events / table entries will be generated as new data is added to the file.
reader
The reader used for this stream. Default is READER_ASCII.
want_record
Boolean value, that defines if the event wants to receive the fields inside of a single record value, or individually (default). If this is set to true, the event will receive a single record of the type provided in fields.

Friday, May 25, 2012

Upcoming: Binary Output with DataSeries

Bro's default ASCII log format is not exactly the most efficient way for storing and searching large volumes of data. An an alternative, Bro 2.1 will come with experimental support for DataSeries output, an efficient binary format for recording structured bulk data. DataSeries is developed and maintained at HP Labs.

The code is now merged into git and we'll give a summary below on how to use it. At this time, we see the DataSeries support primarily as an experiment for understanding the utility of alternative output format; feedback is appreciated. As usual, feel free to send questions to the mailing list and file tickets with our tracker for specific bugs and feature requests.

Bro's DataSeries module also constitutes a case study on writing output plugins using the new internal API we added in Bro 2.0. Adding new output formats is pretty straight-forward now, and we have a few more things in mind here for the future.

Installing DataSeries

To use DataSeries, its libraries must be available at compile-time, along with the supporting Lintel package. Generally, both are distributed on HP Labs' web site. Currently, however, you need to use recent developments versions for both packages, which you can download from github like this:

git clone http://github.com/dataseries/Lintel
git clone http://github.com/dataseries/DataSeries

To build and install the two into <prefix>, do:

( cd Lintel     && mkdir build && cd build && cmake -DCMAKE_INSTALL_PREFIX=<prefix> .. && make && make install )
( cd DataSeries && mkdir build && cd build && cmake -DCMAKE_INSTALL_PREFIX=<prefix> .. && make && make install )

Please refer to the packages' documentation for more information about the installation process. In particular, there's more information on required and optional dependencies for Lintel and dependencies for DataSeries. For users on RedHat-style systems, you'll need the following:

yum install libxml2-devel boost-devel

Compiling Bro with DataSeries Support

Once you have installed DataSeries, Bro's configure should pick it up automatically as long as it finds it in a standard system location. Alternatively, you can specify the DataSeries installation prefix manually with --with-dataseries=<prefix>. Keep an eye on configure's summary output, if it looks like the following, Bro found DataSeries and will compile in the support:

# ./configure --with-dataseries=/usr/local
[...]
====================|  Bro Build Summary  |=====================
[...]
DataSeries:        true
[...]
================================================================

Activating DataSeries

The direct way to use DataSeries is to switch all log files over to the binary format. To do that, just add redef Log::default_writer=Log::WRITER_DATASERIES; to your local.bro. For testing, you can also just pass that on the command line:

bro -r trace.pcap Log::default_writer=Log::WRITER_DATASERIES

With that, Bro will now write all its output into DataSeries files *.ds. You can inspect these using DataSeries's set of command line tools, which its installation process installs into <prefix>/bin. For example, to convert a file back into an ASCII representation:

$ ds2txt conn.log
[... We skip a bunch of meta data here ...]
ts uid id.orig_h id.orig_p id.resp_h id.resp_p proto service duration orig_bytes resp_bytes conn_state local_orig missed_bytes history orig_pkts orig_ip_bytes resp_pkts resp_ip_bytes
1300475167.096535 CRCC5OdDlXe 141.142.220.202 5353 224.0.0.251 5353 udp dns 0.000000 0 0 S0 F 0 D 1 73 0 0
1300475167.097012 o7XBsfvo3U1 fe80::217:f2ff:fed7:cf65 5353 ff02::fb 5353 udp  0.000000 0 0 S0 F 0 D 1 199 0 0
1300475167.099816 pXPi1kPMgxb 141.142.220.50 5353 224.0.0.251 5353 udp  0.000000 0 0 S0 F 0 D 1 179 0 0
1300475168.853899 R7sOc16woCj 141.142.220.118 43927 141.142.2.2 53 udp dns 0.000435 38 89 SF F 0 Dd 1 66 1 117
1300475168.854378 Z6dfHVmt0X7 141.142.220.118 37676 141.142.2.2 53 udp dns 0.000420 52 99 SF F 0 Dd 1 80 1 127
1300475168.854837 k6T92WxgNAh 141.142.220.118 40526 141.142.2.2 53 udp dns 0.000392 38 183 SF F 0 Dd 1 66 1 211
[...]

(--skip-all suppresses the meta data.)

Note that the ASCII conversion is not equivalent to Bro's default output format.

You can also switch only individual files over to DataSeries by adding code like this to your local.bro:

event bro_init()
    {
    local f = Log::get_filter(Conn::LOG, "default"); # Get default filter for connection log.
    f$writer = Log::WRITER_DATASERIES;               # Change writer type.
    Log::add_filter(Conn::LOG, f);                   # Replace filter with adapted version.
    }

Bro's DataSeries writer comes with a few tuning options, see the script scripts/base/frameworks/logging/writers/dataseries.bro in the Bro distribution.

Working with DataSeries

Here are few examples of using DataSeries' command line tools to work with the output files.

  • Printing CSV:

    $ ds2txt --csv conn.log
    ts,uid,id.orig_h,id.orig_p,id.resp_h,id.resp_p,proto,service,duration,orig_bytes,resp_bytes,conn_state,local_orig,missed_bytes,history,orig_pkts,orig_ip_bytes,resp_pkts,resp_ip_bytes
    1258790493.773208,ZTtgbHvf4s3,192.168.1.104,137,192.168.1.255,137,udp,dns,3.748891,350,0,S0,F,0,D,7,546,0,0
    1258790451.402091,pOY6Rw7lhUd,192.168.1.106,138,192.168.1.255,138,udp,,0.000000,0,0,S0,F,0,D,1,229,0,0
    1258790493.787448,pn5IiEslca9,192.168.1.104,138,192.168.1.255,138,udp,,2.243339,348,0,S0,F,0,D,2,404,0,0
    1258790615.268111,D9slyIu3hFj,192.168.1.106,137,192.168.1.255,137,udp,dns,3.764626,350,0,S0,F,0,D,7,546,0,0
    [...]
    

    Add --separator=X to set a different separator.

  • Extracting a subset of columns:

    $ ds2txt --select '*' ts,id.resp_h,id.resp_p --skip-all conn.log
    1258790493.773208 192.168.1.255 137
    1258790451.402091 192.168.1.255 138
    1258790493.787448 192.168.1.255 138
    1258790615.268111 192.168.1.255 137
    1258790615.289842 192.168.1.255 138
    [...]
    
  • Filtering rows:

    $ ds2txt --where '*' 'duration > 5 && id.resp_p > 1024' --skip-all  conn.ds
    1258790631.532888 V8mV5WLITu5 192.168.1.105 55890 239.255.255.250 1900 udp  15.004568 798 0 S0 F 0 D 6 966 0 0
    1258792413.439596 tMcWVWQptvd 192.168.1.105 55890 239.255.255.250 1900 udp  15.004581 798 0 S0 F 0 D 6 966 0 0
    1258794195.346127 cQwQMRdBrKa 192.168.1.105 55890 239.255.255.250 1900 udp  15.005071 798 0 S0 F 0 D 6 966 0 0
    1258795977.253200 i8TEjhWd2W8 192.168.1.105 55890 239.255.255.250 1900 udp  15.004824 798 0 S0 F 0 D 6 966 0 0
    1258797759.160217 MsLsBA8Ia49 192.168.1.105 55890 239.255.255.250 1900 udp  15.005078 798 0 S0 F 0 D 6 966 0 0
    1258799541.068452 TsOxRWJRGwf 192.168.1.105 55890 239.255.255.250 1900 udp  15.004082 798 0 S0 F 0 D 6 966 0 0
    [...]
    
  • Calculate some statistics:

    Mean/stdev/min/max over a column:

    $ dsstatgroupby '*' basic duration from conn.ds
    # Begin DSStatGroupByModule
    # processed 2159 rows, where clause eliminated 0 rows
    # count(*), mean(duration), stddev, min, max
    2159, 42.7938, 1858.34, 0, 86370
    [...]
    

    Quantiles of total connection volume:

    > dsstatgroupby '*' quantile 'orig_bytes + resp_bytes' from conn.ds
    [...]
    2159 data points, mean 24616 +- 343295 [0,1.26615e+07]
    quantiles about every 216 data points:
    10%: 0, 124, 317, 348, 350, 350, 601, 798, 1469
    tails: 90%: 1469, 95%: 7302, 99%: 242629, 99.5%: 1226262
    [...]
    

The man pages for these tools show further options, and their -h option gives some more information (either can be a bit cryptic unfortunately though).

Deficiencies

Due to limitations of the DataSeries format, one can unfortunately not inspect files before they have been fully written. In other words, when using DataSeries, it's currently it's not possible to inspect the live log files inside the spool directory before they are rotated to their final location. It seems that this could be fixed with some effort, and we plan to work with the DataSeries development team on that if the format gains traction among Bro users.

Likewise, we're considering writing custom command line tools for interacting with DataSeries files, making that a bit more convenient than what the standard utilities provide.

Tuesday, May 22, 2012

Announcing Bro Exchange 2012

UPDATE: We have finalized the dates: August 7-8, 2012. See the new Exchange 2012 web page for more information.

Due to overwhelming demand for a user meeting instead of a workshop this year I’m pleased to announce that we are going to be holding an event that we are calling “Bro Exchange 2012”. The name derives our desire to get a large number Bro users together in the same room to exchange thoughts and talk about what they are doing with Bro. I believe that the community has grown to the point where this is needed, it’s time for people to stop operating in isolation.

We received a generous offer from the folks at the National Center for Atmospheric Research in Boulder, Colorado to use their facilities. Thank you NCAR! If you are curious about what goes on at NCAR, you can check out their website: http://ncar.ucar.edu/

What we need from you now is proposals! Get in touch with us at info@bro-ids.org if you have any ideas for talks, demos, interpretive dance, anything really. We are looking for people to use and abuse Bro in any way imaginable. You will get major consideration if you talk about operationally useful aspects of Bro, but personally I would like some completely fun presentations too.

Proposals don’t need to be concretely formed at this point either. If you have an idea we are more than willing to work with you to see if we can find the nugget in your idea that could turn in a great presentation.

Here are some quick ideas for presentations:

  • Tell an appropriately anonymized story of a security incident and how Bro was or could have been used during the incident.
  • Demonstrate how you integrated Bro or Bro data with some external system.
  • Give a brain dump of feature requests or detections you’d like to see.
  • Write and perform a song. OpenBSD always has them, why not Bro?
  • Do a short tutorial of some small part of Bro akin to our workshop presentations.

I can’t wait to see what you all come up with! More information and a registration website will be coming soon.

Friday, May 18, 2012

Upcoming: Bro 2.1 IPv6 Support

The upcoming Bro 2.1 release includes major improvements to its IPv6 support. IPv6 is enabled by default and no longer needs any special configuration. IPv6 has been fully integrated into all parts of Bro including protocol analysis and the scripting language.

Some of the most significant enhancements include support for IPv6 fragment reassembly, support for following IPv6 extension header chains, and support for tunnel decapsulation (6to4 and Teredo). The DNS analyzer now handles AAAA records properly, and DNS lookups that Bro itself performs now include AAAA queries, so that, for example, the result returned by the "lookup_hostname" built-in function is a set that can contain both IPv4 and IPv6 addresses. Support for the most common ICMPv6 message types has been added. Also, the FTP EPSV and EPRT commands are now handled properly.

When building Bro from source, the "--enable-brov6" configure option has been removed because it is no longer relevant. The way IP addresses are stored internally has been improved, so Bro can handle both IPv4 and IPv6 by default without any special configuration.

There are a couple of changes to the Bro scripting language to better support IPv6. First, IPv6 literals appearing in a Bro script must now be enclosed in square brackets (for example, [fe80::db15]). For subnet literals, the slash "/" appears after the closing square bracket (for example, [fe80:1234::]/32). Second, when an IP address variable or IP address literal is enclosed in pipes (for example, |[fe80::db15]|) the result is now the size of the address in bits (32 for IPv4 and 128 for IPv6).

There are several new built-in functions. The "is_v4_addr" and "is_v6_addr" built-in functions can be used to determine whether a given IP address is IPv4 or IPv6. The "to_subnet" built-in function can do conversions from a string representation of a subnet (such as "192.168.0.0/16" or "2607:f8b0::/32") to the corresponding value as a Bro "subnet" type. Similarly, "addr_to_counts" and "counts_to_addr" can do conversions between an IP address and a vector of counts (four elements if address is IPv6 and one if IPv4). Finally, "routing0_data_to_addrs" takes the "data" field of an IPv6 type 0 routing header and returns a vector of IP addresses contained in the routing header data.

A couple built-in functions have been removed: "addr_to_count" (this only worked with IPv4 addresses; use "addr_to_counts" instead), and "bro_has_ipv6" (this is no longer needed, because Bro always supports IPv6 now).

There are some new events that improve support for IPv6 (although neither of these events are yet handled in any of the Bro scripts). The event "ipv6_ext_headers" is generated for any IPv6 packet containing extension headers. Another new event "esp_packet" is generated for any packets using ESP (Encapsulating Security Payload).

There are some new events that are generated for specific ICMPv6 message types: "icmp_packet_too_big", "icmp_parameter_problem", "icmp_router_solicitation", "icmp_router_advertisement", "icmp_neighbor_solicitation", and "icmp_neighbor_advertisement". And there's a new event "icmp_error_message" that is generated if Bro sees an ICMPv6 error message for which there is no dedicated event. It should be noted that none of these new events are currently handled in any of the Bro scripts.

One other small change related to ICMP events is that the "icmp_redirect" event signature has changed (it now includes both the target and destination addresses).

Although not a new event, the "dns_AAAA_reply" event is now generated for DNS replies of type AAAA (previously, Bro would generate a "dns_A_reply" instead), and the event signature has changed slightly (the last parameter has been removed because it was unused). There is a new event "dns_A6_reply" that is generated for DNS replies of type A6.

There is a new experimental feature (to enable it, build Bro with the new configure option "--enable-mobile-ipv6") to analyze Mobile IPv6 (see RFC 6275). If enabled, there is a new event "mobile_ipv6_message" (although currently none of the scripts handle this event).

In addition to Bro itself, the other Bro components have also been made IPv6-aware by default. In particular, significant changes were made to trace-summary, PySubnetTree, and Broccoli to support IPv6.

There are a few API changes in PySubnetTree to support a new concept called binary lookup mode, which only affects IP address lookups (i.e., this feature does not affect how subnets are added to the SubnetTree data structure). There is a new method "set_binary_lookup_mode" which can be used to enable or disable binary lookup mode, and there's a new method "binary_lookup_mode" to check whether or not binary lookup mode is currently enabled. Finally, the SubnetTree constructor has a new optional argument which lets you choose whether or not to enable binary lookup mode immediately, but you can always use "set_binary_lookup_mode" at a later time.

There are a few API changes in Broccoli to support IPv6. First, there is a new type "BroAddr" which can store either an IPv4 or IPv6 address, and the "BroSubnet" type has been made larger to accommodate both IPv4 and IPv6. Also, there is a new function "bro_util_is_v4_addr" which can be used to check if an address is IPv4 or not. Finally, there is a new constant "BRO_IPV4_MAPPED_PREFIX" which is the first 12 bytes of a 16-byte IPv4-mapped IPv6 address (see RFC 4291).

Tuesday, May 15, 2012

Upcoming: Bro 2.1 Development Updates

We are getting close to finalizing the feature set for the upcoming Bro 2.1 release. To give you an idea what's in the queue, we will be doing a series of blog postings that focus on the main areas we have been working on since 2.0. Specifically, expect to see development updates on the following areas:

Extensive IPv6 Support
We are completely revamping Bro's IPv6 support. With Bro 2.1, IPv6 will be fully integrated into protocol analysis and scripting language (and no longer be the fragile, optional code that it used to be). In addition, we are adding support for many more IPv6 features, including ICMPv6 and tunnel decapsulation.
Binary Logging
Bro's default ASCII output is not ideal for handling large volumes of logs. In 2.1, we are adding experimental support for binary output using HP Lab's DataSeries. DataSeries is a format optimized for handling high-volume logs.
Input Framework
Bro 2.1 will come with a new framework for reading data into script-land at runtime, such as blacklists and other external context. Initially, we are focussing on reading ASCII files with a column-based structure similar to Bro's logs. But we designed the framework internals more generally, and new input formats can be added as plugins, similar to how the existing logging framework operates.
File Analysis Framework
We are unifying Bro's approach to inspecting file transfers it observes on the wire. In 2.1, a new framework will provide protocol-independent file reassembly and analysis, with extensive hooks to get access to their content.

The code for all these is either already merged into current git master or is currently waiting for final touches in a feature branch. Stay tuned for more information.

Tuesday, February 7, 2012

Internship Opening

The Bro Project has an opening for a three month internship during the
summer of 2012. If you are interesteded in helping us improve Bro and develop new functionality, please apply!

See here for more information.

Wednesday, February 1, 2012

Filtering Logs with Bro

One of the best new features of Bro 2.0 is the logging framework. It gives you structured logs which are easily parsed for simplified log analysis. It also provides a nice abstraction between writing something to a log and handling that data before it is written to disk. I'll provide a very brief overview of the logging framework and then go into some filters that I've been helping people with lately.

The logging framework in Bro 2.0 is based around sets of key-value pairs. This alone was a huge step for Bro and helps bring it into the modern day since Bro logs now conceptually map neatly into all table and document store databases. To take it further, we wanted to separate the actions of sending data off to be logged and handling how that data is written to a data store (e.g. text files on disk). When data for a log is ready to be written out, log records are written to "Logging Streams" which can then be filtered, modified, and redirected with "Logging Filters".

The need to apply a custom filter can arise from a number of functionality requirements:
  • Prevent logging of data that can't be logged for privacy reason.
  • Pre-splitting logs to ease searching.
  • Splitting logs to direct some of them to external data stores. I'm not showing any examples of this though since Bro 2.0 only support textual logs.

Example 1

The first example is for a user that wants to split their HTTP logs into something that they can manage and search more easily. Initially they decide to just split logs into "inbound" requests and "outbound" requests. The following filter requires that the Site::local_nets variable is configured appropriately which it will be automatically if you run Bro with BroControl and have your local networks defined in <prefix>/etc/networks.cfg.

event bro_init()
        {
        # First remove the default filter.
        Log::remove_default_filter(HTTP::LOG);
        # Add the filter to direct logs to the appropriate file name.
        Log::add_filter(HTTP::LOG, [$name = "http-directions",
                                    $path_func(id: Log::ID, path: string, rec: HTTP::Info) = {
                                        return (Site::is_local_addr(rec$id$orig_h) ? "http_outbound" : "http_inbound");
                                    }]);
        }

With that code added to local.bro or another custom script, Bro will output two HTTP logs: http_inbound.log and http_outbound.log. The log files are created dynamically as they are needed so it's possible that you may not see them if there isn't appropriate traffic to create them.

Taking another step, that same user might also realize that anytime a Windows executable over HTTP transits their monitoring point they want it written to a separate log file in addition to the inbound or outbound log. The file type detection is based on the contents of the HTTP response too so it won't be misled by 'Content-Type' headers or odd URLs.

The next block of code adds a second filter to the HTTP::LOG stream which is executed separately and therefore is able to duplicate logs.

event bro_init()
        {
        # First remove the default filter.
        Log::remove_default_filter(HTTP::LOG);
        # Add the filter to direct logs to the appropriate file name.
        Log::add_filter(HTTP::LOG, [$name = "http-directions",
                                    $path_func(id: Log::ID, path: string, rec: HTTP::Info) = {
                                        return (Site::is_local_addr(rec$id$orig_h) ? "http_outbound" : "http_inbound");
                                    }]);

        # Add a filter to pull Windows PE executables into a separate log.
        Log::add_filter(HTTP::LOG, [$name = "http-executables",
                                    $path = "http_exe",
                                    $pred(rec: HTTP::Info) = { return rec?$mime_type && rec$mime_type == "application/x-dosexec"; }]);
        }
}}}

With that, a Bro installation will end up with three log files for HTTP traffic (assuming the correct traffic is seen): http_inbound.log, http_outbound.log, and http_exe.log. The lines in http_exe.log will be duplicated in their appropriate "inbound" or "outbound" log.

There are a number of cases where sites either can't or won't log outbound requests to avoid intruding on their user's privacy. You can accomodate that by applying a predicate ($pred) function to the filter that splits the log into inbound and outbound. We will return false (F) for the predicate whenever the originator of the connection was local to prevent the log record from proceeding.

event bro_init()
        {
        # First remove the default filter.
        Log::remove_default_filter(HTTP::LOG);
        # Add the filter to direct logs to the appropriate file name.
        Log::add_filter(HTTP::LOG, [$name = "http-directions",
                                    $pred(rec: HTTP::Info) = {
                                        return ! Site::is_local_addr(rec$id$orig_h);
                                    },
                                    $path_func(id: Log::ID, path: string, rec: HTTP::Info) = {
                                        return (Site::is_local_addr(rec$id$orig_h) ? "http_outbound" : "http_inbound");
                                    }]);

        # Add a filter to pull Windows PE executables into a separate log.
        Log::add_filter(HTTP::LOG, [$name = "http-executables",
                                    $path = "http_exe",
                                    $pred(rec: HTTP::Info) = { return rec?$mime_type && rec$mime_type == "application/x-dosexec"; }]);
        }

The above filters will result in two log files with the right traffic: "http_outbound.log" and "http_exe.log". The log with Windows executables will still contain outbound requests as long as a windows executable was returned because the predicate on that filter only prevents records that didn't result in a Windows EXE from the server.

Now, we've barely scratched the surface of filtering for the logging framework. Perhaps a few more examples?

Example 2

I created some filters recently for Doug Burks' excellent Security Onion Linux distribution, to help with data management. He came to me recently letting me know that he needed to know the host interface which saw the traffic resulting in any particular log record. The way Bro clusters normally work is that logs output by any worker are merged together into single logs on the manager which theoretically loses the information he needs. It turns out that the logging framework could cope with this. Specifically he needed the HTTP logs and Conn logs identified by their interface and here is the script that implements it.

event bro_init()
        {
        if ( reading_live_traffic() )
                {
                Log::remove_default_filter(HTTP::LOG);
                Log::add_filter(HTTP::LOG, [$name = "http-interfaces",
                                            $path_func(id: Log::ID, path: string, rec: HTTP::Info) =
                                                {
                                                local peer = get_event_peer()$descr;
                                                if ( peer in Cluster::nodes && Cluster::nodes[peer]?$interface )
                                                        return cat("http_", Cluster::nodes[peer]$interface);
                                                else
                                                        return "http";
                                                }
                                            ]);

                Log::remove_default_filter(Conn::LOG);
                Log::add_filter(Conn::LOG, [$name = "conn-interfaces",
                                            $path_func(id: Log::ID, path: string, rec: Conn::Info) =
                                                {
                                                local peer = get_event_peer()$descr;
                                                if ( peer in Cluster::nodes && Cluster::nodes[peer]?$interface )
                                                        return cat("conn_", Cluster::nodes[peer]$interface);
                                                else
                                                        return "conn";
                                                }
                                            ]);
                }
        }

What this script does is looks up the interface from the cluster configuration for the host that most recently sent an event and appends that interface name to the log name. Most people won't need this filter because it's fairly specific to what Doug is trying to accomplish on Security Onion, but I wanted to point it out because this is not something we ever envisioned doing with the logging framework but it works flawlessly.

Example 3

There is one last demonstration filter that I wanted to show for filtering DNS logs, but it's actually very applicable to HTTP and SSL as well. Someone came to me recently because they were using Bro 2.0 to monitor their DNS server and they wanted to filter their DNS logs into separate logs based on if a name being requested is in a local or nonlocal zone. Here is the script I wrote.

redef Site::local_zones = { "example.com", "example.org" };

event bro_init()
        {
        Log::remove_default_filter(DNS::LOG);
        Log::add_filter(DNS::LOG, [$name = "dns_split",
                                   $path_func(id: Log::ID, path: string, rec: DNS::Info) = {
                                        return (Site::is_local_name(rec$query) ? "dns_localzone" : "dns_remotezone"); }]);
        }

You need to be sure to fill in all of your most top level DNS zones in the Site::local_zones variable as I have done at the top of the script. All this is doing is removing the default DNS filter and applying a new filter which selectively guides logs into either a file named "dns_localzone.log" or "dns_remotezone.log" depending on if the name is contained within one of your configured local zones.

Example 4

Ok, I lied. That wasn't the last filter. I want to show two more small filtering tricks before wrapping this up. Sometimes the default logs contain more information than you are allowed to log or have disk space to store. In this case you can selectively choose to include or exclude fields in the output. Here is an example of only logging timestamps, querying IP address, and query for the DNS log.

event bro_init()
        {
        Log::remove_default_filter(DNS::LOG);
        Log::add_filter(DNS::LOG, [$name="new-default",
                                   $include=set("ts", "id.orig_h", "query")]);
        }

That will result in only those three fields in your dns.log file.

In some cases sites can't log the subject of email in SMTP traffic which is included in the default SMTP logs. It's also easy to just remove a single field and leave all other fields in a log intact. Here's an example which removes just the subject field from the SMTP logs.

event bro_init()
        {
        Log::remove_default_filter(SMTP::LOG);
        Log::add_filter(SMTP::LOG, [$name="new-default",
                                    $exclude=set("subject")]);
        }

Wrap up

Hopefully this post will inspire people to ask more questions about how to filter logs in even trickier ways and really press the logging framework in new and unexpected ways. At least it should give you a few examples to copy&paste and then build off of as you customize Bro's output to suit your requirements. Keep in mind that the filtering and redirection techniques from the examples can be combined in various ways.

For further information on Bro's logging framework you can find our full documentation here: http://www.bro-ids.org/documentation/logging.html

Wednesday, January 11, 2012

"It's a Bro!": Version 2.0 Released

We are happy to announce that Bro 2.0 is now available for download. The source code is there already, and a few binary packages will follow soon.(Update: Binaries are now available.)

Many of you have already tried the 2.0 Beta version that we put out a while ago, and we have used the time since then for integrating the feedback we received and doing a lot of further polishing. We're pretty happy with how 2.0 looks now and invite you to give it a try.

Bro 2.0 is a major step forward in Bro's evolution. I'm not going to repeat what we already said for the beta, but let me add that the feedback we have received in the meantime has been extremely positive. It seems we're heading into the right direction, and Bro is on track to become a crucial part in the toolbox of many operators facing the challenge of fighting increasingly sophisticated attacks.

The part that took us the most time since the Beta is something I'm sure many of you will appreciate: Bro's default scripts now come with extensive documentation in the form of embedded comments describing their public interface—and Bro's new "Broxygen" mode then turns that into a comprehensive hyperlinked reference. To take a look, browse through the new package index.

Now that Bro 2.0 is out, we're looking forward to going right back to work and start the next release cycle. We already have a number of things in the queue that wait for integration after our self-imposed feature freeze is finally over. And we have plenty more ideas on our roadmap waiting for some attention.

But before that, I'd really like to thank everybody on the Bro core team for working so hard on preparing Bro 2.0. Not only did we make a lot of progress over the last year, but we also put some pretty good infrastructure in place that I believe will allow us to pick up even more steam as we proceed.

Stay tuned, and feel yourself invited to follow and perhaps even join development on our developer's mailing list. If you have a cool Bro script to share, keep the new Contributed Scripts Repository in mind that we are in the process of setting up.

Wednesday, January 4, 2012

Monster Logs

This is a guest blog post from Martin Holste. He's been a great participant in our community and lead developer of the log search utility; ELSA. We asked him to do a guest blog post because we think ELSA is so important to give security analysts better visibility into their Bro logs.

One of Bro's greatest strengths is the massive amount of incredibly detailed information it produces that describes exactly what's taking place on your network. It does all of this by default, with no extra configuration or tuning required. Then on top of that, it provides a framework for creating advanced IDS signatures. This is an amazing thing, but the benefit is only as good as the extent to which the security or IT staff is able to make use of the data. Here is an example line of output from Bro:

1322829241.041505 drj3tWq4mu8 10.236.41.95 63714 198.78.209.254 80 HTTP::MD5 10.236.41.95 c28ec592ac13e009feeea6de6b71f130 http://au.download.windowsupdate.com/msdownload/update/software/secu/2011/01/msipatchregfix-amd64_fdc2d81714535111f2c69c70b39ed1b7cd2c6266.exe c28ec592ac13e009feeea6de6b71f130 10.236.41.95 198.78.209.254 80 - worker-0 Notice::ACTION_LOG 6 3600.000000 - - - - - - - - -

There are many currently available methods for making sense of this output. Most of those methods involve variations of using text utilities to search and format the log data into an output that is requested. The problem with this is that for large installations, scalability quickly becomes an issue. To start with, combining logs from multiple servers is non-trivial if a single location does not have enough disk space to store all of the logs. Even if you can get all of the logs in one location, grepping through the hundreds of Gigabytes per day per sensor that Bro can produce in large environments is prohibitively inefficient.

How much does Bro log? A large network with tens of thousands of users will generate a few thousand HTTP requests per second during the day. Bro will create many logs describing this activity, namely, per request:

  • 1 HTTP connect log
  • 1 DNS log (when a lookup is necessary)
  • 1 Notice log (if an executable is downloaded)
  • 2 Connection logs (TCP for HTTP, UDP for DNS)
  • 1 Software inventory log (if this client hasn't been seen before)

That's a total of six logs for just one HTTP request. If the network is seeing 2,000 requests per second, that's 12,000 logs per second (about one billion per day). The logs average about 300 bytes, which means this is about 3.6 MB/sec of logs. That's about 311 Gigabytes of logs per day (if the rate were constant). Text utility speeds vary greatly, but searching even a few Gigabytes of data will take many seconds or minutes. Searching 311 Gigabytes will take hours.

To put this in perspective, if we assume that a single log entry is represented by a stalk of hay, and a stalk of hay is 50 grams, and a hay bale contains 1,000 stalks for 50 kg, then one billion logs would take 1,000,000 bales. If a bale is one meter long and half a meter wide, that would be 500 square kilometers of hay to search through, per day. That's a haystack of 15,000 square kilometers per month (about five times the size of Rhode Island) to search through for a given log.

Constant Time

Enter ELSA: the open-source project for Enterprise Log Search and Archive. ELSA (http://enterprise-log-search-and-archive.googlecode.com) is capable of receiving, parsing, indexing, and storing logs at obscene rates. It provides an easy to use full-text web search interface for getting that data into the hands of analysts and customers. In addition to basic search, ELSA provides ways to report on arbitrary fields such as time, hostname, URL, etc., email alerts for log searches, and a mechanism for storing and sharing search results.


How fast is ELSA? It will receive and index logs at a sustained rate of 35,000 events/sec on a single modest server and can burst to 100,000 events/sec for long periods.

ELSA will have no problems indexing and archiving the log volume described above for a busy network, even on a single modest system. It will (more importantly) return data in less than a second when you ask, for example, for all the unique IP's to visit a certain website in the last month (about 30 billion or 9.3 TB of logs). It's these arbitrary, ad-hoc reports that make ELSA so helpful. However, there are more conventional ways of increasing search time if you have a good idea of what you're looking for ahead of time or know what time period you're searching in. ELSA's core strength is that you do not have to know what you will be looking for and yet you can send enormous volumes of logs to it. This means you do not need to waste time deciding what the log volume will be or whether or not storing the log is worth it, because storing and searching is done in constant time; search times do not increase with log volume.

Example Usage

A common use for ELSA with Bro logs is to drill-down on and investigate notices. For instance, Bro ships with notices for SQL injection attacks. When you see these notices (or have ELSA alert you on them with “notice_type=SQL_Injection_Attacker”), you will want to investigate to see if the attack is a true positive and if it was successful. Doing so is easy because you can take the IP addresses described in the notice and plug them in as a search for HTTP traffic to see that actual requests. This greatly decreases the amount of time it takes for analysts to get through the day's workload of investigations because alert data can be confirmed or refuted with ease.

Another use case is when you have been given credible intelligence of a possible threat to your network from an outside source, and you want to retroactively search to see if there are any prior instances of contact with the hostile entity. Simply plugging in the hostname, URI, IP address, HTTP user-agent, or other bit of intelligence into ELSA will tell you if there are any prior incidents to investigate based on the network activity that Bro has logged.

A third use case is using ELSA to guide creating Bro scripts. For instance, if you're not sure whether or not a script that looks for the existence of a certain URI parameter is common or not, you can quickly test that in ELSA to see if the script will generate a lot of meaningless notices. The ability to explore historical data quickly means that you will have pre-tuned scripts and an excellent understanding of what the data on your network looks like.

In addition to Bro logs, ELSA will store and index any other kind of log sent to it. This allows analysts to corroborate notices with logs from servers, network gear, web proxies, and other devices that can send syslog. All of the logs can be presented in the same search result for easy correlation.

Plugins

ELSA uses a plugin infrastructure for allowing its functionality to be extended. It ships with a plugin for a separate project, StreamDB (http://streamdb.googlecode.com), which is similar to TimeMachine. Using the StreamDB plugin, you can go directly from viewing a Bro notice to viewing the actual traffic (shown in in either printable ASCII or hex form) in two clicks. StreamDB will retrieve the traffic and present it in less than a second.

Plugins are very extensible and are run on the server-side of the web interface , so a plugin that interfaces with a local asset management database or personnel directory would be trivial to write and implement for quick asset lookups, vulnerability management information, etc.

Getting ELSA

To start with, you need to configure Bro to send its logs as syslog. Instructions on setting up Bro for doing this on Ubuntu can be found here: http://ossectools.blogspot.com/2011/09/bro-quickstart-cluster-edition.html. Instructions for installing ELSA can be found here: http://ossectools.blogspot.com/2011/11/elsa-beta-available.html. There is a Google Group for ELSA support and questions at https://groups.google.com/d/forum/enterprise-log-search-and-archive.