The Bro Blog

Friday, May 25, 2012

Upcoming: Binary Output with DataSeries

Bro's default ASCII log format is not exactly the most efficient way for storing and searching large volumes of data. An an alternative, Bro 2.1 will come with experimental support for DataSeries output, an efficient binary format for recording structured bulk data. DataSeries is developed and maintained at HP Labs.

The code is now merged into git and we'll give a summary below on how to use it. At this time, we see the DataSeries support primarily as an experiment for understanding the utility of alternative output format; feedback is appreciated. As usual, feel free to send questions to the mailing list and file tickets with our tracker for specific bugs and feature requests.

Bro's DataSeries module also constitutes a case study on writing output plugins using the new internal API we added in Bro 2.0. Adding new output formats is pretty straight-forward now, and we have a few more things in mind here for the future.

Installing DataSeries

To use DataSeries, its libraries must be available at compile-time, along with the supporting Lintel package. Generally, both are distributed on HP Labs' web site. Currently, however, you need to use recent developments versions for both packages, which you can download from github like this:

git clone http://github.com/dataseries/Lintel
git clone http://github.com/dataseries/DataSeries

To build and install the two into <prefix>, do:

( cd Lintel     && mkdir build && cd build && cmake -DCMAKE_INSTALL_PREFIX=<prefix> .. && make && make install )
( cd DataSeries && mkdir build && cd build && cmake -DCMAKE_INSTALL_PREFIX=<prefix> .. && make && make install )

Please refer to the packages' documentation for more information about the installation process. In particular, there's more information on required and optional dependencies for Lintel and dependencies for DataSeries. For users on RedHat-style systems, you'll need the following:

yum install libxml2-devel boost-devel

Compiling Bro with DataSeries Support

Once you have installed DataSeries, Bro's configure should pick it up automatically as long as it finds it in a standard system location. Alternatively, you can specify the DataSeries installation prefix manually with --with-dataseries=<prefix>. Keep an eye on configure's summary output, if it looks like the following, Bro found DataSeries and will compile in the support:

# ./configure --with-dataseries=/usr/local
[...]
====================|  Bro Build Summary  |=====================
[...]
DataSeries:        true
[...]
================================================================

Activating DataSeries

The direct way to use DataSeries is to switch all log files over to the binary format. To do that, just add redef Log::default_writer=Log::WRITER_DATASERIES; to your local.bro. For testing, you can also just pass that on the command line:

bro -r trace.pcap Log::default_writer=Log::WRITER_DATASERIES

With that, Bro will now write all its output into DataSeries files *.ds. You can inspect these using DataSeries's set of command line tools, which its installation process installs into <prefix>/bin. For example, to convert a file back into an ASCII representation:

$ ds2txt conn.log
[... We skip a bunch of meta data here ...]
ts uid id.orig_h id.orig_p id.resp_h id.resp_p proto service duration orig_bytes resp_bytes conn_state local_orig missed_bytes history orig_pkts orig_ip_bytes resp_pkts resp_ip_bytes
1300475167.096535 CRCC5OdDlXe 141.142.220.202 5353 224.0.0.251 5353 udp dns 0.000000 0 0 S0 F 0 D 1 73 0 0
1300475167.097012 o7XBsfvo3U1 fe80::217:f2ff:fed7:cf65 5353 ff02::fb 5353 udp  0.000000 0 0 S0 F 0 D 1 199 0 0
1300475167.099816 pXPi1kPMgxb 141.142.220.50 5353 224.0.0.251 5353 udp  0.000000 0 0 S0 F 0 D 1 179 0 0
1300475168.853899 R7sOc16woCj 141.142.220.118 43927 141.142.2.2 53 udp dns 0.000435 38 89 SF F 0 Dd 1 66 1 117
1300475168.854378 Z6dfHVmt0X7 141.142.220.118 37676 141.142.2.2 53 udp dns 0.000420 52 99 SF F 0 Dd 1 80 1 127
1300475168.854837 k6T92WxgNAh 141.142.220.118 40526 141.142.2.2 53 udp dns 0.000392 38 183 SF F 0 Dd 1 66 1 211
[...]

(--skip-all suppresses the meta data.)

Note that the ASCII conversion is not equivalent to Bro's default output format.

You can also switch only individual files over to DataSeries by adding code like this to your local.bro:

event bro_init()
    {
    local f = Log::get_filter(Conn::LOG, "default"); # Get default filter for connection log.
    f$writer = Log::WRITER_DATASERIES;               # Change writer type.
    Log::add_filter(Conn::LOG, f);                   # Replace filter with adapted version.
    }

Bro's DataSeries writer comes with a few tuning options, see the script scripts/base/frameworks/logging/writers/dataseries.bro in the Bro distribution.

Working with DataSeries

Here are few examples of using DataSeries' command line tools to work with the output files.

  • Printing CSV:

    $ ds2txt --csv conn.log
    ts,uid,id.orig_h,id.orig_p,id.resp_h,id.resp_p,proto,service,duration,orig_bytes,resp_bytes,conn_state,local_orig,missed_bytes,history,orig_pkts,orig_ip_bytes,resp_pkts,resp_ip_bytes
    1258790493.773208,ZTtgbHvf4s3,192.168.1.104,137,192.168.1.255,137,udp,dns,3.748891,350,0,S0,F,0,D,7,546,0,0
    1258790451.402091,pOY6Rw7lhUd,192.168.1.106,138,192.168.1.255,138,udp,,0.000000,0,0,S0,F,0,D,1,229,0,0
    1258790493.787448,pn5IiEslca9,192.168.1.104,138,192.168.1.255,138,udp,,2.243339,348,0,S0,F,0,D,2,404,0,0
    1258790615.268111,D9slyIu3hFj,192.168.1.106,137,192.168.1.255,137,udp,dns,3.764626,350,0,S0,F,0,D,7,546,0,0
    [...]
    

    Add --separator=X to set a different separator.

  • Extracting a subset of columns:

    $ ds2txt --select '*' ts,id.resp_h,id.resp_p --skip-all conn.log
    1258790493.773208 192.168.1.255 137
    1258790451.402091 192.168.1.255 138
    1258790493.787448 192.168.1.255 138
    1258790615.268111 192.168.1.255 137
    1258790615.289842 192.168.1.255 138
    [...]
    
  • Filtering rows:

    $ ds2txt --where '*' 'duration > 5 && id.resp_p > 1024' --skip-all  conn.ds
    1258790631.532888 V8mV5WLITu5 192.168.1.105 55890 239.255.255.250 1900 udp  15.004568 798 0 S0 F 0 D 6 966 0 0
    1258792413.439596 tMcWVWQptvd 192.168.1.105 55890 239.255.255.250 1900 udp  15.004581 798 0 S0 F 0 D 6 966 0 0
    1258794195.346127 cQwQMRdBrKa 192.168.1.105 55890 239.255.255.250 1900 udp  15.005071 798 0 S0 F 0 D 6 966 0 0
    1258795977.253200 i8TEjhWd2W8 192.168.1.105 55890 239.255.255.250 1900 udp  15.004824 798 0 S0 F 0 D 6 966 0 0
    1258797759.160217 MsLsBA8Ia49 192.168.1.105 55890 239.255.255.250 1900 udp  15.005078 798 0 S0 F 0 D 6 966 0 0
    1258799541.068452 TsOxRWJRGwf 192.168.1.105 55890 239.255.255.250 1900 udp  15.004082 798 0 S0 F 0 D 6 966 0 0
    [...]
    
  • Calculate some statistics:

    Mean/stdev/min/max over a column:

    $ dsstatgroupby '*' basic duration from conn.ds
    # Begin DSStatGroupByModule
    # processed 2159 rows, where clause eliminated 0 rows
    # count(*), mean(duration), stddev, min, max
    2159, 42.7938, 1858.34, 0, 86370
    [...]
    

    Quantiles of total connection volume:

    > dsstatgroupby '*' quantile 'orig_bytes + resp_bytes' from conn.ds
    [...]
    2159 data points, mean 24616 +- 343295 [0,1.26615e+07]
    quantiles about every 216 data points:
    10%: 0, 124, 317, 348, 350, 350, 601, 798, 1469
    tails: 90%: 1469, 95%: 7302, 99%: 242629, 99.5%: 1226262
    [...]
    

The man pages for these tools show further options, and their -h option gives some more information (either can be a bit cryptic unfortunately though).

Deficiencies

Due to limitations of the DataSeries format, one can unfortunately not inspect files before they have been fully written. In other words, when using DataSeries, it's currently it's not possible to inspect the live log files inside the spool directory before they are rotated to their final location. It seems that this could be fixed with some effort, and we plan to work with the DataSeries development team on that if the format gains traction among Bro users.

Likewise, we're considering writing custom command line tools for interacting with DataSeries files, making that a bit more convenient than what the standard utilities provide.

Tuesday, May 22, 2012

Announcing Bro Exchange 2012

UPDATE: We have finalized the dates: August 7-8, 2012. See the new Exchange 2012 web page for more information.

Due to overwhelming demand for a user meeting instead of a workshop this year I’m pleased to announce that we are going to be holding an event that we are calling “Bro Exchange 2012”. The name derives our desire to get a large number Bro users together in the same room to exchange thoughts and talk about what they are doing with Bro. I believe that the community has grown to the point where this is needed, it’s time for people to stop operating in isolation.

We received a generous offer from the folks at the National Center for Atmospheric Research in Boulder, Colorado to use their facilities. Thank you NCAR! If you are curious about what goes on at NCAR, you can check out their website: http://ncar.ucar.edu/

What we need from you now is proposals! Get in touch with us at info@bro-ids.org if you have any ideas for talks, demos, interpretive dance, anything really. We are looking for people to use and abuse Bro in any way imaginable. You will get major consideration if you talk about operationally useful aspects of Bro, but personally I would like some completely fun presentations too.

Proposals don’t need to be concretely formed at this point either. If you have an idea we are more than willing to work with you to see if we can find the nugget in your idea that could turn in a great presentation.

Here are some quick ideas for presentations:

  • Tell an appropriately anonymized story of a security incident and how Bro was or could have been used during the incident.
  • Demonstrate how you integrated Bro or Bro data with some external system.
  • Give a brain dump of feature requests or detections you’d like to see.
  • Write and perform a song. OpenBSD always has them, why not Bro?
  • Do a short tutorial of some small part of Bro akin to our workshop presentations.

I can’t wait to see what you all come up with! More information and a registration website will be coming soon.

Friday, May 18, 2012

Upcoming: Bro 2.1 IPv6 Support

The upcoming Bro 2.1 release includes major improvements to its IPv6 support. IPv6 is enabled by default and no longer needs any special configuration. IPv6 has been fully integrated into all parts of Bro including protocol analysis and the scripting language.

Some of the most significant enhancements include support for IPv6 fragment reassembly, support for following IPv6 extension header chains, and support for tunnel decapsulation (6to4 and Teredo). The DNS analyzer now handles AAAA records properly, and DNS lookups that Bro itself performs now include AAAA queries, so that, for example, the result returned by the "lookup_hostname" built-in function is a set that can contain both IPv4 and IPv6 addresses. Support for the most common ICMPv6 message types has been added. Also, the FTP EPSV and EPRT commands are now handled properly.

When building Bro from source, the "--enable-brov6" configure option has been removed because it is no longer relevant. The way IP addresses are stored internally has been improved, so Bro can handle both IPv4 and IPv6 by default without any special configuration.

There are a couple of changes to the Bro scripting language to better support IPv6. First, IPv6 literals appearing in a Bro script must now be enclosed in square brackets (for example, [fe80::db15]). For subnet literals, the slash "/" appears after the closing square bracket (for example, [fe80:1234::]/32). Second, when an IP address variable or IP address literal is enclosed in pipes (for example, |[fe80::db15]|) the result is now the size of the address in bits (32 for IPv4 and 128 for IPv6).

There are several new built-in functions. The "is_v4_addr" and "is_v6_addr" built-in functions can be used to determine whether a given IP address is IPv4 or IPv6. The "to_subnet" built-in function can do conversions from a string representation of a subnet (such as "192.168.0.0/16" or "2607:f8b0::/32") to the corresponding value as a Bro "subnet" type. Similarly, "addr_to_counts" and "counts_to_addr" can do conversions between an IP address and a vector of counts (four elements if address is IPv6 and one if IPv4). Finally, "routing0_data_to_addrs" takes the "data" field of an IPv6 type 0 routing header and returns a vector of IP addresses contained in the routing header data.

A couple built-in functions have been removed: "addr_to_count" (this only worked with IPv4 addresses; use "addr_to_counts" instead), and "bro_has_ipv6" (this is no longer needed, because Bro always supports IPv6 now).

There are some new events that improve support for IPv6 (although neither of these events are yet handled in any of the Bro scripts). The event "ipv6_ext_headers" is generated for any IPv6 packet containing extension headers. Another new event "esp_packet" is generated for any packets using ESP (Encapsulating Security Payload).

There are some new events that are generated for specific ICMPv6 message types: "icmp_packet_too_big", "icmp_parameter_problem", "icmp_router_solicitation", "icmp_router_advertisement", "icmp_neighbor_solicitation", and "icmp_neighbor_advertisement". And there's a new event "icmp_error_message" that is generated if Bro sees an ICMPv6 error message for which there is no dedicated event. It should be noted that none of these new events are currently handled in any of the Bro scripts.

One other small change related to ICMP events is that the "icmp_redirect" event signature has changed (it now includes both the target and destination addresses).

Although not a new event, the "dns_AAAA_reply" event is now generated for DNS replies of type AAAA (previously, Bro would generate a "dns_A_reply" instead), and the event signature has changed slightly (the last parameter has been removed because it was unused). There is a new event "dns_A6_reply" that is generated for DNS replies of type A6.

There is a new experimental feature (to enable it, build Bro with the new configure option "--enable-mobile-ipv6") to analyze Mobile IPv6 (see RFC 6275). If enabled, there is a new event "mobile_ipv6_message" (although currently none of the scripts handle this event).

In addition to Bro itself, the other Bro components have also been made IPv6-aware by default. In particular, significant changes were made to trace-summary, PySubnetTree, and Broccoli to support IPv6.

There are a few API changes in PySubnetTree to support a new concept called binary lookup mode, which only affects IP address lookups (i.e., this feature does not affect how subnets are added to the SubnetTree data structure). There is a new method "set_binary_lookup_mode" which can be used to enable or disable binary lookup mode, and there's a new method "binary_lookup_mode" to check whether or not binary lookup mode is currently enabled. Finally, the SubnetTree constructor has a new optional argument which lets you choose whether or not to enable binary lookup mode immediately, but you can always use "set_binary_lookup_mode" at a later time.

There are a few API changes in Broccoli to support IPv6. First, there is a new type "BroAddr" which can store either an IPv4 or IPv6 address, and the "BroSubnet" type has been made larger to accommodate both IPv4 and IPv6. Also, there is a new function "bro_util_is_v4_addr" which can be used to check if an address is IPv4 or not. Finally, there is a new constant "BRO_IPV4_MAPPED_PREFIX" which is the first 12 bytes of a 16-byte IPv4-mapped IPv6 address (see RFC 4291).

Tuesday, May 15, 2012

Upcoming: Bro 2.1 Development Updates

We are getting close to finalizing the feature set for the upcoming Bro 2.1 release. To give you an idea what's in the queue, we will be doing a series of blog postings that focus on the main areas we have been working on since 2.0. Specifically, expect to see development updates on the following areas:

Extensive IPv6 Support
We are completely revamping Bro's IPv6 support. With Bro 2.1, IPv6 will be fully integrated into protocol analysis and scripting language (and no longer be the fragile, optional code that it used to be). In addition, we are adding support for many more IPv6 features, including ICMPv6 and tunnel decapsulation.
Binary Logging
Bro's default ASCII output is not ideal for handling large volumes of logs. In 2.1, we are adding experimental support for binary output using HP Lab's DataSeries. DataSeries is a format optimized for handling high-volume logs.
Input Framework
Bro 2.1 will come with a new framework for reading data into script-land at runtime, such as blacklists and other external context. Initially, we are focussing on reading ASCII files with a column-based structure similar to Bro's logs. But we designed the framework internals more generally, and new input formats can be added as plugins, similar to how the existing logging framework operates.
File Analysis Framework
We are unifying Bro's approach to inspecting file transfers it observes on the wire. In 2.1, a new framework will provide protocol-independent file reassembly and analysis, with extensive hooks to get access to their content.

The code for all these is either already merged into current git master or is currently waiting for final touches in a feature branch. Stay tuned for more information.