This is a guest blog post from Martin Holste. He's been a great participant in our community and lead developer of the log search utility; ELSA. We asked him to do a guest blog post because we think ELSA is so important to give security analysts better visibility into their Bro logs.
One of Bro's greatest strengths is the massive amount of incredibly detailed information it produces that describes exactly what's taking place on your network. It does all of this by default, with no extra configuration or tuning required. Then on top of that, it provides a framework for creating advanced IDS signatures. This is an amazing thing, but the benefit is only as good as the extent to which the security or IT staff is able to make use of the data. Here is an example line of output from Bro:
1322829241.041505 drj3tWq4mu8 10.236.41.95 63714 220.127.116.11 80 HTTP::MD5 10.236.41.95 c28ec592ac13e009feeea6de6b71f130 http://au.download.windowsupdate.com/msdownload/update/software/secu/2011/01/msipatchregfix-amd64_fdc2d81714535111f2c69c70b39ed1b7cd2c6266.exe c28ec592ac13e009feeea6de6b71f130 10.236.41.95 18.104.22.168 80 - worker-0 Notice::ACTION_LOG 6 3600.000000 - - - - - - - - -
There are many currently available methods for making sense of this output. Most of those methods involve variations of using text utilities to search and format the log data into an output that is requested. The problem with this is that for large installations, scalability quickly becomes an issue. To start with, combining logs from multiple servers is non-trivial if a single location does not have enough disk space to store all of the logs. Even if you can get all of the logs in one location, grepping through the hundreds of Gigabytes per day per sensor that Bro can produce in large environments is prohibitively inefficient.
How much does Bro log? A large network with tens of thousands of users will generate a few thousand HTTP requests per second during the day. Bro will create many logs describing this activity, namely, per request:
- 1 HTTP connect log
- 1 DNS log (when a lookup is necessary)
- 1 Notice log (if an executable is downloaded)
- 2 Connection logs (TCP for HTTP, UDP for DNS)
- 1 Software inventory log (if this client hasn't been seen before)
That's a total of six logs for just one HTTP request. If the network is seeing 2,000 requests per second, that's 12,000 logs per second (about one billion per day). The logs average about 300 bytes, which means this is about 3.6 MB/sec of logs. That's about 311 Gigabytes of logs per day (if the rate were constant). Text utility speeds vary greatly, but searching even a few Gigabytes of data will take many seconds or minutes. Searching 311 Gigabytes will take hours.
To put this in perspective, if we assume that a single log entry is represented by a stalk of hay, and a stalk of hay is 50 grams, and a hay bale contains 1,000 stalks for 50 kg, then one billion logs would take 1,000,000 bales. If a bale is one meter long and half a meter wide, that would be 500 square kilometers of hay to search through, per day. That's a haystack of 15,000 square kilometers per month (about five times the size of Rhode Island) to search through for a given log.
Enter ELSA: the open-source project for Enterprise Log Search and Archive. ELSA (http://enterprise-log-search-and-archive.googlecode.com) is capable of receiving, parsing, indexing, and storing logs at obscene rates. It provides an easy to use full-text web search interface for getting that data into the hands of analysts and customers. In addition to basic search, ELSA provides ways to report on arbitrary fields such as time, hostname, URL, etc., email alerts for log searches, and a mechanism for storing and sharing search results.
How fast is ELSA? It will receive and index logs at a sustained rate of 35,000 events/sec on a single modest server and can burst to 100,000 events/sec for long periods.
ELSA will have no problems indexing and archiving the log volume described above for a busy network, even on a single modest system. It will (more importantly) return data in less than a second when you ask, for example, for all the unique IP's to visit a certain website in the last month (about 30 billion or 9.3 TB of logs). It's these arbitrary, ad-hoc reports that make ELSA so helpful. However, there are more conventional ways of increasing search time if you have a good idea of what you're looking for ahead of time or know what time period you're searching in. ELSA's core strength is that you do not have to know what you will be looking for and yet you can send enormous volumes of logs to it. This means you do not need to waste time deciding what the log volume will be or whether or not storing the log is worth it, because storing and searching is done in constant time; search times do not increase with log volume.
A common use for ELSA with Bro logs is to drill-down on and investigate notices. For instance, Bro ships with notices for SQL injection attacks. When you see these notices (or have ELSA alert you on them with “notice_type=SQL_Injection_Attacker”), you will want to investigate to see if the attack is a true positive and if it was successful. Doing so is easy because you can take the IP addresses described in the notice and plug them in as a search for HTTP traffic to see that actual requests. This greatly decreases the amount of time it takes for analysts to get through the day's workload of investigations because alert data can be confirmed or refuted with ease.
Another use case is when you have been given credible intelligence of a possible threat to your network from an outside source, and you want to retroactively search to see if there are any prior instances of contact with the hostile entity. Simply plugging in the hostname, URI, IP address, HTTP user-agent, or other bit of intelligence into ELSA will tell you if there are any prior incidents to investigate based on the network activity that Bro has logged.
A third use case is using ELSA to guide creating Bro scripts. For instance, if you're not sure whether or not a script that looks for the existence of a certain URI parameter is common or not, you can quickly test that in ELSA to see if the script will generate a lot of meaningless notices. The ability to explore historical data quickly means that you will have pre-tuned scripts and an excellent understanding of what the data on your network looks like.
In addition to Bro logs, ELSA will store and index any other kind of log sent to it. This allows analysts to corroborate notices with logs from servers, network gear, web proxies, and other devices that can send syslog. All of the logs can be presented in the same search result for easy correlation.
ELSA uses a plugin infrastructure for allowing its functionality to be extended. It ships with a plugin for a separate project, StreamDB (http://streamdb.googlecode.com), which is similar to TimeMachine. Using the StreamDB plugin, you can go directly from viewing a Bro notice to viewing the actual traffic (shown in in either printable ASCII or hex form) in two clicks. StreamDB will retrieve the traffic and present it in less than a second.
Plugins are very extensible and are run on the server-side of the web interface , so a plugin that interfaces with a local asset management database or personnel directory would be trivial to write and implement for quick asset lookups, vulnerability management information, etc.
To start with, you need to configure Bro to send its logs as syslog. Instructions on setting up Bro for doing this on Ubuntu can be found here: http://ossectools.blogspot.com/2011/09/bro-quickstart-cluster-edition.html. Instructions for installing ELSA can be found here: http://ossectools.blogspot.com/2011/11/elsa-beta-available.html. There is a Google Group for ELSA support and questions at https://groups.google.com/d/forum/enterprise-log-search-and-archive.