Friday, May 25, 2018

Broker is Coming: Persistent Stores

Note: This is a guest blog post by Mike Dopheide.

----------------------------------------------------------------------------------------

Disclaimer:  If you aren't familiar with the Bro IDS software, this is going to make zero sense.

The Bro development team has been hard at work and, Broker, the new communication framework isn't far off.  I started looking into it to solve a problem I was having and learned quite a bit along the way.  Given that the documentation and examples aren't all done yet, I thought it might be nice to share a basic example of how data stores can be used.

Getting started, you'll need to be using the master branch.  Of note, Bro now ships with libcaf so if you don't specify --with-caf during configure, Bro will just build it for you which is super nice.

Problem Statement


To give you a little bit of context for where my head was at, we'd noticed our worker nodes continuously doing thousands of reverse DNS requests.  I tracked this down to the built-in protocols/ssh/interesting-hostnames.bro policy which tries to tell you if there is a successful ssh login to a host with an interesting name, like 'mail' or 'dns'.  Our issue was that internally processes were ssh'ing into all of our systems on a regular basis causing tons of redundant queries for local systems.  Wouldn't it be nice if we cached those locally on the worker?  Since we already have a list of locally known hosts provided by protocols/conn/known-hosts.bro, I set out to extend/replace that to also do the DNS lookups.  Might as well figure out how to do it with the upcoming Broker framework so it doesn't have to be re-written later.

What I'm going to show isn't that full extension (Part 2?), but an example of how to interact with persistent Broker stores.  If you want to follow along, now is a good time to make sure your known_hosts.log is working correctly[1] to begin with.

Making known_hosts Persistent


Let's look at the newly rewritten protocols/conn/known-hosts.bro to point out a few things.  First, let's take note of the variable defining the name of the data store name, this will be relevant in a little bit:

global host_store: Cluster::StoreInfo;
...
const host_store_name = "bro/known/hosts" &redef;

Next, inside bro_init() we see this:

Known::host_store = Cluster::create_store(Known::host_store_name);

The data store is created when Bro initializes and this needs to happen before you can access data in the store, regardless of whether the store is persistent or not[2].  We'll see later that this may be a bit counter intuitive as you have to 'create' a store that may already exist on disk.

In order to make Known::host_store persistent, it's helpful to understand what the Cluster::create_store function is doing. From
share/bro/base/frameworks/cluster/main.bro:


global create_store: function(name: string, persistent: bool &default=F): StoreInfo;
...
const default_backend = Broker::MEMORY &redef;
...
const default_persistent_backend = Broker::SQLITE &redef;

Here we notice that the default for create_store is to create a store that is not persistent, which results in simply using memory for your backend.  If we could change the call in known-hosts.bro to be Cluster::create_store(Known::host_store_name, T), we'd be all set.  However, that's not an option without modifying shipped code, fortunately there's another option.

Recall above that I didn't talk about host_store being allocated as Cluster::StoreInfo.  This is a record that contains a backend element.

type StoreInfo: record {
...
               ## The type of backend used for storing data.
               backend: Broker::BackendType &default=default_backend;
...

So the backend for our store is set to the default of Broker::Memory before it's actually created and this storeinfo record is redef'able.

Add these lines to a new test script (broker-test.bro in my case) or local.bro, as long as it happens after loading known-hosts.bro:


redef Cluster::stores += {
 [Known::host_store_name] = Cluster::StoreInfo($backend = Broker::SQLITE)
};

Checking to Make Sure the Data Store Exists via CLI


First, let's examine what we should see at this point.  Back in protocols/conn/known-hosts.bro we can find the call to Broker::put_unique():

event Known::host_found(info: HostsInfo)
       {
       when ( local r = Broker::put_unique(Known::host_store$store,
info$host, T, Known::host_store_expiry) )

There are two things to notice here.  First, Broker data store functions need to happen through asynchronous when()calls, that's important to know for later.  The second thing is the arguments to put_unique() where we see the store, key, value, and expiration.  What this means is we should see keys in the store with values that are simply the T bool.

Now run Bro.  If you run Bro in standalone mode, you will find the data store off of your current directory in a name that matches the name of the data store from above, bro/known/hosts.sqlite, or as a cluster the default store location will be $BROPATH/spool/stores/bro/known/hosts.sqlite

Hopefully you've got sqlite3 so you can use it on the command line to verify the data store:

[bro@hostname known]$ sqlite3 hosts.sqlite
SQLite version 3.7.17 2013-05-20 00:56:22
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> .tables
meta   store
sqlite> .schema store
CREATE TABLE store(key blob primary key, value blob, expiry integer);
sqlite> select * from store;
||1526488517607812295
||1526488517609842240
||1526488517611842000
...
||1526488517613785435
||1526488517615733229


sqlite>

We see the expiration times, but it clearly looks like something may be wrong.  The keys and values don't appear to exist, but I assure you they're there.  The 'blob's can be seen more clearly in hex:

sqlite> select hex(key) from store limit 1;
0600000000000000000000FFFFAABBCCDD
sqlite> select hex(value) from store limit 1;
0101

The hex values in red are the addr for our keys and 0101 is our bool T value.  Perfect, that's exactly what we wanted to see.

Accessing an Existing Data Store


Let's let take a look at accessing that data store outside of the original policy that created it.  I'll be building a short new Bro policy and show the difference whether or not policy/conn/known-hosts.bro is loaded first or not.


@load base/frameworks/cluster
module Test;
event bro_init() {
     for(store in Cluster::stores){
               print store;
     }
}

The Cluster::stores variable should hold all of the data stores Bro is aware of.  However, when we run Bro with this script there's no output.  Recall I mentioned the the store needs to be created when Bro initializes, regardless of whatever it's persistent or not.  Let's try that:

@load base/frameworks/cluster
module Test;


export {
      global host_store: Cluster::StoreInfo;
}


event bro_init() {
host_store = Cluster::create_store("bro/known/hosts",T);
     for(store in Cluster::stores){
               print store;
     }
}

Here we can create the store adding the extra T argument to make the store persistent without having to do the redef.  Now when we run Bro, the store is initiated and we can see "bro/known/hosts" in the output.  However, there's a bit cleaner way given that we assume policy/protocols/known-hosts will already be loaded.  Make sure we remember to redef the backend in this case (you may have done this in local.bro previously).

@load base/frameworks/cluster
@load protocols/conn/known-hosts
module Test;


redef Cluster::stores += {
   [Known::host_store_name] = Cluster::StoreInfo($backend = Broker::SQLITE)
};


event bro_init() &priority=-5 {
     for(store in Cluster::stores){
               print store;
     }
}

Note that by loading known-hosts and setting a negative priority, we allow the create_store()call to happen before we get to this point.  We also want to make sure we still do the redef for the store backend.  That doesn't really count as accessing the store, we've just verified that it's loaded, so let's add a little more:


@load base/frameworks/cluster
@load protocols/conn/known-hosts
module Test;


redef Cluster::stores += {
   [Known::host_store_name] = Cluster::StoreInfo($backend = Broker::SQLITE)
};


event bro_init() &priority=-5 {


when ( local r = Broker::keys(Known::host_store$store)){
if ( r$status == Broker::SUCCESS ){
print r$result
}
}timeout Known::host_store_timeout{
print "Broker timeout\n";
}
}

The Broker::keys() function, as the name implies, returns all of the keys from a data store.  It's important to note that all when() calls that involve Broker require having a timeout defined. Known::host_store$store and Known::host_store_timeout are available to us as a result of known-hosts being loaded.  Your output should include broker::data that includes a set of your keys:

[data=broker::data{{192.168.1.1, 192.168.2.2, 192.168.3.3}}]

Obviously, if we had values in our store other than just T, we'd want to be able to access those as well.  For that we can use the Broker::get() function.  If we already know the key we need the value for, it's fairly simple:

      when ( local r2 = Broker::get(Known::host_store$store,
192.168.1.1)){
          print r2$result;

     }timeout Known::host_store_timeout{

              print "Broker timeout\n";
     }

The output in this case should be:

[data=broker::data{T}]

This is a good time to introduce that variables can now be cast to other types.

     when ( local r2 = Broker::get(Known::host_store$store,192.168.1.1)){
          print r2$result as bool;
     }timeout Known::host_store_timeout{
              print "Broker timeout\n";
     }

That will give you just the T value.  So that's if you know the key you want to get the value for, but let's say you need to iterate over all of the keys, that's possible as well by embedding the get() inside of the result of keys().

when ( local r = Broker::keys(Known::host_store$store)){
if ( r$status == Broker::SUCCESS ){


for (ip in r$result as addr_set) {
when ( local r2 = Broker::get(Known::host_store$store,ip)){
print fmt("%s %s",ip,r2$result as bool);
}timeout Known::host_store_timeout{
print "Broker get() timeout\n";
}
}
}
}timeout Known::host_store_timeout{
print "Broker timeout\n";
}

And there you have it, iterating over your Broker data store.  Obviously this becomes much more useful and has less overhead when calls to Broker are made through events and not layers of embedded when()'s, but I think this example helps illustrate how things are working at a basic level.

Acknowledgements:  Thanks to Justin Azoff and Jon Siwek for answering my questions and Samson Hille for providing the initial motivation.

Footnotes:

1)  Common problems are not having protocols/conn/known-hosts loaded or not having your Site::local_nets defined (or listed in networks.cfg).
2)  Persistence is just whether or not your data will be saved across Bro restarts.  By default, if you restart Bro, your known_hosts will be empty and all of the hosts you knew before will be re-logged.

No comments:

Post a Comment