Chopshop Module Writing Details

January 4, 2013
CND Tools: Post by Wesley Shields

In the first post, we introduced why Chopshop, a network protocol analyzer framework, can be a useful tool in decoding network traffic for incident response. In this second part, we'll discuss how writing additional modules can extend Chopshop.

Module authors will need to understand the data structures provided by and functions needed by Chopshop. Although Chopshop provides support for handling both TCP and UDP datagrams, I'm only focusing on TCP in this post. For the next post I'll use the topics discussed here to give a detailed walkthrough of the "payloads" module provided by Chopshop.

At a high level Chopshop is made up of 3 distinct parts:

  • Core
  • Modules
  • External libraries

Core is responsible for handling the mundane parts of decoder writing. It opens the input file(s), handles output, provides an interface to save files, and does other things that decoder authors should not have to care about. The core takes pcap as input, uses pynids to reassemble the streams and presents the streams to the modules. It also does some bookkeeping on the data structures to ensure modules are only processing data they care about.

Modules are responsible for doing the decoding. They must conform to a very simple API required by the core and are given only the necessary data to get the decoding done.

External libraries contain code that can be shared between decoders. If you have two decoders performing the same de-obfuscation on a block of data it makes sense to put it in a common place and share it between the decoders.

The Anatomy of a Module

When writing a module, Chopshop requires one variable and expects a handful of functions to be in your module. Your module must provide a global moduleName variable. The value of this variable must be a string. Only a small number of the functions are required; the rest are optional. I'll describe the arguments to these functions later.

  • init(module_data): Called once to initialize this instance of your module. When this is called you can use the 'args' key in the module_data dictionary to obtain the string of arguments given to your module on the command line. This string is suitable to pass into your favorite argument parsing routine. The init function must return a dictionary with at least the 'proto' key set with a value of either 'tcp' or 'udp'. Your choice of 'tcp' or 'udp' indicatse which protocol your module is interested in. If, for any reason, your module needs to indicate a failure to initialize, you must set an 'error' key in the returned dictionary with a value of the string you want printed. If the 'error' key is in the dictionary, Chopshop will print the value and exit, without processing any data.
  • module_info(): Called by Chopshop when a user wants to know information about your module (the -m argument to chopshop). This function is optional, but to increase usability we highly recommend you provide it.
  • shutdown(module_data): Called once when Chopshop is shutting down. This function is optional.

If your module is going to process TCP streams you provide these functions:

  • taste(tcp): Called on every stream as soon as the handshake is finished. If the taste function returns true, Chopshop will continue sending the stream to the module for further processing. If it returns false, the module will not receive packets from this stream. At this point there is no layer 7 payload data to inspect, so this function gives your module an initial chance to determine if it is interested in this TCP session or not. You can only inspect the addresses and ports.
  • handleStream(tcp): Called on every reassembled packet from the stream. This is where you'll write the guts of your module. The data you need to process is exposed in either the 'tcp.client.data' or the 'tcp.server.data' arrays. You can use the other attributes of the 'tcp.client' or 'tcp.server' objects to determine which of these you need to use.
  • teardown(tcp): Called on every stream as soon as one of the TCP teardown states is entered. This function is optional.

Attributes of a TCP Object

As a module author, there are only a handful of objects that you need to understand. For modules that deal with TCP, the main object is called the tcp object (see image). It contains a few attributes and functions that are of interest to a module author.

TCP Object
  • addr: A quad-tuple ((src, sport), (dst, dport))
  • timestamp
  • client: An object containing client specific data.
  • server: An object containing server specific data.
  • module_data: A dictionary specific to the lifetime of your module.
  • stream_data: A dictionary specific to the lifetime of the stream being processed.

The functions of the TCP object are:

  • stop(): Tells the Chopshop core that this module is no longer interested in processing this stream.
  • discard(integer): Tells Chopshop core to throw away some number of bytes from the reassembled stream. These bytes will be removed from the start of the 'data' array in either the 'client' or 'server' objects, depending upon which half of a stream is being processed at the time 'discard()' is called.

Client and Server Objects

The data to process is contained in the client and server objects. These objects contain only attributes:

  • count_new: This will always be the number of new bytes since the last call into your module.
  • count: The number of reassembled bytes from the stream.
  • offset: The current position in the stream.
  • data: An array of the available data. The contents of this array will change depending upon your calls to 'tcp.discard()' and general space. It is a fixed size, so if you intend on buffering up large quantities of data you should create an array in your 'tcp.stream_data' dictionary and buffer the data there.

The client and server objects are identical, except they handle different sides of the TCP stream. Normal TCP terminology calls the endpoint that sends the first part of the handshake the "client" and the endpoint that accepts the handshake the "server". The terminology used by Chopshop is slightly different. Chopshop names the client and server objects based on where the data is destined. The server object is for data coming from the machine that sent the first syn (a normal TCP client), and the client object is for data coming from the machine that sent the syn/ack (a normal TCP server).

Example

To make this clear let's describe it in terms of a layer 7 protocol like HTTP. The server object contains data related to an HTTP request and the client object contains data related to an HTTP response. So if you wanted to check what is in an HTTP request packet you look at 'tcp.server.data' and to look at the HTTP response packet you look at 'tcp.client.data'.

The module_data and stream_data dictionaries serve special and distinct purposes. These are arbitrary dictionaries provided to your module and are used for your module to store any data it needs. Which dictionary you use to store a piece of data depends upon what that data is and its relationship to your module. For data useful over the lifetime of your module, you should store it in the module_data dictionary, e.g., options and arguments. For data tied to a specific stream, use the stream_data dictionary. These normally include things like stream state (am I in the middle of a file transfer, or the middle of a shell), or data buffering.

To give this a concrete example let's put it in the HTTP context again. Let's say we wanted to store the number of HTTP request packets in a given stream that start with 'G'. At the end of a TCP stream we want to output the count of those packets, and at the end of the module we want to output a total count.

To do this we initialize two different counters. The first counter is for the per-stream count of matching packets and the second one is for the total count of matching packets across all streams. The total count should be initialized in the init() method of your module.

The total counter initialization would look something like this:

def init(module_data):
    module_data['counter'] = 0
    module_options = { 'proto': 'tcp' }
    return module_options

Here we are creating the 'counter' key in the 'module_data' dictionary and setting the value to 0. The other two lines are the necessary return of a dictionary to tell Chopshop that this module deals with TCP.

The per-stream counter initialization would look something like this:

def taste(tcp):
    ((src, sport),(dst, dport) = tcp.addr
    if sport == 80 or dport == 80:
        tcp.stream_data['counter'] = 0
        return True
    return False

In the above we are checking to make sure the TCP session involves port 80. Let's just ignore the fact that the layer 7 protocol is independent of the port for now, this is a purely illustrative example.)

If the TCP session is in fact on port 80 we set a key in the 'tcp.stream_data' dictionary and the value is 0, and then return True, otherwise we return False. Remember the purpose of the taste function is to give your module a quick sample of the layer 4 headers and let you decide if you care about this stream or not.

Incrementing the counters is done in the handleStream() function.

def handleStream(tcp):
    if tcp.server.count_new > 0: # This is an HTTP request
        if tcp.server.data[0] == 'G':
            tcp.stream_data['counter'] += 1
            tcp.module_data['counter'] += 1
        tcp.discard(tcp.server.count_new) # Discard everything
    elif tcp.client.count_new > 0: This is an HTTP response
        tcp.discard(tcp.client.count_new) # Discard everything

In the above snippet we check to see the direction of the bytes we are processing, and if they are going to an HTTP server and the first byte is 'G' we increment each of our counters.

Displaying the counters is done in two places. For displaying the per-stream counter (stored in the 'stream_data' dictionary) we can use the 'teardown()'function in our module.

def teardown(tcp):
    chop.prnt("Stream tearing down. Saw %i." % tcp.stream_data['counter'])

The 'teardown()' function is called as soon as one of the TCP teardown states is recognized by Chopshop. In this case we use the 'chop.prnt()' function provided by Chopshop to print an informative message and the per-stream counter we have been accumulating.

Displaying the total count across all streams is done in the 'shutdown()'function.

def shutdown(module_data):
    chop.prnt("Shutdown reached. Saw %i." %
module_data['counter'])

You may notice in the 'handleStream()' function we are referencing 'tcp.module_data' instead of just 'module_data' directly. This is because we bundle everything together into a single object to pass into functions where it makes sense to do so. In the case of 'shutdown()', there is no associated TCP stream so there is no 'stream_data' dictionary and no client or server objects, so it makes sense to pass in the 'module_data' dictionary on its own.

A quick note on the 'chop.prnt' statement used above. As a module author you shouldn't have to care about how your output gets handled, you just know you have a string you want to present. Chopshop provides the 'chop' library to abstract the details of this away. As a module author you can call'chop.prnt()' and let Chopshop decide where to put that string (to stdout, to a file, or both). The chop library also provides other output capabilities including JSON, printing with colors on a curses GUI, and writing carved files to disk. The details of these are covered in the Chopshop documentation at our github page.

Looking Ahead

In the next post I'll give a detailed walkthrough of one of our modules that is useful for quickly decoding xor encrypted reverse shells. After that I will cover some more complicated protocols like HTTP and how we have built other tools that integrate very well with Chopshop and make parsing HTTP easy.

As always, if you want to take a look at Chopshop you can get it from https://github.com/MITRECND/chopshop.

Attending ShmooCon 2013?

Join us at ShmooCon, the annual hacker convention, February 15-17, where we'll talk about how ChopShop can be used to "bust the Gh0st".