Introducing htpy

November 12, 2013
CND Tools: Post by Wesley Shields

Although HTTP is an application layer protocol, it is commonly used as a transport for all kinds of other protocols, including malware command and control (C2). MITRE, like many other organizations, has had to find ways to understand the C2 channels that use HTTP. Anyone who has had to deal with embedded C2 protocols knows how tedious it can be to deal with the underlying protocol, so we found a way to quickly parse the HTTP to get on with the task of analyzing the embedded C2 protocol.

Rather than invent our own parser, we developed a Python interface— called htpy (pronounced h-t-pie) — to an HTTP parser called LibHTP that is used by the Suricata project. We use htpy from ChopShop to quickly and easily parse the HTTP, leaving us with just the embedded malware protocols to analyze. While htpy ties in nicely with ChopShop, it's a separate project and can be used for HTTP parsing in other Python programs.

In this post, I will be walking through a high-level overview of htpy, and its use within ChopShop. You can also read the htpy documentation for more detailed information.

Let's start with an example

A good example for how to combine LibHTP and ChopShop to get a powerful HTTP parsing capability is the http_extractor module. I won't go through the entire module, but instead focus on the important interactions between ChopShop and htpy, primarily pointing out the htpy API and some basic do's and don'ts to keep in mind. I'm going to assume you have a working knowledge of HTTP and understand the different components that make up a transaction.

Connection Parsers

The main object used in htpy is called the connection parser (often abbreviated to cp). The primary way to instantiate a connection parser is using htpy.init(), which returns a connection parser with the default configuration.

Modern HTTP implementations tend to serialize multiple requests, waiting for a response before sending another request in the same session. To speed things up they will parallelize the overall process by spawning multiple, simultaneous TCP connections. From a ChopShop perspective this translates into needing one connection parser per session. The best place to instantiate the connection parser is to do it in the taste function, and the best place to store the connection parser is in the stream data dictionary:

tcp.stream_data['cp'] = htpy.init()


Once you have instantiated a connection parser for a given stream, you have to decide which parts of the HTTP protocol you want to parse. Callback functions are heavily used by LibHTP and by extension htpy to allow users of the library to process the parsed HTTP contents. Four of the more commonly used callbacks in ChopShop modules are listed below:

  • request_headers
  • response_headers
  • request_body_data
  • response_body_data

As you might guess, the names of these callbacks reflect the point in an HTTP transaction when the callback is called. For example, when using the request_headers callback you know that your callback function will be called as soon as parsing the request headers is finished. The request_body_data callback is called whenever a blob of body data is processed.

On Buffering, Chunked Encoding, and Compressed Bodies

Due to the way LibHTP is designed, if you have a request body that spans multiple packets, your request_body_data callback will be called once per packet, and it is up to your callback function to buffer up the data, if desired. Buffering is best done in the stream_data dictionary. To get this dictionary passed to your callbacks, please see the section on sending an object to callbacks.

HTTP supports something known as chunked transfer encoding. This is a means for servers to send data to clients without knowing  ahead of time the full size of the data. From a parsing perspective, it is important to recognize and deal with chunked transfers appropriately, and LibHTP does that automatically for you. Lastly, if the request body is gzip compressed, it will be decompressed automatically, prior to calling your callback function.

Registering Callbacks

There are multiple types of callback functions, all of which are registered the same way. The difference among these callback types is the function definitions for your callbacks. The two most commonly used callback types are regular callbacks and transaction callbacks.

A typical function definition for a regular callback is:

def request_headers_callback(cp)

The cp parameter is the connection parser that was in use at the time your callback was called. This will come into play later when we examine the other methods of connection parsers used to examine the parsed data.

A typical function definition for a transaction callback is:

def request_body_data_callback(data, length)

The data parameter is the blob of body data and the length parameter is the length of that data. Seeing the length parameter in a Python program may seem weird, but remember that htpy is a very thin python wrapper around LibHTP, and that LibHTP is written in C. The concept of "pointer to some data and the size of that data in memory" is a common idiom in C, and htpy is just exposing that directly. For more information on the different callback types and their definitions, please see the callback section of the README on GitHub.

Now that we know what the definition of our callback function needs to look like, we can register the callback with the connection parser. The connection parser object has one registration method per callback, and each one takes the callback function as a single argument. To register the two callbacks described above you would use:


Parsing Data

Now that we have defined some functions to be called at specific points in an HTTP transaction, we have to get the HTTP data into the parser. The connection parser object has two methods for consuming data: req_data and res_data (request data and response data). This happens to align nicely with ChopShop's directionality checks, described in our previous posting. If the server's count_new attribute is greater than 0, we know we have an HTTP request (data going to the HTTP server makes it a request). If the client's count_new attribute is greater than 0, we know we have an HTTP response. You can use this knowledge to feed the data into the appropriate connection parser method.

if tcp.server.count_new > 0:
elif tcp.client.count_new > 0:

Using Callbacks

Using callbacks depends entirely on what you want to do with the parsed data. The connection parser object that gets passed into your callback has many useful methods. If you want to check the Host header field of an HTTP request, you can use the get_request_header method:

def request_headers_callback(cp):
    if cp.get_request_header('host') != "":
        return htpy.HTP_STOP
    return htpy.HTP_OK

Each callback must return a constant value to indicate the status. The defined constants are htpy.HTP_OK, htpy.HTP_ERROR, and htpy.HTP_STOP. These return values tell the underlying parser what to do. If HTP_OK is returned, the parser continues as normal. If HTP_ERROR is returned, the parser will refuse to parse any more data and an exception will be raised. Lastly, ifhtpy.HTP_STOP is returned, the parser will refuse to parse any more data and a different exception will be raised. This allows you to catch the appropriate stop exception and call tcp.stop() in your module.

Wrap Up

Parsing HTTP can be a very complex undertaking. By leveraging LibHTP, htpy provides a simpler way for ChopShop authors to write HTTP-based decoders. An analyst with a good understanding of these tools can quickly parse through the HTTP and be left with only the malware protocol, thus substantially reducing the overall time needed to produce the decoder.

As always, if you wish to contact me about ChopShop, htpy, or any of the other things discussed here, please contact me.