Scanning Streaming Data with Yarashop

April 16, 2013
CND Tools: Post by Stephen DiCato

Wesley Shields: Today we're taking a detour from our regular ChopShop posts to talk about another one of MITRE's open source projects: Yaraprocessor. Stephen DiCato will be illustrating how the yarashop module for ChopShop can be used to scan network streams with Yaraprocessor.

We'll skip the ChopShop details for now. Instead we'll discuss how to make use of the Yarashop module to apply your custom YARA rules to network streams.

We'll be resuming our deep dive of ChopShop modules in the next post.

Stephen DiCato:
For those new to YARA, a bit of background is in order. YARA is a "tool aimed at helping malware researchers identify and classify malware samples. With YARA you can create descriptions of malware families based on text or binary patterns contained in samples of those families." Essentially, YARA is useful for finding patterns.

While YARA excels at analyzing static data, scanning streaming data is an entirely different story. Due to the structure of PCAP (packet capture) files, analyzing it with YARA only finds signature matches contained within an individual packet. While this is useful, it suffers from one big drawback: trying to write complex signatures that span multiple packet payloads is impossible. Realistically, you'd be better off analyzing the contents of the actual TCP streams contained in the PCAP file rather than the PCAP file itself.

That idea, of analyzing TCP streams, is the origin for Yaraprocessor, a standalone Python library that allows for scanning data streams in multiple ways, and Yarashop, a plugin for ChopShop that leverages Yaraprocessor.

What ChopShop with Yarashop brings to CND

With Yarashop, an analyst can scan an entire TCP session, the payload of each TCP message individually, or use a fixed-sized buffer to scan chunks of reassembled data in a streaming fashion.

Let's walk through each of these scenarios using a very simple network session containing three packets—see Figure 1 below—that's been reassembled by ChopShop. For illustrative purposes only the packets flowing in one direction are shown, even though Yarashop will analyze each direction of a TCP session individually.

Figure 1: TCP network session reassembled in ChopShop, ready for processing by Yarashop
Figure 1: TCP network session reassembled in ChopShop, ready for processing by Yarashop

Scenario 1: Scanning an entire TCP session

By default, Yarashop will concatenate all payloads and scan the result. This is the equivalent of scanning the entire network session in one direction. See Figure 2 below that illustrates this using green highlighting to indicate what data is being scanned by YARA. Each green overlay is a representation of one scanning instance. Depending upon the mode chosen, there may be one scan or many scans.

Figure 2: TCP session to be scanned by Yarashop
Figure 2: TCP session to be scanned by Yarashop

For small network sessions, loading the entire session into YARA is easy and provides good results. Because the entire stream is buffered and scanned all at once, before you begin doing any processing, you should determine the performance impact that different stream lengths will have.

Scenario 2: Scanning a TCP payload

Scanning each individual payload is useful in situations where the size of your stream is large, and performance would be negatively impacted if it were processed as one large chunk. Figure 3 shows per packet scanning, where YARA analyzes one packet at a time.

Figure 3: Per packet scanning performed by Yarashop
Figure 3: Per packet scanning performed by Yarashop

This per packet method is effectively the same as scanning the entire PCAP file directly, just skipping the PCAP header between each packet and the layer 4 and below headers.

Scenario 3: Scanning fix-sized buffers

Scanning fixed-sized buffers is the last scenario. The variables used in this scanning method are the buffer size and the window step. The buffer size refers to the amount of data to scan at a time. Window step dictates how many bytes to advance the buffer between scans. A 1024-byte buffer with a 1024-byte window step would scan an entire network session 1024 bytes at a time, never skipping a byte or scanning a byte more than once. By controlling the window step, the user can control the overlap between scans.

Scanning our example session using a 10-byte buffer size and a 5-byte window step is illustrated in Figure 4.

Figure 4: Fixed buffer scanning performed by Yarashop
Figure 4: Fixed buffer scanning performed by Yarashop

As you can see in the above example, some bytes are actually scanned more than once. This is because the window step value is less than the window size. A window step greater than the buffer size produces a "sampling" effect, where only intermittent sections of the network session are analyzed.

Obtaining Yarashop

In closing, we hope to have illustrated the value of Yaraprocessor and Yarashop when scanning streaming data. As you experiment more with running YARA against network streams, we hope to hear about it. You can find out more about the Yaraprocesser and Yarashop projects on their respective Github pages. If you run into problems, feel free to report them on Github!

Wesley Shields: Thanks, Stephen.

In our next post, we will be talking about analyzing HTTP with ChopShop and htpy.