Wesley Shields: Today we're taking a detour from our regular ChopShop posts to talk about another one of MITRE's open source projects: Yaraprocessor. Stephen DiCato will be illustrating how the yarashop module for ChopShop can be used to scan network streams with Yaraprocessor.
We'll skip the ChopShop details for now. Instead we'll discuss how to make use of the Yarashop module to apply your custom YARA rules to network streams.
We'll be resuming our deep dive of ChopShop modules in the next post.
Stephen DiCato:
For those new to YARA, a bit of background is in order. YARA is a "tool aimed at helping malware researchers identify and classify malware samples. With YARA you can create descriptions of malware families based on text or binary patterns contained in samples of those families." Essentially, YARA is useful for finding patterns.
While YARA excels at analyzing static data, scanning streaming data is an entirely different story. Due to the structure of PCAP (packet capture) files, analyzing it with YARA only finds signature matches contained within an individual packet. While this is useful, it suffers from one big drawback: trying to write complex signatures that span multiple packet payloads is impossible. Realistically, you'd be better off analyzing the contents of the actual TCP streams contained in the PCAP file rather than the PCAP file itself.
That idea, of analyzing TCP streams, is the origin for Yaraprocessor, a standalone Python library that allows for scanning data streams in multiple ways, and Yarashop, a plugin for ChopShop that leverages Yaraprocessor.
What ChopShop with Yarashop brings to CND
Mature CND shops are writing their own signatures (we happen to use YARA) to detect specific patterns in the data they are collecting. If you scan an entire PCAP file, as opposed to the contents of the TCP streams, and a signature is found, you don't get any of the metadata about the individual packet or stream where the signature matched (e.g., the output might only be something like this: Rule "gh0st backdoor" found at offset 7200). This is because YARA has no idea WHAT it is scanning. You need a tool like ChopShop to be able to help you understand the TCP streams as you are scanning.
Yarashop is a way to combine YARA and ChopShop to provide the context about WHAT is being scanned. When a match is found that extra context can be presented to help analysts understand the situation (e.g., the output for the above example would be something like this: Rule "gh0st backdoor" found at offset 400 in stream 1.1.1.1:2456->2.2.2.2:80 at 7:42AM on 2012/06/03).
Yarashop utilizes the Yaraprocessor codebase to do the slicing, dicing and scanning of data. By separating Yarashop from Yaraprocessor, we have made it possible to apply the same scanning processes to more than just network data. You can take Yaraprocessor and use it to scan very large files in chunks, or you can use it to scan other streaming data sources, or anything you want!
With Yarashop, an analyst can scan an entire TCP session, the payload of each TCP message individually, or use a fixed-sized buffer to scan chunks of reassembled data in a streaming fashion.
Let's walk through each of these scenarios using a very simple network session containing three packets—see Figure 1 below—that's been reassembled by ChopShop. For illustrative purposes only the packets flowing in one direction are shown, even though Yarashop will analyze each direction of a TCP session individually.
Figure 1: TCP network session reassembled in ChopShop, ready for processing by
Yarashop
Scenario 1: Scanning an entire TCP session
By default, Yarashop will concatenate all payloads and scan the result. This is the equivalent of scanning the entire network session in one direction. See Figure 2 below that illustrates this using green highlighting to indicate what data is being scanned by YARA. Each green overlay is a representation of one scanning instance. Depending upon the mode chosen, there may be one scan or many scans.
Figure 2: TCP session to be scanned by Yarashop
For small network sessions, loading the entire session into YARA is easy and provides good results. Because the entire stream is buffered and scanned all at once, before you begin doing any processing, you should determine the performance impact that different stream lengths will have.
Scenario 2: Scanning a TCP payload
Scanning each individual payload is useful in situations where the size of your stream is large, and performance would be negatively impacted if it were processed as one large chunk. Figure 3 shows per packet scanning, where YARA analyzes one packet at a time.
Figure 3: Per packet scanning performed by Yarashop
This per packet method is effectively the same as scanning the entire PCAP file directly, just skipping the PCAP header between each packet and the layer 4 and below headers.
Scenario 3: Scanning fix-sized buffers
Scanning fixed-sized buffers is the last scenario. The variables used in this scanning method are the buffer size and the window step. The buffer size refers to the amount of data to scan at a time. Window step dictates how many bytes to advance the buffer between scans. A 1024-byte buffer with a 1024-byte window step would scan an entire network session 1024 bytes at a time, never skipping a byte or scanning a byte more than once. By controlling the window step, the user can control the overlap between scans.
Scanning our example session using a 10-byte buffer size and a 5-byte window step is illustrated in Figure 4.
Figure 4: Fixed buffer scanning performed by Yarashop
As you can see in the above example, some bytes are actually scanned more than once. This is because the window step value is less than the window size. A window step greater than the buffer size produces a "sampling" effect, where only intermittent sections of the network session are analyzed.
Obtaining Yarashop
In closing, we hope to have illustrated the value of Yaraprocessor and Yarashop when scanning streaming data. As you experiment more with running YARA against network streams, we hope to hear about it. You can find out more about the Yaraprocesser and Yarashop projects on their respective Github pages. If you run into problems, feel free to report them on Github!
Wesley Shields: Thanks, Stephen.
In our next post, we will be talking about analyzing HTTP with ChopShop and htpy.