The Life of Binaries

January 18, 2013
Cyber Training: Post by Xeno Kovah

This is the third in a series of posts about internally developed computer security training classes that have been taught within MITRE (Technical Training) and have been publicly released.

In this post, the editor continues an interview with Xeno Kovah, this time about his 2-day class, The Life of Binaries, released to the Open Training community.

Editor: What topics are covered in your class on binaries, which seems an unusual topic to teach a class on, so there must be more to it, right?

Xeno Kovah: Well, the idea behind this class is to cover the life cycle of binaries, from their conception in high level programming languages to their death as a process being unloaded from memory (only to be re-born the next time they're executed!). In support of this timeline-based approach, I start by covering some basics of compilation. Of course, college classes cover this more thoroughly, so the point of my covering it is to point out the machinery of binary generation, and also how that influences some quirks of the final generated assembly code.

A major portion of the class is spent dissecting the Windows Portable Executable (PE) file format. So much so that the first year this class was taught, I didn't have enough time to cover the Linux Executable and Linkable Format (ELF). As a result, the next version of the class has dropped the compiler information in favor of jumping right to PE format in order to make room for ELF.

As with the other classes I've developed, if you scroll to the bottom of the training page, then click on the p (pi) symbol, you'll see a work-in-progress map (PDF) of the nitty-gritty details of what the class covers. As I've mention in our previous interviews, I'm not yet giving the map a more prominent position on the class page because I want to make it interactive, such that when you click on a topic in the map, you'll go to that portion within the class video that addresses it.

[Ed.]: Why do you think it's important that people know this level of depth on binaries?

[XK]: As I already mentioned, certainly compiler classes are common in college curricula. However, many classes stop short of having the students' compilers actually output real assembly language code. So the point of this class is to highlight one example approach to compilation, though by no means the most common, that explicitly outputs x86 assembly. In this way I hope to tie foundational knowledge to vocational knowledge, which often shows up in the context of reverse engineering.

For the binary formats like PE and ELF, it's very important to understand how they are used in order to understand how certain classes of malware, such as viruses and "packers," can abuse them. That's why there are some labs at the end of the PE section, showing how it's applicable to a custom (mostly-safe ;)) example virus and the most common packer, UPX. "Packers" are programs that play games with memory layout and the contents of an executable in order to obfuscate the true purpose of a binary from a reverse engineer.

Another reason for including this information is because I feel that there are a lot of people who know network security, but not enough who know host-based security. When you treat hosts as black boxes, there's an awful lot of capability and complexity hidden from your view. And when you learn depth, not just broadly, you'll probably be better at finding solutions for new problems, and just generally make yourself more adaptable to the challenging projects in your workplace.

[Ed.]: OK. What other security jobs can make use of this type of knowledge?

[XK]: For this particular class I like to think it's just interesting enough to know the information on what actually goes on when you click on an executable on your PC :)

But the primary area where this information is useful is in supporting reverse engineers' understanding of possible malware manipulations. That's why we make this a prerequisite for the Introduction to Software Reverse Engineering class, as well as the Rootkits class. Speaking of rootkits, one of the two core userspace hooking techniques is to manipulate a data structure that is covered extensively in the Life of Binaries class. And in the general malware world, there are plenty of tricks malware can play on a malware analyst by manipulating the binary format; an analyst needs to know the details of the format in order to recognize and counter the tricks.

[Ed.]: So, you must be making use of this in your work at MITRE. Is that right?

[XK]: Yes. For a particular project I was working on a few years back, I had to learn the nitty-gritty details of how binary formats work in order to analyze binary packers. After that project, this knowledge found new life when I applied it to the Checkmate project, which I've been mentioning in every blog post. ;) In support of its memory integrity verification capabilities, Checkmate needs to be able to analyze code in memory to ensure it hasn't been tampered with by malware. Part of the way to do this is to parse the PE header information in memory, determine the locations that should never legitimately change, then hash those regions and send the measurement back to confirm that it matches expectations. It was this earlier work on analyzing malicious software that made my life that much easier when I went to build defensive software.

[Ed.]: You mentioned two classes in your curriculum that make use of this knowledge. Could you explain more?

[XK]: Sure. Viewing the course map at the bottom of the course page shows the relationship between this class and the others. This class contains important knowledge that is highly recommended for a student to know before taking the later Rootkits class. To be more specific, the knowledge of the Import Address Table and Export Address Table are very important because they can be utilized by user space rootkits to intercept functions for the purpose of monitoring or hiding activity. Also, the discussion of the very common but very basic packer UPX, and how it manipulates binary section mappings, foreshadows the more advanced compression and obfuscation techniques that can occur. This is talked about a bit in the Reverse Engineering Malware class, but honestly it's a topic unto itself, and we hope to have a class that focuses on how to "unpack" packed code in the future.

[Ed.]: What made you want to make this class publicly available?

[XK]: The main driver for this class is that I saw how long it took me to learn the topic of binary formats when I was working on the packer analysis project, and I wanted to make it easier to bootstrap those who would be working on the project after me. And after I later found an unexpected reuse for this knowledge, I figured that other people might find other uses that I hadn't even thought of. So I decided to put the knowledge out there in an easily accessible manner.

[Ed.]: Nice. What else do you have up your sleeve for future follow-up classes?

[XK]: I'm mostly focused on improving my existing four classes. This class focused on 32-bit architectural details. I've updated it to focus on 64-bit systems, which are becoming more common as people switch to 64-bit desktop OSes. For this class it turns out that there are not a lot of changes to the binary format for 64-bit vs. 32-bit binaries (which is always good for increasing compatibility and making the knowledge last). The updated class was delivered in November 2012.

Perhaps not new work per se, but I'm looking at integrating a training game as the core delivery of the class examples. Basically after every section of the class, students get a randomized set of questions about the previous material, then they have to dive right in with the tools in order to figure out how to answer the questions. As the game is being refined, I'm posting it over in the "BinaryScavengerHunt" portion of the R0x0r Arcade games project.

[Ed.]: Cool. Thank you, Xeno. We'll be talking to you again next time about your Rootkits class.