About Us Our Work Employment News & Events
MITRE Remote Access for MITRE Staff and Partners Site Map
edge top

August 2000,
Volume 4
Number 2

Data Mining Issue

Text Mining by Filter Composition

What is the Origin of Data Mining?

Data Mining for Aviation Safety

Identifying Dominant Air Traffic Flows in Complex Airspace

Detecting Changes in Overhead Imagery

Data Mining for Intrusion Detection

 

Home > News & Events > MITRE Publications > The Edge >

Data Mining for Intrusion Detection

Network Operations image

In the Network Operations Center of the future, the security analyst will come to work in the morning, sit down with a cup of coffee, and press the “What’s New?” button on the network monitoring and analysis screen. A list of suspicious incidents and attempted intrusions (more commonly called “attacks”) on the network will appear. Perhaps there is a file transfer at 2 a.m. from a host that usually only has activity during business hours. The analyst will then investigate these incidents to identify them as, for example, “attack” or “false alarm.” The analyst will also be presented with distilled descriptions of attacks that he or she had identified the previous day.

An essential technology in this scenario will be data mining (DM). It will be DM analysis that will determine the bounds for normal network activity, and it will be DM techniques that enable the software to spend the night determining which characteristics of previously identified attack activity distinguish it from normal network usage.

To understand the improvement this will represent, it is necessary to understand the current network intrusion detection (ID) environment. Software “sensors” deployed along the network record activity: the initiation of a World Wide Web connection from host A to host B, for example, or a single outside host trying to connect to every MITRE host. Each sensor records certain important pieces of information about this activity, such as the time of day and the duration of the connection. This information is stored in a database that easily accrues millions of records each day. On a regular basis, security analysts sift through this data looking for the most serious attacks. There can be thousands of suspicious activity alarms and each requires further analysis to fully understand its purpose. Moreover, as commercial ID software currently favors heightened sensitivity, many of the alerts generated are false alarms and result in wasted time.

One of the most serious limitations in identifying and describing new attacks is that there is simply so much data that security experts are not able to examine thoroughly every single alerted activity. And, as data collection grows with increased network usage, little is being done to help mitigate this situation by performing analysis to determine which data is the most relevant and which data is unnecessary to collect.

This area of data overload is where data mining can make its most significant contribution. A number of MITRE research projects have begun to explore the use of DM to address data overload in ID by taking one of two basic approaches: profiling or classification. In profiling, the goal is to establish some notion of “normal” and then look for deviations from that. In classification, we take known attacks and try to determine meaningful features that distinguish that set of traffic from the remainder of the traffic.

Of these two approaches, classification has been used less often in the ID environment. This is because it is crucial for classification analysis that there be adequate collections of data representing both attacks and non-attacks. Because this type of analysis is new to the ID world, rarely is this information collected in the proper form. For example, when the recent Knowledge Discovery and Data Mining (KDD) Cup--an annual competition at the preeminent technical conference in the data mining industry--challenged contestants to classify attacks in network activity logs, it had to enhance actual network data with attacks artificially generated according to predetermined attack signatures. Without explicit identifiers on identified attack records, it has been nearly impossible for classifiers to learn to discriminate between attacks and non-attacks.

MITRE’s current “Data Mining in ID” project is starting to address this deficiency by enabling security analysts to tag important records in the database and assign them to meaningful classes (attack, probe, legitimate, etc.). By providing the necessary capabilities for labeling attacks and a better way to maintain the history of intrusion behavior, this work represents a significant enhancement to the existing security infrastructure. In the near future, this labeled data will be used to explore and test various data mining classification techniques. This project has also begun to perform profiling on individual hosts. This profiling analysis can operate on the basic network traffic data that is already collected. The hope is that by looking at the traffic to and from specific machines, unusual activity can be identified. The initial approach involves doing simple statistical analyses of isolated features. For example, the chart below shows a 30-day summary of the frequency of File Transfer Protocol (FTP) connections to a particular host for each hour of the day. Notice that the activity from 1 a.m. to 2 a.m. is outside the hours during which the vast majority of connections are made; analysts should be alerted so they can investigate that activity further. The next stage of this project will use data clustering techniques to identify more sophisticated partitions of common activity for that host. Then, traffic that does not “fit” into any of the normal groups will be reported to the security analyst for further investigation.

Thirty-day summary of File Transfer Protocol connections.

Thirty-day summary of File Transfer Protocol connections.

In other emerging work, MITRE is addressing the issue of false alarms produced by current ID sensors. This work uses data mining to look for recurring sequences of alarms to help understand which alarms might be the result of legitimate usage. For example, alarm “A” may be frequently followed by alarm “B” as a result of legitimate operations. Once this is recognized, future occurrences of this sequence can be filtered out. In joint work with George Mason University, MITRE is working on an approach that includes filtering out data that captures “common” connection activity. It makes use of association rule detection to identify frequent host parings. For example, perhaps host X regularly connects to host Y four times a day. Once these common connections have been removed, the remaining data is fed to a classification system to detect attacks. This work has been successfully tested on synthetically generated data, and it will soon be applied to actual network data.

The fields of network intrusion detection and data mining are just beginning to work together. MITRE research is beginning to demonstrate that the network activity data whose sheer quantity has been one of the primary challenges to current ID efforts can be amenable to analysis via a variety of data mining techniques. The application of those techniques has already begun to prove useful in filtering out false alarms and characterizing normal connection pairs. In the near future, data mining should be able to help us understand what normal behavior is for individual host machines and better discriminate network attacks from innocuous activity.


For more information, please contact Bill Hill using the employee directory.


Homeland Security Center Center for Enterprise Modernization Command, Control, Communications and Intelligence Center Center for Advanced Aviation System Development

 
 
 

Solutions That Make a Difference.®
Copyright © 1997-2013, The MITRE Corporation. All rights reserved.
MITRE is a registered trademark of The MITRE Corporation.
Material on this site may be copied and distributed with permission only.

IDG's Computerworld Names MITRE a "Best Place to Work in IT" for Eighth Straight Year The Boston Globe Ranks MITRE Number 6 Top Place to Work Fast Company Names MITRE One of the "World's 50 Most Innovative Companies"
 

Privacy Policy | Contact Us