![]() |
|||||
|
In the Network Operations Center of the future, the security analyst will come to work in the morning, sit down with a cup of coffee, and press the Whats New? button on the network monitoring and analysis screen. A list of suspicious incidents and attempted intrusions (more commonly called attacks) on the network will appear. Perhaps there is a file transfer at 2 a.m. from a host that usually only has activity during business hours. The analyst will then investigate these incidents to identify them as, for example, attack or false alarm. The analyst will also be presented with distilled descriptions of attacks that he or she had identified the previous day. An essential technology in this scenario will be data mining (DM). It will be DM analysis that will determine the bounds for normal network activity, and it will be DM techniques that enable the software to spend the night determining which characteristics of previously identified attack activity distinguish it from normal network usage. To understand the improvement this will represent, it is necessary to understand the current network intrusion detection (ID) environment. Software sensors deployed along the network record activity: the initiation of a World Wide Web connection from host A to host B, for example, or a single outside host trying to connect to every MITRE host. Each sensor records certain important pieces of information about this activity, such as the time of day and the duration of the connection. This information is stored in a database that easily accrues millions of records each day. On a regular basis, security analysts sift through this data looking for the most serious attacks. There can be thousands of suspicious activity alarms and each requires further analysis to fully understand its purpose. Moreover, as commercial ID software currently favors heightened sensitivity, many of the alerts generated are false alarms and result in wasted time. One of the most serious limitations in identifying and describing new attacks is that there is simply so much data that security experts are not able to examine thoroughly every single alerted activity. And, as data collection grows with increased network usage, little is being done to help mitigate this situation by performing analysis to determine which data is the most relevant and which data is unnecessary to collect. This area of data overload is where data mining can make its most significant contribution. A number of MITRE research projects have begun to explore the use of DM to address data overload in ID by taking one of two basic approaches: profiling or classification. In profiling, the goal is to establish some notion of normal and then look for deviations from that. In classification, we take known attacks and try to determine meaningful features that distinguish that set of traffic from the remainder of the traffic. Of these two approaches, classification has been used less often in the ID environment. This is because it is crucial for classification analysis that there be adequate collections of data representing both attacks and non-attacks. Because this type of analysis is new to the ID world, rarely is this information collected in the proper form. For example, when the recent Knowledge Discovery and Data Mining (KDD) Cup--an annual competition at the preeminent technical conference in the data mining industry--challenged contestants to classify attacks in network activity logs, it had to enhance actual network data with attacks artificially generated according to predetermined attack signatures. Without explicit identifiers on identified attack records, it has been nearly impossible for classifiers to learn to discriminate between attacks and non-attacks. MITREs current Data Mining in ID project is starting to address this deficiency by enabling security analysts to tag important records in the database and assign them to meaningful classes (attack, probe, legitimate, etc.). By providing the necessary capabilities for labeling attacks and a better way to maintain the history of intrusion behavior, this work represents a significant enhancement to the existing security infrastructure. In the near future, this labeled data will be used to explore and test various data mining classification techniques. This project has also begun to perform profiling on individual hosts. This profiling analysis can operate on the basic network traffic data that is already collected. The hope is that by looking at the traffic to and from specific machines, unusual activity can be identified. The initial approach involves doing simple statistical analyses of isolated features. For example, the chart below shows a 30-day summary of the frequency of File Transfer Protocol (FTP) connections to a particular host for each hour of the day. Notice that the activity from 1 a.m. to 2 a.m. is outside the hours during which the vast majority of connections are made; analysts should be alerted so they can investigate that activity further. The next stage of this project will use data clustering techniques to identify more sophisticated partitions of common activity for that host. Then, traffic that does not fit into any of the normal groups will be reported to the security analyst for further investigation.
Thirty-day summary of File Transfer Protocol connections. In other emerging work, MITRE is addressing the issue of false alarms produced by current ID sensors. This work uses data mining to look for recurring sequences of alarms to help understand which alarms might be the result of legitimate usage. For example, alarm A may be frequently followed by alarm B as a result of legitimate operations. Once this is recognized, future occurrences of this sequence can be filtered out. In joint work with George Mason University, MITRE is working on an approach that includes filtering out data that captures common connection activity. It makes use of association rule detection to identify frequent host parings. For example, perhaps host X regularly connects to host Y four times a day. Once these common connections have been removed, the remaining data is fed to a classification system to detect attacks. This work has been successfully tested on synthetically generated data, and it will soon be applied to actual network data. The fields of network intrusion detection and data mining are just beginning to work together. MITRE research is beginning to demonstrate that the network activity data whose sheer quantity has been one of the primary challenges to current ID efforts can be amenable to analysis via a variety of data mining techniques. The application of those techniques has already begun to prove useful in filtering out false alarms and characterizing normal connection pairs. In the near future, data mining should be able to help us understand what normal behavior is for individual host machines and better discriminate network attacks from innocuous activity. For more information, please contact Bill Hill using the employee directory. |
Solutions That Make a Difference.® |
|
|