![]() |
|||||
|
We all have access to lots of information, but are seldom in a position to exploit it effectively for decision making. In times of crisis, this problem can be especially severe. Imagine you are a senior analyst besieged with news and intelligence reports of a hostage situation at an American embassy. Who is in charge of the terrorists? Is their group likely to attack other embassies? When the president calls for an emergency meeting, your boss is asked to make a 20-minute presentation that profiles the terrorist group and develops arguments describing their likely negotiation positions and the potential for further attacks. How can computers help this process, which relies so critically on collective human understanding and insight, in the midst of the furor of a crisis? Genoa, a project of the Defense Advanced Research Projects Agency (DARPA), is aimed at improving analysis and decision making in crisis situations by providing tools that allow analysts to collaborate in developing structured arguments in support of particular conclusions and to help predict likely future scenarios. Genoa also provides knowledge discovery tools to mine the information in these sources for important patterns, trends, and anomalies, to discover nuggets of valuable information. One of the challenges Genoa faces is to make it easy for analysts to take knowledge gleaned with the use of these discovery tools and embed it in a concise and useful form in an intelligence product, as evidence in support of structured arguments. MITRE has been tasked with developing a summarization filter architecture to address this challenge. MITREs approach relies on component-based software composition, i.e., assembly of software units that have contractually specified interfaces and that can be independently deployed and reused. This component-based approach, which leverages XML and Java-Beans technologies, allows the analyst to select various text mining tools from a menu and, with just a few mouse clicks, assemble them to create a complex filter that fulfills whatever information discovery function is currently needed. A filter here is a tool that takes input information and turns it into some more abstract and useful representation. Filters can also weed out irrelevant parts of the input information. For example, in response to the crisis situation discussed earlier, an analyst might use these mining tools to discover important nuggets of information in a large collection of news sources. This use of data mining tools can be illustrated by looking at TopCat, a MITRE-developed system that identifies different topics in a collection of documents and displays the key players for each topic. TopCat uses association rule mining technology to identify correlations among people, organizations, locations, and events (shown below in blue, violet, green, and red, respectively). Clustering these correlations creates topics such as the three in the following figure, built from six months of global news from several print, radio, and video sources--over 60,000 news stories in all.
Topics derived from clustering 60,000 news stories. This allows the analyst to discover, say, an association between people involved in a bombing incident, which gives a starting point for further analysis, e.g., do McVeigh and Nichols belong to a common organization? This, in turn, can lead to new knowledge that can be leveraged in the analytical model used to help predict whether this terrorist organization is likely to strike elsewhere in the next few days. Similarly, the third topic reveals the important players in an election in Cambodia. This discovered information can be leveraged to help predict whether the situation in Cambodia is going to explode into a crisis that affects U.S. interests. Now, suppose an analyst wants to know more about the people in the last topic. Instead of reading more than 6,000 words of text from 10 articles on the topic, the analyst can compose a topic detection filter like TopCat with a biographical summarization filter that gleans facts about key persons from the topics articles. The result of the composition is a short, 86-word-long summary, seen below.
An 86-word summary of the news collection. This summarization filter, developed under DARPA funding, identifies and aggregates descriptions of people from a collection of documents by means of an efficient syntactic analysis, the use of a thesaurus, and some simple natural language generation techniques. It also extracts from these documents salient sentences related to these people by weighting sentences based on the presence of the names of people as well as the location and proximity of terms in a document, their frequency, etc. (TopCat and a summarization filter perform a similar function for MITRE's Broadcast News Navigator, which applies them to continuously collected broadcast news in order to extract named entities and keywords and to identify the transcripts and sentences that contain them. (See Personalized Broadcast News). The summarization filter includes a parameter to specify the target length or the reduction rate, allowing summaries of different lengths to be generated. For example, allowing a longer summary would mean that facts about other people (e.g., Pol Pot) would also appear in the summary. This example illustrates how mining a text collection using a composed summarization filter can reveal important associations at varying levels of detail. The component-based approach also allows these filters to be easily integrated into intelligence products such as reports and briefings. To help analysts present structured arguments and supporting information to decision makers, Genoa provides an electronic notebook briefing tool (the Virtual Situation Book) developed by Global Infotek. Summarization filters can be associated with regions on a page in a briefing book that can be shared across a community of collaborating analysts. When a document or a folder of documents is dropped onto a region associated with a filter, the filter applies and the textual summary or visualization appears in that region. For more information, please contact Inderjeet Mani using the employee directory. |
Solutions That Make a Difference.® |
|
|