Intelligent Information Processing
Intelligent Information Processing investigates technologies, tools,
and processes that support the discovery, processing, exploitation and
dissemination of information, tools and knowledge. Intelligent agents
are covered in this area.
Analyze,
Share, Know (ASK)
Bedford and Washington
Problem
Analysts must form a correct view of the world from very large quantities
of data that are not necessarily structured to help them answer their
questions. Every analyst must potentially be able to leverage the insights
gained by the work of others in the organization. The analyst should be
able to know what the rest of the organization knows about the target
subject and that informations relevancy to the current analysis
task.
Objectives
Using a mix of COTS, GOTS, and research components, the ASK program seeks
to demonstrate novel analytic tools to support efficient inference against
large data collections, analysis coordinated through structured argumentation,
and an integrated, collaborative knowledge management environment for
the analytical enterprise.
Activities
Leveraging previous work through judicious systems integration, the ASK
team is creating prototypes of several advanced capabilities, including:
automatic generation of timelines with event extraction; spatio-temporal
fusion of imagery, signal processing, and linguistic sources; structured
argumentation to organize all-source analysis; dialogue agent analytic
tool interfaces mediated by an instant messaging system; reinforcement
learning in the analytic environment; and analytic enterprise integration
strategies.
Impacts
On-going interaction between members of the ASK team and specific government
communities informs our choice of tasks. Capabilities demonstrated here
are being fed back into the real-world work of intelligence systems engineering.
Audio Hot Spotting
Qian Hu, Principal Investigator
Bedford and Washington
Problem
Large volumes of recordings require rapid retrieval of segments potentially
relevant to a given query (audio hot spotting). Spoken document retrieval
systems that simply combine automatic speech recognition (ASR) with information
retrieval (IR) do not meet this need in real applications. This is because
of high ASR word error rates and the loss of important audio information
in the speech transcription.
Objectives
We propose to research and develop audio-specific retrieval algorithms
in critical domains by 1) exploiting multiple types of acoustic information
from the audio signals; 2) exploring several adaptive techniques to improve
existing ASR performance; and 3) fusing component technologies such as
ASR, language/speaker identification, audio feature extraction, and information
retrieval.
Activities
We will research algorithms and techniques to extend and improve ASR and
audio feature extraction and to develop audio-based query algorithms making
use of the multiple types of audio information. We will research and develop
fusion algorithms to build an audio hot spotting system based on the extended
ASR, audio feature extraction, language/speaker identification, and the
new audio query language.
Impacts
Our research in audio hot spotting algorithms and prototype development
will address the needs of MITRE's sponsors with warehouses of recordings
waiting for efficient retrieval. It will extend MITRE's information retrieval
capability from text to include audio. The expertise gained through the
research will equip MITRE to better advise industry developers and our
sponsors on audio information retrieval topics and evaluation standards.
Automated Discovery of Structural Patterns in
Link Analysis
Washington only
Problem
The threat from terrorist actions is perceived to be immediate and growing.
However, it is believed that clues to the capabilities, intentions and
organization of terrorist and other criminal groups can be found in already
available data. Identification of suspect behavior is made difficult through
the intentional concealment of relationships, and thus the efficient discovery
of patterns from large databases is a very difficult problem.
Objectives
We will explore promising, although unproven and technically risky, new
approaches for automating the discovery of patterns of suspicious behavior
and associations. We will attempt to train a classifier that can identify
criminals or terrorists based on descriptions of the types of associations
or relationships that they have in common.
Activities
We will develop a demonstration prototype for a large-scale repository
supporting link discovery and analysis. Initial emphasis will be on techniques
for transforming multiple, large databases into an integrated, searchable
link representation. We will test approaches for storage and traversal
of the links and mechanisms for inserting additional links. A series of
three prototypes will demonstrate increasingly sophisticated techniques.
The link repository will grow with each demonstration.
Impacts
The proposed research can contribute to the national effort by bringing
new methodologies to bear in discovering terrorists, terrorist organizations,
fraud, and other criminal behavior. There are challenges in attempting
to scale current practices in link analysis to large-scale databases and
to find suspects based upon data that is intentionally being manipulated.
The methodology could be transferred into the counterterrorism and law
enforcement domains if it proves effective.
Automated Information Discovery and Retrieval
from Asian Language Sources
Ray LeBlanc, Principal
Investigator
Bedford and Washington
Problem
While several commercial capabilities exist to address particular facets
of machine translation (MT) needs, emphasis has been placed on European-based
languages. Furthermore, none of the existing COTS products iis particularly
well suited to the military environment. English translation of the Asian
languages is a much more difficult problem than for European and has presented
the MT community with significant challenges.
Objectives
This project will develop a capability to perform Chinese and Korean cross-language
information retrieval, information discovery (ID), data mining (DM), and
knowledge management (KM) in support of open source intelligence analysis.
The project will develop a prototype capability that can support in-field
experimentation with a broad spectrum of users.
Activities
The project will provide an automated capability to translate electronic
textual information between Chinese and English, and between Korean and
English. We will characterize and subsequently retrieve information, based
on user-specified profiles, from Chinese and Korean language sources by
means of a prototype analytic tool. A dictionary management capability
will allow users to build, import/export, and aggregate custom dictionaries.
Impacts
This project has the potential for improving the efficiency and effectiveness
of intelligence organizations currently impacted by foreign language translation
issues. It is expected to provide the beneficiaries with needed interim
capabilities and validation of the most fertile areas for the future application
of government funds.
Foundations for Next Generation Information
Access
Bedford and Washington
Problem
Computerized support for information gathering is fragmented across multiple
research communities and integration is difficult due to a lack of an
underlying formalism that cuts across the different technologies. Statistical
techniques developed for individual components have been developed in
isolation and without a common theoretical foundation. As a result we
are left with a number of reasonably effective, semi-principled, incompatible
techniques.
Objectives
The principal objective is the development of statistical foundations
for information access. A successful foundation will comprise rigorous
characterizations of the issues of modeling and estimation, together with
principled methodologies for adapting to new languages, genres, information
domains, auxiliary knowledge sources and tasks.
Activities
We will develop simulations that model the stochastic generation of latent
document features, observable document features, the determination of
document relevance, and the distribution of query characteristics. We
will perform exploratory data analysis on available research corpora to
verify our models. A central focus will be research into the importance
of variance reduction and the potential benefit of various bias-variance
strategies.
Impacts
This research is of direct relevance to existing MITRE projects. The results
will allow MITRE to develop information access systems incorporating new
sources of evidence and to tailor information systems to meet specific
military and intelligence needs. MITRE will then be strategically positioned
to set the direction of research into, and development of, next generation
information access technology.
Graph Based Data Mining
Washington only
Problem
The threat from terrorist actions is perceived to be immediate and growing.
However, it is believed that clues to the capabilities, intentions and
organization of terrorist and other criminal groups can be found in already
available data. Identification of suspect behavior is made difficult through
the intentional concealment of relationships, and thus the efficient discovery
of patterns from large databases is a very difficult problem.
Objectives
We will explore promising, although unproven and technically risky, new
approaches for automating the discovery of patterns of suspicious behavior
and associations. We will attempt to train a classifier that can identify
criminals or terrorists based on descriptions of the types of associations
or relationships that they have in common.
Activities
We will develop a demonstration prototype for a large-scale repository
supporting link discovery and analysis, emphasizing techniques for transforming
multiple, large databases into an integrated, searchable link representation.
Three prototypes will demonstrate increasingly sophisticated techniques
for storage and traversal of the links and mechanisms for inserting additional
links. Sensitive but unclassified databases from the USCS will form the
basis for the second and third demonstrations.
Impacts
The research can contribute to the national effort by applying new methodologies
to discover terrorists, terrorist organizations, fraud, and other criminal
behavior. Feasibility of these techniques in the USCS environment will
be demonstrated in the areas of counterdrugs and counterterrorism. The
methodology could be transferred into the counterterrorism and law enforcement
domains if it proves effective.
Robot Platoon Command and Control
Washington only
Problem
Reliable autonomous soldier robot teams will not be possible for many
years. However, an intermediate level of autonomy, where a commander gives
high-level commands (e.g., go to the top of Hill 203), is achievable in
the near future. This supervisory control requires only occasional intervention
by a commander during a mission.
Objectives
This proposal asserts that one human is adequate for directing a small
team of robots. We will use reconnaissance tasks in urban terrain as our
test bed. Validating the assertion will require us to demonstrate a working
team system where robots exhibit some automated reasoning (route planning,
navigation) and cooperative behavior, while attending to human guidance.
Activities
We will extend behavior-based robotics approaches to include the memory
and communication required for human participation in the team. Our principal
demonstration task will be to produce a team entry for the RoboCup-Rescue
annual competition. We will also investigate the utility of platform mobility
for reconnaissance-directed sensor networks.
Impacts
MITRE's capability in robotics will be of considerable importance to our
customers in the near future. This proposal builds on MITRE's current
expertise in command and control and artificial intelligence. Robot platoon
command and control defines a niche that is a natural extension of this
expertise.
Social Information Retrieval
Washington only
Problem
Our research is focused on developing new technology for tracking internet-based
networked organizations, and using those results to identify potential
vulnerabilities and threats. Current information retrieval technology
does not directly address the problem of detecting activist networks,
assessing behavior, and tracking their evolution; new technology is needed
to detect networks based on their structure and context.
Objectives
The main objective is to develop technology for a worldwide monitoring
system used to detect the emergence of new groups (e.g., activists) and
track the evolution of existing organizations based on their online activity.
The focus will be on assessing an organization's behavior and its vulnerabilities.
Activities
We are exploring the confluence of information retrieval for collecting
distributed information, social network analysis for determining network
structure and characteristics, and dynamical systems modeling for determining
network function or behavior. Work includes the development of advanced
smart crawler collection tools that will use adaptive and cooperative
searching techniques to provide efficient and high-coverage collection
from the Web or other network search environments.
Impacts
This research will provide new tools for detecting emergent networked
organizations in the open Web and enterprise environments, and will provide
a basis for modeling their behavior, identifying critical nodes for assessing
vulnerabilities and network robustness. Our initial work has already had
impact on several sponsor mission areas.
|