Information Management
Information Management investigates databases, distributed databases,
data mining and legacy databases.
Building the Semantic Web
Bedford and Washington
Problem
As the amount of information on the World Wide Web continues to grow,
the value of automated tools capable of finding, filtering, and combining
information in response to specific user requirements greatly increases.
The largest barrier preventing more automated use of Web resources is
that the semantics (meaning) of these resources is generally unavailable
to automated agents.
Objectives
The objective of this project is to develop technical foundations for
a "Semantic Web," in which programs such as agents, search engines,
or service brokers can identify and use World Wide Web resources (including
both information and services) based on machine-readable representations
of their semantics.
Activities
We are investigating language concepts for representing and processing
semantic information that scale to the Web environment, and application
areas that include eBusiness and disaster relief. We are participating
in the World Wide Web Consortium's Semantic Web Activity, engaging in
joint research with MIT's Context Interchange (COIN) project, and cooperating
with researchers in DARPA's DAML (DARPA Agent Markup Language) program.
Impacts
This research addresses a key area of current Web technology development,
impacting numerous MITRE programs dependent on Web technologies such as
XML, as well as wider eBusiness and other communities addressing issues
of large-scale interoperability. The research also provides technology
transfer opportunities with a wide range of academic and industry R&D
activities and standards groups.
CEM Design and Project Support Laboratory
Washington only
Problem
The evaluation of proposed approaches and technologies that will be used
in the IRS information and data management research project requires access
to development tools used in the modernization effort. For example, tools
are needed to support research on the impact of XML on data processing,
and research on a metadata repository.
Objectives
CEM will develop an internal facility of resources and technologies to
support the research on key topics identified in the IRS information and
data management research project. This facility would allow CEM staff
to have access to development tools used in modernization efforts as well
as alternative tool sets for the evaluation of proposed technical approaches.
Activities
The CEM Design and Project Support Laboratory will host a set of tools
for research and evaluation of key topics. The initial set of tools includes
the Rational Enterprise Suite and System Architect Modeling Tool for the
initial evaluation of metadata repository approaches, and IBM DB2 and
Extend Business Process Modeling Tool for the impact analysis of XML in
data processing.
Impacts
The facility will enable CEM staff to conduct independent research on
topics that are directly relevant to the IRS modernization program. Results
of this research should provide the basis for MITRE to make proactive
recommendations on related technical work by contractors.
| Project Summary Chart |
Presentation [PDF] |
Data
and Information Management Research at the IRS
Washington only
Problem
Currently, the IRS has limited ability to capture information and data
from multiple inputs such as tax returns and information returns and to
analyze and provide timely reports from multiple data source systems.
Changing regulations have altered business rules that affect the processing
of more than 40 terabytes of taxpayer data.
Objectives
The project will define a research agenda that will guide information
innovation grant studies addressing information and data management (IM/DM)
topics relevant to the IRS. These studies will assist in determining approaches
for dealing with IRS challenges in managing the increasing information
demand and volume of data for the current and future systems.
Activities
Current IRS information and data management challenges will be identified
and will guide the identification of research topics. Investigators will
further define each topic and a proposed approach for addressing it. The
initial set of topics will include determining the impact of XML on business
processing, defining approaches for metadata repositories, and implementing
and managing business rules.
Impacts
As new components of the IRS modernization program are defined and developed.
MITRE will be able to directly relate and apply its research results regarding
the topics defined in this project. This work should assist the IRS in
implementing information systems that support the goal of providing timely
information to stakeholders.
Data Integration as an Industrial Process
Len Seligman, Co-Principal
Investigator
Washington only
Problem
Data integration requires too much human time and skill. We need to industrialize,
to create narrow-skill steps, each of which produces reusable knowledge
rather than opaque code. To move from (easily evaded) mandates to natural
incentives, we will explore "describe and generate" tools to
make even the first connection easier. The approach should be incremental,
driven by real interoperability needs, not special initiatives.
Objectives
Our goals are to refine the industrial approach and to move industry,
the research community, and sponsors toward that vision. Specifically,
we will extend (very scalable) profile-driven integration techniques to
be compatible with commercial multidatabase query tools, develop metrics
to help project planners compare data integration techniques and judge
tools' utility, and evaluate emerging describe-and-generate data integration
research prototypes in real projects.
Activities
In the first year we performed a modular breakdown of integration steps
and compared profile-driven against federated techniques. We also began
constructing an experimental framework and metrics for comparing integration
techniques. In the second year we will conduct experiments using research
prototypes (e.g., IBM Research's Clio) with aviation, brain mapping, and
tax administration data; conduct a survey of data integration practitioners
to determine where the costs are the greatest; and adapt metrics to improve
project planning. In the third year we will refine the metrics and perform
further experimentation. Throughout, we will publish results and transition
them to MITRE and sponsor projects.
Impacts
We will reframe a critical technology to reflect rarely addressed organizational
realities. We will influence emerging industrial tools and researchers'
agendas, and provide metrics where none previously existed. MITRE's sponsors
will be aided in moving from giant doomed data integration initiatives
to incremental progress.
Database Curation and Access for Bioinformatics
Bedford and Washington
Problem
Biological databases store information on proteins, genes, and their functions.
The biomedical literature describes the experiments behind the database
entries. Many databases lag behind the literature because they require
biologists to transfer the information from articles to database entries.
Biologists need interactive tools to help in the timely and consistent
transfer of information from the literature into the databases (the "curation"
process).
Objectives
This project will develop interactive techniques for the curation of biological
databases. These techniques will allow curators of databases to maintain
currency and consistency of these databases in the face of exponential
growth of research in genomics and proteomics. To provide the curation
tools, we will develop information mining methods for free text and structured
data, specifically geared to the biology domain.
Activities
In the first year we will determine requirements for biological database
curation and mine existing databases for training and test data. We will
also develop an initial curation system prototype and conduct initial
technology and user-centered evaluations. In the second year we will refine
our interactive curation system and further evaluate it. Finally, we will
explore a Question & Answer front-end.
Impacts
This work will have impact on the biology community by defining methods
and evaluations for automating database curation. We estimate that hundreds
to thousands of biology databases exist, and the number is growing rapidly.
Investment in this area leverages MITRE's expertise in text data mining
and databases, allowing MITRE to become a significant player in bioinformatics.
Distributed Metadata Service
Dock Allen, Principal Investigator
Bedford only
Problem
DOD and commercial practice are moving towards dynamic composition of
Web-based systems using publish/subscribe paradigms and metadata (e.g.,
XML). The Joint Battlespace Infosphere (JBI), Universal Description, Discovery,
and Integration (UDDI), and peer-to-peer systems are examples of this.
For these "infospheres" to succeed, they must be scalable, flexible,
and evolvable, and support component reuse both in applications and in
the infosphere services themselves.
Objectives
We are "pushing the edge" of the architecture envelope for Web-based
infospheres with respect to distribution and participant stewardship of
metadata (for scalability), uniformity (for simplicity and reuse), and
dynamic integration of components (for flexibility). As these architectural
principles are validated, we transition them into DOD initiatives, programs,
and commercial use.
Activities
We will define a metadata-based architecture to improve scalability, uniformity,
and flexibility, and a profiling language for advertising Web service
/ information "haves" and "needs." We will develop
components and a software development kit (SDK) for architecture evaluation,
technology transition, and demonstrations. As architectural principles
are validated, we will transition them via consultation, delivery of components,
and training and provide inputs to industry consortia, conferences, and
discussion groups.
Impacts
Our profiling language, which expresses both "haves" and "needs,"
is used for the JBI. Our brokers, profile editors, and SDK will be transitioned
to the 3Q2002 JBI release. We support projects in becoming "JBI enabled."
We consulted with a MITRE project developing an information service and
XML schema that allows "advertising" of UAV intelligence products,
as well as "publishing needs" for UAV intelligence products.
ISR Information Service (ISRIS)
John Kane, Principal Investigator
Bedford and Washington
Problem
Legacy intelligence, surveillance, and reconnaissance (ISR) systems connect
through stove-piped interfaces to command and control (C2) systems. This
limits the ability to form a common operational picture to support missions
such as time-critical targeting. A variety of "battlespace intranets"
are emerging to address this problem, but the challenge remains: how will
ISR assets connect to these information management initiatives?
Objectives
This project is experimenting with the integration of advanced Internet
technologies into the ISR sensor ground station. The objective is to enable
access for all C2 users to the real-time services and data of an ISR platform
from within a user's standard Web browser, and ultimately show the way
ahead to the next generation of DOD ISR and C2 Web services.
Activities
The FY01 ISR Information Service (ISRIS) prototype supports the Air Force
Global Hawk UAV. We plan to experiment with ISRIS during live flights
in FY02 using servers on the MITRE MII and on SIPRNET. We will extend
the ISRIS concept to the Predator UAV and U2. The subscription and profiling
technology of the Air Force Joint Battlespace Infosphere (JBI) will be
integrated with ISRIS.
Impacts
ISRIS research is helping the DOD develop the concepts and technology
for next generation Web services on battlespace internets. These services
will enable an unprecedented level of access to real-time situational
awareness information and raw sensor data. Doing this within the user's
generic browser will streamline deployment and enable a browser-based
common operational picture.
Managing Data Quality with Shared Views
Bedford and Washington
Problem
Data quality, defined as fitness for use, is increasingly seen as a serious
problem in government and private sector databases. Sometimes the data
is of low quality; in other cases users cannot easily determine whether
the data is satisfactory. Consequences range from user mistrust, lack
of use, and creation of redundant stores to mission failures.
Objectives
We have developed technology to manage data quality annotations: to store
them using a defined model, to capture them with minimal impact on users
by modifying existing production tools, and to use them by extending existing
applications. We will show how this technology can be applied to two systems
by extending database view technology to include quality annotations and
to propagate data quality annotations between systems.
Activities
We will make recommendations on improving data quality within two key
government databases. These recommendations will include use of views
and quality annotations for improved propagation of data between the systems.
We will also develop a data quality assessment tool and demonstrate it
for one of the systems.
Impacts
Data quality annotations help a user employ data appropriately and can
inform organizational efforts at data quality improvement. We will enable
end-to-end management of data quality via creation of better (and more
easily maintained) data quality views for various communities of interest
(COIs). Our work will make it easier for COIs to form and interact with
government databases.
Neuroinformatics
Washington only
Problem
The neuroscience community is accumulating a vast amount of human brain
mapping data using techniques that operate over many spatial scales. Currently
data use is generally limited to the lab of origin; the data is not readily
available to other investigators for subsequent studies. Data may exist
that a researcher could use to explore a particular hypothesis, but that
investigator is not aware of its existence or does not have ready access
to it.
Objectives
The overall goal of this research is to design, prototype, and evaluate
an information infrastructure for human brain mapping data which will
help realize the full potential of this growing store of mapping data.
In this initial undertaking, we focus on a system that enables the analysis,
exploration, and dissemination of structural magnetic resonance imaging
(MRI) data from multiple labs.
Activities
The project will develop and deploy a digital library for structural MRI
data; design a warehouse of structural MRI data; design a content-based
system that will allow users to retrieve images with features similar
to those in a submitted example; develop techniques that enable users
to aggregate warehouse information into a probabilistic brain atlas; and
develop a prototype system that will ground architecture and query language
development in a real-world setting.
Impacts
The problems encountered by the human brain mapping community are isomorphic
to those encountered by many of MITRE's traditional sponsors. We expect
research conducted under the auspices of this project to be readily transitioned
to our Treasury Department, IRS, DOD, and USGC sponsors. In addition,
this project provides an important public service to the neuroscience
research and clinical communities.
Next Generation Joint Battlespace Infosphere (JBI)
Core Services
Bedford only
Problem
Digital information rapidly is becoming integrated into all aspects of
military activities. Operations are becoming increasingly fast-paced and
diverse. To provide commanders with the knowledge required to make decisions
in this environment, a greatly enhanced command and control (C2) concept
for intelligence gathering, dissemination and visualization is needed,
based on revolutionary new information-age concepts and technologies.
Objectives
Our primary objective is to evaluate and integrate the first versions
of the best existing research and commercial technologies into the core
services of the C2 Enterprise Integration/Common Integrated Infrastructure
(C2EI/CII) platform in a manner consistent with the overall JBI vision.
This initial integrated set of services is expected to be used by early
adopter SPO prototyping efforts.
Activities
The modeling of user information needs and decision goals will be the
focus for FY02 research. In addition, investigation of deployable commercial
products consistent with major C2 SPO selections and the JBI vision must
be continued. Finally, continued improvement to existing JBI functional
services will continue. These core services will be evaluated and prototyped
for integration into C2EI/CII enterprise services.
Impacts
This project should provide better insight and a working relationship
among major S&T community members and form a bridge between technology
and information management research. The integration of "best of
breed" implementations for each of the C2EI/JBI platform components
can form the basis for C2 SPOs and other DOD related programs to adopt
an adaptable Web services-based architecture.
Using Domain Knowledge in Data Mining
Washington only
Problem
Domain knowledge is not being fully exploited by current data mining methods.
As a result of this deficiency, data mining tools generate a large number
of patterns that are not "interesting," i.e., overly complex,
already known or unnecessarily low in predictive power.
Objectives
Directly incorporating domain knowledge into data mining algorithms should
improve the quality of the output and therefore reduce the amount of manual
filtering and interpretation of the discovered rules (required to be done
by humans) and improve the overall efficiency of the process.
Activities
We plan to research the proposed method by accomplishing the following:
acquiring domain knowledge and preferences and representing them in a
suitable format; modifying Association Rules and Decision Trees algorithms
to allow direct use of domain knowledge and preferences within the mining
process; evaluating the effect of our techniques; and summarizing our
research activities in a conference paper.
Impacts
Because of the popularity of the selected algorithms and the generality
of our approach, this work can have a positive impact on any of the ongoing
data mining projects within MITRE and on the broader data mining community
and tool vendors who support this work.
|