About Us Our Work Employment News & Events
MITRE Remote Access for MITRE Staff and Partners Site Map

Technology Symposium banner

 

» Complete Project List

» Table of Contents

»

Projects Featured in Human Language:

Analyzing Corporations Through Financial Text Mining

ASSIST

CLASR: Cross Language Automatic Speech Recognition

Clipper, TrIM, XTRIM, & OpenMT: MITRE Embedded Machine Translation Prototypes

Closing the Semantic Gap

Exploring Culture Through Language

Facilitating Sense-Making for Situational Awareness

Quick International Character Recognition (QUICR)

Spatio-Temporal Information Extraction and Reasoning from Natural Language

TransTac

blue line

Human Language

Human Language researches computer systems that understand and/or synthesize spoken and written human languages. Included in this area are speech processing (recognition, understanding, and synthesis), information extraction, handwriting recognition, machine translation, text summarization, and language generation.


Analyzing Corporations Through Financial Text Mining

Marc Vilain, Principal Investigator

Problems:
Much key financial information exists only as unanalyzed text: SEC filings are rich with corporate financial details, while newswires provide timely notification of corporate changes. Manual analysis of these data is impractical, however, due to the enormous volume of available texts and the sheer length of many documents. Automated text mining is needed to assist financial analysts handle these texts.

Objectives:
Recent MSR activity has demonstrated that key financial entities, facts, and events can be extracted from mission-relevant texts and made available for human analysis. For this proof of concept to become practical, however, the performance of current language processing methods must be improved, especially as applied to financial texts. Our objective is to boost the performance of these techniques.

Activities:
Our first area of focus is entity extraction, i.e., the identification of company names and other key financial elements. Current extraction tools often fail to identify these building blocks of financial analysis; we will apply new machine learning methods to boost extraction performance. Time permitting, we will also apply learning to the identification of scenario-specific facts and events.

Impact:
The need for quality automated financial analysis is widespread, as many government financial specialists are reaching retirement age without being replaced. At the same time, closing the growing corporate tax gap and meeting increased regulatory oversight responsibilities is taxing what staff remain. Only through data mining methods such as these will the demands of these missions be met.

Approved for Public Release: 07-0323


ASSIST

Lisa D. Harper, Principal Investigator

Problems:

Objectives:

Activities:

Impact:

Approved for Public Release: 05-0825


CLASR: Cross Language Automatic Speech Recognition

John Henderson, Principal Investigator

Problems:
Fewer than 50 languages have large available lexical resources. Machine translation systems address only the 20 languages with obvious commercial and military impact. While progress has been made on these languages over 50 years of research, the techniques developed exploit large quantities of written resources. Consequently, these technologies are not applicable to languages with few written resources.

Objectives:
We will investigate a novel technique for identifying and characterizing the concepts represented by target language (English) words in digitized audio recordings of a foreign source language. The result is a system for recovering English text from source language audio. It requires neither source language written resources for system development nor a source language written intermediate form at decoding time.

Activities:
The problem of spoken language translation is broken down into several modeling subproblems, each of which is the target of recent research advances: automated acoustic unit discovery, bilingual lexicon design, structured language modeling, and transduction grammars. This project investigates how these separate research advances can be combined into a unified model and system for cross-language speech recognition.

Impact:
Machine translation and speech recognition are typically viewed as properly separable efforts. This attempt at making a joint model will likely spur other such approaches in the research community, reminding many to explore bridging techniques. More practically, but longer term, many developing nations require assistance and resources for continued existence. This technology will aid international assistance and monitoring.

Approved for Public Release: 05-0232

Presentation [PDF]


Clipper, TrIM, XTRIM, & OpenMT: MITRE Embedded Machine Translation Prototypes

Rod Holland, Principal Investigator


Closing the Semantic Gap

Marc Vilain, Principal Investigator

Problems:
The explosive adoption of language-enabled analytic tools speaks to their abilities to interpolate what people mean from what people say. But much language-enabled analysis has reached a semantic gap: progress on key tasks is stymied because these methods can only approximate what humans mean. This is especially true for the critical unsolved problem of identifying events and their ramifications.

Objectives:
We will create a computational database of word meanings that will bridge the semantic gap and enable analytic tools to better model human language. This database will be compiled from dictionary definitions that cover the full range of the English language. Further, we will marshal algorithmic methods that apply the inter-relationships of word meanings to identifying events and their ramifications.

Activities:
In order to create this lexical database, we will: infer hierarchies of word meanings from their dictionary definitions; establish those non-hierarchical word relations that further capture essential meanings; and derive a repertoire of primitive semantic elements from which meanings are composed. We will also define evaluation measures to better assess progress and determine goodness of fit to actual analytic tasks.

Impact:
This work will sharpen analytic capabilities in many areas. In particular, being able to reliably identify and compare events is necessary to next-generation capabilities in Indications and Warnings. It will enable: the ability to catalogue what happens to entities of interest; the ability to filter redundant information about the same event; and the ability to detect inconsistent versions of events.

Approved for Public Release: 06-0209

Presentation [PDF]


Exploring Culture Through Language

Lisa Ferro, Principal Investigator


Facilitating Sense-Making for Situational Awareness

Christopher Berube, Principal Investigator

Problems:
Military domains that use Internet chat to supplement their situational awareness (SA) do not currently exploit the correlation between chat and structured data (e.g., ground moving target indicator (GMTI) reports). Therefore, those who rely heavily on chat for sense-making are forced to establish relationships manually (if they do so at all) between chat and structured data, often under time pressure.

Objectives:
Our research will focus on the development of algorithms and a software prototype for the correlation of data in multiple chat rooms with data from structured sources. Such sources will include those typically exploited in time-critical targeting (TCT) domains, such as GMTI and intelligence platforms. Algorithms for aggregating data will be developed to support correlation with chat and high-level SA.

Activities:
We will optimize existing information extraction software for use with chat drawing on lexicons developed for TCT domains. Development of software for extracting elements from structured data will support the correlation of these elements with chat using algorithms based on statistical metrics and heuristics. We will use real and simulated data sets to demonstrate a prototype system, including user interfaces.

Impact:
This research will allow users to establish relationships between disparate data sources, resulting in a better capability to balance the demands of monitoring data across multiple chat rooms while exploiting "traditional" structured data sources. While this prototype is being developed using selected sources from a TCT environment, other sources of structured data (e.g., air tasking orders) can be considered.

Approved for Public Release: 05-1395

Presentation [PDF]


Quick International Character Recognition (QUICR)

Linda Van Guilder, Principal Investigator

Problems:
Many of the Arabic script documents gathered in Afghanistan, Iraq, and the War on Terror are handwritten. Limited technology currently exists for recognizing offline handwritten Arabic. The number of documents to be translated far exceeds the manpower available to process them quickly. We must assemble technology solutions for digitizing, translating, and triaging the data so that we can deliver high-priority documents to analysts more rapidly.

Objectives:
We will develop techiques to rapidly modify existing handwriting recognition algorithms to process Arabic and other low-density languages. We will solidify the methodology on handwritten Arabic and fine tune it on another low-density language.

Activities:
We will collect and publicly release training data and corpus generation tools. We will attempt to improve performance by investigating search and matching algorithms, cross-lingual feature typologies, algorithms for segmentation and feature extraction, alternatives for diacritic handling, and a multi-pass processing scheme. We will investigate task-focused recognition and extensions to enable dynamic swapping of language models.

Impact:
By taking a systematic approach to rapid development of international character recognition systems, MITRE will be prepared to help overcome the next language processing crisis. Prototypes for handwritten Arabic script recognition and corpus generation and corpora of handwritten Arabic will be made available for technology transfer. Technical reports and white papers published under this research program will advance the state of the art.

Approved for Public Release: 05-0323

Presentation [PDF]


Spatio-Temporal Information Extraction and Reasoning from Natural Language

Inderjeet Mani, Principal Investigator

Problems:
In many customer problems, a large fraction of important events are given vague relative spatial and temporal characterizations that are ignored by today's systems. These systems reason very little, if at all, about space and time, requiring considerable interpolation and extrapolation by the analyst. Finally, current approaches make it difficult to incorporate end-user preferences.

Objectives:
We will develop information extraction and reasoning algorithms within a machine learning framework that will address the spatio-temporal location of events in text. We will develop the SpaceML annotation scheme to produce annotated corpora for training systems to integrate learning with reasoning, use active learning to ask user questions about particular examples, and measure system performance on a particular application.

Activities:
In Year 1, we will develop the annotation framework, including a spatial expression normalizer evaluated in an operational setting, and integrate reasoning and learning. In Year 2, we will test active learning with analysts, develop a location labeler, and initiate a challenge task for the community. In Year 3, we will transition the system and port it to Mandarin Chinese.

Impact:
This retargetable approach will allow for common solutions across many domains. The key problems of spatial and temporal representation from language will be solved. Particular problems in integration of learning and reasoning will be solved for spatio-temporal domains. Richer structured spatio-temporal data will become available for data mining and visualization.

Approved for Public Release: 06-1442

Presentation [PDF]


TransTac

Sherri Condon, Principal Investigator

Presentation [PDF]


^TOP

Last Updated:05/02/2007

Homeland Security Center Center for Enterprise Modernization Command, Control, Communications and Intelligence Center Center for Advanced Aviation System Development

 
 
 

Solutions That Make a Difference.®
Copyright © 1997-2013, The MITRE Corporation. All rights reserved.
MITRE is a registered trademark of The MITRE Corporation.
Material on this site may be copied and distributed with permission only.

IDG's Computerworld Names MITRE a "Best Place to Work in IT" for Eighth Straight Year The Boston Globe Ranks MITRE Number 6 Top Place to Work Fast Company Names MITRE One of the "World's 50 Most Innovative Companies"
 

Privacy Policy | Contact Us