About Us Our Work Employment News & Events
MITRE Remote Access for MITRE Staff and Partners Site Map

Technology Symposium banner

» Complete Project List

»

Projects Featured in Human Language:


CCL Babylon: Installing TrIM at the Tower of Babel

Compact Aids for Speech Translation (CAST)

Core Dialogue Research

Document Analysis Methods for Fraud Detection

Foreign Language-tool Improvement Through Evaluation (FLITE)

Improved OCR-Based Foreign Language Acquisition

Language and Speech Exploitation Resources (LASER ACTD)

Reading Comprehension: Reading, Learning, Teaching

blue line

2004 Technology Symposium > Human Language

Human Language

Human Language researches computer systems that understand and/or synthesize spoken and written human languages. Included in this area are speech processing (recognition, understanding, and synthesis), information extraction, handwriting recognition, machine translation, text summarization, and language generation.


CCL Babylon: Installing TrIM at the Tower of Babel

Rod Holland, Principal Investigator

Location(s): Washington and Bedford


^TOP

Compact Aids for Speech Translation (CAST)

Lisa Harper, Principal Investigator

Location(s): Washington

Problems
Commercial enterprise does not support translation of languages spoken by small populations, even when there is high probability that terrorists use these languages. Ground troops and support personnel have limited access to human translation support when deployed to regions of the world where these languages are spoken.

Objectives
The goal of the Babylon program is to develop rapid, two-way, natural-language speech translation interfaces and platforms for warfighters to use in field environments for force protection, refugee processing, and medical triage. Babylon will focus on overcoming the many technical and engineering challenges limiting current multilingual translation technology to enable future full-domain, unconstrained dialogue translation in multiple environments.

Activities
MITRE is responsible for evaluating 1+1-way systems by June of FY03 and two-way translation systems by early FY04. MITRE will help develop new metrics to account for the unique characteristics of speech-to-speech translation systems. In support of these evaluations, MITRE is also spearheading the collection of foreign language dialogue data that will reflect end-user requirements.

Impact
Babylon translation devices will aid personnel with Special Forces missions, emergency medical response, intelligence gathering, and force protection. Critically, Babylon targets low-density languages such as Pashto, in addition to other high-priority languages such as Farsi, Mandarin Chinese, and Arabic.


^TOP

Core Dialogue Research

Christine Doran, Principal Investigator

Location(s): Washington and Bedford

Problems
Dialogue managers (DMs) are of increasing interest to our sponsors, but have not been useful to date because they are not flexible enough to handle conversations of moderate complexity, multiple modalities, or more than two participants, or to be adapted to new conversational tasks or domains without considerable effort.

Objectives
Our objectives are twofold: first, to advance the state of the art of operational dialogue managers along the continuum of dialogue complexity, and, second, to develop a new paradigm for the rapid development of modular, extensible and robust dialogue managers and for their evaluation.

Activities
In year one, we will assess the dialogue needs of three areas -- training, question-answering, and multimodal, multiparty robot control -- by porting existing DMs to them. In year two, we will focus on developing our modular information-state DM toolkit. In year three, we will formally evaluate our development paradigm by porting our toolkit to the same three areas.

Impact
By promoting a systematic approach to development of robust, portable DMs, we will transition this technology out of the laboratory into sponsor hands. The experience we gain in evaluating the portability and robustness of the toolkit will give sponsors the information they need to evaluate the potential effort and resources needed to build a new dialogue system.

Presentation [PDF]


^TOP

Document Analysis Methods for Fraud Detection

Marc Vilain, Principal Investigator

Location(s): Washington and Bedford

Problems
Each year, the U.S. Treasury is defrauded of $85 billion in corporate taxes owed by large- and medium-sized business. At fault are ever-new revenue-hiding schemes that are legally disallowed. The IRS, however, is limited in its ability to investigate, as corporate tax filings often lack key details required to detect noncompliance.

Objectives
A promising alternative is to exploit SEC filings to detect potentially noncompliant accounting practices. Our objective is to introduce text-oriented methods for securities filings that can complement the data-oriented methods used for tax filings. We are especially concerned with creating an exploratory testbed that will enable adaptability and interactive control by end-users.

Activities
We will apply current information extraction techniques to SEC filings, and will identify relevant facts ("the company registered a tax loss") and relationships ("Mr. Smith is a shareholder in the partnership"). These language elements will be collated into document-level analyses through linguistic and statistical models. We will also pursue trainable classification techniques to abstract these analyses into automated noncompliance detectors.

Impact
As a research endeavor, this project will further the practice of information extraction through advances in fact detection, document categorization, and document structure understanding. Our experimental testbed will look at a number of key problems, among them discovering suspect ownerships in partnerships, identifying disallowed tax losses, and detecting new variants of existing tax abuse schemes. Because of the sheer magnitude of uncollected revenues, even partial solutions are valuable.

Presentation [PDF]


^TOP

Foreign Language-tool Improvement Through Evaluation (FLITE)

Florence Reeder, Principal Investigator

Location(s): Washington

Problems
Foreign language processing problems have not diminished. Instead of less data and fewer languages, we see more. Instead of more analysts and linguists, there are fewer. The need for better foreign language processing and translation tools is more critical and the penalties for failure are more spectacular. We seek to improve systems and their capabilities.

Objectives
We will establish automated evaluation of language processing. We will start with automated evaluation for machine translation (MT) tools, use these evaluations to select among alternate translations, and use this framework to improve the quality of language generated in MT. Our tools will incorporate recent advances in natural language generation (NLG) and look at evaluation from a psycholinguistics perspective.

Activities
To improve evaluation we will use multiple evaluation strategies and techniques in an integrated platform, examine metrics and psycholinguistic implications, and apply evaluation techniques to two problems. To improve MT we will build a system that uses evaluation to select among alternate translations, design evaluation techniques that do not rely on reference translations, and design an interlingual structure supporting better NLG.

Impact
This work addresses a shortfall in NLG. It supports on-demand evaluations for multiple customers, and also supports "MT in a box" for adaptation of MT to specific domains. Additionally, it will become key in the application evaluation facility which is part of the Foreign Language Technology Center.

Presentation [PDF]


^TOP

Improved OCR-Based Foreign Language Acquisition

Linda Van Guilder, Principal Investigator

Location(s): Washington and Bedford

Problems
Although multilingual optical character recognition (OCR) has been integrated into operational intelligence discovery systems, the recognition results vary from mediocre to extremely poor on real foreign-language data. Such poor quality OCR output adversely affects downstream information discovery and exploitation technologies such as information retrieval, named entity extraction, machine translation (MT), and automatic summarization.

Objectives
The main objective of this project is to evaluate and identify technologies and resources that can be used to improve foreign-language OCR quality. In addition to producing quantitative evaluations and recommendations for the OCR-to-MT pipeline, we are developing an annotated ground truth corpus of handwritten Arabic as well as a prototype for Arabic handwriting recognition.

Activities
We will prepare evaluation data, gather COTS/GOTS tools, and develop an Arabic handwriting recognition algorithm, a corpus of handwritten Arabic, and Arabic post-OCR error correction. After evaluating the components, we will investigate topic detection for lexicon selection, design and execute keyword OCR/MT evaluation, and evaluate the Arabic handwriting prototype. We will begin transition to customers and extend our testbed to Korean.

Impact
The prototypes, recommendations, and data that we develop will be extremely valuable to our customers, and will also benefit a broader community when the results are published in journals such as Computational Linguistics. By disseminating our research in this fashion, we expect to drive commercial and scholarly development to incorporate critical functionality.


^TOP

Language and Speech Exploitation Resources (LASER ACTD)

Rod Holland, Principal Investigator

Location(s): Washington


^TOP

Reading Comprehension: Reading, Learning, Teaching

Lynette Hirschman, Principal Investigator

Location(s): Washington and Bedford

Problems
This project is addressing a three-stage grand challenge application for human language technology: building a system that can "learn to read," then "read to learn" and finally "teach to learn." It deals with issues of machine learning, knowledge acquisition, and instructional technology.

Objectives
First we will build a computer-based system capable of passing a third grade reading-comprehension test. Second we will build a system that will "read to learn," passing a test on that subject matter after having read the text. Finally we will build a system that can learn through interacting with a person, and, at the same time, help to teach the person.

Activities
We have applied prototype systems on reading comprehension tests designed for fourth to eighth graders with a 30%-40% accuracy. We are improving the system to include more components. We will implement a reciprocal teaching demonstration, where the system plays the role of teacher (grading student answers) or the role of peer learner (answering questions posed by a real student).

Impact
This research will open new areas of research, addressing issues of machine learning, breaking the knowledge acquisition bottleneck, developing new evaluation measures for understanding and learning, and creating new instructional technologies via learning companions and interactive teaching environments.

Presentation [PDF]


^TOP

 

 

Homeland Security Center Center for Enterprise Modernization Command, Control, Communications and Intelligence Center Center for Advanced Aviation System Development

 
 
 

Solutions That Make a Difference.®
Copyright © 1997-2013, The MITRE Corporation. All rights reserved.
MITRE is a registered trademark of The MITRE Corporation.
Material on this site may be copied and distributed with permission only.

IDG's Computerworld Names MITRE a "Best Place to Work in IT" for Eighth Straight Year The Boston Globe Ranks MITRE Number 6 Top Place to Work Fast Company Names MITRE One of the "World's 50 Most Innovative Companies"
 

Privacy Policy | Contact Us