About Us Our Work Employment News & Events
MITRE Remote Access for MITRE Staff and Partners Site Map

Technology Symposium banner

»Complete Project List

»

Projects Featured in Human Language:


CLASR: Cross Language Automatic Speech Recognition

Clipper & TrIM: Embedded Machine Translation Prototypes

Core Dialogue Research

Document Analysis Methods in Fraud Detection

Document Exploitation (DOCEX) Improvement by Component Evaluation

Embedded Foreign Language Exploitation Tools

Quick International Character Recognition (QUICR)

Reading Comprehension: Reading, Learning, Teaching

blue line

2005 Technology Symposium > Human Language

Human Language

Human Language researches computer systems that understand and/or synthesize spoken and written human languages. Included in this area are speech processing (recognition, understanding, and synthesis), information extraction, handwriting recognition, machine translation, text summarization, and language generation.


CLASR: Cross Language Automatic Speech Recognition

John Henderson, Principal Investigator

Location(s): Washington

Problem
Fewer than 50 languages have large available lexical resources. Machine translation systems address only the 20 languages with obvious commercial and military impact. While progress has been made on these languages over 50 years of research, the techniques developed exploit large quantities of written resources. Consequently, these technologies are not applicable to languages with few written resources.

Objectives
We will investigate a novel technique for identifying and characterizing the concepts represented by target language (English) words in digitized audio recordings of a source (foreign) language. The result is a system for recovering English text from source language audio. It requires neither source language written resources for system development nor a source language written intermediate form at decoding time.

Activities
The problem of spoken language translation is broken down into several modeling subproblems, each of which is the target of recent research advances: automated acoustic unit discovery, bilingual lexicon design, structured language modeling, and transduction grammars. This project investigates how these separate research advances can be combined into a unified model and system for cross-language speech recognition.

Impact
Machine translation and speech recognition are typically viewed as properly separable efforts. This attempt at making a joint model will likely spur other such approaches in the research community, reminding many to explore bridging techniques. More practically, but longer term, many developing nations require assistance and resources for continued existence. This technology will aid international assistance and monitoring.

Presentation [PDF]


^TOP

Clipper & TrIM: Embedded Machine Translation Prototypes

Rod Holland, Principal Investigator

Location(s): Washington


^TOP

Core Dialogue Research

Christine Doran, Principal Investigator

Location(s): Washington and Bedford

Problem
Dialogue managers (DMs) are of increasing interest to our sponsors, but have not been useful to date because they are not flexible enough to handle conversations of moderate complexity, multiple modalities, or more than two participants, or to be adapted to new conversational tasks or domains without considerable effort.

Objectives
Our objectives are twofold: first, to advance the state of the art of operational dialogue managers along the continuum of dialogue complexity, and, second, to develop a new paradigm for the rapid development of modular, extensible and robust dialogue managers and for their evaluation.

Activities
In year one, we will assess the dialogue needs of three areas -- training, question-answering, and multimodal, multiparty robot control -- by porting existing DMs to them. In year two, we will focus on developing our modular information-state DM toolkit. In year three, we will formally evaluate our development paradigm by porting our toolkit to the same three areas.

Impact
By promoting a systematic approach to development of robust, portable DMs, we will transition this technology out of the laboratory into sponsor hands. The experience we gain in evaluating the portability and robustness of the toolkit will give sponsors the information they need to evaluate the potential effort and resources needed to build a new dialogue system.

Presentation [PDF]


^TOP

Document Analysis Methods in Fraud Detection

Marc Vilain, Principal Investigator

Location(s): Washington

Problem
Each year, the U.S. Treasury is defrauded of $85 billion in corporate taxes owed by large- and medium-sized business. At fault are ever-new revenue-hiding schemes that are legally disallowed. The IRS, however, is limited in its ability to investigate, as corporate tax filings often lack key details required to detect noncompliance.

Objectives
A promising alternative is to exploit SEC filings to detect potentially noncompliant accounting practices. Our objective is to introduce text-oriented methods for securities filings that can complement the data-oriented methods used for tax filings. We are especially concerned with creating an exploratory testbed that will enable adaptability and interactive control by end-users.

Activities
We will apply current information extraction techniques to SEC filings and will identify relevant facts ("the company registered a tax loss") and relationships ("Mr. Smith is a shareholder in the partnership"). These language elements will be collated into document-level analyses through linguistic and statistical models. We will also pursue trainable classification techniques to abstract these analyses into automated noncompliance detectors.

Impact
As a research endeavor, this project will further the practice of information extraction through advances in fact detection, document categorization, and document structure understanding. Our experimental testbed will look at a number of key problems, among them discovering suspect ownerships in partnerships, identifying disallowed tax losses, and detecting new variants of existing tax abuse schemes. Because of the sheer magnitude of uncollected revenues, even partial solutions are valuable.

Presentation [PDF]


^TOP

Document Exploitation (DOCEX) Improvement by Component Evaluation

Linda Van Guilder, Principal Investigator

Location(s):

Problem
A huge portion of the Arabic script documents gathered in Afghanistan, Iraq, and the War on Terror are handwritten. No technology currently exists for recognizing offline handwritten Arabic. The number of documents to be translated far exceeds the manpower available to process them quickly. We must assemble technology solutions for digitizing, translating, and triaging the data so that we can deliver high-priority documents to analysts more rapidly.

Objectives
We will develop techniques to rapidly modify existing handwriting recognition algorithms to process Arabic and other low-density languages. We will solidify the methodology on handwritten Arabic and fine tune it on Pashto or another low-density language.

Activities
We will collect and publicly release training data and corpus generation tools. We will attempt to improve performance by investigating search and matching algorithms, cross-lingual feature typologies, algorithms for segmentation and feature extraction, alternatives for diacritic handling, and a multi-pass processing scheme. We will investigate task-focused recognition and extensions to enable dynamic swapping of language models.

Impact
By taking a systematic approach to rapid development of international character recognition systems, MITRE will be prepared to help overcome the next language processing crisis. Prototypes for handwritten Arabic script recognition and corpus generation and corpora of handwritten Arabic will be made available for technology transfer. Technical reports and white papers published under this research program will advance the state of the art.

Presentation [PDF]


^TOP

Embedded Foreign Language Exploitation Tools

David Day, Principal Investigator

Location(s): Washington and Bedford

Problem
Since linguists have excessive workloads of backlogged documents, there is a need to address aids that will help enhance timely human identification, extraction, and translation of critical intelligence content. Many analysts today are insufficiently trained in the source language being exploited. Can we leverage automated tools and improved visualization to aid in the efficient understanding of foreign language material?

Objectives
We seek to improve the productivity and quality of human exploitation of foreign language material, including both document translation and the direct extraction of information for intelligence applications. We will establish an instrumented exploitation toolbox sufficient to perform carefully controlled experimentation that can help identify those capabilities that can provide the greatest increases in human quality and productivity.

Activities
We will perform a survey of existing tools that have been proposed or used as productivity aids for foreign language DocEx. We will integrate newly developed automatic foreign language processing and visualization capabilities that might contribute to effective DocEx. Using an instrumented toolbox we will study empirically the ways in which analyst productivity is most enhanced.

Impact
The results of this project's investment in integrated tool development and experimentation will be performance metrics and empirical results that can guide investment in the improvement to the nation's DocEx capabilities, as well as an operational prototype that demonstrates some of these productivity and quality improvements.

Presentation [PDF]


^TOP

Quick International Character Recognition (QUICR)

Amlan Kundu, Principal Investigator

Location(s): Washington

Problem
Many of the Arabic script documents gatehered in Afghanistan, Iraq, and the War on Terror are handwritten. Limited technology currently exists for recognizing offline handwritten Arabic. The number of documents to be translated far exceeds the manpower available to process them quickly. We must assemble technology solutions for digitizing, translating, and triaging the data so that we can deliver high-priority documents to analysts more rapidly.

Objectives
We will develop techiques to rapidly modify existing handwriting recognition algorithms to process Arabic and other low-density languages. We will solidify the methodology on handwritten Arabic and fine tune it on Pashto or another low-density language.

Activities
We will collect and publicly release training data and corpus generation tools. We will attempt to improve performance by investigating search and matching algorithms, cross-lingual feature typologies, algorithms for segmentation and feature extraction, alternatives for diacritic handling, and a multi-pass processing scheme. We will investigate task-focused recognition and extensions to enable dynamic swapping of language models.

Impact
By taking a systematic approach to rapid development of international character recognition systems, MITRE will be prepared to help overcome the next language processing crisis. Prototypes for handwritten Arabic script recognition and corpus generation and corpora of handwritten Arabic will be made available for technology transfer. Technical reports and white papers published under this research program will advance the state of the art.

Presentation [PDF]


^TOP

Reading Comprehension: Reading, Learning, Teaching

Lynette Hirschman, Principal Investigator

Location(s): Washington and Bedford

Problem
This project is addressing a three-stage grand challenge application for human language technology: building a system that can "learn to read," then "read to learn" and finally "teach to learn." It deals with issues of machine learning, knowledge acquisition, and instructional technology.

Objectives
First, we will build a computer-based system capable of passing a third grade reading-comprehension test. Second, we will build a system that will "read to learn," passing a test on that subject matter after having read the text. Finally, we will build a system that can learn through interacting with a person, and, at the same time, help to teach the person.

Activities
We have applied prototype systems on reading comprehension tests designed for fourth to eighth graders with a 30%-40% accuracy. We are improving the system to include more components. We will implement a reciprocal teaching demonstration, where the system plays the role of teacher (grading student answers) or the role of peer learner (answering questions posed by a real student).

Impact
This research will open new areas of research, addressing issues of machine learning, breaking the knowledge acquisition bottleneck, developing new evaluation measures for understanding and learning, and creating new instructional technologies via learning companions and interactive teaching environments.

Presentation [PDF]


^TOP

 

 

Homeland Security Center Center for Enterprise Modernization Command, Control, Communications and Intelligence Center Center for Advanced Aviation System Development

 
 
 

Solutions That Make a Difference.®
Copyright © 1997-2013, The MITRE Corporation. All rights reserved.
MITRE is a registered trademark of The MITRE Corporation.
Material on this site may be copied and distributed with permission only.

IDG's Computerworld Names MITRE a "Best Place to Work in IT" for Eighth Straight Year The Boston Globe Ranks MITRE Number 6 Top Place to Work Fast Company Names MITRE One of the "World's 50 Most Innovative Companies"
 

Privacy Policy | Contact Us