About Us Our Work Employment News & Events
MITRE Remote Access for MITRE Staff and Partners Site Map
edge top

Winter 2002
Volume 7
Number 1

 

 

Home > News & Events > MITRE Publications > The Edge >

voice access gives fast data response

Currently, lower-echelon military commanders cannot directly access information contained in Battlefield Tactical Operations Center databases. Having near-real-time access to this data would improve both their knowledge of overall conditions and their responsiveness to changes in tactical situations, but achieving this has proven to be difficult. Security concerns, limited bandwidth, and the desire to keep hands and eyes free have combined to make computer-to-computer links problematic.To solve this problem, MITRE decided to use low-bit-rate coded voice to access the databases. Using speech over secure narrowband wireless links has several advantages. First, voice coders and encryption tools already exist in military communications equipment. Second, adversaries would gain little benefit if the device were stolen or captured unless they knew passwords and other detailed system information. By contrast, losing a computer to a foe would be more costly. Third, a thin client—a computer with limited resources working through a server—requires no software updates. When the database is updated, the user can simply ask more questions.

Fred Goodman and Bryan George work on the prototype system.
Fred Goodman and Bryan George work on the prototype system.

How Does It Work?

In a peacekeeping operation, military personnel query a database about the current situation, as well as historical data. Representative queries, with typical responses, include:

1) Current time?
The current time is 3:59 PM, February 14.”

2) Threatcon level?
The Threatcon level is Charlie.”

3) How far am I from the last incident?
The incident occurred 81.2 miles to the NNE.”

4) When did the last incident occur?
The incident occurred at 9:39 PM, August 15.”

This capability could not have been achieved until recently because the key component technologies were insufficiently mature. Speech coding at very low bit rates was barely adequate for human-to-human communications in the 1980s (e.g., STU-III), but now distortion is so low that we can pass the output to a speech recognizer and get good results.

Similarly, speech recognition was limited to near-ideal noise and channel conditions and performed poorly on female voices as recently as the late 1990s. Now recognizers are being used in a wide variety of commercial applications and giving good results.

Speech synthesis is the third vital technology for this application. While synthetic speech has been intelligible for many years, quality has been improved recently by generating more natural intonation contours and syllabic stresses.

 

Figure 1. Data flow of prototype systems.
Figure 1. Data flow of prototype systems.

 

We built a prototype system with the following data flow: microphone input is digitized in response to a user push-to-talk button click (see Figure 1). When the user releases the button, the sampled data is encoded. The resulting bit stream is sent via wireless connection to a remote server that decodes the data and passes it to a speech recognizer, which, in turn, converts the audio into text. This text forms the basis of the database query. Query results are converted to user-friendly strings and are passed to a text-to-speech (TTS) synthesizer and encoded for transmission. Finally, the encoded speech is sent back over the wireless network to the thin client, decoded, and sent to the user’s ear.

Design Questions
To make this system successful, we needed to know what speech-coder bit rate is required to obtain good recognition performance and how performance degrades in the presence of noise and coding distortions. We ran experiments using a commercial-off-the-shelf (COTS) speech recognizer (Nuance) to test a variety of commercial and federal standard speech-coders using bit rates from 2.4 kilobits per second (Kbps) up to 16 Kbps. Our tests covered three types of military noise, spanning a 30-decibel (dB) signal-to-noise ratio (SNR): high mobility multipurpose wheeled vehicle (HMMWV, e.g., a truck or jeep) noise, Lynx helicopter noise, and machine gun noise. We used the Texas Instruments TIDIGITS speech database (strings of digits spoken by a large number of people) as source material.

Figure 2. Recognition performance for 9 coders.
Figure 2. Recognition performance for 9 coders.

 

The recognition performance for nine coders varied from 2.4 to 16.0 Kbps using HMMWV noise (see Figure 2). The Y-axis shows word error percentage; the lower the score, the better the performance. The results show that the two oldest coders performed the worst, while the most recently developed coders (i.e., G.728 and G.729) performed the best. The 2.4 Kbps MELP (Mixed Excited Linear Prediction) coder performed well down to ~6 dB SNR; as a result, we used MELP in our demonstration system. At that low rate, additional bits are available for error detection/correction, and the best encryption methods can be applied. We did not test for the effects of channel errors when error protection is inadequate (e.g., in a burst-noise environment).

Implementation
The demonstration system consists of a laptop computer with a wireless local area network (IEEE 802.11), a Linux personal computer, and a Sun server. We used a Nuance COTS recognizer, a Festival (open source) TTS synthesizer, and an Oracle database to store the “military conditions.” The system currently responds with ~5 second delay, but this can be reduced substantially with better data buffering.

Our ability to create this rapid prototype in less than six months was greatly aided by our use of the Defense Advanced Research Projects Agency Communicator software infrastructure, whose plug-and-play approach enabled us to combine COTS, open source, and homegrown software with a minimum of integration difficulties. MITRE has maintained and upgraded Communicator, which was originally developed at MIT. Communicator software and documentation are available at the SourceForge Web site. We are currently porting the client software to an iPAQ hand-held device to produce a fieldable system.

This project has proven that secure voice-driven database access can be achieved even in noisy environments. The prototype system worked well at MITRE’s very noisy 2001 Technology Symposium. For a low-complexity speech recognition problem (10–15 queries), we achieved excellent results with 2.4 Kbps coded speech, and performance remained acceptable in HMMWV noise down to 6 dB SNR. Our listeners also deemed the synthetic speech output acceptable.

The number and complexity of queries will grow as we gain more experience in the real-world use of the system. Of course, all of the added complexity will be in the back-end server. As recognition performance and coder quality continue to improve, the system can expand accordingly.


For more information, please contact Fred Goodman using the employee directory.


Homeland Security Center Center for Enterprise Modernization Command, Control, Communications and Intelligence Center Center for Advanced Aviation System Development

 
 
 

Solutions That Make a Difference.®
Copyright © 1997-2013, The MITRE Corporation. All rights reserved.
MITRE is a registered trademark of The MITRE Corporation.
Material on this site may be copied and distributed with permission only.

IDG's Computerworld Names MITRE a "Best Place to Work in IT" for Eighth Straight Year The Boston Globe Ranks MITRE Number 6 Top Place to Work Fast Company Names MITRE One of the "World's 50 Most Innovative Companies"
 

Privacy Policy | Contact Us