![]() |
|||||
|
|
Currently, lower-echelon military commanders cannot directly access
information contained in Battlefield Tactical Operations Center databases.
Having near-real-time access to this data would improve both their knowledge
of overall conditions and their responsiveness to changes in tactical
situations, but achieving this has proven to be difficult. Security
concerns, limited bandwidth, and the desire to keep hands and eyes free
have combined to make computer-to-computer links problematic.To solve
this problem, MITRE decided to use low-bit-rate coded voice to access
the databases. Using speech over secure narrowband wireless links has
several advantages. First, voice coders and encryption tools already
exist in military communications equipment. Second, adversaries would
gain little benefit if the device were stolen or captured unless they
knew passwords and other detailed system information. By contrast, losing
a computer to a foe would be more costly. Third, a thin client—a
computer with limited resources working through a server—requires
no software updates. When the database is updated, the user can simply
ask more questions.
How Does It Work? In a peacekeeping operation, military personnel query a database about the current situation, as well as historical data. Representative queries, with typical responses, include: 1) Current time? 2) Threatcon level? 3) How far am I from the last incident? 4) When did the last incident occur? This capability could not have been achieved until recently because the key component technologies were insufficiently mature. Speech coding at very low bit rates was barely adequate for human-to-human communications in the 1980s (e.g., STU-III), but now distortion is so low that we can pass the output to a speech recognizer and get good results. Similarly, speech recognition was limited to near-ideal noise and channel conditions and performed poorly on female voices as recently as the late 1990s. Now recognizers are being used in a wide variety of commercial applications and giving good results. Speech synthesis is the third vital technology for this application. While synthetic speech has been intelligible for many years, quality has been improved recently by generating more natural intonation contours and syllabic stresses.
We built a prototype system with the following data flow: microphone input is digitized in response to a user push-to-talk button click (see Figure 1). When the user releases the button, the sampled data is encoded. The resulting bit stream is sent via wireless connection to a remote server that decodes the data and passes it to a speech recognizer, which, in turn, converts the audio into text. This text forms the basis of the database query. Query results are converted to user-friendly strings and are passed to a text-to-speech (TTS) synthesizer and encoded for transmission. Finally, the encoded speech is sent back over the wireless network to the thin client, decoded, and sent to the user’s ear. Design Questions
The recognition performance for nine coders varied from 2.4 to 16.0 Kbps using HMMWV noise (see Figure 2). The Y-axis shows word error percentage; the lower the score, the better the performance. The results show that the two oldest coders performed the worst, while the most recently developed coders (i.e., G.728 and G.729) performed the best. The 2.4 Kbps MELP (Mixed Excited Linear Prediction) coder performed well down to ~6 dB SNR; as a result, we used MELP in our demonstration system. At that low rate, additional bits are available for error detection/correction, and the best encryption methods can be applied. We did not test for the effects of channel errors when error protection is inadequate (e.g., in a burst-noise environment). Implementation Our ability to create this rapid prototype in less than six months was greatly aided by our use of the Defense Advanced Research Projects Agency Communicator software infrastructure, whose plug-and-play approach enabled us to combine COTS, open source, and homegrown software with a minimum of integration difficulties. MITRE has maintained and upgraded Communicator, which was originally developed at MIT. Communicator software and documentation are available at the SourceForge Web site. We are currently porting the client software to an iPAQ hand-held device to produce a fieldable system. This project has proven that secure voice-driven database access can be achieved even in noisy environments. The prototype system worked well at MITRE’s very noisy 2001 Technology Symposium. For a low-complexity speech recognition problem (10–15 queries), we achieved excellent results with 2.4 Kbps coded speech, and performance remained acceptable in HMMWV noise down to 6 dB SNR. Our listeners also deemed the synthetic speech output acceptable. The number and complexity of queries will grow as we gain more experience in the real-world use of the system. Of course, all of the added complexity will be in the back-end server. As recognition performance and coder quality continue to improve, the system can expand accordingly. For more information, please contact Fred Goodman using the employee directory. |
Solutions That Make a Difference.® |
|
|