Audio Hot Spotting Retrieves Information from MultimediaFebruary 2010
Topics: Multimedia Information Systems, Information Storage and Retrieval
As a company that does research, MITRE's work often meets at the intersection of technology and human interaction. An example is the development of the audio hot spotting process, which was granted a patent recently as a tool for analyzing multimedia content.
Audio hot spotting, or AHS, is a combination of MITRE-developed algorithms plus technologies from other government research laboratories and commercial companies. It uses speech-to-text software to process audio files into text, which can then be quickly computer-searched for any word or phrase. As the research has continued, advances in the technology have allowed users to query and retrieve data from multimedia sources using multiple features and speech recognition engines.
How It Works: Phoneme and Word-based Retrieval
Analysts and researchers use automated search and query capabilitiesbuilt around criteria such as keywords, phrases, speaker identity, speech rate and emphasis, laughter, or applauseto draw out desired information in real time from multiple forms of media. For example, they can study financial broadcasts to get a reading on stock movements or monitor political speeches to gauge audience reaction quickly. Such information can be helpful to government agencies responding to developing situations.
Audio hot spotting combines word-based speech recognition with phoneme-based audio retrieval. (An example of a phoneme is the "t" sound in the words tip, stand, water, and cat.) Each system has its advantages and disadvantages, but combining the two produces better precision and recall for audio retrieval than using a single speech processing engine.
Phoneme-based audio retrieval is fast and can handle variations in spelling, out-of-vocabulary words, and various audio qualities, but may have more false positives for short-word queries. For example, the sounds for the set of letters [w + u + n] are in one, won, and wonderful. The phoneme-based engine can also retrieve proper names or words, such as "Negandra," or "Shenghzen," that aren't found in the dictionary of word-based speech recognition systems. However, the phoneme-based system alone does not yield a speech transcript, although it can retrieve audio segments containing the query words or phrases the user put in.
Word-based retrieval, on the other hand, is more precise for single-word queries using good quality audio and for providing transcripts for automatic analysis. But word-based retrieval may miss hits for phrase-based queries or out-of-vocabulary words and is slower in pre-processing and more sensitive to noisy audio.
By fusing the two types of speech processing systems from commercial-off-the-shelf systems, MITRE's audio hot spotting algorithm takes advantages of each system's strengths, while simultaneously filling gaps in their capabilities. MITRE's algorithms arrange the order of the results based on the computation of the two systems' weighted confidence scores. The AHS system also combines the query results from both speech processing engines and ranks them as one set.
AHS was created nine years ago by MITRE principal scientist Qian Hu, a researcher in speech and natural language processing. Drawing on MITRE's strength in human language technologies, Hu brought together a team of five scientists, including herself, who jointly hold the patent for "System and method for audio hot spotting," Number 7,617,188.
In addition to Hu, the other patent holders include Fred Goodman, a lead signal processing engineer and a specialist in speech processing algorithm development; Stanley Boykin, a lead database software engineer and the lead architect for the AHS database; Randall Fish, a principal artificial intelligence engineer and specialist in speech processing algorithms; and Warren Greiff, a principal artificial intelligence engineer who worked on information retrieval.
"Since the patent was issued, the team has expanded beyond its original five members," says Hu. "We are continuing our research and applying the technology to user needs in multimedia information retrieval and analysis."
by David A. Van Cleave