Audio Hot Spotting Retrieves Information from Multimedia

February 2010
Topics: Multimedia Information Systems, Information Storage and Retrieval
MITRE's patented audio hot spotting technology combines speech recognition with phoneme-based, audio retrieval to quickly search vast numbers of multimedia files for keywords, phrases, speech rate, laughter, and applause.
audio signal

As a company that does research, MITRE's work often meets at the intersection of technology and human interaction. An example is the development of the audio hot spotting process, which was granted a patent recently as a tool for analyzing multimedia content.

Audio hot spotting, or AHS, is a combination of MITRE-developed algorithms plus technologies from other government research laboratories and commercial companies. It uses speech-to-text software to process audio files into text, which can then be quickly computer-searched for any word or phrase. As the research has continued, advances in the technology have allowed users to query and retrieve data from multimedia sources using multiple features and speech recognition engines.

How It Works: Phoneme and Word-based Retrieval

Analysts and researchers use automated search and query capabilities—built around criteria such as keywords, phrases, speaker identity, speech rate and emphasis, laughter, or applause—to draw out desired information in real time from multiple forms of media. For example, they can study financial broadcasts to get a reading on stock movements or monitor political speeches to gauge audience reaction quickly. Such information can be helpful to government agencies responding to developing situations.

Audio hot spotting combines word-based speech recognition with phoneme-based audio retrieval. (An example of a phoneme is the "t" sound in the words tip, stand, water, and cat.) Each system has its advantages and disadvantages, but combining the two produces better precision and recall for audio retrieval than using a single speech processing engine.

Phoneme-based audio retrieval is fast and can handle variations in spelling, out-of-vocabulary words, and various audio qualities, but may have more false positives for short-word queries. For example, the sounds for the set of letters [w + u + n] are in one, won, and wonderful. The phoneme-based engine can also retrieve proper names or words, such as "Negandra," or "Shenghzen," that aren't found in the dictionary of word-based speech recognition systems. However, the phoneme-based system alone does not yield a speech transcript, although it can retrieve audio segments containing the query words or phrases the user put in.

Word-based retrieval, on the other hand, is more precise for single-word queries using good quality audio and for providing transcripts for automatic analysis. But word-based retrieval may miss hits for phrase-based queries or out-of-vocabulary words and is slower in pre-processing and more sensitive to noisy audio.

By fusing the two types of speech processing systems from commercial-off-the-shelf systems, MITRE's audio hot spotting algorithm takes advantages of each system's strengths, while simultaneously filling gaps in their capabilities. MITRE's algorithms arrange the order of the results based on the computation of the two systems' weighted confidence scores. The AHS system also combines the query results from both speech processing engines and ranks them as one set.

Collaborative Effort

AHS was created nine years ago by MITRE principal scientist Qian Hu, a researcher in speech and natural language processing. Drawing on MITRE's strength in human language technologies, Hu brought together a team of five scientists, including herself, who jointly hold the patent for "System and method for audio hot spotting," Number 7,617,188.

In addition to Hu, the other patent holders include Fred Goodman, a lead signal processing engineer and a specialist in speech processing algorithm development; Stanley Boykin, a lead database software engineer and the lead architect for the AHS database; Randall Fish, a principal artificial intelligence engineer and specialist in speech processing algorithms; and Warren Greiff, a principal artificial intelligence engineer who worked on information retrieval.

"Since the patent was issued, the team has expanded beyond its original five members," says Hu. "We are continuing our research and applying the technology to user needs in multimedia information retrieval and analysis."

Audio hot spotting can retrieve various kinds of information from multimedia. Areas of interest can be specified using keywords, phrases, language, and speaker identification. Background sounds, such as applause and laughter, can also tell how a speaker is received by an audience. When matches are found, the system displays the recognized text and allows the user to play the audio or video in the vicinity of the identified "hot spot."

Audio hot spotting can retrieve various kinds of information from multimedia. Areas of interest can be specified using keywords, phrases, language, and speaker identification. Background sounds, such as applause and laughter, can also tell how a speaker is received by an audience. When matches are found, the system displays the recognized text and allows the user to play the audio or video in the vicinity of the identified "hot spot."

—by David A. Van Cleave

Publications

Interested in MITRE's Work?

MITRE provides affordable, effective solutions that help the government meet its most complex challenges.
Explore Job Openings

Publication Search