 |
Fred Goodman |
Listening to the Future
Fred Goodman
March 2003
The ability to instantly retrieve an exact audio file adds enormous
benefit to situational analysis.
Like raking pearls from the mire, locating and extracting critical
bits of sound from an impenetrable sea of audio files demands broad
expertise, patience, and a really good ear. Fred Goodman has all
three. With several decades of signal processing experience, the
past two at MITRE as an engineer in the Signal Processing Center,
Goodman is totally immersed in his newest challenge: audio hot spotting.
Hot spotting is the ability to instantly search pre-recorded video
and audio files or footage or live events (e.g., court testimony,
speeches, or telephone conversations) for specific spoken words,
sound effects, or noises. And it is done with the ease and speed
of an Internet-type browser.
"The hot-spotter will never replace the human listener," says Goodman,
"but it can save hours of labor, listening to audio, waiting for
an interesting bit. The ability to query and instantly retrieve
the exact audio file you are looking for—a few sentences of
a speech, for example—adds enormous benefit to situational
analysis." He mentions the now infamous Watergate tapes that led
to President Nixon's resignation. "Imagine if those investigators
could hot-spot all instances of, say, Halderman's conversations
about money. It could be done in a couple of hours. Back then, however,
everything had to be manually transcribed and then played back on
audiotape. It took weeks to find out where the critical stuff was."
Five years from today, the Gartner Report predicts, moving images
(essentially video and multimedia), together with accompanying audio,
will dominate the Web landscape. If so, people will require future
search engines that can search video and audio files as deftly as
today's handle text. That's not possible yet. Recently, however,
Goodman and a team of researchers in MITRE's Bedford and Washington
locations have engineered a breakthrough solution.
Goodman brings years of speech technology experience to the project,
especially in the area of automatic speaker recognition. During
the team's first month of work, he helped define the audio hot spotting
prototype architecture and began investigating the speaker recognition
subsystem. "Members of the team have very different and complementary
skills, which made the system come together very quickly," says
Goodman. "For example, Project Leader Qian Hu has great knowledge
of current speech recognition research and what's happening in the
commercial sector. I know more about government research and development."
In the first seven months of a three-year project, the team produced
a working audio hot spotting prototype and demonstrated its capabilities
at MITRE's annual Technology Symposium. "That was a great moment
for the team. We'd hit it, and we knew it," says Goodman.
The hot spotting prototype also retrieves video, which is an all-important
capability for today's video-driven communications. "We hear so
much better when our eyes work together with our ears," explains
Goodman. He considers the observable nuances in lip or eye movement
extremely important to any audio analysis.
Goodman is enthusiastic about taking the prototype to the next
level. "The Internet will be the ultimate beneficiary down the road,"
says Goodman, "but right now we are focused on our sponsors' needs.
The government has millions of hours of audio—terabytes upon
terabytes—in its files. These include: analog and digital
recordings from television and radio, telephone conversations, surveillance
tapes, not to mention military communications and material in the
Library of Congress."
With one ear cocked to the future of audio recognition, Goodman
and his teammates are pioneering the new frontier of sound as they
continue to improve on their discovery.
|