![]() |
|||||
|
|
Home > News & Events > MITRE Publications > The MITRE Digest > | |||||||||||||||||||
| Chasing HAL: In Pursuit of a Human-Computer Interface November 2001
No machine of any kind in history comes even close to performing such a feat. With computers rapidly becoming bigger, faster, and better, and the daily news brimming with exploits of nearly unimaginable computer prowess, the natural tendency is to think that talking language machines must be just around the corner. Not so fast! Nature's great prize is not so easily won. As science continues to wrestle with the question of exactly how the brain understands language, Human Language Technology tries to coax conversation between machines and humans into being. According to Speech Technology Magazine, after a 15-year adolescence, "understanding" between machines and humans is now a definite "when," and not an "if." But the "when" seems grudgingly slow in coming, and the advent of sentient computers still in the distant future. For eight years now, Lynette Hirschman, chief scientist of MITRE's Human Language Technology research group, together with her colleagues, has sought to marry human language capabilities to the ever-growing power and sophistication of computers. Erasing a small corner of a large, crammed whiteboard in her office, she drew a matrix illustrating the many subsets of this sprawling field of research. "Human language technology makes it possible for computers to recognize human speech and to understand what it means—it makes it possible for machines to use speech to communicate with humans," she explained, "that is, to have a successful conversation or interaction via language." Making Smart Machines A watershed moment in Human Language Technology would be in achieving a true conversational interface—a go-between device enabling a human and a computer to speak, to be understood, and to reply in conversation as comfortable and natural sounding as the human's native tongue. To date, algorithms have been created for parsing sentences and large databases of vocabularies with pronounciations have been developed to match words. Yet for Hirschman and her colleagues, a machine able to emulate true language interaction is still far off. Far off, of course, can be a relative term, especially to researchers like Victor Zue, head of MIT's Spoken Language Systems Group, who recently delivered a presentation at MITRE on next-generation speech-based interfaces. Zue is upbeat about the positive, near-term potential of what he calls true "intelligent agents," also known as smart interfaces. For example, MIT's Oxygen project, scheduled for rollout in stages over the next 10 years, is, according to Zue, a true multi-domain, perceptual interface that combines both speech and vision—a computer with ears, eyes, and vocal chords serving up a vast, digital storehouse of varied and expert knowledge and capable of conversing with the smoothness of human or near-human speech. To wit, a distant forebear of HAL, the omnipotent computer in the film 2001: A Space Odyssey. During an onstage demo by telephone hookup to MIT's airline booking interface Pegasus (a much less advanced machine than the proposed Oxygen), Zue successfully booked a roundtrip flight from Boston to San Francisco. However, when he asked the computer for the cheapest fare, he got a reply of $3,300. Explained associate professor and speech interface researcher Roni Rosenfeld
of Carnegie Mellon University, "No suitable universal interaction paradigm
has yet been proposed for humans to effectively, efficiently, and effortlessly
communicate by voice with machines. Natural language systems require a
lengthy development phase which is data and labor intensive, with heavy
involvement by experts who meticulously craft the vocabulary, grammar,
and semantics for a specific field of knowledge." MITRE's DARPA Communicator MITRE's DARPA Communicator program, funded by the Defense Advanced Research Projects Agency, has worked to provide human-to-machine interaction via speech and represents forward progress in the evolution of interface technology. This summer, Communicator will roll out of the laboratory for field tests. Communicator will integrate databases from multiple fields of knowledge, like weather information, airline travel, car and hotel rentals, calendar access, as well as e-mail and voice mail access. The user and Communicator will interact conversationally to access and exchange this information, including the human-like abilities to signal non-understanding or to interrupt to clarify information. The success of DARPA Communicator could well hasten fuller development of true, "natural" dialog interaction with computers. A conversational interface such as DARPA Communicator, once operational, brings up another question: could the computer be taught to learn on its own? And, if so, can it be tested to see what it has learned? "First," answered Hirschman, "we must teach it how to read. After it has read something, say, a chemistry book, we then need to be able to question it on what it knows about chemistry." Even more brain-like and more econmical for the system, she continued, the entire chemistry book need not be learned to pull out some facts about specific areas. All the machine would really need to learn are the highlights—the essentials—just as humans learn. The division's Reading Comprehension program is actively researching the process, but beginning slowly in using elementary school third- and fourth-grade reading and tests. When it comes to an interface, one size does not fit all. Although some interfaces are commercially available and some are, according to Hirschman and Zue, quite good, they operate in narrow, specialized fields of knowledge, say for instance, weather reporting or airline bookings. Machine voice response or dialogue modeling is still somewhat stilted and not as smooth or natural sounding as is the goal for a true conversational interface. But not everything needs to be big to succeed. Hirschman and her group also custom engineer small-scale, highly specialized interfaces for computer-mediated communication. Specialty Interfaces A computer that understands rules of grammar and syntax would be useful not only for understanding speech, but also for reading large amounts of text and extracting information. Another MITRE project, DARPA TIDES, scans running text from newswire feeds and newsletters for references to outbreaks of infectious diseases. DARPA TIDES uses Alembic, a trainable language processing system and rule sequence processor for high-speed, intelligent text extraction. Conceptual Browsing automatically organizes the information extracted from large collections of text. Hirschman's lab is also currently researching a speech-to-speech translation interface for what are called "low density languages," that is, rare languages, many without easily accessible grammar or dictionaries such as Tetun in East Timor, a remote island in the Indian Ocean. Military or humanitarian intervention in East Timor, as violence there in 1999 made necessary, would make knowledge of Tetun a very important asset. Hirschman speculated on still another important specialty interface regarding the recently completed Human Genome Project. "Now the real work begins," she said, referring to the gargantuan database of genome information needing to be interpreted and parsed into genes and their associated proteins. Inderjeet Mani, the principal investigator for MITRE's Conceptual Browsing program, estimates that humanity's yearly output of information is more than an exabyte or 1,000,000,000,000,000,000 bytes (1 quintillion bytes). The astounding complexity of speech, together with this immensity of information, presents both a daunting challenge, as well as an enormous opportunity, for the field of Human Language Technology. A Single Hunt with Many Eyes Hirschman believes that breakthroughs in Human Language Technology will
not come in one fell swoop. Rather, progress will be realized through
a network of research communities over time. Certainly, as the Internet
had Bell Labs and the Xerox PARC as high-focus research facilities, places
like MITRE, MIT, and Carnegie Mellon provide the same for interface research.
But, she emphasized, the importance of an interconnected worldwide community
is absolutely necessary in finally unraveling the secrets of speech. —By Tom Green Page last updated: August 23, 2001 | Top of page |
Solutions That Make a Difference.® |
|
|