![]() |
|||||
|
|
Home > News & Events > MITRE Publications > The MITRE Digest > | |||||||||||||||||||
Human Language Technology March 2001
Chasing HAL - The Pursuit of a Human-Computer Interface "Man became man not by the tool but by the Word. It is not walking upright and using a stick to dig for food or strike a blow that makes a human being, it is speech." —Nadine Gordimer, The Essential Gesture Consider for a moment two people in casual conversation. Each brings to the conversation a vast, enormously versatile and powerful ability: speech! It is estimated by Steven Pinker, director of MIT's Center for Cognitive Neuroscience, that each of the speakers possesses the ability to produce a hundred million trillion sentences. If that isn't astounding enough, each speaker is easily able to understand all of the other's conversational offerings. No machine of any kind in history comes even close to performing such a feat. With computers rapidly becoming bigger, faster, and better, and the daily newspapers brimming with exploits of nearly unimaginable computer prowess, the natural tendency is to think that talking language machines must be just around the corner. Not so fast! Nature's great prize is not so easily won. As science continues to wrestle with the question of exactly how the brain understands language, Human Language Technology tries to coax conversation between machines and humans into being. According to Speech Technology Magazine, after a 15-year adolescence, "understanding" between machines and humans is now a definite "when" and not an "if". But the "when" seems grudgingly slow in coming, and the advent of sentient computers still far off. For eight years now Lynette Hirschman, chief scientist of MITRE's Human Language Technology research group, together with her colleagues, has sought to marry human language capabilities to the ever-growing power and sophistication of computers. Erasing a small corner of a large, crammed whiteboard in her office, she drew a matrix illustrating the many subsets of this sprawling field of research. "Human language technologies are those innovations that make it possible to have computers recognize human speech to understand the meaning of human speech, to use speech to communicate with humans," she explained, "that is, to have a successful conversation or interaction via language." Dave: Open the pod bay doors, HAL. —2001: A Space Odyssey A watershed moment in Human Language Technology would be in achieving a true conversational interface—a go-between device enabling a human and a computer to speak, to be understood, and to reply in conversation as comfort-able and natural sounding as the human's native tongue. To date, algorithms have been created for parsing sentences and large databases of vocabularies have been developed to match words. Yet for Hirschman and her colleagues, a machine able to emulate true language interaction is still far off.
Far off, of course, can be a relative term, especially to researchers like Victor Zue, head of MIT's Spoken Language Systems Group, who recently delivered a presentation at MITRE on next generation speech-based interfaces. Zue is upbeat about the positive, near-term potential of what he calls true "intelligent agents," also known as smart interfaces. For example, MIT's Oxygen project, scheduled for rollout out in stages over the next 10 years, is, according to Zue, a true multi-domain, perceptual interface that combines both speech and vision—a computer with ears, eyes, and vocal chords serving up a vast, digital storehouse of varied and expert knowledge, and capable of conversing with the smoothness of human or near-human speech. To wit, a distant forebear of HAL, the omnipotent computer in the film "2001: A Space Odyssey." Explained associate professor and speech interface researcher Roni Rosenfeld of Carnegie Mellon University: "No suitable universal interaction paradigm has yet been proposed for humans to effectively, efficiently and effortlessly communicate by voice with machines. Natural language systems require a lengthy development phase which is data and labor intensive, with heavy involvement by experts who meticulously craft the vocabulary, grammar, and semantics for a specific field of knowledge." HAL, Gort, Data, and Robby the Robot aside, infusing computers with an awareness of human language and the ability to freely communicate with humans is one of the supreme challenges and one of the most arduous tasks in all of modern computing. Gordon Bell, of computer architecture fame, once confided that he looked at the problems of human language technology, thought them inordinately difficult, and moved on to work in other areas. So what is it that scared Bell off, and why is it that if computers have gotten so much faster and cheaper and more powerful, they have not become any better at understanding what we want them to do?
Terrence Deacon, author of The Symbolic Species: The co-evolution of language and the brain, is blunt about the huge gaps in our understanding of language. "We know how to use a word to mean something," he writes, "and to refer to something. We know how to coin new words and to assign meanings to them. We know how to create codes and artificial languages. Yet we do not know how we know how to do this, nor what we are doing when we do. Or rather, we know on the surface, but we do not know what mental processes underlie these activities, much less what neural processes are involved." Enabling a computer to duplicate some of these capabilities would be a great leap forward for interface technology. MITRE's DARPA Communicator program, funded by the Defense Advanced Research Projects Agency, has worked to provide human-to-machine interaction via speech and represents a solid step forward in the evolution of interface technology. This summer it will roll out of the laboratory for field tests. Communicator will integrate databases from multiple fields of knowledge like weather information, airline travel, car and hotel rentals, calendar access, as well as e-mail and voice mail access. The user and Communicator will interact conversationally to access and exchange this information, including the human-like abilities to signal non-understanding or to interrupt the other to clarify information. When it comes to an interface, one size does not fit all. Although some interfaces are commercially available and some are, according to Hirschman and Zue, quite good, they operate in narrow, specialized fields of knowledge, say for instance, weather reporting or airline bookings. Machine voice response or dialogue modeling is still somewhat stilted and not as smooth or natural sounding as is the goal for a true conversational interface. But not everything needs to be big to succeed. Hirschman and her group also custom engineer small-scale, highly specialized interfaces for computer-mediated communication. A computer that understands rules of grammar and syntax would be useful not only for understanding speech, but also for reading large amounts of text and extracting information. Another MITRE project, DARPA TIDES, scans running text from newswire feeds and newsletters for references to outbreaks of infectious diseases using Alembic, a trainable language processing system and rule sequence processor for high-speed, intelligent text extraction. Conceptual Browsing automatically organizes the information extracted from large collections of text. Hirschman's lab is also currently researching a speech-to-speech translation interface for what are called "low density languages," that is, rare languages, many without easily accessible grammar or dictionaries such as Tetun in East Timor, a tiny island in the Indian Ocean. Military or humanitarian intervention in East Timor, as violence there in 1999 made necessary, would make knowledge of Tetun a very important asset. Hirschman speculated on still another important specialty interface regarding the recently completed Human Genome Project. "Now the real work begins," she said, referring to the gargantuan database of genome information needing to be interpreted and parsed into genes and their associated proteins. Inderjeet Mani, the principal investigator for MITRE's Conceptual Browsing program estimates that humanity's yearly output of information is more than an exabyte or 1,000,000,000,000,000,000 bytes (1 quintillion bytes). The astounding complexity of speech together with this immensity of information presents both a daunting challenge as well as an enormous opportunity for the field of Human Language Technology. An amazing aspect to the pursuit is the diversity of disciplines necessary to keep up the chase. It takes linguists, psycholinguists, psychologists, mathematicians, electrical engineers, acoustical engineers, physicists, computer scientists, information technologists, cognitive scientists, and neurobiologists, working separately and together, just to keep pace. As Hirschman, with a smile of understatement, succinctly puts it: "These are exciting times." Indeed.
Page last updated: May 21, 2002 | Top of page |
Solutions That Make a Difference.® |
|
|