DEEPLANG Teaches Computers to Measure Their WordsOctober 2015
Topics: Artificial Intelligence, Modeling and Simulation
Coming across these two posts on your Twitter feed, you would know (or soon puzzle out after some grumbling about the generation gap) that they convey the same message. But if you fed them into a database along with 18,000 other tweets, would a computer program be able to match them together as having identical meanings?
Last fall a MITRE team entered an international competition to design a natural language processing program that could determine whether two sentences expressed the same or similar meaning. MITRE's entry in the 2015 SemEval Semantic Similarity task performed the best out of those submitted by 18 teams. In fact, it only performed a little worse than a human annotator did.
"The result was really exciting for us," says MITRE's Guido Zarrella. "First, because we hadn't expected to do so well against so many quality teams. But also because we can now apply what we've learned to designing better and cheaper natural language processing tools for our sponsors." Zarrella is the principal investigator for the DEEPLANG project, whose findings the MITRE team mined in designing their entry.
Deep into the Heart of Language
For more than 20 years, MITRE researchers have made key contributions to the field of natural language processing, which seeks to equip computers with the ability to understand the meaning of human language. Using this technology, MITRE's sponsors can feed their computer systems data in the form of plain written language, or have their system search through books, user surveys, or Internet forums to gather the data itself. Another project helps your phone tell the difference between a real voice and a sound-alike recording for secure passwords.
DEEPLANG is one of several natural language processing projects MITRE is pursuing with the ultimate goal of making artificial intelligent systems completely fluent in "humanese." DEEPLANG explores the use of "deep learning" to expand the capability of artificial intelligence programs to learn the structures of human language. With deep learning, an AI builds itself a sophisticated model of how language works by piecing together knowledge gained from analyzing words and sentences.
Model Measurements Require Millions of Language Examples
"The idea is that first you feed the program a sentence," Zarrella explains. "The program takes the first word of the sentence and measures its use and meaning. Then the program takes its analysis of the first word and combines it with its analysis of the second word. Then it takes the analysis from both of those words and combines it with a third word and so forth.
"By the time the program has gone through the entire sentence, it's considered how each individual word relates to all the other words."
By analyzing millions and millions of such sentences, the program can build a model of how language works. Using that model, it can "measure" the meanings of words and sentences and then compare the measurements of different sentences. Sentences with similar measurements will likely have similar meanings.
This was the cornerstone idea for MITRE's SemEval entry. "The challenge for the SemEval task was to measure semantic similarity between two sentences," Zarrella says. "It's a challenge that computers are ill-suited to solve because they don't understand language in the same way that people do."
Years of Human Language Research into the World of Words
But it's a crucial challenge. Daily life is swamped with words: news headlines and Facebook posts and doctor's notes and legal briefs. Having automated help to wade through it all would be a boon to productivity, which is why companies like Google and Microsoft are investing heavily in natural language processing research.
Looking to make sure that our government sponsors stay ahead of the learning curve, MITRE informs them on the latest research. We also pursue our own research in the field and enter competitions like SemEval to test how well our ideas work.
Having received late notice of the 2015 SemEval, MITRE's entry required a scramble to complete. "We had about three weeks to devote to this project," Zarrella says, "while other teams had three months. So it was kind of a sprint."
The DEEPLANG team of Zarrella, John Henderson, Liz Merkhofer, and Laura Strickhart approached the challenge with a kitchen-sink mentality. "We wanted to understand which approaches had promise and which didn't. So we brought together a number of technologies and ideas, some of which were brand new and had never been developed before. Others dated back to MITRE's first steps into human language processing technology."
Sentences in Alignment
To construct its computer model, the MITRE team downloaded 300 million sentences from Twitter and then analyzed them with deep-learning algorithms. "First we constructed what's called an 'alignment' between two sentences," Zarrella says. "This is an approach that's used a lot when you're training a computer to translate from one language to another. You feed the model different translations of a sentence, and the model then figures out how the corresponding words in the translations align to each other."
Using the alignment technique, the computer measured the "distance" between the words in the sample tweets. Short distances indicated shared meaning, long distances indicated disparate meaning. Once the computer model had an understanding of how Twitter sentences might relate to each other, MITRE fed the model the 17,790 tweet pairs SemEval provided to each team. The model then matched the tweets it detected had the same or similar meanings.
The results were impressive. SemEval graded the correlation of each entry's matches on a scale of 0 to 1, 1 being a perfect match with expert judgment.
"Our system’s score was 0.62," says Zarrella. "No other entry topped 0.57, while human annotators scored 0.74."
With measured proof that their DEEPLANG research is on the right path, Zarrella and his team hope to stretch out the model's matching capabilities. "We want to be able to analyze not just words and sentences, but paragraphs, documents, dialogues—and turn that analysis into useful information."
The uses for that information are as varied as the missions of MITRE's sponsors. The U.S. Department of Veterans Affairs hopes to improve veterans' health by making better use of the data in their medical history. And MITRE is partnering with the Federal Aviation Administration to analyze the transcripts of conversations between air traffic controllers and pilots to improve how information passes between them.
"Really, any aspect of government that deals with language in some way can benefit from this research," Zarrella says. "MITRE's goal is to provide sponsors with new technologies based on the latest findings in the field of human language processing, and, in cases where the technology is not there yet, to help the field advance."
—by Christopher Lockheardt