Dreaming CGI Teaches Computers to See Better than Humans

January 2017
Topics: Machine Learning, Artificial Intelligence, Computing Methodologies, Data (General), Mathematics, Sensing and Signal Processing
In 2016, MITRE achieved a milestone that's eluded artificial intelligence experts for decades—a computer vision system that recognizes objects with a greater accuracy than the average person. Inspiration from game-industry CGI played an important role.

Back in 1968 when Philip K. Dick wrote the novel Do Androids Dream of Electric Sheep? (later, the movie Blade Runner) many artificial intelligence (AI) pioneers thought that teaching computers to recognize objects was just beyond the horizon. Yet over the decades, the enormity of the challenge became clearer.

"Think of it this way," says Mikel Rodriguez, head of the MITRE Computer Vision Group. "We humans apply about 50 percent of our brain cells to visualization. It’s incredibly complex, but we take it for granted."

In 2016, MITRE passed a major milestone by creating a computer system that can recognize objects better than the average person. And, much like the stuff of science fiction, "dreaming" computer-generated imagery (CGI) computers were instrumental in the breakthrough.

Embracing the Volume of Data

As with earlier computer vision breakthroughs such as Holodeck (created in response to the Boston Marathon bombing), Rodriguez and his team recognized the combination of fast, inexpensive computer processing power and vast amounts of visual imagery flooding the internet and all aspects of our lives holds a solution. "With more than 18,000 hours of video uploaded to the internet every day, there is no shortage of visual information."

Rodriguez explains how embracing the volume of data helps their computer vision algorithm learn. "A typical code is written in straight, logical steps. But with deep learning, instead of having a lot of 'If this is true, do that' instructions, it works loosely like a human brain."

While this approach to artificial intelligence has been around for decades, in recent years deep learning has become immensely more powerful. This is thanks in large part to the rise of graphical processing units and the widespread availability of large collections of data that can be used to train these deep neural networks.

The system learns by rearranging the connections with new experiences, like a giant tangle of interconnected nodes that reweight themselves. The computer vision team feeds the algorithms massive amounts of images, each labeled, such as "this is a car, this is a chair."

"With deep learning, we ‘train’ computers to do what we want by providing examples rather than programming them. After many iterations, the system develops competency or even better than human performance in the desired task," Rodriguez says. "It’s really cool to watch."

But the vital step of labeling images was a time-intensive, manual process—and it formed a bottleneck. Previously, the team gathered labeled images as part of MITRE's involvement in ImageNet, a crowd-sourcing project of more than 50,000 people worldwide.

Then Eliza Mace, an intern from MIT (now a MITRE employee) working in the Computer Vision Lab was searching online for images of vehicles to help train the system. Images from video games kept popping up.

At first it was just an irritation, but then she had an epiphany: "I thought, CGI is so sophisticated, we could create real-world images of objects where annotations are difficult to find online. Plus, this is a process we can automate, which is far faster than having a person download a million examples and check through them all."

Creating Realistic Images and Labeling Them in CGI

And that's what they did. The computer vision team wrote a program that randomly picked different backgrounds and put realistic-looking 3D models into them, letting the computer randomly "dream" thousands of synthetic examples that could then be used to train a neural network. It turned out to be very effective, Rodriguez says.

"We let our CGI computer hallucinate for two weeks, essentially dreaming different scenes and objects. We established some parameters—like a bus stays on the road, it doesn’t fly—because real buses don't do that." Afterwards, the CGI system "taught" the AI system. "It worked. We had a treasure trove of data. And that boosted us to the point where the computer could surpass human abilities and recognize almost any image."

Rodriguez points out that people still read emotions better. But whereas an average person might recognize a picture or video of a car as "a sedan" or "a Honda," the computer can instantly recognize it as a "2013 Honda Accord DX."

He explains this new milestone may have implications for our sponsors for everything from aviation and driverless cars to national security and more.

"Right now, there are whole buildings full of people who look at video screens all day for security purposes. I believe it makes more sense to let a computer filter through those hours of footage first and let people do the creative problem-solving."

There are also numerous possibilities in healthcare, such as scanning blood samples for cancer cells or melanoma skin cancer detection. Currently, his team is working with Carnegie Mellon University to use computer vision in an assisted-living environment. For example, the system can alert a caregiver if an elderly person falls, leaves an oven door open, or signals for help.

"I've been in this field for 15 years, and I didn't think we'd get to this point in my lifetime," Rodriguez says. "I think what we have here is transformative technology. "And at MITRE, I’m excited to share it with our sponsors and apply it to some of our country’s most difficult and important problems."

For more information about Mikel Rodriguez and computer vision, check out these stories in The Boston Globe and WCVB-Boston.

—by Bill Eidson


Publication Search