Technology that Spots What Humans Might Miss in a Crowd

July 2017
Topics: Human Language Technology, Machine Learning, Information Security
Security cameras in public places are commonplace, but they don't always prevent disturbances. MITRE researchers are exploring how a fusion of soft biometrics, audio, and other data from video feeds may help analysts detect and avert public threats.
Crowd at a concert

Whether you're shopping at the mall, cheering on the hometown team at a large sporting event, or in line at the airport, you know they're there—security cameras.

"Even in a location as benign as a parking garage, there are so many surveillance cameras capturing images at various angles that a person in a control room can only monitor so much before losing sensitivity," says Chongeun Lee, a MITRE engineer who specializes in biometrics. "It's easy to miss something critical among all the information flowing in."

Lee is the principal investigator on the "LinkBioMan" technology project, which is part of MITRE's internal research program. A team of researchers including Monica Carley-Spencer, Chris Pike, Benjamin Skerritt-Davis, Haluk Tokgozoglu and Amanda Vu is contributing its expertise in video analytics, biometrics, machine learning, human language technology, and computational auditory perception to create sensors that can spot anomalies on video. By flagging unusual activities—especially at large public gatherings—the sensors can help humans act to prevent a crisis or respond promptly after the crisis. The key is to not just identify objects of interest or obvious concerns; but also patterns of behavior that are both security and safety concerns and predict the most critical issues. Each response is costly and the balance with necessary privacy concerns is critical to this research.

"Our idea is when something abnormal happens, an alert is triggered so personnel on duty can act quickly," Lee says.

Unusual Behaviors: Making the [Automated] Connection

Surprisingly, most mainstream video security technology lacks sound, color, or both. Think of the grainy, silent, black-and-white images often replayed on television news reports. While today's video can solve small, specific problems, it doesn't effectively fuse visuals with audio and other sensor data.

"The color and audio technology found in social media video offers great clues," Lee says. "The sounds of glass breaking, sirens, or gunshots might indicate civil disturbance." Therefore, the team makes use of selected cellphone videos of incidents, posted by individuals online, as data for training and testing.

The LinkBioMan system is being designed with an ability to conduct fusion on real-time monitoring feeds and to apply the same algorithms for forensics purposes on pre-recorded video, regardless of the method of capturing.

This type of "smart video" isn't normally available to security and law enforcement personnel until after an incident occurs. (MITRE conducts research in that area as well.)

Sound is just one analytic in LinkBioMan, whose formal project title is "Linking Soft Biometrics to Semantic Description of an Event." The MITRE team is going beyond audio. They're also applying machine learning to algorithms they developed that create a semantic linkage—meaning a contextual understanding—of the relationship between people, activities, objects, and environments.

Training to Recognize a Link

A good example of this type of technological understanding is a scenario where a group of children are playing on railroad tracks. Normally, children playing is a harmless activity, but the implication of their location—railroad tracks—is ominous.

Factor in the time-honored truth that most children aren't typically aware of their surroundings, and the situation dangerously elevates. Thus, the vision is that LinkBioMan technology would sound an alert to either the conductor or nearby personnel that the children are on the tracks and should get off. 

"We're trying to elevate security to the next level, because presently there isn't much automation that can intelligently interpret activities and events within video," says Carley-Spencer, the co-principal investigator on the project. "We're applying computer vision and machine learning to multimedia (video frames, audio track, and potentially other data feeds), to essentially train a computer to recognize what is happening."

The LinkBioMan algorithms also seek to catch unusual behavior that the human eye may not immediately detect. Think of someone carrying a large bag and walking in the opposite direction of thousands of standing people as they take in a parade. Maybe the person is just scanning the crowd for a friend. But what if there's more going on?

Safer in a Crowd

A blend of open-source tools form the backbone of LinkBioMan technology. An additional element being developed by the team for this new fusion of tools is image captioning. Image captioning software examines a scene and provides a verbal description to analysts of what may be amiss.

"We're striving to connect object detection to the descriptions provided by the captioner," says  Tokgozoglu, MITRE's technical lead of video analytics for LinkBioMan. "We can create a contextual link between what the scene description says is happening, and what visual entities corresponding to the text are saying about the scene."

Within the LinkBioMan family of acoustic, video, and fusion algorithms, the team is currently working to increase accuracy and robustness. Their next step is to expand LinkBioMan to other use cases, such as natural disasters, environmental hazards, and construction safety.

The work on LinkBioMan aligns with the needs of MITRE's sponsors, because "the commercial sector is addressing the interests of their largest customer base, and these don't necessarily overlap with the problems that government and federal law enforcement agencies are most concerned with," Carley-Spencer notes. "MITRE has significant in-house research expertise, so we can quickly take advantage of knowledge gained from other sponsor projects to generate prototypes on a low budget."

"Once we fully develop this system, we can re-tune it to address cases for a government organization's needs," Tokgozoglu adds. "We already have an algorithm for riots, and we plan to tailor it to large public speaking gatherings, parades, and athletic events."

Ultimately, the LinkBioMan research team is aiming for success via the technology's flexibility and customization.

"Our goal is to make the algorithm useful for all kinds of situations where security is difficult to maintain, because of the many elements in play," Lee concludes.

—by Cheryl Scaparrotta


Interested in MITRE's Work?

MITRE provides affordable, effective solutions that help the government meet its most complex challenges.
Explore Job Openings

Publication Search