Text Mining in Foreign Languages Has the Global Aviation Safety Community Talking

November 2016
Topics: Aviation Safety, Software Engineering, Machine Learning, Systems Engineering, Aviation and Aeronautics
Written reports are an invaluable source of data for analyzing aviation safety events. MITRE developed text processing capabilities that tap into that rich data source more efficiently. Now we’ve adapted those capabilities to work with a foreign language.
Camille Shiotsuki, Sheng Liu, and Frank Sogandares in front of whiteboard.

MITRE has demonstrated that there is a potential for adapting a system capable of analyzing aviation text reports for particular safety issues to a complex foreign language. This is a transformational capability for enhancing aviation safety globally.

This achievement builds on a MITRE-developed capability to analyze text reports from air traffic controllers and pilots that has been a key component of safety analyses in the U.S. That automated system identifies the safety concerns described in each text report. These may include course deviations, disruptive passengers, aircraft entering a runway without clearance to do so, equipment malfunctions, and unstable approaches during landings. The system then groups these reports into categories so analysts can examine them in depth to determine whether the events the reports describe represent a safety pattern.

These capabilities were originally designed to work with reports in English. But because aviation safety is a global concern—with the safety initiatives of one country having an impact on aviation safety throughout the world—we wanted to know if we could adapt it to work with reports written in other languages. Since rapid growth in air traffic is forecast for the coming 20 years in the Middle East, Asia, and Africa, we wanted to learn whether we could adapt the system to work with the languages used in these regions, many of which are character-based rather than alphabet-based.

Testing the Concept

In 2015, MITRE launched an internal research project to tackle this challenge. We began by obtaining a database of 20,000 sample aviation safety reports in one character-based language. (Due to an agreement with the organizations providing that data, we are not at liberty to reveal the source of those reports, or even the language in which they are written.)

With the data in hand, a team of engineers with expertise in system architecture, software systems engineering, and foreign languages began making the existing text-mining computer programs work with the test language.

That was a tall order, for several reasons. Even in English, different air traffic controllers or pilots will describe similar events in different ways. For instance, one might describe "engine trouble," another "engine failure," and yet another "lost engine number 2." One might spell out "runway" while another might abbreviate it as "RNWY" or "RWY."

The MITRE team had to first analyze the language that appears in text-based reports and then create a data dictionary to capture these differences in terminology to identify what was, essentially, the same type of safety event.

Training the Software to Identify Safety Concerns

In working with the test language, there was the added challenge of making the system recognize foreign characters. Once the words were correctly put into the system, the team conducted testing to ensure that their automated text processing and data mining algorithms worked regardless of the language the reports were written in. 

"Automation will not eliminate safety analysts' work, but it can create a consistent, repeatable way for them to get a snapshot of what the reports contain," says Camille Shiotsuki, the project's principal investigator. "That will dramatically improve their ability to recognize trends and address them."

Validating the System with End Users

Besides proving we could use the existing models and algorithms to work with a foreign language, MITRE worked with the providers of the test data to ensure the system produced meaningful information.

"They're very happy with the analyses and statistics we've delivered," reports Valerie Gawron, who oversees the research. "In addition to being able to quickly identify similar incidents, they've been able to see how trends track over time.

"For example, this capability can be applied to identify the seasonality of incidents such as bird strikes—or collisions between aircraft and birds—that may be very common during some months but not so much at other times of the year. And incidents involving unruly passengers may correlate closely with certain holidays." This ability to forecast the region-specific seasonality of safety incidents may help prevent them in the future.

The team also became aware of certain cultural differences in safety reporting and will take them into consideration when adapting the models for other languages. "For instance, what one culture might describe as 'moderate turbulence' may be considered severe turbulence for most of the rest of the world's pilots," Gawron explains. "There is no guarantee that translated words have the same weight or meaning in describing safety concepts. That's why working with the contributors of the data to validate the results is so important."

Sharing the Technology with Other Nations

Next steps include exploring who in the global aviation community may benefit from this capability. MITRE could then contribute the know-how developed during this research to another language, another country, or another region of the world.

"The potential of this work is that individual countries could one day have the ability to process text reports in their own language," Shiotsuki says. "And if several countries band together to share their aviation safety data, this kind of tool could be adapted to process and analyze reports in multiple languages." That could be a game-changer for identifying and addressing regional safety trends. The end result is safer air travel for everyone.

—by Marlis McCollum


Publication Search