MITRE Helps the Government Address the Sizeable Challenges of Big Data AnalyticsJune 2012
Topics: Data Management, Distributed Systems
"Big data" is the catch-all phrase for data sets so large—petabytes, exabytes, zettabytes of information—that they are difficult to manage using traditional tools. But the wealth of information such data sets contain make developing the tools and techniques to mine that information worth the investment.
Google and Facebook have invested hundreds of staff years to develop new approaches, many of which are realized in open source technologies. And Eric Hughes, department head of MITRE's Big Data Analytics Center of Expertise in our Department of Defense federally funded research and development center, plans to help our sponsors gain equally potent benefits from their stores of data. "Our customers are interested in having systems that can manage large amounts of data to get the most information value out of it."
MITRE partners such as DARPA (Defense Advanced Research Projects Agency), the National Institutes of Health, and the U.S. Geological Survey are among those federal departments and agencies that announced more than $200 million in new commitments to develop data tools and techniques as part of the Obama administration's recently launched "Big Data Research and Development Initiative."
Dealing with the Data
MITRE's sponsors desperately need those tools and techniques because what was once a stream of data for them to manage has become a white-water river. When a single UAV mission can collect 1.4 petabytes of data (1 petabyte equals one thousand terabytes or 1 million gigabytes), processing and storing it can be challenging.
Then comes the task of analyzing it. To fulfill their missions, MITRE's sponsors are under pressure to find faster, more efficient, and more cost effective methods of performing increasingly more complex analytics on the growing amounts of available data. The fact that the data is often in multiple formsmachine-generated content, video footage, audio streamsonly increases the challenge.
Hughes is helping develop big data analytics tools to assist sponsors in meeting these challenges. "Big data analytics is a set of approaches that MITRE's trying to develop to address a number of sponsors' data problems," he says. Those problems include handling data that arrives at high rates in amounts that exceed available storage space; formatting data that is noisy, corrupted, and in multiple formats; developing quick and efficient analytics; and designing data systems that can evolve as analytic needs change.
A Checklist for Customers' Needs
When Hughes speaks to customers to assess their data needs, he has a checklist of questions first:
- Is the data continuously arriving or is it static? Customers worry about having the agility to analyze the data they have on hand while still being able to assess data that is not yet, but may become, relevant to their problem.
- How much complexity do they need from their analytics? Do they want to find simple correlations between things or complex patterns over time?
- How much historical data do they need? At one point, sponsors may have only been interested in analyzing data for ongoing operations. Now they've learned that interesting patterns can be gleaned with data spans of months or even years.
Once he has assessed their needs, Hughes can direct a wide variety of MITRE resources to meeting them (see "More Big Data"). Big data support MITRE provides to sponsors includes developing analytics for specific missions; providing systems engineering expertise to customers acquiring big databases and analytics systems; researching new developments in databases, data mining, and other relevant areas; and sharing information from industry, academia, and the open source community.
Collaborating on the Cloud
MITRE's culture of collaboration means these resources can come from many different working groups. "There are several different approaches to getting actionable information from big data, and MITRE has experts in all the relevant approaches," says Hughes. "We have an email list, a SharePoint site, training courses through the MITRE Institute [the company's in-house training resource], and even a Handshake site that's open to sponsors. Our senior managers and officers are very supportive of our efforts to find and form links between projects." (Handshake is a MITRE-developed social networking tool.)
For instance, Hughes has been keeping tabs on MITRE's cloud computing research to see what big data applications it might have. "The increase in efficiency and other benefits that some types of cloud computing can bring to big data analytics is very promising."
MITRE's expertise in big data analysis is all the more valuable to customers and sponsors, considering the current dearth of such skills. A 2011 report by the McKinsey Global Institute states, "The United States alone faces a shortage of 140K-190K of people with analytical and managerial expertise, and 1.5 million managers and analysts with the skills to understand and make decisions based on the study of big data."
Data is destined to become, as the New York Times summed up the World Economic Forum's view, "a new class of economic asset." Against that forecast, MITRE's efforts to increase its expertise in big data look to be an equally important asset.
—by Christopher Lockheardt