About Us Our Work Employment News & Events
MITRE Remote Access for MITRE Staff and Partners Site Map

Home > News & Events > MITRE Publications > The Edge >

Network Representations Support Powerful Data Analysis

Sarah Piekut, Lowell Rosen, and Daniel Venese

eople involved in complicated investigations, from law enforcement to the intelligence community, are often faced with a large pile of information—some of which is critical, some of which is contradictory, redundant, or unreliable. Data obtained in the course of an investigation or seized at a crime scene can span the gamut from handwritten notes to credit card receipts to computer records of network traffic to database records. Such disparate sources make data integration a true challenge, particularly when trying to solve a case quickly.

Some environments give investigators time to normalize and correct data anomalies to create a data warehouse before they analyze the data. Time-critical scenarios involving crime and terrorism do not permit this luxury. People must exploit the data in its native form while capturing imprecise and partial relationships—knowing that not all that information will be trustworthy. Criminals change official records, such as driver's licenses or Visa applications, to conceal their true identities. And trustworthy databases may exhibit differences in representations for data elements such as names and addresses. It's important to retain all reported "facts," even redundant and contradictory ones, since investigators cannot be sure which will prove true, or useful. Nothing can be discarded—as something trivial might be the key to solving the crime.

How do the investigators organize this pile of information to be able to best analyze it and get to the critical clues? One approach for combining multisource information is with a network representation that describes many types of entities and emphasizes relationships between them, such as pairing a name to an address. Such networks provide greater flexibility than conventional databases, facilitate analyses of structural patterns, and reveal associations that otherwise would be difficult to detect. In conventional data analysis systems, the data capture and exploitation strategy is planned in advance. Network representations exploit whatever information is available since the types of entities and their relationships are hard to predict.

MITRE has explored the use of network representations for several sponsors, coming up with new approaches and information management capabilities not yet available through commercial products.

A network repository supports powerful data analysis operations that complement those of a traditional data mart.

A network repository supports powerful data analysis operations that complement those of a traditional data mart.

Building a Network Representation

A network is a "least common denominator" data model for expressing relationship information. This structure is able to capture and portray complex and often obscure relationships, such as suspicious connections between people. Formally, network structures equate to a graph representation, a series of edges and vertices.

In graph data model terminology, entity instances are the vertices (or nodes) and relationships are the edges (or links). This type of representation emphasizes structural patterns, e.g., "this address is related to this name." Attributes can be associated with each edge and vertex. An address might contain its coordinates or an indication of whether it is a commercial or residential address. Name of a person might include a list of aliases. The relationship of name to address might include the dates when the name was associated with the address and a rating as to the confidence level of the relationship.

A name or address might have an attribute indicating it is five hops away from a suspect organization. Network representation can be enhanced with more instances of edges and vertices and with new attributes that enhance the understanding of them.

The implementation of a network representation (such as a network repository) can employ a variety of technologies. Links can be stored in a relational database or as a series of flat files. Attri-butes could be specified using a data dictionary or data language, such as Extensible Markup Language (XML) or Resource Description Framework (RDF).

A common semantic understanding is important for matching field names and attribute values across multiple data sources. For example, the field named "make" in one database may be the same as "vehicle_make" in another.

Attribute values also require a common understanding, e.g., the automobile make attribute "Chevy" is the same as "Chevrolet." Integration of information from multiple data sources requires a degree of preprocessing, normalization, and standardization. Depending on time and resource constraints, this can be accomplished in phases. Raw information might be loaded immediately, while derived and calculated attributes are added over time. Building a persistent network involves linking records or objects, even individual data elements from disparate data sources that were not intended to be used together.

Once constructed, a network repository supports powerful data analysis operations that complement those of a traditional data mart. Analysis includes concepts such as adjacency (how far is one node from another) and graph topology (tree, cycle, etc.). You can also use network clustering to find subgraphs of related nodes and links. Subgraph matching tools find similar patterns of relationships among groups.

We have applied these techniques effectively to government data sets containing tens of millions of records. However, there are still challenges to address in constructing investigative applications and exploiting a network repository for storage of node/link instances. For example, applications must be able to handle a large evolving set of attribute and link types, captured according to different standards and dealing with complex entity relationships. Distance estimates can still be misleading when entities have huge numbers of relationships. Thus, analysis within a domain context is needed to define the criteria to sever the low-value linkages. Finally, the sheer size of the network (often many terabytes or larger) can challenge implementers, who must choose among many options for storing it.

MITRE continues to address these challenges and look for better ways to manage, visualize, and analyze information in a number of work programs and in our own research projects.

Information Interoperability Issue

Summer 2004
Vol. 8, No. 1



Introduction

Arnon Rosenthal and Len Seligman


A Framework for Information Interoperability

Len Seligman and Arnon Rosenthal


How Do We Build Information Systems That Support Network-Centric Warfare?

Scott Renner


Network Representations Support Powerful Data Analysis

Sarah Piekut, Lowell Rosen, and Daniel Venese


The Semantic Web: A Path to Large-Scale Interoperability

Frank Manola, Mary Pulvermacher, and Leo Obrst


Mapping Among Independently Developed Aviation Information Systems Increases Interoperability

Catherine Bolczak, Len Seligman, Nels Broste, Ron Schwarz, and Shawne Lampert


Using Data Warehousing to Integrate Multiple Sources of Data

Victor Pérez-Núñez, Robert Jurgens, Larry Hughes, and Ali Obaidi


Creating Standards for Multiway Data Sharing

Elizabeth Harding, Leo Obrst, and Arnon Rosenthal


Formatted Messaging Modernization Exploits XML Technologies

Robert W. Miller, Mary Ann Malloy, and Ed Masek


pdf icon Download this issue [1.2MB]

 

For more information, please contact Sarah Piekut or Daniel Venese using the employee directory.


Page last updated: July 30, 2004   |   Top of page

Homeland Security Center Center for Enterprise Modernization Command, Control, Communications and Intelligence Center Center for Advanced Aviation System Development

 
 
 

Solutions That Make a Difference.®
Copyright © 1997-2013, The MITRE Corporation. All rights reserved.
MITRE is a registered trademark of The MITRE Corporation.
Material on this site may be copied and distributed with permission only.

IDG's Computerworld Names MITRE a "Best Place to Work in IT" for Eighth Straight Year The Boston Globe Ranks MITRE Number 6 Top Place to Work Fast Company Names MITRE One of the "World's 50 Most Innovative Companies"
 

Privacy Policy | Contact Us