![]() |
|||||
|
|
|
|
||||
Creating Standards for Multiway Data Sharing Elizabeth Harding, Leo Obrst, and Arnon Rosenthal haring data across multiple systems within an organization, such as the Department of Defense, or among organizations, such as those of the Intelligence Community, is essential to making fast, informed decisions. This is particularly true in times of crisis. But what happens when you're trying to exchange vital information across dissimilar systems, each with its own data format? Often, humans have to painstakingly do the translations. By creating standards, however, systems can talk to each other, machine to machine. In this article, we describe two MITRE-supported efforts to do just that. Our work to help our sponsors create small, tractable data standards could provide other organizations with ideas on how to solve their data-sharing issues. The first project involves creating a new capability for the Air Force: the automated passing of key sets of information across multiple machines. MITRE developed the concept and prototype, known as Cursor on Target (CoT), in 2003 to meet an urgent need. Michael Butler and his team did this by narrowing down the content area to the most critical elements and creating a small standard that can be extended for use across the enterprise. Certain Air Force systems can now exchange key information about the tactical environment (targets, troops, etc.) without the need for human translation. Another MITRE team then documented and modeled the CoT approach so that it could be used by other groups. In the second project, we worked on the Intelligence Community (IC) Metadata Standard for Publications, which is designed to standardize publication metadata (e.g., document structure, author, creation date, security level, topic) across a large community. Standardized metadata enables a consistent retrieval of meaningful and relevant information. Machine-to-Machine Messages The CoT prototype gave the Air Force the ability to rapidly exchange information on strike missions across multiple systems, enhancing the reaction speed and accuracy of these missions. CoT focuses on "what" (is it a target to be hit or a survivor to be rescued?), "where" (guidance coordinates), and "when" (timeline). With CoT, key information is shared machine to machine, whereas in the past it could only be done via voice transmission, human transcription, and manual data reentry. This prototype was enthusiastically received by Air Force leaders, has been tested extensively, and is being used in Iraq today. Butler's team is working with other groups within the military services to apply the CoT approach to their needs. Our MITRE team (part of the Air Force Data Interoperability Group) saw CoT as an opportunity to apply and test our work on standards and metadata expressed in information models. We were looking for ways to reduce system design time through simple ontologies and Community of Interest (COI) processes. We wanted to leverage the work of Butler's team and extend and enhance it through models, which would make it easier for other groups to understand and adopt. Our goals were to demonstrate a scaleable approach to solving data interoperability problems by helping to understand, analyze, and represent the meaning of the data; model the results of the analysis; and be able to build one of many possible physical representations. We call this work the "Cursor on Target- Extender" (CoT-X) initiatives. The first CoT-X (CoT-X1) provides additional representations in a "visualizable" Unified Modeling Language (UML) and a common Extensible Markup Language (XML) representation based on the UML. Modeling puts everything in English so that people can quickly understand the approach and apply it to other projects. UML representations are key to the CoT-X initiatives because they set the stage for future interoperability by using formal models, which is one alternative to using standard data. Ultimately, negotiation between systems will be based on models, and systems will self-describe using models. Before developing our information model, we identified stakeholders and selected information analysts/systems engineers to play the role of "information modelers," people who understand data and how to analyze it and represent it for flexible, efficient processing. To get members of the participating organizations involved in creating CoT-X1, we formed a COI made up of subject matter experts. Our team looked at position information in seven systems (including one message set and the CoT standard). Our information modelers met with 17 experts to discuss, describe, and validate their particular system's position representation. We asked them open-ended questions about the information they needed and shared, and we dug through volumes of data documentation. The information modelers then analyzed the results of these sessions and the document review and described each system representation as a UML model. They looked for the intersections in all the information to determine what information is common and unambiguous. We then constructed a consolidated CoT-X1 model to which each of the systems can map. For the CoT-X1 initiative, we re-used the position data from CoT and added 10 additional elements to elaborate on time, source, and accuracy in the consolidated CoT-X1 model. These analyses, modeling, and consolidation steps took approximately 10 weeks (400 hours) of the information modelers' time. The 40 hours per new element was considered quite acceptable, especially since we kept demand on subject matter experts low. We then manually created an XML schema from the CoT-X UML. For the second initiative (CoT-X2), we worked to reduce the modelers' time by up to 20 percent. We realized little improvement, however, because of the ramp-up of a new analysis team. We expect time improvement in the future, but it's important to note that it takes a lot of analysis to reduce data to its core elements. The interoperability benefits of CoT have extended beyond the immediately
affected programs. The CoT and CoT-X UML models and schemas are both registered
in the DOD Metadata Registry (formerly called the XML Registry) in the
Aerospace Operations namespace, where they are available to anyone in
the Department of Defense. In addition, a number of MITRE-supported programs
are using the CoT-X UML models to articulate requirements for ground moving
target indicators and to communicate with contractors.
Intelligence Community Data Sharing Much of the data consumed and produced in the IC is in the form of text documents—e.g., reports on economic and political conditions and military capabilities. The IC needs improved capabilities to share and reuse documents across the community, as well as to search and to apply security, archiving, and other applications to these documents. The best way for the agencies to achieve these goals is through shared metadata. That is, information providers must attach descriptive tags to appropriate chunks of their documents (e.g., where the information came from, what the content is about, classification level, etc.) and provide tools that will apply and exploit the tags. To do this, organizations must standardize the tags that producers use. Standardizing tags is increasing within the IC, but is still spotty. In general, different standards are used across agencies. Consequently, documents are difficult to share across agencies, and frequently within agencies, even when they are well tagged because the recipient system may use different tags. Often documents are not thoroughly or uniformly tagged, and there is no full set of tools available for tagging or interpreting the tags from one system to another. To solve these problems, the IC is developing standards in the form of the the IC Metadata Standard for Publications (IC-MSP), which is aimed at describing bibliographic information and also generic document components. It's an ongoing effort. Such standards will enable the creation of tools to exploit tags provided by other agencies, encourage more complete tagging, and increase the incentive to create new tools. It will facilitate the interchange and reuse of published IC products and their components. IC-MSP is an implementation of XML intended for document-style intelligence products posted on Intelink (the classified intelligence network shared by the IC) and other domain servers. IC-MSP currently consists of more than 200 XML elements or attribute definitions, as well as additional elements that are common to most publication hierarchies, including the IC Information Security Marking standard's security attributes. (More standards will be added in the future.) This IC document markup effort is a large initiative that currently involves hundreds of elements (basic document properties) for all new IC products and many agencies. The Intelligence Community CIO funds the IC-MSP effort, and agencies have agreed to mandate its use where appropriate. To make this happen, the IC Metadata Working Group (ICMWG) was created in 2000. It includes representatives from major intelligence agencies and partners (e.g., State, Justice, Energy, the military commands and services, and some commercial companies). MITRE representatives sit in on these meetings to observe and advise. MITRE has provided a variety of assistance to the IC, including creating the requirements for a sound metadata architecture and infrastructure, developing cases and scenarios for exploiting metadata, and giving strategic advice on emerging Semantic Web standards. We recently completed a sponsor-requested review of the IC-MSP standard, and we also helped the ICMWG respond to Congressional inquiries on XML and metadata issues. We continue to work on describing more IC-MSP semantics to support wider sharing and exploitation. To enable the IC-MSP, many IC agencies have begun pilots to apply tags or generate them from other formats that were already being captured. Intelink is developing applications that exploit these tags to accomplish information sharing and interoperability across the community and to make search and discovery more effective. Conclusion These two efforts in establishing standards for data sharing have been successful for several reasons. The scope of the subject matter was manageable, and the users were interested in finding a solution. They agreed that the standard did not need to cover all information possible, just the most critical information to be exchanged. Also, both chose widely known languages for expressing their standards: UML (aimed at software developers) and XML (an industry standard for data interchange and document management). Successful standardization efforts have a realistic understanding of both the benefits and the limits of data standards. Because no single standard will describe all systems in a very large enterprise with many autonomous participants, organizations must develop effective two-pronged strategies. First, they should minimize diversity by developing data standards within focused communities of interest, as described in this article. Second, they must develop tools and processes to help system builders mediate across multiple standards and both conforming and non-conforming systems. MITRE is currently developing such strategies for several of our sponsors. |
|
||||
| For more information, please contact Elizabeth Harding, Leo Obrst, Larry Hughes or Arnon Rosenthal using the employee directory. Page last updated: August 5, 2004 | Top of page |
|||||
Solutions That Make a Difference.® |
|
|