About Us Our Work Employment News & Events
MITRE Remote Access for MITRE Staff and Partners Site Map
Our Work

Follow Us:

Visit MITRE on Facebook
Visit MITRE on Twitter
Visit MITRE on Linkedin
Visit MITRE on YouTube
View MITRE's RSS Feeds
View MITRE's Mobile Apps
Home > Our Work > Technical Papers >

The Metadata Coverage Index (MCI): A Standardized Metric for Quantifying Database Annotation Richness

February 2012

Konstantinos Liolios, Microbial Genomics and Metagenomic Super Program, Department of Energy Joint Genome Institute
Lynn Schriml, Department of Epidemiology and Public Health, Institute for Genome Sciences, University of Maryland School of Medicine
Lynette Hirschman, The MITRE Corporation
Ioanna Pagani, Microbial Genomics and Metagenomic Super Program, Department of Energy Joint Genome Institute
Bahador Nosrat, Microbial Genomics and Metagenomic Super Program, Department of Energy Joint Genome Institute
Philippe Rocca-Serra, Centre for Ecology & Hydrology
Susanna-Assunta Sansone, Centre for Ecology & Hydrology
Chris Taylor, European Molecular Biology Laboratory (EMBL) Outstation, European Bioinformatics Institute (EBI)
Nikos C. Kyrpides, The MITRE Corporation
Dawn Field, The MITRE Corporation, European Molecular Biology Laboratory (EMBL) Outstation, European Bioinformatics Institute (EBI)

ABSTRACT

Variability in the extent of the descriptions of data (metadata) held in public repositories forces users to assess the quality of records individually, which rapidly becomes impractical. The automatic scoring of records on the richness of their description enables sorting by quality. Here, we introduce an objective measure for metadata—the 'Metadata Coverage Index' (MCI): the percentage of available fields actually filled in a record or description. MCI scores can be calculated for a whole database, for individual records or for their component parts (variables or subsets of the data). The MCI score can be used to filter, rank or search for records, to assess the metadata quality of an ad hoc collection, or to determine the frequency with which fields in a particular record type are filled. Here we demonstrate the utility of MCI scores using metadata from the Genomes Online Database (GOLD), including records compliant with the 'Minimum Information about a Genome Sequence' standard developed by the Genomic Standards Consortium. Finally, we discuss a number of challenges and the further application of MCI score data to show improvements in annotation quality over time, to inform the work of standards bodies and repository providers on the usability and popularity of the same standards, and to credit the work of curators. Such an index provides a step towards putting metadata capture practices and in the future, standards compliance, into a quantitative and objective framework.

View/Download Document

Additional Search Keywords

Metadata Coverage Index, MCI, MCI scores, standardized metrics, Genomes Online Database, GOLD, annotation quality, bioscience data

 

Page last updated: March 13, 2012   |   Top of page

Homeland Security Center Center for Enterprise Modernization Command, Control, Communications and Intelligence Center Center for Advanced Aviation System Development

 
 
 

Solutions That Make a Difference.®
Copyright © 1997-2013, The MITRE Corporation. All rights reserved.
MITRE is a registered trademark of The MITRE Corporation.
Material on this site may be copied and distributed with permission only.

IDG's Computerworld Names MITRE a "Best Place to Work in IT" for Eighth Straight Year The Boston Globe Ranks MITRE Number 6 Top Place to Work Fast Company Names MITRE One of the "World's 50 Most Innovative Companies"
 

Privacy Policy | Contact Us