The Metadata Coverage Index (MCI): A Standardized Metric for Quantifying Database Annotation Richness

February 2012
Topics: Metadata Management
Konstantinos Liolios, Microbial Genomics and Metagenomic Super Program, Department of Energy JointGenome Institute
Lynn Schriml, Department of Epidemiology and Public Health, Institute for Genome Sciences, Universityof Maryland School of Medicine
Dr. Lynette Hirschman, The MITRE Corporation
Ioanna Pagani, Microbial Genomics and Metagenomic Super Program, Department of Energy JointGenome Institute
Bahador Nosrat, Microbial Genomics and Metagenomic Super Program, Department of Energy JointGenome Institute
Philippe Rocca-Serra, Centre for Ecology & Hydrology
Susanna-Assunta Sansone, Centre for Ecology & Hydrology
Chris Taylor, European Molecular Biology Laboratory (EMBL) Outstation, European BioinformaticsInstitute (EBI)
Nikos C. Kyrpides, The MITRE Corporation
Dawn Field, The MITRE Corporation, European Molecular Biology Laboratory (EMBL) Outstation, European BioinformaticsInstitute (EBI)
Download PDF (305.68 KB)

Variability in the extent of the descriptions of data (metadata) held in public repositories forces users to assess the quality of records individually, which rapidly becomes impractical. The automatic scoring of records on the richness of their description enables sorting by quality. Here, we introduce an objective measure for metadata—the 'Metadata Coverage Index' (MCI): the percentage of available fields actually filled in a record or description. MCI scores can be calculated for a whole database, for individual records or for their component parts (variables or subsets of the data). The MCI score can be used to filter, rank or search for records, to assess the metadata quality of an ad hoc collection, or to determine the frequency with which fields in a particular record type are filled. Here we demonstrate the utility of MCI scores using metadata from the Genomes Online Database (GOLD), including records compliant with the 'Minimum Information about a Genome Sequence' standard developed by the Genomic Standards Consortium. Finally, we discuss a number of challenges and the further application of MCI score data to show improvements in annotation quality over time, to inform the work of standards bodies and repository providers on the usability and popularity of the same standards, and to credit the work of curators. Such an index provides a step towards putting metadata capture practices and in the future, standards compliance, into a quantitative and objective framework.

Publications

Interested in MITRE's Work?

MITRE provides affordable, effective solutions that help the government meet its most complex challenges.
Explore Job Openings

Publication Search