|
The Metadata Coverage Index (MCI): A Standardized Metric for Quantifying Database Annotation Richness
February 2012
Konstantinos Liolios, Microbial Genomics and Metagenomic Super Program, Department of Energy Joint
Genome Institute
Lynn Schriml, Department of Epidemiology and Public Health, Institute for Genome Sciences, University
of Maryland School of Medicine
Lynette Hirschman, The MITRE Corporation
Ioanna Pagani, Microbial Genomics and Metagenomic Super Program, Department of Energy Joint
Genome Institute
Bahador Nosrat, Microbial Genomics and Metagenomic Super Program, Department of Energy Joint
Genome Institute
Philippe Rocca-Serra, Centre for Ecology & Hydrology
Susanna-Assunta Sansone, Centre for Ecology & Hydrology
Chris Taylor, European Molecular Biology Laboratory (EMBL) Outstation, European Bioinformatics
Institute (EBI)
Nikos C. Kyrpides, The MITRE Corporation
Dawn Field, The MITRE Corporation, European Molecular Biology Laboratory (EMBL) Outstation, European Bioinformatics
Institute (EBI)
ABSTRACT
Variability in the extent of the descriptions of data (metadata) held in public repositories forces users to assess the quality of records individually, which rapidly becomes impractical. The automatic scoring of records on the richness of their description enables sorting by quality. Here, we introduce an objective measure for metadata—the 'Metadata Coverage
Index' (MCI): the percentage of available fields actually filled in a record or description. MCI scores can be calculated for a whole database, for individual records or for their component parts (variables or subsets of the data). The MCI score can be used to filter, rank or search for records, to assess the metadata quality of an ad hoc collection, or to determine the frequency with which fields in a particular record type are filled. Here we demonstrate the utility of MCI scores using metadata from the Genomes Online Database (GOLD), including records compliant with the 'Minimum Information about a Genome Sequence' standard developed by the Genomic Standards Consortium. Finally, we discuss a number of challenges and the further application of MCI score data to show improvements in annotation quality over time, to inform the work of standards bodies and repository providers on the usability and popularity of the same standards, and to credit the work of curators. Such an index provides a step towards putting metadata capture practices and in the future, standards compliance, into a quantitative and objective framework.

Additional Search Keywords
Metadata Coverage Index, MCI, MCI scores, standardized metrics, Genomes Online Database, GOLD, annotation quality, bioscience data
|