Overview of BioCreAtIvE task 1B: Normalized Gene
Lists
June 2005
Lynette Hirschman, The MITRE Corporation
Marc Colosimo, The MITRE Corporation
Alexander Morgan, The MITRE Corporation
Alexander Yeh, The MITRE Corporation
ABSTRACT
Our goal in BioCreAtIve has been to assess the state of the art in text mining, with
emphasis on applications that reflect real biological applications. To this end, we have
focused on the curation process for model organism databases. This paper summarizes
the BioCreative task 1B, the "Gene Identifier List" task, which is inspired by the gene
list typically supplied for each curated paper in a model organism database. For the
assessment, systems were given a set of abstracts from each of three model organism
databases (Yeast, Fly, and Mouse), along with synonym lists for these organisms that
define the correspondence between unique gene identifiers and the mentions of these
genes and gene products in the curated literature. The systems were evaluated on
their ability to produce the correct list of unique gene identifiers for the genes and
gene products mentioned in the abstracts for each organism. For the evaluation, we
prepared a training data set of 5000 abstracts per organism with (noisy) gene lists
derived automatically from the gene lists for the full text articles; a development test
data of 100-200 abstracts per organism with hand-corrected gene lists; and a blind test
set of 250 abstracts per organism with carefully annotated gene lists.

Additional Search Keywords
N/A
|