Rutabaga By Any Other Name: Extracting Biological Names
2003 Award Winner
Lynette Hirschman, The MITRE Corporation
Alexander A. Morgan, The MITRE Corporation
Alexander S. Yeh, The MITRE Corporation
ABSTRACT
As the pace of biological research accelerates, biologists are becoming
increasingly reliant on computers to manage the information explosion.
Biologists communicate their research findings by relying on precise
biological terms; these terms then provide indices into the literature
and across the growing number of biological databases. This article
examines emerging techniques to access biological resources through
extraction of entity names and relations among them. Information extraction
has been an active area of research in natural language processing and
there are promising results for information extraction applied to news
stories, e.g., balanced precision and recall in the 93-95% range for
identifying person, organization and location names. But these results
do not seem to transfer directly to biological names, where results
remain in the 75-80% range. Multiple factors may be involved, including
absence of shared training and test sets for rigorous measures of progress,
lack of annotated training data specific to biological tasks, pervasive
ambiguity of terms, frequent introduction of new terms, and a mismatch
between evaluation tasks as defined for news and real biological problems.
We present evidence from a simple lexical matching exercise that illustrates
some specific problems encountered when identifying biological names.
We conclude by outlining a research agenda to raise performance of named
entity tagging to a level where it can be used to perform tasks of biological
importance. © 2003 Elsevier Science (USA). All rights reserved.

Publication
Reprinted from Journal
of Biomedical Informatics, Vol. 35, Lynette Hirschman, Alexander
A. Morgan, and Alexander S. Yeh, Rutabaga By Any Other Name: Extracting
Biological Names, pp. 247-259, Copyright 2003, with permission
from Elsevier.
Additional Search Keywords
n/a
|