Active Learning with a Human In The Loop
April 2013
Seamus Clancy, The MITRE Corporation
Sam Bayer, The MITRE Corporation
Robyn Kozierok, The MITRE Corporation
ABSTRACT
Text annotation is an expensive pre-requisite for applying data-driven natural language processing
techniques to new datasets. Tools that can reliably reduce the time and money required to construct
an annotated corpus would be of immediate value to MITRE's sponsors. To this end, we have
explored the possibility of using active learning strategies to aid human annotators in performing
a basic named entity annotation task. Our experiments consider example-based active learning
algorithms that are widely believed to reduce the number of examples and therefore reduce cost,
but instead show that once the true costs of human annotation is taken into consideration the savings
from using active learning vanishes. Our experiments with human annotators confirm that human
annotation times vary greatly and are dicult to predict, a fact that has received relatively little
attention in the academic literature on active learning for natural language processing. While our
study was far from exhaustive, we found that the literature supporting active learning typically
focuses on reducing the number of examples to be annotated while ignoring the costs of manual
annotation. To date there is no published work suggesting that active learning actually reduces
annotation time or cost for the sequence labeling annotation task we consider. For these reasons,
combined with the non-trivial costs and constraints imposed by active learning, we have decided
to exclude active learning support from our annotation tool suite, and we are unable to recommend
active learning in the form we detail in this technical report to our sponsors as a strategy for
reducing costs for natural language annotation tasks.

Additional Search Keywords
Active Learning, Machine Learning, Annotation, Natural Language Processing
|