Toward Conditional Models of Identity Uncertainty with Application to Proper Noun Coreference
September 2003
Andrew McCallum, Department of Computer Science, University of Massachusetts Amherst
Ben Wellner, The MITRE Corporation
ABSTRACT
Coreference analysis, also known as record linkage
or identity uncertainty, is a difficult and important
problem in natural language processing, databases,
citation matching and many other tasks. This paper
introduces several discriminative, conditionalprobability
models for coreference analysis, all
examples of undirected graphical models. Unlike
many historical approaches to coreference, the
models presented here are relational—they do not
assume that pairwise coreference decisions should
be made independently from each other. Unlike
other relational models of coreference that are generative,
the conditional model here can incorporate
a great variety of features of the input without
having to be concerned about their dependencies—
paralleling the advantages of conditional random
fields over hidden Markov models. We present experiments
on proper noun coreference in two text
data sets, showing results in which we reduce error
by nearly 28% or more over traditional thresholded
record-linkage, and by up to 33% over an alternative
coreference technique previously used in natural
language processing.

Additional Search Keywords
coreference, identity uncertainty, conditional-probability models, natural language processing
|