MITRE
 
About Us Our Work Employment News & Events
MITRE Remote Access for MITRE Employees Site Map
Home > Our Work > Technical Papers >

Leveraging Machine Readable Dictionaries in Discriminative Sequence Models

June 2006

Ben Wellner, The MITRE Corporation
Marc Vilain, The MITRE Corporation

ABSTRACT

Many natural language processing tasks make use of a lexicon—typically the words collected from some annotated training data along with their associated properties. We demonstrate here the utility of corpora-independent lexicons derived from machine readable dictionaries. Lexical information is encoded in the form of features in a Conditional Random Field tagger providing improved performance in cases where: i) limited training data is made available ii) the data is case-less and iii) the test data genre or domain is different than that of the training data. We show substantial error reductions, especially on unknown words, for the tasks of part-of-speech tagging and shallow parsing, achieving up to 20% error reduction on Penn TreeBank part-of-speech tagging and up to a 15.7% error reduction for shallow parsing using the CoNLL 2000 data. Our results here point towards a simple, but effective methodology for increasing the adaptability of text processing systems by training models with annotated data in one genre augmented with general lexical information or lexical information pertinent to the target genre (or domain).

» Download Paper [PDF, 488KB]

Additional Search Keywords

N/A

 

Page last updated: July 19, 2006   |   Top of page

Homeland Security Center Center for Enterprise Modernization Command, Control, Communications and Intelligence Center Center for Advanced Aviation System Development

 
 
 

Serving as Architects of Information Advantage.™
Copyright © 1997-2008, The MITRE Corporation. All rights reserved.
MITRE is a registered trademark of The MITRE Corporation.
Material on this site may be copied and distributed with permission only.

 

Privacy Policy | Contact Us

Boston Business Journal Best Places to Work 2007 Computerworld Best Places to Work in IT 2005-2008 Fortune 100 Best Places to Work 2002-2008