About Us Our Work Employment News & Events
MITRE Remote Access for MITRE Staff and Partners Site Map
Our Work

Follow Us:

Visit MITRE on Facebook
Visit MITRE on Twitter
Visit MITRE on Linkedin
Visit MITRE on YouTube
View MITRE's RSS Feeds
View MITRE's Mobile Apps
Home > Our Work > Technical Papers >

Robust Language Identification in Short, Noisy Texts: Improvements to LIGA

July 2012

John Vogel, Brandeis University
David Tresner-Kirsch, Brandeis University and The MITRE Corporation

ABSTRACT

Language identification (LI) is a crucial preprocessing step for natural language processing tasks and other secondary uses of documents from multilingual sources. Conventional machine learning approaches to LI perform very well on long documents using standard language, but relatively poorly on short documents rife with non-standard orthography. This limitation is an obstacle to secondary uses of social media data from Twitter and other similar sources. We propose several linguistically-motivated modifications to the LIGA algorithm, and evaluate these modifications empirically. Our results show that a modified algorithm achieves 99.8% accuracy disambiguating among six European languages, reducing baseline LIGA's error rate by roughly an order of magnitude.

View/Download Document

Additional Search Keywords

Twitter, microblogs, language identification, natural language processing, machine learning

 

Page last updated: August 30, 2012   |   Top of page

Homeland Security Center Center for Enterprise Modernization Command, Control, Communications and Intelligence Center Center for Advanced Aviation System Development

 
 
 

Solutions That Make a Difference.®
Copyright © 1997-2013, The MITRE Corporation. All rights reserved.
MITRE is a registered trademark of The MITRE Corporation.
Material on this site may be copied and distributed with permission only.

IDG's Computerworld Names MITRE a "Best Place to Work in IT" for Eighth Straight Year The Boston Globe Ranks MITRE Number 6 Top Place to Work Fast Company Names MITRE One of the "World's 50 Most Innovative Companies"
 

Privacy Policy | Contact Us