|
Normalization for Automated Metrics: English and Arabic Speech Translation
November 2009
Sherri Condon, The MITRE Corporation
Gregory A. Sanders, National Institute of Standards and Technology
Dan Parvaz, The MITRE Corporation
Alan Rubenstein, The MITRE Corporation
Christy Doran, The MITRE Corporation
John Aberdeen, The MITRE Corporation
Beatrice Oshika, The MITRE Corporation
ABSTRACT
The Defense Advanced Research Projects Agency (DARPA) Spoken Language Communication and Translation System for Tactical Use (TRANSTAC) program has experimented with applying automated metrics to speech translation dialogues. For translations into English, BLEU, TER, and METEOR scores correlate well with human judgments, but scores for translation into Arabic correlate with human judgments less strongly. This paper provides evidence to support the hypothesis that automated measures of Arabic are lower due to variation and inflection in Arabic by demonstrating that normalization operations improve correlation between BLEU scores and Likert-type judgments of semantic adequacy—as well as between BLEU scores and human judgments of the successful transfer of the meaning of individual content words from English to Arabic.

Additional Search Keywords
N/A
|