Table Classification: An Application of Machine Learning to Web-hosted Financial Documents
April 2006
Marc Vilain, The MITRE Corporation
John Gibson, The MITRE Corporation
Benjamin Wellner, The MITRE Corporation
Rob Quimby, The MITRE Corporation
ABSTRACT
This paper presents learning-based techniques that support the processing
of tables in HTML publications. We are concerned especially with classifying
tables as to format and content, focusing on the domain of corporate
financials. We present performance results based on multiple classification
methods, and make several novel methodological contribu-tions. These
include a new evaluation corpus, a clever tech-nique for creating the
corpus, and an exhaustive approach to-wards sensitivity analysis for
classification features.

Additional Search Keywords
N/A
|