Testing the Accuracy of Automated Classification Systems Using Only Expert Ratings that are Less Accurate than the System

By Dr. Paul Lehner

Information technology is advancing to develop systems that address problems of increasing sophistication and complexity. This paper presents a different approach to using expert ratings to estimate the accuracy of complex systems.

Download Resources


PDF Accessibility

One or more of the PDF files on this page fall under E202.2 Legacy Exceptions and may not be completely accessible. You may request an accessible version of a PDF using the form on the Contact Us page.

‚ÄčA method is presented to estimate the accuracy of automated classification systems using only expert ratings that may be substantially less accurate than the systems being evaluated. The estimation method begins with multiple expert ratings on test cases, uses the level of inter-rater agreement to estimate rater accuracy, uses Bayesian updating based on estimated rater accuracy to estimate a "ground truth" probability for each classification, and then estimates system accuracy by comparing the relative frequency that the system agrees with the most probable classification at different probability levels. A simulation analysis provides evidence that the method is robust and yields reasonable estimates of system accuracy under diverse and predictable conditions.