Testing the Accuracy of Automated Classification Systems Using Only Expert Ratings that are Less Accurate than the SystemFebruary 2014
Topics: Probability and Statistics, Modeling and Simulation, Performance of Systems
A method is presented to estimate the accuracy of automated classification systems using only expert ratings that may be substantially less accurate than the systems being evaluated. The estimation method begins with multiple expert ratings on test cases, uses the level of inter-rater agreement to estimate rater accuracy, uses Bayesian updating based on estimated rater accuracy to estimate a "ground truth" probability for each classification, and then estimates system accuracy by comparing the relative frequency that the system agrees with the most probable classification at different probability levels. A simulation analysis provides evidence that the method is robust and yields reasonable estimates of system accuracy under diverse and predictable conditions.