Method and system for finding similar records in mixed free-text and structured data

Patented

A technique for data mining where the available data contains both structured as well as unstructured (free-text) data. Performing separate analysis on these different sources of data does not fully exploit the available information (e.g. clustering records without regard to narratives can match reports of total electrical failure with human factors problems). The application describes one approach to combining the information available from all of these different types of data together to get a single ¿similarity¿ score. The importance of picking tools appropriate to the types of data in hand is also stressed.

Patent Number: 7,076,485

Date Issued: July 11 2006