Method and system for finding similar records in mixed free-text and structured data

A technique for data mining where the available data contains both structured as well as unstructured (free-text) data. Performing separate analysis on these different sources of data does not fully exploit the available information (e.g. clustering records without regard to narratives can match reports of total electrical failure with human factors problems). The application describes one approach to combining the information available from all of these different types of data together to get a single ¿similarity¿ score. The importance of picking tools appropriate to the types of data in hand is also stressed.

Patent #: 7076485 Issue Date: July 11, 2006