A technique for data mining where the available data contains both structured as well as unstructured (free-text) data. Performing separate analysis on these different sources of data does not fully exploit the available information (e.g. clustering records without regard to narratives can match reports of total electrical failure with human factors problems). The application describes one approach to combining the information available from all of these different types of data together to get a single ¿similarity¿ score. The importance of picking tools appropriate to the types of data in hand is also stressed.
Patent Number: 7,076,485
Date Issued: July 11 2006