Knowledge Base Population: An Orchestrated Path from Content to Insight

By Ransom Winder, Ph.D. , Joseph Jubinski , Samuel Bayer , Nathan Giles , Merwyn Taylor , Jason Duncan

Automatic extraction of unstructured content into a knowledge base can provide important insights. The Knowledge Base Population effort uses a top-down system design to integrate tools following a repeatable approach for creating knowledge across different domains.

Download Resources

Unstructured content, especially text, contains important facts about entities of interest, but given the infeasible prospect of manually reviewing all available content for these facts, a viable solution is to employ automatic extraction to a knowledge base. This process involves recognizing, collecting, and adapting the important content into a machine-readable format, which, in turn, can be queried and explored to answer questions about entities, their properties and relationships, and what inferences can be drawn from them. Applying such techniques introduces certain challenges of organization, consistency, flexibility, and extensibility to new technologies or different domains.

The Knowledge Base Population (KBP) work addresses this through a top-down system design able to integrate established or cutting-edge tools and technologies to support each of its stages of workflow, while aiming at developing a repeatable approach and reusable architectural design that will serve many different domains. This initial work suggests a core workflow supporting the most fundamental activities and lays out potential extensions once the baseline has been engineered.