MITRE’s novel LILAC™ Test Harness designed to test for problematic content in generative chatbot responses

MITRE LILAC Test Harness
Download Resources
The MITRE-developed List of Interventions for Large Language Model-Assisted Chatbots (MITRE LILAC) is a toolkit for identifying and mitigating problematic outputs from chatbots powered by generative artificial intelligence. Initially developed under our independent research and development program, MITRE LILAC defines 24 different types of harmful content based on real open-source incident reports, content which can lead to negative outcomes for the users and deployers of chatbot applications. The LILAC™ codebase can scan and flag chatbot responses for problematic content during operation, and it includes a battery of tests to evaluate chatbot and other language model applications for acquisition or deployment. The toolkit's prompt library can also be integrated with your existing off-the-shelf or cloud-based AI platform.
MITRE LILAC can improve testing and use of generative language applications across multiple domains, by providing scores grounded in real incidents for benchmarking and comparing solutions. It is especially developed with public-facing applications in mind, such as customer service, public health, and government services; but many of the problems it identifies can be just as critical for expert-facing systems. By helping to address risks more thoroughly with inaccurate and wasteful chatbot responses, the toolkit promotes effective and efficient application of generative technologies.
The LILAC technical report is available for additional information about the research methodology: https://www.mitre.org/news-insights/publication/emerging-risks-and-mitigations-public-chatbots-lilac-v1
To learn more about this technology or to inquire about licensing opportunities, contact our Technology Transfer Office at techtransfer@mitre.org.