IT Modernization Pipeline and Roundtrip Code Conversion

Copyrighted

MITRE’s systematic process and methodology establishes the stages and quality assessment phases of legacy software modernization.

MITRE’s innovative software modernization pipeline provides a disciplined approach to achieve faster and less costly modernization of legacy IT systems. The pipeline determines the required phases of modernization and identifies various aspects of the modernization process that may be automated. It then employs LLMs to these phases to increase the quality and efficiency of legacy IT modernization.

The IT modernization (ITMOD) pipeline begins with onboarding of legacy software into a secure environment equipped with private versions of current LLMs. Once the legacy software is onboarded, custom, MITRE-developed tools then leverage commercially available LLMs in accordance with mission owner security requirements to begin the modernization process.

Within the pipeline itself, MITRE’s IT Modernization independent research investments produced two novel LLM-based self-evaluation methods. The first is an approach to use multiple choice quizzing and answering to allow LLMs to identify “hotspots” in code that lead to translation errors. The second is a process to use LLMs to self-identify and categorize translation errors throughout the modernization process. These LLM self-evaluation techniques accelerate identification of common translation errors and show promise for finding uncommon, tricky translation errors. 

The ITMOD pipeline contains additional evaluation methods that provide quality measurements for intermediate representations (IRs) which are non-code LLM-generated outputs. The pipeline generates documentation from both legacy and converted code, including code comments, UML diagrams, requirements documents, pseudocode, and summary descriptions. These IRs are used to improve LLM code translation, providing a better starting point for human-driven modernization and accelerating translation from legacy languages.

Within the pipeline, there are various MITRE-developed capabilities and considerations that are stand-alone components outside of the methodology that include:

  1. Checklist of required controls to make the software modernization environment suitable for hosting sensitive codebases and the required deployment architecture describing the environment.
    1. Allows for the hosting of sensitive data while adopting approved security and controls with continuous monitoring for real time protection
  2. MITRE-developed algorithm that enables a more structured preparation of the code using the Tree-sitter library to build a concrete syntax tree for a source file.
    1. The algorithm determines functional blocks of code and chunks them into logical pieces that fit within the context window, thereby improving the comprehension of code during prompting.
  3. Innovative methodology that evaluates the LLM-generated code comments from legacy code using a MadLibs-style approach that parses and removes the original comments from the legacy codebase, and replaces with uniquely identified placeholder tags to create LLM-generated comments to replace or augment human-created comments. This method allows software owners to automatically update or augment code comment documentation.
  4. A software package employing standard metrics that have been researched and tested to be directly applicable to the task of legacy IT modernization. The implementation of these metrics is a novel contribution and allows for programmatic evaluation of an LLMs ability to understand legacy code concepts in these languages.
  5. A report card that combines and displays metrics on LLM performance to inform engineer and mission stakeholder decisions on quality and acceptance of LLM outputs.
  6. Highly-automated LLM self-evaluation techniques that identify and categorize common LLM errors in converted, modernized code as well as predict “hotspots” that will be difficult for LLMs to translate.
  7. A process and playbook for using IRs, effective generation of modern code from legacy code, and iterative refinement of the code generation process based on expert inputs and common error modalities.

The ITMOD pipeline has broad applicability across industry and government sectors with aging software systems that support critical business operations with embedded complex policies and rules. It supports entities looking for an end-to-end modernization pipeline. In conjunction with the use of MITRE's comprehensive legacy system expertise and software engineering best practices, the pipeline improves the assurance, quality, and efficiency of critical legacy IT modernization using LLMs.

For more information on MITRE's IT system modernization technology, or to inquire about licensing opportunities, contact our Technology Transfer Office at techtransfer@mitre.org.