Assess Test and Evaluation Plans and Procedures

Definition: Test and evaluation is the set of practices and processes used to determine if the product under examination meets the design, if the design correctly reflects the functional requirements, and if the product performance satisfies the usability needs of personnel in the field.

Keywords: acceptance test, integration test, operational test, peer reviews, system test

MITRE SE Roles & Expectations: MITRE systems engineers (SEs) are expected to be familiar with different kinds of tests and know which group conducts the tests, how to evaluate test documents, and what the developer's procedures are for test control. MITRE SEs are also expected to analyze test data.


Testing is the way a product, system, or capability under development is evaluated for correctness and robustness and is proved to meet the stated requirements. Testing is done at each stage of development and has characteristics unique to the level of test being performed. At a macro level, testing can be divided into developer testing conducted before the system undergoes configuration management and testing conducted after the system undergoes configuration management. Testing done before configuration management includes peer reviews (sometimes called human testing) and unit tests. Testing done after configuration management includes integration test, system test, acceptance test, and operational test. Government testing agencies normally conduct an operational test. The developer conducts other tests; in some cases, such as an acceptance test, government observers are present.

Assessing Test and Evaluation Plans and Procedures

Assessment normally begins with the Test and Evaluation Master Plan (TEMP), which is the driver for much of what follows. The government develops the TEMP; the developer creates detailed test plans and procedures. The nature of the program, the life cycle, the user needs, and the user mission drive the TEMP's scope, direction, and content. For example, testing software developed for the program is quite different from testing systems that are largely based on, and require considerable integration of, commercial off-the-shelf (COTS) products. The TEMP influences the testing documents that the developer produces, but the developer's documents are largely driven by what it produces and is working to deliver.

The government program management office (PMO) is tasked with assessing the developer's test and evaluation plans and procedures. Often MITRE plays a central role in helping the PMO perform this assessment. The requirements on which the developer's test plans and procedures are based must be well crafted. A valid requirement is one that is measurable and testable. If it is not measurable and testable, it is a poor requirement. Developer test plans and procedures should be based on the functional requirements, not on the software design. Both the test community within the developer organization and the development community should base their products on the functional requirements.

Assessing the developer's test plans and procedures should focus on the purpose of the test—that is, to assess the correctness and robustness of the product, system, or service. The tests should first prove that the product can do what it is intended to do and second that it can withstand anomalous conditions that may arise. This second point requires particular care because there are huge differences in how robustness is validated in a COTS-based system versus software developed for a real-time embedded system. The environment in many COTS-based business systems can be tightly bound. A name or address field can be limited in terms of acceptable characters and field length. In a real-time embedded system, you know what the software expects to receive if all is going as it should, but you do not always know what possible input data might actually arrive, which can vary in terms of data type, data rate, and so on. Denial-of-service attacks often try to overwhelm a system with data, and the developer's skill in building into the system a robustness that allows it to handle data it is not intended to process has a great deal to do with the eventual reliability and availability of the delivered product. It is not unusual for the error protection logic in complex government systems to be as large as, or larger than, the operational software.

Assessment of the test plans and procedures must take all of these issues into account. The assessor must understand the nature and purpose of the system and the kind of software involved, and the assessor must have the experience to examine the test plans and procedures to ensure they do an appropriate job of verifying that the software functions as intended. The assessor must also verify that, when faced with anomalous data conditions, the software will respond and deal with the situation without crashing. The test conditions in the test plans and procedures should present a wide variety of data conditions and record the responses.

For software systems, especially real-time systems, it is impossible to test all possible paths through the software, but it should be possible to test all independent paths to ensure that the tests exercise all segments of the software. There are software tools to facilitate this, such as the McCabe suite that will identify paths as well as the test conditions needed to put into a test case. However it is accomplished, this level of rigor is necessary to ensure the requisite reliability has been built into the software.

Unlike the unit test, the integration test plans and procedures focus on the interfaces between program elements. These tests must verify that the data being passed between program elements will allow the elements to function as intended, while also ensuring that anomalous data conditions are dealt with at their entry point and not passed to other programs within the system. The assessor must pay particular attention to this when assessing the integration test plans and procedures. These tests must be driven by the functional requirements because those drive what the software must do in order for the sponsor to accept the system.

Test and Evaluation Phases

Pre-Configuration Management Testing

The two primary test practices conducted prior to configuration management are:

  • Peer Reviews: Peer reviews are performed to find as many errors as possible in the software before the product enters the integration test. Peer reviews are one of the key performance activities at Level 3 of the Software Engineering Institute's (SEI) Capability Maturity Model. The SEI accepts two kinds of peer reviews:
    • Software inspections (SEI preference)—have a well-defined process understood throughout the industry. Done properly, they can remove as much as 87 percent of the life-cycle errors in software. (They are sometimes called Fagan Inspections after their developer, Mike Fagan.)
    • Code walkthroughs—have no standard process. They can have widely differing levels of rigor and effectiveness and at best will remove about 60 percent of the errors in software.
  • Unit Test: The developer conducts the unit test, typically on the individual modules under development. The unit test often requires the use of drivers and stubs because other modules, which are the source of input data or receive the output of the module being tested, are not ready for test.

Post-Configuration Management Testing

Testing conducted after the product is placed under developer configuration control includes all testing beyond unit test. Once the system is under configuration management, a problem discovered during testing is recorded as a trouble report. This testing phase becomes progressively more expensive because it involves integrating more and more modules and functional units as they become available; the system therefore becomes increasingly more complex. Each test requires a documented test plan and procedure, and each problem encountered is recorded on a trouble report. Each proposed fix must be validated against the test procedure during which it was discovered and must also verify that the code inserted to correct the problem does not cause another problem elsewhere. With each change made to respond to a problem, the associated documentation must be upgraded, the fix must be documented as part of the configuration management process, and the fix must be included in the next system build so that testing is not conducted with patches. The longer it takes to find a problem, the more rework is likely, and the more impact the fix may have on other system modules; therefore, the expense can continue to increase. Thus performing good peer reviews and unit tests is very important.

  • Integration Test: This is a developer test that is successively more complex. It begins by integrating the component parts, which are either the modules that have completed the unit test or COTS products, to form functional elements. The integration test progresses from integration of modules to form entire functional elements, to integration between functional elements, to software-hardware integration testing. Modeling and simulation are often used to provide an operational-like testing environment. An integration test is driven by an integration test plan and a set of integration test procedures. Typically an integration test will have embedded within it a subset of tests identified as regression tests, which are conducted following a system build. Their objective is to verify that the build process did not create a serious problem that would prevent the system from being properly tested. Often regression tests can be automated.
  • Test Data Analysis: When conducting peer reviews, unit tests, integration testing, and system tests, a significant amount of data is collected and metric analysis is conducted to show the condition state of the system. Significant metric data is produced related to defect density, pass-fail data on test procedures, error trend analysis, etc. MITRE SEs should be familiar with test metrics and evaluate the test results to determine the likelihood that the system can meet the requirements of performance delivered on time and within budget.
  • System Test: This is an operational-like test of the entire system being developed. Following a successful system test, a determination is made as to whether the system is ready for acceptance test. After the completed system test and before the acceptance test, a test readiness review (TRR) may be conducted to assess the readiness of the system to enter the acceptance test.
  • Acceptance Test: Witnessed by the government, this is the last test before the government formally accepts the system. Similar to the system test, the acceptance test is often a subset of the procedures run during system test.
  • Operational Test: Performed by an operational unit of the government, this is the final test before the system is declared ready for general distribution to the field.

Best Practices and Lessons Learned

Examine the reports on the pre-configuration management tests. Examine these reports to evaluate the error density information and determine the expected failure rates that should be encountered during subsequent test periods.

Review the peer review and unit test results prior to the start of integration testing. Due to the expense and time needed to correct problems discovered in the post-configuration management tests, SEs should understand how thorough the prior tests were and whether there is a hint of any issues that need to be addressed before the integration test starts. If peer reviews and unit tests are done properly, the error density trend data during the integration test should show an error density of 0.2 to 1.2 defects per 1,000 source lines of code.

Consider modeling and simulation options. These could support or substitute for some aspects of integration that are either of lower risk or extremely expensive or complex to perform with the actual system.

Complete a thorough independent review of the test results to date prior to supporting the TRR. This is especially true for performance or design areas deemed to be of the greatest risk during the design phase. After the TRR is passed and the program enters acceptance testing, correcting problems is extremely expensive and time consuming.

Involve the government Responsible Test Organization (RTO) early. Involve the RTO (during the concept development phase is not too early) so they understand the programmatic and technical issues on the program. Including the RTO as part of the team with the acquisition and engineering organizations will lessen conflicts between the acquisition organization and RTO due to lack of communication and misunderstanding of objectives.

References & Resources

Department of Defense, December 2012, Test & Evaluation Management Guide, Ch. 12, TEMP.

Department of Justice, January 2003, Systems Development Life Cycle Guidance Document, Appendix C-15, Test and Evaluation Master Plan.

Federal Aviation Administration, April 2014, Test and Evaluation Process Guidelines.


Download the SEG

MITRE's Systems Engineering Guide

Download for EPUB
Download for Amazon Kindle
Download a PDF

Contact the SEG Team