Definition: Performance engineering is a specialty systems engineering discipline that encompasses the practices, techniques, and activities required during each phase of the Systems Development Life Cycle to ensure that a proposed or existing solution will meet its non-functional requirements. Non-functional requirements specify the criteria used to judge the operation of a system rather than its specific behaviors or functions.
Keywords: capacity planning, design validation, feasibility, instrumentation, load testing, measurement, modeling and simulation, monitoring, requirements validation, response time, scalability, stress testing, throughput
MITRE SE Roles & Expectations: MITRE systems engineers (SEs) are expected to understand the purpose and role of performance engineering in the acquisition process, where it occurs in systems development, and the benefits of employing it. MITRE SEs are also expected to understand and recommend when performance engineering is appropriate to a situation. Some aspects of performance engineering are often associated with specialty engineering disciplines. Others, however, are the purview of mainstream systems and design engineering (e.g., many of the dimensions of usability). MITRE SEs are expected to monitor and evaluate performance engineering technical efforts and the acquisition program's overall performance engineering activities and recommend changes when warranted, including the need to apply specialty engineering expertise.
Performance Engineering Scope
Performance engineering focuses on the ability of systems to meet their non-functional requirements. A non-functional requirement is a requirement that specifies criteria that can be used to judge the operation of a system, rather than specific behaviors. It may address a property the end product must possess, the standards by which it must be created, or the environment in which it must exist. Examples are usability, maintainability, extensibility, scalability, reusability, security and transportability. Performance engineering activities occur in each phase of the Systems Development Life Cycle. It includes defining non-functional requirements; assessing alternative architectures; developing test plans, procedures, and scripts to support load and stress testing; conducting benchmarking and prototyping activities; incorporating performance into software development; monitoring production systems; performing root cause analysis; and supporting capacity planning activities. The performance engineering discipline is grounded in expertise in modeling and simulation, measurement techniques, and statistical methods.
Traditionally, much of performance engineering has been concerned with the performance of hardware and software systems, focusing on measurable items such as throughput, response time, and utilization, as well as some of the "-ilities"— availability, reliability, scalability, and usability. Tying the performance of hardware and software components to the mission or objectives of the enterprise should be the goal when conducting performance engineering activities. This presents performance results to stakeholders in a more meaningful way.
Although performance engineering activities are most often associated with hardware and software elements of a system, its principles and techniques can be applied to other aspects of systems that can be measured in some meaningful way, including, for example, business processes. In the most simplistic sense, a system accepts an input and produces an output. Therefore, performance engineering is applicable to not only systems but networks of systems, enterprises, and other examples of complex systems.
As an example, given the critical nature of air traffic control systems, their ability to meet non-functional requirements, such as response time and availability, is vital to National Airspace System (NAS) operations. Though there are many air traffic control systems within the NAS, the NAS itself is an example of an enterprise comprising people, processes, hardware, and software, among other things. At any given time, the NAS has a finite capacity; however, an opportunity exists to increase that capacity through more efficient processes or new technology. The NAS is an example of a non-IT system to which performance engineering techniques can be applied.
Performance Engineering Across the Systems Engineering Life Cycle
As illustrated in Figure 1, the activities associated with performance engineering span the entire systems life cycle— from Pre-Systems Acquisition through Sustainment. Although performance engineering is recognized as fundamental in manufacturing and production, its activities should begin earlier in the system life cycle when an opportunity exists to influence the concept or design to ensure that performance requirements can be met. Performance engineering techniques can be used to determine the feasibility of a particular solution or to validate the concept or requirements in the Pre-Systems Acquisition stage of the life cycle. Likewise, performance engineering techniques can be used to conduct design validation as well.
Figure 1. Performance Engineering in the System Life Cycle
Performance Engineering Activities
Performance engineering includes various risk reduction activities that ensure that a system can meet its non-functional requirements. Performance engineering techniques can be used to validate various aspects of a planned system (whether new or evolving). For instance, performance engineering is concerned with validating that the non-functional requirements for a particular system are feasible even before a design for that system is in place. In this regard, requirements validation ensures that the non-functional requirements, as written, can be met using a reasonable architecture, design, and existing technology.
Once a design is in place, performance engineering techniques can be used to ensure that the particular design will continue to meet the non-functional requirements prior to actually building that system. Design validation is a form of feasibility study used to determine whether the design is feasible with respect to meeting the non-functional requirements. Likewise, performance engineering activities can be used, as part of a technology assessment, to assess a particular high-risk aspect of a design.
Finally, tradeoff analysis is related to all of the activities mentioned previously in that performance engineering stresses the importance of conducting a what-if analysis—an iterative exploration in which various aspects of an architecture or design are traded off to assess the impact. Performance modeling and simulation as well as other quantitative analysis techniques are often used to conduct design validation as well as tradeoff, or what-if, analyses.
Once a system is deployed, it is important to monitor and measure function and performance to ensure that problems are alleviated or avoided. Monitoring a system means being aware of the system's state in order to respond to potential problems. There are different levels of monitoring. At a minimum, monitoring should reveal whether a particular system component is available for use. Monitoring may also include the collection of various measurements such as the system load and resource utilization over time. Ideally, availability and measurement data collected as part of the monitoring process are archived in order to support performance analysis and to track trends, which can be used to make predictions about the future. If a permanent measuring and monitoring capability is to be built into a system, its impacts on the overall performance must be taken into consideration during the design and implementation of that system. This is characterized as measurement overhead and should be factored into the overall performance measurement of the system.
System instrumentation is concerned with the measurement of a system, under controlled conditions, to determine how that system will respond under those conditions. Load testing is a form of system instrumentation in which an artificial load is injected into the system to determine how the system will respond under that load. Understanding how the system responds under a particular load implies that additional measurements, such as response times and resource utilizations, must be collected during the load test activity as well. If the system is unable to handle the load such that the response times or utilization of resources increases to an unacceptable level or shows an unhealthy upward trend, it may be necessary to identify the system bottleneck. A system bottleneck is a component that limits the throughput of the system and often impacts its scalability. A scalable system is one whose throughput increases proportionally to the capacity of the hardware when hardware is added. Note that elements like load balancing components can affect the proportion by which capacity can be increased. Careful planning is necessary to ensure that analysis of the collected data will reveal meaningful information.
Finally, capacity planning is a performance engineering activity that determines whether a system is capable of handling increased load that is predicted in the future. Capacity planning is related to all the activities mentioned previously— the ability to respond to predicted load and still meet non-functional requirements is a cornerstone of capacity planning. Furthermore, measurements and instrumentation are necessary elements of capacity planning. Likewise, because bottlenecks and non-scalable systems limit the capacity of a system, the activities associated with identifying bottlenecks and scalability are closely related to capacity planning as well.
Best Practices and Lessons Learned
System vs. Mission Performance. The ability to tie the performance of hardware or software or network components to the mission or objectives of the enterprise should be the goal. This allows the results of performance engineering studies to be presented to stakeholders in a more meaningful way. It also serves to focus testing on outcomes that are meaningful. For example, central processing unit utilization by itself is not meaningful unless it is the cause of a mission failure or a significant delay in processing critical real-time information.
Early Life-Cycle Performance Engineering. Too often, systems are designed and built without doing the early performance engineering analysis associated with the Pre-Systems Acquisition stage shown in Figure 1. When performance engineering is bypassed, stakeholders are often disappointed and the system may even be deemed unusable. Although it is common practice to optimize the system after it's built, the cost associated with implementing changes to accommodate poor performance increases with each phase of the system's life cycle, as shown in Figure 2. Performance engineering activities should begin early in the system's life cycle when an opportunity exists to influence the concept or design of the system in a way that ensures performance requirements can be met.
Figure 2. The Cost of Change
Risk Reduction. Performance engineering activities are used to validate that the non-functional requirements for a particular system are feasible even before a design for that system is in place, and especially to assess a particular high-risk aspect of a design in the form of a technology assessment. Without proper analysis, it is difficult to identify and address potential performance problems that may be inherent to a system design before that system is built. Waiting until system integration and test phases to identify and resolve system bottlenecks is too late.
Tradeoff Analysis. Performance engineering stresses the importance of conducting a tradeoff, or what-if, analysis—an iterative analysis in which various aspects of an architecture or design are traded off to assess the impact.
Test-Driven Design. Under Agile Development methodologies, such as test-driven design, performance requirements should be a part of the guiding test set. This ensures that the non-functional requirements are taken into consideration at all phases of the engineering life cycle and not overlooked.
Monitoring, Measurement, and Instrumentation. System instrumentation is a critical performance engineering activity. Careful planning is necessary to ensure that useful metrics are specified, that the right monitoring tools are put in place to collect those metrics, and that analysis of the collected data will reveal meaningful information.
Performance Challenges in Integrated Systems. Projects that involve off-the-shelf components or systems of systems introduce special challenges for performance engineering. Modeling and simulation may be useful in trying to anticipate the problems that arise in such contexts and to support root cause analysis should issues emerge/materialize. System instrumentation and analysis of the resulting measurements may become more complex, especially if various subsystems operate on incompatible platforms. Isolating performance problems and bottlenecks may become more difficult as a problem initiated in one system or subsystem may emerge as a performance issue in a different component. Resolving performance engineering issues may require cooperation among different organizations, including hardware, software, and network vendors.
Predicting Usage Trends. Performance data collected as part of the monitoring process should be archived and analyzed on a regular basis in order to track trends, which can be used to make predictions about the future.
References & Resources
- Computer Measurement Group (CMG). A professional organization of performance professionals and practitioners. The CMG holds a yearly conference and publishes a quarterly newsletter.
- International Council on Systems Engineering (INCOSE). Recognizes performance engineering as part of the system life cycle.
- Association for Computing Machinery (ACM) [SIGSIM and SIGMETRICS]. Contains special interest groups in both simulation and measurement.
- The Society for Modeling and Simulation International (SCS). Expertise in modeling and simulation, which is used extensively in performance engineering activities.
- Information Technology Infrastructure Library (ITIL). Industry standard for IT service management (includes aspects of performance engineering).
- Federal Enterprise Architecture (FEA) Performance Reference Model. The PRM is a "reference model" or standardized framework to measure the performance of major IT investments and their contribution to program performance.
- Capability Maturity Model Integration (CMMI). CMMI is a process improvement approach that provides organizations with the essential elements of effective processes.
- The Standard Performance Evaluation Corporation (SPEC). A standards body for performance benchmarks. SPEC is an umbrella organization encompassing the efforts of the Open Systems Group.
- Transaction Processing Performance Council (TPC). The Transaction Processing Performance Council defines transaction processing and database benchmarks and delivers trusted results to the industry.
- Object Management Group (MARTE Profile). International, not-for-profit, computer industry consortium focused on the development of enterprise integration standards. MARTE is "Modeling and Analysis of Real-time and Embedded Systems."
- Gunther, Neil. Author of several performance engineering books.
- Maddox, Michael. 2005. A Performance Process Maturity Model. MeasureIT, Issue 3.06.
- Menasce, Daniel A. Professor at George Mason University. Author of several performance engineering books.
- Smith, Dr. Connie U., and Williams, Dr. Lloyd G. Creators of the well-known Software Performance Engineering (SPE) process and associated tool. Authors of "Performance Solutions" as well as numerous white papers.
- Jain, Dr. Raj. Professor at Washington University in St. Louis. Author of several performance engineering books and articles.