Definition: Performance engineering is a specialty systems engineering discipline that applies scientific, mathematical, engineering, and measurement concepts, principles, and methods to deliver a system that meets its nonfunctional performance-related requirements.
Keywords: capacity planning, design validation, feasibility, instrumentation, load testing, measurement, modeling and simulation, monitoring, requirements validation, response time, scalability, stress testing, throughput
MITRE SE Roles and Expectations: MITRE systems engineers (SEs) are expected to understand the purpose and role of performance engineering in the acquisition process, where it occurs in systems development, and the benefits of employing it. MITRE SEs are also expected to understand and recommend when performance engineering is appropriate to a situation. Some aspects of performance engineering are often associated with specialty engineering disciplines. Others, however, are the purview of mainstream systems and design engineering (e.g., many of the dimensions of usability). MITRE SEs are expected to monitor and evaluate performance engineering technical efforts and the acquisition program's overall performance engineering activities and recommend changes when warranted, including the need to apply specialty engineering expertise.
Performance Engineering Scope
Performance engineering focuses on the ability of systems to meet their nonfunctional requirements. A nonfunctional requirement is a requirement that specifies criteria that can be used to judge the operation of a system, rather than specific behaviors. It may address a property the end product must possess, the standards by which it must be created, or the environment in which it must exist. Examples are usability, maintainability, extensibility, scalability, reusability, security, and transportability. Performance engineering activities occur in each phase of the systems development life cycle. It includes defining nonfunctional requirements; assessing alternative architectures; developing test plans, procedures, and scripts to support load and stress testing; conducting benchmarking and prototyping activities; incorporating performance into software development; monitoring production systems; performing root cause analyses; and supporting capacity planning activities. The performance engineering discipline is grounded in expertise in modeling and simulation, measurement techniques, testing, and statistical methods.
Traditionally, much of performance engineering has been concerned with the performance of hardware and software systems, focusing on measurable items such as throughput, response time, and utilization, as well as some of the "-ilities"—availability, reliability, scalability, and usability. The goal of performance engineering activities should be to tie the performance of hardware and software components to the mission or objectives of the enterprise. This presents performance results to stakeholders in a more meaningful way.
Although performance engineering is most often associated with hardware and software elements of a system, its principles and techniques can be applied to other aspects of systems that can be measured in some meaningful way, including, for example, business processes. In the most simplistic sense, a system accepts an input and produces an output. Therefore, performance engineering is applicable not only to systems but also to networks of systems, enterprises, and other complex systems.
As an example, given the critical nature of air traffic control systems, their ability to meet nonfunctional requirements, such as response time and availability, is vital to National Airspace System (NAS) operations. Though the NAS has many air traffic control systems, the NAS itself is an example of an enterprise comprising people, processes, hardware, and software, among other things. At any time, the NAS has a finite capacity; however, more efficient processes or a new technology could increase that capacity. The NAS is an example of a non-IT system to which performance engineering techniques can be applied.
Performance Engineering Across the Systems Engineering Life Cycle
As illustrated in Figure 1, the activities associated with performance engineering span the entire systems life cycle—from Pre-Systems Acquisition through Sustainment. Although performance engineering is recognized as fundamental in manufacturing and production, its activities should begin earlier in the system life cycle when an opportunity exists to influence the concept or design to ensure that performance requirements can be met. Performance engineering techniques can be used to determine the feasibility of a particular solution or to validate the concept or requirements in the Pre-Systems Acquisition stage of the life cycle. Likewise, performance engineering techniques can be used to conduct design validation as well.
Performance Engineering Activities
Performance engineering includes various risk reduction activities that ensure that a system can meet its nonfunctional requirements. Performance engineering techniques can be used to validate various aspects of a planned system (whether new or evolving). For instance, performance engineering is concerned with validating that the nonfunctional performance-related requirements for a particular system are feasible even before a design for that system is in place. Requirements validation ensures that the nonfunctional requirements, as written, can be met using a reasonable architecture, design, and existing technology.
Once a design is in place, performance engineering techniques can be used to ensure that the particular design will continue to meet the nonfunctional performance-related requirements prior to actually building that system. Design validation is a form of feasibility study used to determine whether the design is feasible with respect to meeting the nonfunctional requirements. Likewise, performance engineering activities can be used, as part of a technology assessment, to assess a particular high-risk aspect of a design.
Finally, trade-off analysis is related to all the previously mentioned activities in that performance engineering stresses the importance of conducting a what-if analysis—an iterative exploration in which various aspects of an architecture or design are traded off to assess the impact. Performance modeling and simulation, performance testing, and various quantitative analysis techniques are often used to conduct design validation as well as trade-off, or what-if, analyses.
Once a system is deployed, it is important to monitor and measure function and performance to ensure that problems are alleviated or avoided. Monitoring a system means being aware of the system's state in order to respond to potential problems. There are different levels of monitoring. At a minimum, monitoring should reveal whether a particular system component is available for use. Monitoring may also include the collection of various measurements such as the response time, system load, and resource utilization over time. Ideally, availability and measurement data collected as part of the monitoring process are archived in order to support performance analyses and to track trends, which can be used to make predictions about the future. If a permanent measuring and monitoring capability is to be built into a system, its impacts on the overall performance must be taken into consideration during the design and implementation of that system. This is characterized as measurement overhead and should be factored into the overall performance measurement of the system.
System instrumentation is concerned with the measurement of a system, under controlled conditions, to determine how that system will respond under those conditions. System instrumentation supports load testing in which an artificial load is injected into the system to determine how the system will respond under that load. Understanding how the system responds under a particular load implies that additional measurements, such as response times and resource utilizations, must be collected during the load test activity as well. If the system is unable to handle the load such that the response times or utilization of resources increases to an unacceptable level or shows an unhealthy upward trend, it may be necessary to identify the system bottleneck. A system bottleneck is a component that limits the throughput of the system and often impacts its scalability. A scalable system is one whose throughput increases proportionally to the capacity of the hardware when hardware is added. Note that elements like load-balancing components can affect the proportion by which capacity can be increased. Careful planning is necessary to ensure that analysis of the collected data will reveal meaningful information.
Finally, capacity planning is a performance engineering activity that determines whether a system is capable of handling increased load that is predicted in the future. Capacity planning is related to all the activities mentioned previously—the ability to respond to predicted load and still meet nonfunctional requirements is a cornerstone of capacity planning. Furthermore, instrumentation, measurements, and testing are necessary elements of capacity planning. Likewise, because bottlenecks and nonscalable systems limit the capacity of a system, the activities associated with identifying bottlenecks and scalability are closely related to capacity planning as well.
Best Practices and Lessons Learned
System vs. mission performance. Tying the performance of hardware or software or network components to the mission or objectives of the enterprise should be the goal. This makes the results of performance engineering studies more meaningful to stakeholders and focuses testing on meaningful outcomes. For example, central processing unit usage by itself is not meaningful unless it is the cause of a mission failure or a significant delay in processing critical real-time information.
Early life-cycle performance engineering. Too often, systems are designed and built without doing the early performance engineering analysis associated with the Pre-Systems Acquisition stage shown in Figure 1. When performance engineering is bypassed, stakeholders are often disappointed and the system may even be deemed unusable. Although it is common practice to optimize the system after it's built, the cost associated with implementing changes to accommodate poor performance increases with each phase of the system's life cycle, as shown in Figure 2. Performance engineering activities should begin early in the system's life cycle when an opportunity exists to influence the concept or design of the system in a way that ensures performance requirements can be met.
Risk reduction. Performance engineering activities are used to validate that the nonfunctional requirements for a particular system are feasible even before a design for that system is in place, and especially to assess a particular high-risk aspect of a design in the form of a technology assessment. Without proper analysis, it is difficult to identify and address potential performance problems that may be inherent to a system design before that system is built. Waiting until system integration and test phases to identify and resolve system bottlenecks is too late.
Trade-off analysis. Performance engineering stresses the importance of conducting a trade-off, or what-if, analysis—an iterative analysis in which various aspects of an architecture or design are traded off to assess the impact.
Test-driven design. Under agile development methodologies, such as test-driven design, performance requirements should be a part of the guiding test set. This ensures that the nonfunctional requirements are taken into consideration at all phases of the engineering life cycle and not overlooked.
Monitoring, measurement, and instrumentation. System instrumentation is a critical performance engineering activity. Careful planning is necessary to ensure that useful metrics are specified, that the right monitoring tools are put in place to collect those metrics, and that analysis of the collected data will reveal meaningful information.
Performance challenges in integrated systems. Projects that involve off-the-shelf components or systems of systems introduce special challenges for performance engineering. Modeling and simulation may be useful in trying to anticipate the problems that arise in such contexts and to support root cause analysis should issues emerge/materialize. System instrumentation and analysis of the resulting measurements may become more complex, especially if various subsystems operate on incompatible platforms. Isolating performance problems and bottlenecks may become more difficult as a problem initiated in one system or subsystem may emerge as a performance issue in a different component. Resolving performance engineering issues may require cooperation among different organizations, including hardware, software, and network vendors.
Predicting usage trends. Performance data collected as part of the monitoring process should be archived and analyzed on a regular basis in order to track trends, which can be used to make predictions about the future.
References and Resources
Computer Measurement Group (CMG). A professional organization of performance professionals and practitioners. The CMG holds a yearly conference and publishes a quarterly newsletter.
International Council on Systems Engineering (INCOSE). Recognizes performance engineering as part of the system life cycle.
Association for Computing Machinery (ACM) [SIGSIM and SIGMETRICS]. Contains special interest groups in both simulation and measurement.
The Society for Modeling and Simulation International (SCS). Expertise in modeling and simulation, which is used extensively in performance engineering activities.
Information Technology Infrastructure Library (ITIL). Industry standard for IT service management (includes aspects of performance engineering).
Federal Enterprise Architecture (FEA) Performance Reference Model (PRM). The PRM is a "reference model" or standardized framework to measure the performance of major IT investments and their contribution to program performance.
Capability Maturity Model Integration (CMMI). CMMI is a process improvement approach that provides organizations with the essential elements of effective processes.
The Standard Performance Evaluation Corporation (SPEC). A standards body for performance benchmarks. SPEC is an umbrella organization encompassing the efforts of the Open Systems Group.
Transaction Processing Performance Council (TPC). The Transaction Processing Performance Council defines transaction processing and database benchmarks and delivers trusted results to the industry.
Object Management Group (MARTE Profile). International, not-for-profit, computer industry consortium focused on the development of enterprise integration standards. MARTE is "Modeling and Analysis of Real-time and Embedded Systems."
Gunther, N. Author of several performance engineering books.
Maddox, M., 2005. A Performance Process Maturity Model, MeasureIT, Issue 3.06.
Menasce, D. A. Professor at George Mason University and author of several performance engineering books.
Smith, C. U., and L. G. Williams. Creators of the well-known Software Performance Engineering (SPE) process and associated tool. Authors of Performance Solutions as well as numerous white papers.
Jain, R. Professor at Washington University in St. Louis and author of several performance engineering books and articles.