To print a PDF copy of this article, click here.
Based on historical data, a large percentage of U.S. military systems struggle to achieve their reliability requirements, resulting in significant penalties, such as decreased system availability, increased life-cycle costs, and schedule delays. These impacts are all applicable to studies of Risk Assessment and Analysis of Alternatives (AoA)—which assess technical, schedule, and cost risks. In order to effectively analyze the reliability risks for programs of interest in a Risk Assessment or AoA, a new approach has been developed.
The recently adopted approach consists of four separate techniques that can be used individually or collectively to inform decision makers and positively improve defense acquisition:
- Assess the reliability estimates of similar systems to gauge the feasibility of the reliability requirement and the likelihood of achieving it.
- Conduct an assessment using the U.S. Army Materiel Systems Analysis Activity (AMSAA) Reliability Scorecard to determine the adequacy of the overall reliability program through a quantitative risk score.
- Create a realistic Reliability Growth Planning Curve (RGPC) using AMSAA’s Planning Model based on Projection Methodology (PM2) and gauge the associated risks using AMSAA’s RGPC Risk Assessment Matrix.
- Examine the impact of the reliability requirement on test duration and Operations & Support (O&S) life-cycle costs.
The four techniques are not new. In fact, they are used and widely accepted for planning and managing reliability programs. However, systematically applying these effective techniques to improve the Risk Assessment and AoA process is new. This article presents the four techniques and applies each of them to a notional AoA for a ground vehicle program. It is important to note that executing each of the four techniques may not be possible for every study, as it will depend on the extent of available reliability information for the proposed and alternative systems. In such cases, the analysis should include only the techniques that can be performed.
The first technique gauges the feasibility of the reliability requirement and the likelihood of achieving it by assessing the reliability of similar systems. According to the Defense Acquisition University’s (DAU) Glossary of Defense Acquisition Acronyms and Terms, reliability “measures the probability that the system will perform without failure over a specified interval under specified conditions. Reliability must be sufficient to support the warfighting capability needed in its expected operating environment.” Therefore, when applying this technique to Risk Assessment and AoA, it is important to consider any differences in capabilities and operating environment that exist between the proposed and alternative systems.
It is unlikely that the proposed system will have all the same capabilities as each of the alternative systems. Technological advancements, along with the Department of Defense (DoD) need to adapt to the ever-changing threats on the battlefield, result in the development of systems with enhanced capabilities. These may include capabilities such as increased power, added protection, increased payload, reduced fuel burden, and improved Command, Control, Communications, Computers, Intelligence, Surveillance and Reconnaissance (C4ISR) systems. Any differences should be identified between the proposed and alternative systems in terms of capabilities and technologies, as well as their associated Technology Readiness Levels (TRLs), Manufacturing Readiness Levels (MRLs), and Integration Readiness Levels (IRLs).
Technological advancements, along with the DoD need to adapt to the ever-changing threats on the battlefield, result in the development of systems with enhanced capabilities.
It also is unlikely that the proposed system will have the same operating environment and usage as each of the alternative systems. There also should be identification of any differences between the proposed and alternative systems in terms of environment (such as terrain, temperature, and weather), tasks that the system must complete to accomplish its mission, and the definition and classification of failures. Once the differences in capabilities and operating environment are identified, the reliability requirement for the proposed system should be compared to that of the alternative systems in order to gauge its feasibility.
Using the example AoA, Figure 1 shows that the proposed system has a 148-hour Mean Time Between Failure (MTBF) requirement, which is within the range of the requirements for the alternative systems. Upon further investigation, it is determined that the proposed system will have a few additional capabilities in comparison to the alternative systems. However, the elevated risk of achieving the required MTBF with the additional capabilities is offset by the fact that the required usage environment for the proposed system (mostly primary roads) is not as harsh as that of the alternative systems (mostly secondary roads). Therefore, it can be determined through the first technique that the technical risk associated with achieving the MTBF requirement is fairly low.
Figure 1. Assessing the Reliability of Similar Systems
AMSAA Reliability Scorecard
The AMSAA Reliability Scorecard initially was developed to provide a mechanism for consistently and effectively conducting early engineering-based reliability reviews to alert key Army leaders when weapon systems are off track with respect to meeting their reliability requirement. Typically, the Scorecard is used to examine a program’s use of reliability best practices, based on the planned and completed reliability tasks, to assess the adequacy of the overall reliability program. However, the Scorecard is a comprehensive evaluation tool that is not limited to early engineering-based reviews. Instead, the Scorecard is applicable to engineering activities that occur during all phases of the life cycle, making it a useful tool for Risk Assessment and AoA.
The AMSAA Reliability Scorecard contains 40 elements grouped into eight critical categories. Based on each element’s criteria, a rating of high risk, medium risk, low risk, or “Not Evaluated” is assigned to each of the 40 elements. The ratings are used to calculate a risk score for each of the eight categories, as well as an overall risk score for the program. The scores are normalized to a 100-point scale, where 100 is the highest risk. Elements that are not applicable to the program should be rated Not Evaluated, which removes them from the calculations. After assigning a level of risk to each of the 40 elements, the analyst should provide suggestions to decrease the risk for each of the medium- and high-risk elements. Next, cost and schedule estimates should be made to determine the programmatic impacts of executing the recommended activities. An example Scorecard element is shown in Figure 2.
Figure 2. AMSAA Reliability Scorecard
|Category||#||Element||High-Risk Criteria||Medium-Risk Criteria||Low-Risk Criteria|
|Reliability Analysis||15||Comprehensive thermal and vibration analyses and/or finite element analyses (FEA) are conducted to address potential failure mechanisms and failure sites.||No thermal or vibration analyses or FEA are conducted.||Design may be modeled. Boundary conditions are determined from higher-level models or measured data. Vibration response may not be measured in multiple locations or in all appropriate axes. Limited FEA may be carried out. Some thermal or vibration objectives will not be met.||Design is modeled for thermal and vibration characteristics. Boundary conditions are determined from higher-level models or measured data. Special items and operating conditions are modeled. Vibration response is measured in multiple locations in all appropriate axes. FEA is performed on structure. All thermal and vibration objectives should be met.|
The AMSAA Reliability Scorecard can be used for systems composed primarily of hardware, as well as those composed of both hardware and software. The AMSAA Software Reliability Scorecard was developed recently to evaluate reliability programs for software-intensive systems. Both Scorecards allow for identification of risks associated with achieving the reliability requirement, and they highlight critical activities that a program should execute to increase the likelihood of reliability success.
Continuing with the notional AoA, the second technique is executed to identify the risks associated with achieving the reliability requirement and the cost and schedule impacts associated with mitigating those risks. According to the completed Scorecard assessment, the program has an overall risk rating of 56, which is in the medium-risk range. The assessment indicates that the developer committed minimal resources toward Design for Reliability (DfR) activities such as Failure Modes, Effects, and Criticality Analysis (FMECA), Finite Element Analysis, and thermal and vibration analysis. Based on knowledge from previous defense acquisition programs, it is estimated that this program would need to dedicate roughly 18 months to effectively execute these DfR efforts and incorporate the necessary design changes into the proposed system prior to entering formal, system-level reliability growth testing.
Therefore, if program management decides to push forward with the current design, the AMSAA Reliability Scorecard indicates that the technical risks associated with achieving the reliability requirement are medium. However, if program management is willing to make a strong commitment to executing the appropriate DfR best-practices and is willing to incur an 18-month schedule delay, then the technical risks could be mitigated.
Reliability Growth Planning Curve
Reliability growth planning addresses program schedules, amount of testing, resources available, and the realism of achieving and demonstrating the reliability requirement. To plan for and manage reliability growth, programs develop a Reliability Growth Planning Curve (RGPC) and establish the necessary supporting activities. One of the reliability growth planning models commonly used by the Army and DoD is AMSAA’s Planning Model based on Projection Methodology (PM2). PM2 is an Excel-based mathematical model used to formulate a detailed reliability growth plan for a complex system under development. The plan is represented in the form of a system-level RGPC that incorporates the reliability requirement, test schedule, and management’s corrective action strategy. If an RGPC using PM2 has not already been developed for the system, one should be developed using realistic planning parameters.
When analyzing reliability for Risk Assessment and AoA, the use of PM2 is particularly beneficial. Additionally, AMSAA’s RGPC Risk Assessment Matrix should be used to assess the risks associated with the planning parameters. The matrix includes 10 elements relating to the RGPC, with low-, medium-, and high-risk criteria associated with each element. The appropriate risk level should be assigned to each of the 10 elements, based on the RGPC and the system’s reliability growth program. If several elements receive medium- or high-risk ratings, it may be unlikely for the system to achieve the reliability goals established by the RGPC. In such cases, a new, more achievable RGPC should be developed so most of the elements in the risk matrix yield low-risk ratings. Then a comparison should be made between the original RGPC and the new RGPC to determine the estimated schedule impacts associated with the new plan.
The reliability analysis for the notional AoA continues by applying the third technique to gain additional insights into the proposed system’s risks. Figure 3 depicts the vendor’s proposed RGPC for its developmental system, which has been determined to be low risk using the RGPC Risk Assessment Matrix. As indicated by the RGPC, the system is required to enter system-level reliability growth testing with an initial MTBF of 103 hours to have a realistic chance of achieving the 245-hour MTBF Goal at the end of Developmental Testing (DT). However, lower-level component testing and reliability block diagram estimates indicate that the system may only have an initial MTBF of 30 hours, which is significantly shorter than the planned value of 103 hours.
Figure 3. PM2 Reliability Growth Planning Curve
By identifying the system’s low likelihood of “getting on the curve,” it can be concluded that the program’s current plan yields high risks. To mitigate these risks, a more realistic RGPC should be developed that incorporates the expected initial MTBF of 30 hours. However, according to the RGPC Risk Assessment Matrix, the goal MTBF in DT (which is 245) should be no more than 3 times the initial MTBF (which is 30). Therefore, the current 30-hour initial MTBF is too low to generate a realistic RGPC. In order to mitigate the high risks and satisfy the criteria in the RGPC Risk Assessment Matrix, it is critical for the program to achieve the planned 103-hour initial MTBF. To accomplish this, program management must be dedicated to conducting a major DfR effort that includes substantial redesign of one or more subsystems in order to mitigate large classes of failure modes. This is the only way for the system to “get on the curve.”
Using the insights gained from techniques 1 through 3, the following conclusions can be made thus far:
- The 148-hour MTBF requirement is appropriate for the system.
- The developer did not dedicate the appropriate resources toward DfR activities, which would result in an 18-month schedule delay were program management to perform those activities.
- For the program to have a low-risk plan, an initial MTBF of at least 103 hours is needed. Therefore, the 18-month investment in the DfR activities identified by the Scorecard is essential.
Impact of Reliability on O&S Costs
The fourth and final technique is to examine the impact of the reliability requirement on test duration and O&S life-cycle costs. According to the June 2010 Office of the Secretary of Defense (OSD) Memo State of Reliability, “Sustainment costs have 5 to 10 times more impact on total life-cycle costs than do Research, Development, Test, and Evaluation (RDT&E) costs. Poor reliability leads to higher sustainment costs for replacement spares, maintenance, repair parts, facilities, staff, etc.” Achieving a higher MTBF requires additional test time for DfR activities and DT reliability growth test events. However, the associated payoff of the additional testing is not just improved system reliability, but also reduced O&S costs for the life cycle of the system.
When conducting reliability analysis for Risk Assessment and AoA, a sensitivity analysis on the reliability requirements should quantify the financial impact that various levels of system reliability will have on the program’s life-cycle costs. To achieve the estimated O&S costs, the Selected Essential-Item Stock for Availability Method (SESAME)-based Consumption, Holding, Repair, and Transportation (COHORT) model can be used. The model provides cost analyses by using existing consumable- and repairable-part input data that are tailored to a particular system. COHORT computes the expected life-cycle costs of the enterprise’s supply and maintenance system that will be supporting the weapon system/end item throughout its useful life. For Risk Assessments and AoAs, the cost and schedule impacts of reliability testing are important, but equally important are the O&S life-cycle costs associated with the system’s reliability.
To complete the reliability analysis for the notional AoA, the fourth technique is utilized to determine the impact that lowering the MTBF requirement (or achieving a lower MTBF goal) would have on DfR and DT duration and on O&S life-cycle costs. As shown in the top row of Figure 4, if the proposed system had an MTBF requirement of only 103 hours, no system-level reliability growth testing would be needed, as long as the system undergoes the 18 months of previously mentioned DfR activities. The O&S life-cycle costs associated with the 103-hour MTBF would be about $1.5 billion.
Figure 4. Requirements Sensitivity Analysis
If, on the other hand, the system’s MTBF requirement remained at its current value of 148 hours, the O&S costs would be $1.0 billion. However, 18 months of DfR and 5 months of system-level reliability growth testing would be needed, for a total of 23 months. If the system’s MTBF requirement increased to 225 hours, and again to 300 hours, the O&S costs would be further reduced. However, achieving these higher MTBF requirements would require program management to commit additional time to conduct DfR activities and system-level reliability growth testing.
Many military systems struggle to achieve their reliability requirements, resulting in decreased system availability, increased life-cycle costs, and schedule delays. The proposed approach for Risk Assessment and AoA includes four techniques for identifying and assessing a program’s reliability risks. Whether used individually or collectively, these techniques can inform decision makers and positively improve defense acquisition.
Elements of this approach have been used to support the Armored Multi-Purpose Vehicle (AMPV) AoA, Bradley Cost Benefit Analysis, and the Deployable Force Protection Radar Study. This approach also is being incorporated into the Ground Combat Vehicle (GCV) AoA.