You are on page 1of 4

INTEGRATING ROOT CAUSE ANALYSIS METHODOLOGIES

Leith Hitchcock

CPEng CMRP, PALL Corporation

Abstract: Many Root Cause Analysis (RCA) methodologies have specific applications and limitations and
in some case for complex machinery investigations they can be combined and enhanced for better results.
Typical methodologies that can be combined effectively are Kepner Tregoe, Causal Tree Analysis
(Apollo), Fault Tree Analysis, Logic Tree Analysis, Barrier Analysis, and Human Performance Evaluation
amongst others.

The difficulty with many RCA methodologies currently available today is that by themselves they may not
result in complete and efficient analyses and effectively implemented solutions for complex machinery
problems. Most methodologies also need expert facilitation to achieve results. Many methodologies today
also have widespread general application yet can be further enhanced for asset specific investigations.
Furthermore the can be effectively combined for a more effective and efficient approach.

Using Life Cycle Management principles the concept the all root causes can be classified into one of five
categories: Design, Manufacturing, Installation, Operation and Maintenance, can be used to enhance data
gathering, problem identification, causal tree selection and management and solution design. This concept
is also applicable to both logical (KT) and graphical (CT) methodologies.

Key Words: Fault Tree Analysis, RCA, Fault Tree Analysis, Barrier Analysis, Kepner Tregoe, Life Cycle
Management, Solution Design, KT, CT.

1 THE PREPARATORY PHASE


The following flowchart is adapted from Reference 1 and outlined the key sequences in machinery RCA approach:
Define potential
corrective action(s)
No
Design based Is root Yes
cause
confirmed? Prepare cost-
benefit analysis
Manufacturing
based

Notification of failure

Installation File
Is there a cost- No
induced documents
effective
Preserve physical for future
Clarify the event solution?
evidence use

Operation
induced Yes
Gather support Evaluate failed
documentation components Prepare report and
recommendations
Maintenance
induced
Interview all Test system
involved personnel dynamics Submit
recommendations
Develop probable for approval
root cause(s)
Develop sequence-
of-events diagram Is root Yes
Verify cause by
cause
Verify cause by testing
evident?
testing
No File
Are
documents
recommendations
No for future
approved?
use

Request assistance No Is root


Yes
from reliability cause
engineering group confirmed? Yes

Implement Test to verify


corrective actions correction

Figure 1. General Machinery Investigation Process (1)


The conceptual refinement is to drive the five basic root cause categories of Design, Manufacturing, Installation, Operation
and Maintenance (DMIOM) into the RCA process via specific focussed areas of investigation.

WCEAM 2006 Paper 106 Page 1


2 THE BASIC TECHNIQUES
The following table is adapted from the US Government’s DOE RCA Document(2) and outlines the applications and
limitations of some for the generally used techniques:
Method When to use Advantages Disadvantages Remarks
Cause & Effect Use for multi-faceted problems Provides visual display of Time consuming & Requires a broad perspective of
Chart Analysis with long or complex causal analysis process. requires familiarity with the event to identify unrelated
Fault Tree Analysis factor chain. Identifies probable process to be effective. problems. Helps to identify
Logic Tree Fault or Logic Tree can be used contributors to the Fault and Logic Tree’s can where deviations occurred from
when the problem involves logic condition. Fault and Logic be extremely complex acceptable methods.
Fishbone Diagram Tree capable of analysing
or control issues. May need specialists in control
logic issues with control systems and logic for Fault or
systems or processes Logic Tree methods.
Change Analysis Use when cause is obscure. Simple 4 step process. Limited value because of A singular problem technique
Especially useful in evaluating the danger of accepting that can be used in support of a
equipment failures. wrong, “obvious” answer. larger investigation. All Root
Causes may not be identified.

Barrier Analysis Use to identify barrier & Provides systematic Requires familiarity with This process is based on the
equipment failures & procedural approach. process to be effective. MORT Hazard / Target
or administrative problems. concept.

Management Use when there is a shortage of Can be used with limited May only identify areas of If this process fails to identify
Oversight & Risk experts to ask the right questions prior training. Provides a cause, not specific problem areas, seek additional
Tree Analysis & whenever the problem is a list of questions for causes. help or use Cause & Effect
(MORT) recurring one. Helpful in solving specific control & analysis.
programmatic problems. Management factors.
Human Use whenever people have been Thorough analysis. Looks None if the process is Requires training.
Performance identified as being involved in the at systemic and human closely followed.
Evaluation problem cause. aspects to failure.
Kepner-Tregoe Use for major concerns where all Highly structured More comprehensive than Requires training.
Problem Solving & aspects need thorough analysis. approach, focuses on all may be needed.
Decision Making Can be used as a general aspects of the occurrence
framework. & problem resolution.
Disciplined solution
development process.

Table 2. Method Applicability Table (2)

3 USING KEPNER-TREGOE AS A FRAMEWORK


Although the Kepner-Tregoe (KT) (3) methodology is comprehensive this degree of detail is often required for complex or
extensive machinery failure investigations and warranty claim work. Using KT as a framework allows facilitators to better
acquire and prepare information prior to carrying out such methodologies as Causal Tree(3) or Fault Tree(4). This is important
when the sequence of events, failure mechanism relationships and failure mode interrelationships must be understood in detail.
Proper data acquisition prior to commencing the RCA also improves the processes effectiveness and efficiency.
A diagrammatic process is detailed below in Figure 2 detailing how KT can be used as a framework with other graphical
techniques inserted for more detail failure mechanism/logic analysis.
1. Incident Analysis Incident Analysis Incident capture & assessment for RCA.

Problem Statement
Define the problem & gather supporting
2. Problem Analysis
data. Establish RCA teams.
Data Collection

Select RCA Technique Select appropriate RCA technique/s to


3. Technique Selection
analyse the problem.
7. Root Cause Analysis Management

Cause & Fault Fish Change Barrier MORT


Effect Tree Bone Analysis Analysis Analysis
Charting Analysis Diagram
Apply the RCA technique/s to generate
possible causes.
Human Performance Evaluation Check which of these have supporting
4. Technique Application
facts to extract probable causes.
Cause Evaluation Verify against the problem parameters
to find the root cause/s.
Root Cause Selection

Criteria Selection
Define the criteria that the solution must
satisfy.
Solution Generation Generate alternative solutions and
5. Solution Development evaluate them against the criteria.
Solution Evaluation Select the solution/s that best meet the
criteria with minimal risk of creating new
problems.
Solution Selection

Solution Implementation Implement the agreed solutions and


6. Solution Management monitor their success in solving the
Solution Monitoring problem.

Fig. 3. A diagrammatic approach to using KT as a framework for RCA methodology integration.

WCEAM 2006 Paper 106 Page 2


The KT methodology, however, can be further improved by driving the DMIOM concept through the data acquisition,
problem analysis, and solution design areas of RCA. One way to do this is to carry out the question challenge for each
DMIOM category. An additional improvement is to expand the data acquisition/problem analysis phase with the basic
elements from the change analysis methodology to give an additional “deviation” and “effect/consequence” dimension to the
analysis as can be seen below in Table 2.

For each DMIOM category IS IS NOT Deviation Effect/


Consequence
Identity (WHAT)
Location (WHERE)
Timing (WHEN)
Extent (SIGNIFICANCE)
Answer the following questions and fill in the chart:
IS…
Identity - What item specifically has trouble? What is wrong with it?
Location - Where on the item did it happen? Where was the item located?
Timing - When did it happen - time, before / after, point in cycle?
Extent - When it happens how much is affected? Any pattern?
IS NOT…
Identity - Are there similar items? How are they affected?
Location - What parts are unaffected? Are others having trouble?
Timing - What are other likely times? Is it happening then too?
Extent - Is some portion consistently not involved? Is this usual?

Table 2. The expanded KT methodology with DMIOM category and change analysis embedded.

4 HUMAN ERROR
Human Performance Evaluation has a key role in linking solutions to DMIOM categories where there is human
involvement. In the case of machinery failure human error is prevalent and in order to make solutions permanent a focus on the
systemic and human dimensions needs to be carried out.
A systematic view of human error is outlined in Figure 4, which details the key areas for investigation into human and
systemic error.

Slips Attention failures Plan of action satisfactory but


Unintended action deviated from intention
actions in some unintended way
Lapses Memory failures

Rule-based Misapplication of good rule.


mistakes Application of bad rule.

Substandard acts Mistakes


No ready made solution, new
Knowledge-based situation tackled by thinking
mistakes out answer from scratch

Intended Routine violations Habitual deviation from


actions regulated practices.

Exceptional Non-routine infringement


Violations violations apparently dictated by local
circumstances.
Acts of sabotage

Figure 4. Human Error Sources


The analysis of human error as part of an asset RCA provides a necessary link to Incident/Accident investigation
techniques used in health, safety, and environmental incident investigations where such techniques as Barrier Analysis (2&5) and
MORT (2&6) are used.

WCEAM 2006 Paper 106 Page 3


5 MODIFYING CAUSAL TREE METHODS
Causal Tree analysis can often become difficult if a complex problem is not initially broken down into separate areas of
analysis. This in itself poses a problem from a facilitation viewpoint in that sometimes the process can end up with several
teams investigating the same fault tree. The classic example of this from a machinery viewpoint is “lubrication failure”.
The author first used a matrix system in 1990 to regain control over a complex turbine investigation with great success. The
system basically splits a complex investigation into several logical areas of analysis and each area is given a sequential
number. The causal tree sheets (when using the Apollo method (7) for example) are then numbered and each fault tree is laid
over an alphanumeric matrix such that each individual failure mode (Post-it note) can be identified using sheet number and
Cartesian coordinate.
When simultaneous analyses are carried out the facilitator ensures that when a common cause is found it is only analysed
once from either one fault tree or as a separate tree. Whenever this tree is referenced the initiating cause is marked with “TO
Sheet X, Row Y, Column Z)” and the analysis tree is referenced “FROM Sheet A, Row B, Column C”. The reason for this
becomes clear when as the analysis progresses key failure modes become referenced multiple times. A failure mode that
receives multiple references is either a Root Cause or a Significant Contributing Cause and must be dealt with by the solution.
Quite often these multiple reference failure modes become the focus for quick fixes or interim solutions as they themselves
influence multiple other failure mechanisms, as is the case for “lubrication failure” in machines investigations.
Further modifications can be made by incorporating “and, or, if” logic indicators on the causal tree branches, however, this
can cause unnecessary complication of the process. This process is useful particularly in the case of conditional “if” review
where an event wont take place until a preceding condition has been met.

6 CONCLUSION
Like any maintenance activity the different root cause analysis methodologies should be viewed as tools in a toolbox and as
such any RCA program should not rely on just one method in isolation but rather have expertise in several key methodologies
that can be deployed to suit the type of RCA required.
This concept of deploying a task oriented method is the general principle behind the US Governments DOE document.
When such techniques are used for machinery root cause analysis investigations users should consider further refinements
as outlined in this paper in order improve the effectiveness and efficiency of investigations. Such processes and modification
should, however, be based on the criticality of the failed asset, the risk associated with its failure, and the return on investments
expected from the RCA.

7 REFERENCES:

[1] Mobley, R.Keith. Root cause failure analysis (1999). Butterworth-Heinemann, Woburn MA, USA. ISBN 0-
7506-7158-0.
[2] DOE-NE-STD-1004-92, DOE Guideline, Root Cause Analysis Guidance Document (1992), US Department of
Energy, Office of Nuclear Energy, Washington DC, USA.
[3] C.H. Kepner and B.B. Tregoe. The New Rational Manager (1981). Princeton Research Press, Princeton, NJ,
USA.
[4] BS5760: Part 7: 1991, Reliability of systems, equipment and components, Part 7. Guide to fault tree analysis.
[5] Barrier Analysis (1995). Technical Research and Analysis Centre, Scientech, Inc. Idaho Falls, ID, USA. SCIE-
DOE-01-TRAC-29-95.
[6] N.W. Knox and R.W.Eicher (1983). Mort Users Manual, SSDC-4, Rev.2, System Safety Development Centre,
EG&G, Idaho, ID, USA.
[7] Gano, Dean L. (1999). Apollo Root Cause Analysis. Apollonian Publications, Yakima, Washington, USA. ISBN
1-883677-01-7.

WCEAM 2006 Paper 106 Page 4

You might also like