International Society of AirSafety Investigators Presentation
“Taking personal responsibility for the safety of each traveler and every crew member.
Acting with an empowered "can do" spirit to continuously improve our airline.”
Jean-Pierre Dagon, Director Corporate Safety for AirTran Airways
About the author: Jean-Pierre Dagon is a native of Switzerland (Geneva) and is the Director Corporate Safety for AirTran Airways. He has been employed with Northwest Airlines as Manager Flight Operational Quality Assurance, and Flight Safety Texas, San Antonio Learning Center: Director of Pilot Standards. Prior to 1995 he was with Crossair LTD, Basel Switzerland as Head of Pilot Training.
He is a B717-200 Captain and rated on DC-9 (B717), SF-340, and SA-2000.He holds a B.B.A., Aviation management option, Boise State University (Cum Laude) and has attended the College of Geneva (Switzerland). He is a member of AIAA, Corporate members of ISASI and the Flight Safety Foundation.
This article was adapted, with permission, from the author's presentation entitled "The practical use of Root Cause Analysis system (RCA) using REASON®: A building block, for accident/incident investigations" presented at the ISASI (International Society of AirSafety Investigators) 2003 seminar in Washington D.C., USA, August 2003.
Root Cause Analysis with REASON®
Root Cause Analysis (RCA) is commonly used in engineering and reliability programs, but is not always emphasized in accident/incident investigations.RCA can lead to changes in procedures, processes, manuals, oversight and training.
The defense in depth strategy is common to all safety prevention doctrine.The Swiss Cheese Theory commonly illustrates successive layers of protection, one behind the other, each guarding against the possible breakdown of the one in front.According to the originator of the theory, Dr. James Reason, each layer has weaknesses and gaps akin to a Swiss cheese.The Swiss Cheese metaphor is best represented by a moving picture, with each defensive layer coming in and out of the frame according to local conditions.
The theory holds that these holes are created by a combination of active and latent failures. The active failure consists of errors or violations committed at the sharp end of the system. A latent failure stems from poor design, shortfall in training, inadequacy of tools and equipment, which are present for sometimes years before these conditions combine with local circumstances and active failures to penetrate the system’s many defensive layer.
As such, the rare conjunction of a set of holes in successive defenses allows hazards to come into damaging contact with people and assets, according to Dr. Reason as he defines the accident trajectory. To date, however accident/incident investigations point many times to causal factors (i.e. bringing forth the facts), but leave it up to the recipient of the report to determine root causes.
This approach offers an opportunity to examine Root Causes, and bring forth some measurable indicators of the likelihood of reoccurrences.It may offer an avenue to the question: “What latent conditions led to the accident?” Root Cause Analysis (RCA) is commonly used in engineering and reliability programs, but is not always emphasized in accident/incident investigations.RCA can lead to changes in procedures, processes, manuals, oversight and training.
RCA using Reason®In Root Cause Analysis, REASONrecognizes three basic elements that built causal patterns:
A Change or changes: An action that triggered another step in a problem.The initial change comes from the problem statement.For example: Aircraft ship number 123, Flt 455 ‘s left wing collided with a parked fuel truck.This is a change, something happened that caused the end result: a collision.
A condition: A state of being that existed within the environment over some period of time; i.e. It was dark. The ramp was wet.The fuel truck was parked on the safety zone. The pilot’s scan was poor.
An Inaction: Is anything that could have or should have occurred to prevent the next step in the problem, but did not.The inaction is akin to allowing the chain of events to continue unchallenged.For example:The pilot did not stop when confusing marshalling signals were present.The airplane was allowed to continue with a high rate of descent.The flight crew did not react to a GPWS pull-up command, and so forth.
A set of facts identifies all of the factors that are essential for one step to occur within one chain within the REASON® model. As a person lists the component factors that explain why a particular step in the event occurred, a set of factors is built. Each set must contain only factors that are necessary to explain the consequence of that set, and nothing more.
Certain rules have to be met in order for the system to work:A set is a group of factors that causally account for the next higher step (their consequent) in the model.
1. There can be only one change in a set (a group of answers that explains one cause, or one change for any level).
2. Change is produced by change.
3. Inactions are always brought about by inactions; therefore, you cannot have a change answer in an inaction set (Something didn’t get done, or didn’t happen, either due to a lack of plan, or the plan did not work).
4. Conditions can occur in any set, but it is not necessary to have a condition in every set.
There are two types of conditions: those that are brought about by change, and those that are brought about by a lack of change.The Reason® software will ask you to designate which type of condition you are dealing with before its "Advice Area" activates the questions for the set. (Note: The software stringently enforces these four rules.)
Building steps for RCA
The building steps start with a change (which may be a summary of the incident broken down in simple building blocks). That change is developed with a set of factors, which contributed to the initial event, this allows the investigator to retrace steps that came into effect to bring about the changes.In this process, one will find repeating patterns, which can be looped and thus connected to one factor that accounts for several of these event sets leading to the accident.The process is basically structured around a pyramid:
Upstream risk analysis-Upstream (top of pyramid), accounts for critical steps prior to the event (the last chain of the event chain).As you move down the values are becoming smaller, it lends a predominance of weight at the top of the model.
Downstream analysis-A longer chain of events, as analyzed downstream (or at the bottom of the pyramid), would indicate a bigger problem, for there were many opportunities to break the chain of events from unfolding, yet these opportunities were either ignored or unknown.It is likely that latent effects would be best described by downstream analysis. Whereas active failures would more be consequential at the top of the pyramid.If one assumes a single level of events, causing the accident, than one has a typical active failure model (a virtual impossibility).
Differing fixes
The engineering fix versus organizational fix approach offers an alternative to eventual costly engineering changes, which may not be necessary given the propensity for the event to reoccur.Engineering safety brings forth a comprehensive and permanent fix, however, engineering safety can have alternative drawbacks:
·It can be impractical or hard to market for the industry at large. Example: Considering an initiative to equip passenger aircraft with aft-facing seats.Although used extensively in the military, a proposition for aft seat facing passenger seats could be interesting if one considers the flying public’s likely distaste for flying “backwards.”
·It can introduce new threats because of the fix in itself. An example is the automation introduced in modern jets, which are intended to alleviate the workload and monitor parameters. If the automation fails, it relies on intuitive knowledge by the pilot who is not cognizant at first of a failure in automation, or a failure in programmation and could lead to a catastrophe.Example: the Air Inter A-320 crash in Strasbourg, were a vertical speed of 3.3 (as in 3300 fpm down) [VS/HDG combined mode] may have been left, or erroneously selected by the pilot in command (PIC) in lieu of the track/flight plan angle mode or 3.3° [TRK/FPA mode] desired, leading to a controlled flight into terrain againstMt. St. Odile.
·It brings forth a bulldozer approach, to level an ant hill--a disproportional fix, to a single and remote possibility of a failure.
Root Cause Analysis is a process designed to discover both an engineering solution and organizational alternatives. In the REASON® system these controls can be compared for effectiveness for prevention of a certain event. This effectiveness comparison coupled with an understanding of the propensity for the specific event to recur provides decision makers with important information to aid them in deciding whether engineered controls are preferred.
Root causes division
A root cause can be categorized in the following hierarchy:
A.Management Level Action Required--Management Principles must be first considered, to insure a policy is in place, is enforced and controls are established.Management-level statements included are,
Management did not:
·communicate this requirement;
·designate that this policy apply to this specific situation;
·establish a means to monitor compliance with this policy;
·communicate how they were monitoring for compliance;
·enforce the policy when an infraction was found;
·establish a policy to control this.
The point at which the statement can be affirmed as true is the point of breakdown in the Organizational Principles of Control.If the statements are not applicable, the next step is analyzed.
B.Supervision Level Action Required-At this point, the Reason® software offers Supervision Principles to consider in each of the following statements:
·Supervision did not communicate what was wanted.
·Supervision did not provide the things necessary in order to comply with policy.
·Supervision did not follow the policy in the past.
·Supervision did not enforce the policy in the past.
The point at which a statement can be affirmed as true is the point of breakdown in the Organizational Principles of Control.
After looking at a failure at the management level, filtering down to the supervisory level, the individual performance may be examined:
C.Individual Action Level Required
·The Individual's incorrect action is now acceptable and the policy can be changed.
·The Individual's incorrect behavior can be modified.
·The Individual's incorrect behavior cannot be changed, and he must be removed from that particular environment.
The software is diligent in giving this as a last resort option stating: “selecting an individual root cause (RC ) is a serious and rare decision.Using the RC wizard will help to avoid missing the systemic portion of a RC where the individual(s) share responsibility.” Often, an organization will resort to disciplinary action, at the expense of finding a systemic problem to an incident/accident; thus, going against the accepted proposition that individuals for the most part have an innate desire for self-preservation, and in a high consequence environments, seldom create intentional accidents.
Application of RCA
Here is a practical application of Root Cause Analysis using REASON® in an actual case of simple ground damage.
Problem Statement--Aircraft # 123 arrived at Destination as Flight 455 from Philadelphia on 14 June with 59 customers and a crew of five, and was assigned to Gate C-3.The ramp crew was at another gate and not in position for an arrival at C-3, but ran to their positions when notified of the waiting aircraft.As the aircraft moved forward into the gate it struck an unattended fuel hydrant truck left inside the containment zone, damaging the leading edge and underside of the left wing.
Figure 1: Leading edge slats 4 and 5 damaged.
Figure 2: Detailed view of the damages-with Fuel Truck.
[TO DOWNLOAD THE PRESENTATION OVEHEADS PLEASE CLICK ON THE LINK BELOW - EDITOR]
Root Cause Analysis with REASON® Root Cause Analysis
[REASON®, is a trademark of DECISION Systems, inc., located in Longview Texas, not to be confused with Dr. James Reason - Editor]
[A PDF version of this presentation can be downloaded from a link at the end of the page. Overheads can be downloaded from a link mid way down this page.]
Narration obtained by the Reason®-software--Because the fuel vendor's supervision did not enforce the policy of parking fuel trucks in designated areas only and the individual(s) did not comply on their own with the established business process, the fueler did not park the vehicle in a designated parking area. Additionally, because the customer service organization did not establish a policy to advise fuel company personnel about the importance of safety zone lines, the fuel company did not stress to its truck drivers the importance of not parking in safety zones. So, the fueler was not attentive when he parked the vehicle.
Also, because the fuel vendor did not establish a policy to park vehicles only in designated parking spots, the fuel company did not have a prohibition against parking in the safety zones for office business. Since the fueler was not attentive when he parked the vehicle, and because the fuel company did not have a prohibition against parking in the safety zones for office business, the fueler did not park outside of the safety zone line. So, when the fueler parked the truck to deliver a bill to the fuel vendor's office, and because the fueler did not park the vehicle in a designated parking area, and because the fueler did not park outside of the safety zone line, a fuel truck was parked in the safety zone.
Moreover, because the management did not establish a policy to repaint the lines periodically due to wear, the safety zone line was not visible from the marshaller's position. As the ramp was wet, and since the safety zone line was not visible from the marshaller's position, and because the fueler did not park outside of the safety zone line, the fuel truck's position relative to the safety zone line was unclear to the marshaller.
In addition, because the 3C gate required a high angle turn, the pilot had to turn more than a 135 degree angle to park. So, the pilot's scan was poor. Furthermore, because the marshaller did not have adequate on the job experience, she did not follow her training.
Then, because the Customer Service Organization did not monitor the marshalling policy and the individual(s) did not comply on their own with the established business process, the marshaller did not follow established signal procedures but instead used her wands to signal to her wing-walkers.
Meantime, the marshaller was under stress, and the marshalling agent was not following her training, and ramp supervision did not enforce the illustration of hand signals (SP6720.3) and the individual(s) did not comply on their own with the established business process, the marshaller did not follow procedure in communicating with wing walkers. Then, because the marshaller needed the wing walkers in position to guide the aircraft in, the marshaller was trying to communicate the need for the wing walkers to get in position by using her wands.
Additionally, as several ramp workers were sick that day, and the customer service organization did not monitor the staffing level to insure adequacy and the individual(s) did not comply on their own with the established business process, the ramp did not have adequate staffing that day. Consequently, as the wing walkers were busy onloading a cargo bin at an adjacent gate, the wing walkers could not take their position in a timely fashion. Since the marshaller was trying to communicate the need for the wing walkers to get in position by using her wands, and because the wing walkers could not take their position in a timely fashion, the marshalling agent did not stay in position with her wands crossed. Since the marshalling agent was using her wands to signal to her wing-walkers, and because the marshalling agent did not stay in position with her wands crossed, the marshaller's crossing signal was not constant.
Also, because the pilot in command (PIC) would not comply with the policy requiring safe practices when unclear signals are received, the PIC did not follow safe practices. Since the marshaller's crossing signal was not constant, and because the PIC did not follow safe practices, the PIC did not stop the aircraft, when confusing signals were received. As the fuel truck height was above the wing's leading edge, and since a fuel truck was parked in the safety zone, and since the fuel truck's position to the safety zone line was unclear to the marshaller, and since the pilot's scan was poor, and because Flt 455 taxied into gate C-3, and because the PIC did not stop the aircraft, when confusing signals were received, aircraft 123, Flt 455's left wing collided with a parked fuel truck.
Case interpretation
REASON Analysis of this investigation shows that it is valid to compare the identified root causes to each other, given a calculated reliability of 100%. This event contains a typical mix of both conditions and actions.
The fuel vendor's supervision has the opportunity to enforce the policy of parking fuel trucks in designated areas only and the individual(s) did not comply on their own with the established business process.
In terms of preventing this problem, this is the 7th best option, removing9 per centof this model.
The customer service organization has the opportunity to establish a policy to advise fuel company personnel on the importance of safety zone lines
This is the best prevention option. It eliminates 22 per cent of this problem.
The fuel vendor has the opportunity to establish a policy to park vehicles in designated parking spots only.
Preventing this root cause is the 2nd best option, and will deal with 22% of the causes that produced this problem.
The management has the opportunity to establish a policy to repaint the lines periodically due to wear.
This action, the 8th best option, will remove 7 per cent of this problem.
The customer service organization has the opportunity to monitor the marshalling policy and the individual(s) did not comply on their own with the established business process.
This option is the 5th best available option. It will remove 13 per cent of this problem.
Ramp supervision has the opportunity to enforce the illustration of hand signals (sp6720.3) and the individual(s) did not comply on their own with the established business process.
This prevention opportunity is the 4th best, eliminating 14 per cent of the process that produced this problem.
The customer service organization has the opportunity to monitor the staffing level to insure adequacy and the individual(s) did not comply on their own with the established business process.
In terms of preventing this problem, this is the 3rd best option, removing 15 per cent of this model.
The PIC has the opportunity to comply with the policy requiring safe practices when unclear signals are received
This is the 6th best prevention option. It eliminates 12 per cent of this problem.
Tree model explained
The following tree model, illustrates a complete Root Cause analysis on the aforementioned example.Changes are dark squares. Conditions are grey circles. Inactions are grey rounded squares.Root causes often happen as a result of inaction.
A level is best described as a collection of events occurring horizontally; henceforth a set.Note that there is only one change per level. As the tree model builds up, consider the bottom as the flat portion of the pyramid.To understand how REASON® priorities actions (most effective action) to affect the outcome, one can look at the right identical branches, of the 2nd and 3rd conditions (1st level of the model) following the initial change.
As we eliminate one root cause, they are duplicated, i.e. the same root cause eliminates 22 per cent of the model, henceforth the interpretation: "The customer service organization has the opportunity to establish a policy to advise fuel company personnel on the importance of safety zone lines"--This is the best prevention option. It eliminates 22 per cent of this problem.
However, also note that an active failure at the first level was an immediate contributor to the accident by looking at the far right hand corner, 1st level: The pilot In Command did not stop, when signals where confusing (last line of defense).Hence this interpretation: "The PIC has the opportunity to comply with the policy requiring safe practices when unclear signals are received"--This is the 6th best prevention option. It eliminates 12 per cent of this problem.
By removing the PIC from the picture, we do not remove the conditions that exist, or could exist for this accident to reoccur.Conditions are still present for another opportunity to damage an airplane. In above two examples we see a brief overview of a latent failure (systemic issue), and an active failure (individual failing to stop) as contributors to this event.
REASON Tree diagram of the event. Each branch represent an opportunity to eliminate a root cause.The process of elimination produces the interpretation of the case.
REASON® Summary Sheet Chart
The Model is:Closed
Quantification Reliability:100.0%
Total Relative Causal Stress:29.3690
Total Proper Causal Stress:79
Causal Stress TTP:3.7176
Total Relative Generating Causality:22.4190
Total Proper Generating Causality:54
Generating Causality TTP:3.5030
Summary Sheet interpreted
Since the model contains no Insufficient Data, it is 100 percent reliable (according to our inputs). The raw numbers include Proper Causal Stress: The value of each changes inactions and conditions. Proper Generating Causality: The value of all changes and inactions (we substract the existing conditions). Relative means the importance assigned depending on which level of the model these factors occur (the closer the event to the outcome, the heavier the weight).Proper, means an equal number per level.Relative, gives more importance and weight to factors occurring early in the model (i.e. top of the pyramid).The Causal Stress TTP (Tendency Toward Process) is interpreted to mean the relative number of all factors (including conditions) divided by the proper numbers of factors (discounting the level at which it occurs).Generating Causality TTP, is interpreted to mean the same but we discount the existing conditions.
TTP interpretation
The Tendency Toward Process (TTP) number is a metric calculated within the REASON® software that indicates the amount of “causal stress” present within a specific event model.Given that the discovered corrective actions are not put in place, TTP indicates how quickly and/or frequently the organization could anticipate a recurrence of the same event. In many ways it is a measure of the potentiality of recurrence.
TTP is charted in a numeric range of 0-10. TTP scores of around 3.0 are normal. This particular case had a 3.7 TTP, which is slightly elevated yet indicates that the event is not prone to recur quickly or frequently.In the REASON software TTP indicates the degree of quickness and frequency related to an event’s recurrence, and there are several reasons why it should be contemplated when prioritizing events for corrective action.
Often, after an organization experiences an incident with serious consequences, decision makers proceed into the decision-making phase of corrective action with a mindset that often defaults to putting in engineered controls, even if solutions dealing with the organizational system seem to be equally effective.
Engineered solutions are indeed often effective; yet they often are the most costly options available for dealing with an event. Engineered solutions are sometimes quick - they are put them in place and if designed correctly, they provide instant protection.Yet, if the TTP is low for an incident, the need for an expedient correction is not as great. Very often a discovered fix in the organizational system can be both more effective and more cost effective than the engineered solution.
Some unwanted events tend to happen over and over due to the repetitive nature of the specific business process associated with them. An example of this is the business process of boarding passengers on a plane, which is an extremely repetitive process. Due to its repetitive nature, if any problems exist in the “boarding passenger business process” one can expect those problems to happen again and again.
Such problem events tend to have high TTP scores in REASON®. Repetitious events, such as those noted, match well with the inherent advantages of engineered solutions because such solutions “dummy-proof” the affected business processes, eliminating the need to rely on people induced corrections. Relying on organizational systems and people to deal with voluminously repetitive problem issues is not going to be as consistent a control for these problems as will be an engineered solution.
But often the events encountered (serious or otherwise) are the exceptional, infrequent events associated with business processes that are not as repetitive. These types of events tend to have low TTP numbers. Thus, the TTP metric itself can serve as an indicator that assists a decision maker in deciding between engineered solutions and/or fixes in the organizational system that often are as preventative and more cost effective.
In the “Aircraft Truck” case study, the TTP is 3.7, which is just slightly above normal. This score would tend to indicate that organizational fixes would be just as prudently chosen as any discovered engineered solutions.
REASON’s benefits
The Root Cause approach to incident /accident investigation using REASON’s software offers an additional facet to the accident investigation.It may assist at looking at a systemic failure (organizationally) leading to an accident; it also may help to answer the systemic “why” of an accident, complementing the “how” and “when.”Hopefully this approach will provide additional weight in recommendations following investigations.In particular RCA can give an approach to risk analysis, offering an insight in the likelihood of reoccurrence of an event, and encourage sharing of “best practices” in the industry, in terms of procedures, processes, and gained knowledge.This tool also affords a framework for Root Cause Analysis investigations.Finally it pre-empts the old fashion approach of "remove the cause and the problem ceases to exist."
Notice:The contents of this website are protected by Australian (C) 2002 - 2006 Copyright REASON4 Consulting (Systemic-Resilient-Precision Pty Ltd) and/or U.S Copyright (C) 1979 - 2006 Decision Systems, Inc. All Rights Reserved, including excerpts from papers and presentations depicting the REASON® and RAID™ process from other sources.