The role of information management in preventing major disasters

This paper was prepared in 2015 to present to information management and public administration conferences and seminars. This was also published in information management professional journals in Australia, New Zealand and the United Kingdom.

Juanita Stuart

Introduction

When reflecting upon some of the most notable disasters that have happened world-wide, it is disappointing how often a contributor to the cause of the disaster was poor information management practices. Managing information well is just good business sense.

All these precious lives (with their grieving family and friends) were impacted by preventable, predictable disasters. Let’s learn from these mistakes and not let them happen again.

The principles of information management come from two sources. Firstly from Archives New Zealand (2000), they say information is to be:

  • Complete - Includes structural and contextual information, the process that produced it and other linked documents
  • Comprehensive - Records the whole business process
  • Adequate - Fit for purpose for which it is being kept
  • Accurate - Correctly reflects what was communicated, decided or done
  • Authentic - It is what it purports to be
  • Useable - Identifiable, retrievable, accessible, and available when needed
  • Tamper-proof - Security maintained to prevent unauthorised access, destruction, alteration or removal

Secondly, information management is about (adapted from Ross, et.al., 1996 and Johansson & Hollnagel, 2007):

  • Systems: Computer systems as well as systems such as file classification and security classification that apply to both hard and soft copy.
  • Business processes: An orderly sequence of tasks. In essence, prompting people to do the right thing at the right time
  • People:  The people having the skills to do the task and understanding why they are doing it so they do the task willingly.
  • Ideally, these three should be well balance. Whenever one of the three is underperforming, the other two have to take up the slack.

With those information management principles in mind, let’s look at some disastrous events.

Pike River Coal Mine Explosion

On 19 November 2010, an explosion occurred at Pike River coal mine taking the lives of 29 men and injuring two survivors.

About 1 1/2 hours after the explosion, two men walked out of the mine telling of how they were injured by the explosion and how they passed out as they slowly made their way out (a 2 km walk) by the air being unbreathable.

The explosion had severed all the electrical monitoring equipment connected to the above-ground control room. An electrician was sent into the mine to try to repair the damaged electrical connections. He turned back without going in very far due to the air being unbreathable.

Several air samples were taken indicating the mine was still on fire. A borehole was drilled into the heart of the mine reaching pit bottom 5 days after the first explosion. Through that hole, they took air samples bringing hope that the air may be breathable enabling people to enter the mine and perhaps some of the men might still be alive if they had been able to reach some of the “self-rescuer” breathing devices.

However, the experts in coal mining and mine rescues saw the CCTV footage of the explosion at the portal and believed no one could have survived the blast. A few hours after taking the air samples at pit bottom, the second explosion occurred dashing all hope of any survivors as it was multiple times worse than the first. There were 2 further explosions.

Some coal mines emit methane gas from the coal seam. Pike River was one of these. Hydro mining (like water blasting) produces more methane than dynamiting out the coal. If methane is kept diluted with good air ventilation, it is less likely to ignite. However if there isn’t good ventilation and the gas can concentrate, it is easily ignitable. Methane is lighter than air so it rises, leaving the breathable air at floor level and the methane concentration at the ceiling.

The equipment that took readings of the air quality showed throughout October and November numerous times that the methane concentration was at dangerous levels. These readings didn’t go to the managers nor did they receive publicity. They were not reported in daily production or weekly operations meetings nor through the deputies’ production reporting system (as demonstrated later). They were not reported to the regulator as the regulation required.

At management level, there were committee meetings which had action sheets that recorded the person responsible for the action and expected completion date. If it was simple, it generally was done. However actions required of some departments were routinely left undone. The actions that required a coordination of several departments were also routinely left undone.

[Image] Extracts from Dene Murphy’s 21 October 2010 deputies production report.
Image: Extracts from Dene Murphy’s 21 October 2010 deputies production report (Royal Commission, 2012, vol. 2, p. 105).

Image 2 is an example of a Deputy production report: this example was completed by Dene Murphy on 21 October 2010, less than a month before the explosion. On average, Mr Murphy put in one of these reports per day for 2 years. So you can see the frustration he was feeling in the language he used to express himself. This was Pike River’s system of identifying matters that presented safety risks for the employees.

There were also many other reports about incidents, accidents and hazards. However, there was no business process for sorting, classifying, passing the concerns on to a manager who could/would do something about it. Many of these were just thrown away. Those not tossed became a large accumulation. In fact the Royal Commission analysed 1083 of them.

[Image] ID tag board.
Image: ID tag board (Macfie, 2013, after p. 198)

Amongst the victims of the poor information management were the families of all the mine workers. Pike River’s system of identifying who was underground and who wasn’t did not receive full compliance. At the time of the explosion they could not quickly identify who was affected by it. Over the 20 hours it took to ascertain exactly who was underground, the families were distressed.

Reflecting back on the information management principles outlined at the beginning, how could they have made a difference at Pike River? There was no management information system. Vital information was not brought together, summarised and analysed for executive managers. The key information on health and safety incidents was available but was not handled systematically and therefore did not receive a response. Therefore the information was not usable because the business processes didn’t have it go to the right person who could/would respond to it appropriately.  The system for recording who was in the mine at any given moment didn’t work. Therefore the information was inaccurate.

One of the most unforgiveable information management mistakes was the constant false information from senior management of how well things were going while those close to the operations knew things weren’t going well at all.

Herald of Free Enterprise Ferry capsize

The next tragedy to explore is the capsizing of the Herald of Free Enterprise Ferry between Dover and Bruges-Zeebrugge on 8 March 1987 killing 193 people.

[Image] Townsend Thoresend ferry capsized.
Image: Townsend Thoresend ferry. (Wikimedia, n.d.).

On that fateful day, 650 passengers were on board. The doors to the car deck were left open. Water entered the ship at the car deck and caused it to capsize. The doors being open would not have in itself caused the ship to capsize because a sister ship made the crossing with her doors open without incident.

This ship did not normally do the Dover to Bruges-Zeebrugge run. The pier and the ship’s decks didn’t match each other. The drawbridge could only go to one deck and vehicles could only go to that one deck. The ship had to fill its forward ballast tanks to lower the ship in the water to enable cars to use the drawbridge and load onto it. The ship was due to be modified during its next refit scheduled for later that year to overcome this limitation.

Most ships are divided into watertight compartments below the waterline so that in the event of flooding, the water will be confined to one compartment, keeping the ship afloat. However the car deck was open with no dividers.

Normal practice was for an assistant boatswain to close the ferry doors before dropping moorings. Usually the first officer would remain on deck to ensure they were closed before returning to the wheelhouse.

On 8 March, they were running behind time. The captain was under pressure from the owner of the ferry (Townsend Thoresen) to be on time. The ship was designed for quick acceleration. The weather brought calm conditions. Although there was a high spring tide, the water was shallow especially with the ballast.

The First officer returned to the wheelhouse before the ship dropped its moorings (should not happen but commonly did). He trusted the assistant boatswain to close the doors. However the assistant boatswain went to his cabin and took a nap. The captain presumed the doors were closed. He couldn’t see the doors from the wheelhouse. The doors are held by massive hydraulic rams, so they couldn’t open by themselves or by water pressure. There was no warning system in the wheelhouse to alert the captain if the door was open.

In the hurry to get away without falling further behind time, they neglected to dump the ballast. The ship was in shallow water and was going too fast. They only got 91 metres from shore.

Reflecting back on the information management principles, how could they have made a difference for the passengers on Herald of Free Enterprise? The ship’s captain acted on the inaccurate presumption that the doors were closed and did not have access from the wheelhouse to the information to tell him they were still open. He did not have access to accurate information.

The business process broke down in two ways:

  • The process of the boatswain informing the captain broke down as the captain did not get the information about the doors before dropping moorings.
  • The process for the crew to dump the ballast before dropping moorings broke down.

There were also human errors: It doesn’t help when a key crew member is asleep in his cabin or when a captain is in such a hurry to keep to the schedule that he neglects to complete the checks before dropping moorings. In this case the checks were closing the doors and dumping ballast.

Black Hawk helicopters shot down by U.S. Air Force F-15 fighter aircraft

The next tragedy to explore is the ‘friendly fire’ shooting down of two Black Hawk helicopters by two U.S. Air Force F-15 fighter aircraft on 14 April 1994 killing 26 people made up of 10 military and 16 high-ranking civilian VIPs.

[Image] U.S. Army’s Black Hawk.
Image: U.S. Army’s Black Hawk (Wikimedia, n.d.).

The U.S. army helicopters and the two F-15s were over the Iraqi no-fly zone. From the pilots first noticing the helicopters on their radars to the time they were shot down was a total of seven minutes. This was peace-time: The war was over. They were there with Operation Provide Comfort to provide humanitarian relieve for the suffering of Kurdish and other refugees after the Persian Gulf war.

Integration between military services

The USAF leaders had failed to adequately integrate U.S. Army helicopters into the overall Operation Provide Comfort air operations. In the military you must communicate up the ranks, then there is a person to communicate across to the other service, then the communication goes down the ranks. The military staff whose job it was to communicate between the Air Force and Army during the war was stood down and not replaced for these peace-time humanitarian operations.

Different radio frequencies and technology

The helicopter pilots did not change to the radio frequency required in the no-fly zone. They remained on the enroute frequency. However, the commander of the operation had made an exception about the radio frequency to be used by the helicopters in order to mitigate a different safety concern. The helicopter pilots were simply following orders when they didn’t switch to the no-fly zone frequency. The helicopters were using older technology radios and didn’t have the newer jam-resistant radios the F-15 pilots had. The two types of radios couldn’t communicate with each other. However, the F-15s were equipped with both radio technologies. The pilots knew to switch to older radio technology, as they did for other types of friendly aircraft such as Turkish and Syrian aircraft.

[Image] U.S. Air Force’s F-15.
Image: U.S. Air Force’s F-15 (Wikimedia, n.d.).

Mistaken identity

The F-15 pilots stated they thought the helicopters were Iraqi Mil Mi-24 “Hind” helicopters. (The Hinds were painted light brown and desert tan, whereas the Black Hawks were painted green camouflage.) The Black Hawk helicopters have large American flags on the two fuel tanks, on each side door, the nose and on the belly. The weather was fair and clear with good visibility. However the F-15s were required to fly above 10,000 feet while helicopters must remain below 400 feet.

The F-15s were to know who is in the no-fly zone at all times and accurately identify them as friendly or not. They were to do two passes over the enemy to confirm the identification, which they did not do because that would have put them too close to the ground. They attempted to fly over remaining 300 feet above and 1,000 feet off to the side of the helicopters.

During the investigation, when recreating this scenario with F-15s and Black Hawks, the test pilots could not see the American flags and could not identify the Black hawks. It was difficult to even see 2 green helicopters against a green background in that area of Iraq. There is no way they could have met the identification requirements. It would have also been impossible to identify the Hinds as Iraqi (enemy) instead of Hinds used by Syria and Turkey (friendly). Even if it had been an enemy helicopter, they were not allowed to shoot them if the helicopters had no hostile intent (such as being lost, in distress, on a medical mission or being flown by pilots who were defecting). Even if they had been enemy Hind helicopters, they were of no threat to the F-15s or the AWAC aircraft.

[Image] U.S. Air Force’s AWAC.
Image: U.S. Air Force’s AWAC. (Wikimedia, n.d.).

Air traffic control

The Air Force’s Airbourne Warning And Control (AWAC) crew (the air traffic controllers in the sky) failed to intervene but watched silently. This team turned over every 6-8 weeks. This accident happening on day one of the new team, the first time they had worked together. They had minimal and inadequate training, so there were 2 instructors on board. It was their job to warn the fighters about any friendly aircraft the fighters were targeting. The helicopters reported their departure, flight route, and destinations by radio to the AWACs. This report was acknowledged.

In fact, the helicopters had 3 communications with AWAC before being shot down. The F-15 lead pilot radioed AWAC saying they were engaging enemy aircraft, checked the radio frequency again (the frequency the helicopters couldn’t access) then shot them down. They did not radio the commander therefore acting without command approval as required in peace time operations.

To keep enemy from knowing all their movements and therefore targeting them, the Army used code names for geographic place names. The Air Force use to use the same codes but had stopped using them. When the army radioed AWAC, they used the code names. The Army didn’t know that the Air Force rookies had no idea what the code names meant. So when the Army radioed their locations, etc., AWAC acknowledged it but did not communicate it any further.

All the Air Force (both the F-15 pilots and AWAC staff) had a minimum communication policy, using abbreviated messages and a reluctance to ask for clarification where there was potential miscommunication.

There were two AWAC operators…one for inside the no-fly-zone and one for outside the no-fly-zone. They were confused as to who was responsible for tracking aircraft in the boundary area between the two. Neither one controlled the Black Hawk helicopters.

The F-15s were tasked to clear the area of any hostile aircraft. They received a list of all scheduled coalition aircraft missions for that day in which the two pilots reviewed before take-off. The helicopters were on the list but did not list takeoff times, routes, or flight durations for them as helicopters have to be flexible and can’t function to rigid schedules. The Black Hawks were not told of the F-15s and therefore had no idea they were there. The helicopters went into the area before the F-15 pilots could ensure the area was sanitised…an official exception had been made for the Black Hawks. In fact it was accepted practice for them to go into the area before it was sanitised and was frequently done. The F-15 pilots weren’t told of this exception even though that was the purpose for the AWAC instructors being with them, to tell them these known exceptions.

The identification friend or foe (IFF) system had not functioned to identify the helicopters to the F-15 pilots. On 14 April, the helicopters flew in from Turkey and reported their entry into the no-fly zone by radio on the en route frequency. They landed at the Military Coordination Center. “Friendly helicopter” tags were added to their radar scopes and they were displaying identification friend or foe signals. The radar symbol was suspended after the helicopters disappears from their scopes upon landing. They picked up passengers (VIPs) and lifted off to another Iraqi city 190 km away. “Friendly helicopter” symbols were visible on the radar screens. The Army and the Air Force were using different IFF codes (The helicopters used the code they were commanded to use) They were not told that there were separate IFF codes for that geographical area.

Rules of engagement

The ground-based Mission Director maintains constant communication links with all the air-space operations, and with the Air Force commander on the ground. He kept the Operation commander informed of anything happening that required his approval. Before anyone shot at anyone, this communication was to occur. The Rules of Engagement during peace time certainly required these levels of communication. The written guidance was clear, however there was controversy on how it should be implemented and who had decision-making authority. The approval to shoot that the F-15 pilots were required to receive had not been given. The commander that gives the approval assumed the F-15s would ask his approval before they fired. The F-15 pilots believed they didn’t need approval when an imminent threat was involved (but they weren’t a threat). In those 7 minutes, they didn’t check if the conditions of “no hostile intent” applied.

Communication difficulties

The army helicopters suffered interruptions in radio transmission due to terrain. The Joint Tactical Information Distribution Center (provides ground with picture of airspace occupants) told the Mission Director there were no airspace occupants, which was inaccurate information.

At 10.20am the lead pilot reported they were on station. Usually at this time, the AWACs will give them a “picture” of any aircraft in the area. No information was provided to the F-15 pilots at this time, although the Black Hawks had already checked in with the AWACs on three separate occasions. The lead F-15 pilot twice reported unsuccessful attempts to identify radar contacts they were receiving, but in response the AWACs did not inform them about the presence of Black Hawks in the area.

Reflecting back on the information management principles, how could they have made a difference for the passengers and crew on the Black Hawk helicopters?

Information was incomplete:

  • The code names for places that were once used by both the Army and Air Force were no longer used by the Air Force.
  •  The F15 crew had a culture of minimal communication that easily caused misunderstandings when no one would dare indicate they didn’t understand.

Information was inaccurate:

  • The F15 crew didn’t check the insignia on the sides and front of the helicopters and mis-identified the Black Hawks as hostile.
  • The Mission Director got incorrect information about the state of the airspace.
  • The commander thought procedures were being followed: Helicopters tracked and F-15 pilots receiving their flight schedules.
  • Army pilots were given wrong information about IFF codes.

Information was unusable:

  • The F15 crew knew the helicopters were to be in the area, just didn’t know when or where they were going.
  • The F15 crew didn’t know the commander told helicopters not to change radio frequencies. They searched a different frequency.
  • The helicopter flight plans were distributed to the F-16 pilots (who fly at lower altitudes) but not the F-15 pilots. Information about flights was not distributed to all those who needed to know.
  • Black Hawks were allowed to enter the no-fly zone before fighters swept area. F-15 and AWACS crews were not told of this exception

Systems intended to enable information sharing did not function as intended:

  • the “friend or foe” system only works when aircraft are at higher altitudes but not when close to landing.
  • All aircraft in the no-fly zone should have been able to communicate with each other effectively.

Business processes intended to enable information sharing did not function as intended:

  • In the Air Force, whoever’s job it was to integrate Army helicopters into the overall Operation Provide Comfort air operations didn’t do it.
  • The required tracking by AWAC of all aircraft in the no-fly zone wasn’t done nor the fighters made aware of their location.
  • The Combined Forces Air Component (Air Force) and Military Coordination Center (Army) should be better able to communicate and coordinate with each other.
  • The Air Force operate with fixed, rigid schedules whereas the Army requires flexible scheduling. There was not timely detailed flight information on planned helicopter activities. There were no procedures for dealing with last minute changes in helicopter flight plans.

People were relied upon to have knowledge and skills and yet they did not function as intended:

  • The F15 crew knew the helicopters were to be in the area and just didn’t identify their target well before shooting.
  • The rules of engagement were misunderstood and not followed. More adequate training may have helped this.
  • F-15 pilots were not reminded to use the older radio frequency for “friendly” aircraft.
  • Inadequate training for AWACS crews and the instructors on board didn’t share the knowledge with the trainees appropriately.
  • There were no procedures, guidance or training for AWACS to control helicopters.

Tangiwai Railway derailment

The next disaster to explore is the Tangiwai Railway derailment on 24 December 1953 killing 151 people.

[Image] Train crossing the rail bridge at Tangiwai.
Image: The rail bridge at Tangiwai. (Jigsaw Entertainment Ltd. n.d.) This photo was taken about 30 hours before it happened.

There were 285 people on the train.

From 1859, there were a number of recorded lahars so the risk of the damage lahars do was well known. In 1925, the railway bridge at Tangiwai was weakened by a lahar. A civil engineer’s report describing the damage to pier 4 said it was tilted half an inch (12 mm) and the track above had bulged to the same extent. The pier was also scoured out at its foundation. Before the railway bridge was first built, the Engineers at the time documented that it was the wrong place to build a railway and at Tangiwai is the wrong place to have a bridge. It seemed risky to place two state highways and the railway line so close to three active volcanoes.

Overseas experts came to New Zealand in 1945, warning of the likelihood of a lahar. A mountain guide had warned that the crater lake was rising: the officials laughed at him. In 1951, some men who often canoed the crater lake noticed the rising lake level and began to record soundings showing it was rising at ½ inch a day. One of them wrote a letter to the Geological scientists warning of the risks. They were ignored.

At about 8pm on that fateful evening, there was an earth tremor on Mt Ruapehu that was felt as far away as Waiouru. The seismograph at the Chateau recorded the vibration. The outlet of the crater lake (consisting of only ash) collapsed starting the lahar. Lahars travel at about 12 miles (about 19km) an hour.

The train was running on time. While sitting at Taihape at 8.30pm, the roaring noise of the lahar could be heard. The noise grew increasingly louder until at 9pm it was “a terrific roar”. At 10.06pm the train hadn’t yet left Waiouru. No one stopped the train even though they could hear the mountain’s “terrific roar” for over an hour.

The lahar struck the bridge at 10.15pm, the train went into the river at 10.21pm. The lives of 151 people were lost for lack of a phone call to stop the train.

Reflecting back to the information management principles, what went wrong?

  • The history of lahars didn’t inform the right people.
  • The numerous reports and letters that were written about the risks and conditions, didn’t inform the right people who could or would do something about it. The people who were informed chose to do nothing. The damaged bridge wasn’t repaired, those monitoring the seismograph didn’t raise an alarm/alert.
  • The people who heard the mountain roaring didn’t inform the right people.
  • There was no business process put in place after the numerous warnings to enable the train to be stopped.

China Airlines Flight 140 crash

The next disaster to explore is the China Airlines flight 140 on 26 April 1994 killing 264 people.

Flight 140 was approaching Nagoya, Japan to land. The Takeoff/Go around button had been pushed as usual activating the autopilot. There were two bursts of thrust applied in quick succession and the airplane was nose up in a steep climb. Airspeed dropped quickly, the plane stalled, the nose dropped. The captain tried to pull back the control column but was unsuccessful because the autopilot could not be overridden. So when the pilot wanted to take control, he couldn’t.

Problems with the flight control computer software (which didn’t allow the captain to override autopilot) had been identified and a service bulletin had been released. The “fix” was available from September 1993 (6 months earlier). However because the computer problem had not been labelled a “cause” of the previous incidents, the modification was labelled “recommended” rather than “mandatory”. China Airlines were going to fix the flight computers at the next time they needed repairs.

Reflecting on information management principles, the information was inaccurate that categorised the computer “fix” to be discretionary.

Union Carbide Bhopal chemical plant explosion

The next disaster to explore is Union Carbide in Bhopal on 2-3 December 1984. Conservative estimates say there were 8,000 fatalities, 10,000 permanent disabilities, and 558,125 injuries. Even now after 30 years, 150,000 survivors are still struggling with serious medical conditions.

[Image] Union Carbide, Bhopal.
Image: Union Carbide, Bhopal (Wikimedia, n.d.).

This chemical plant produced Methyl isocyanate (MIC), a garden pesticide.

On 2-3 December 1984, MIC was released into the air. There are differing versions of what happened so to stay away from the controversial areas, this report will focus on what is in agreement between the various sides. A relatively new worker was assigned to wash out some pipes and filters which were clogged. The pipe-washing operation should have been supervised by the second shift supervisor, but that position had been eliminated in a cost-cutting effort. MIC produces large amounts of heat when in contact with water, and the worker properly closed the valves to isolate the MIC tanks from the pipes and filters being washed. However, somehow water entered the MIC tanks causing the explosion. The relief valve opened venting MIC into the air and the wind carried the MIC into the population around the plant.

It is not uncommon for a company to turn off passive safety devices, such as refrigeration units, to save money. The operating manual specified that the refrigeration unit must be operating whenever MIC was in the system: The chemical has to be maintained at a temperature no higher than 5° Celsius to avoid uncontrolled reactions. A high temperature alarm was to sound if the MIC reached 11˚. The refrigeration unit was turned off, however, and the MIC was usually stored at nearly 20°. The plant management adjusted the threshold of the alarm, accordingly, from 11° to 20° and tank temperature readings were taken less often, thus eliminating the possibility of an early warning of rising temperatures.

Other protection devices at the plant had inadequate design thresholds. For example, the vent scrubber (which neutralises MIC with caustic soda), had it worked, was designed to neutralize only small quantities of gas at fairly low pressures and temperatures. The pressure of the escaping gas during the accident exceeded the scrubber’s design by nearly two and a half times (Several gas scrubbers had been out of service for 5 months. Only one was operating on the day), and the temperature of the escaping gas was at least 80° Celsius more than the scrubber could handle.

Similarly, the flare tower (which was supposed to burn off released vapor) was totally inadequate to deal with the estimated 40 tons of MIC that escaped during the accident. (It could only handle a quarter of the gas that leaked when working but it was out of order at the time of the incident) Any leak could go unnoticed for a long time.

In addition, there was a water curtain (spraying water on the gas to knock it down to limit the geographic area contaminated). The MIC was vented from the vent stack 108 feet above the ground, well above the height the water could be sprayed due to inadequate water pressure. The water curtain reached only 40 to 50 feet above the ground. The water jets could reach as high as 115 feet, but only if operated individually.

The water would not have caused the severe explosion had the refrigeration unit not been disconnected and drained of Freon, or had the gauges been properly working and monitored, or had various steps been taken at the first smell of MIC instead of being put off until after the tea break, or had the scrubber been in service, or had the water sprays been designed to go high enough to douse the emissions, or had the flare tower been working and been of sufficient capacity to handle a large excursion.

A safety audit two years earlier by a team from Union Carbide had noted many safety problems at the plant, including several involved in the accident. The manager said all the identified issues were corrected, however the quality must have been poor as they remained deficient.

Alarms at the plant sounded 20 to 30 times a week for various purposes. An actual alert could not be distinguished from routine events or practice alerts. Ironically, the warning siren was not turned on until two hours after the MIC leak was detected (and after most all the injuries occurred) and then was turned off after only five minutes – which was company policy.

As cost cutting measures, the maintenance and operating personnel were cut in half. Maintenance procedures were severely cut back and the shift relieving system was suspended – if no replacement showed up at the end of the shift, the following shift went unmanned. Staff complained about the cuts through their union but were ignored.

Staff were commanded to deviate from the proper safety regulations. Seventy percent of them were fined for refusing to deviate. “There was widespread belief among employees that the management had taken drastic and imprudent measures to cut costs and that attention to details that ensure safe operation were absent” (Lihou, 1990).

When the danger during the release became known, many employees ran from the contaminated area of the plant, totally ignoring the buses that were sitting idle ready to evacuate workers and nearby residents. Plant workers had only a bare minimum of emergency equipment – a shortage of oxygen masks, for example, was discovered after the accident started – and they had almost no knowledge or training about how to handle non-routine events.

The surrounding community were not warned of the dangers, before or during the release, or informed of the simple precautions that could have saved them from lethal exposure, such as putting a wet cloth over their face and closing their eyes. If the community had been alerted and provided with this simple information, many (if not most) lives would have been saved and injuries prevented.

As controversy raged over who to blame, Leveson merely indicates it was an accident waiting to happen. “Given the overall state of the Bhopal Union Carbide plant and its operation, if the action of inserting the slip disk had not been left out of the pipe washing operation that December day in 1984, something else would have triggered an accident” (Leveson, 2011, p. 28).

Reflecting on information management principles, the information was not comprehensive from the time they stopped logging temperatures causing there to be no records of necessary data. Information is not useable when there are multiple alarms going off often. It becomes information overload making the needed information unidentifiable. The system for providing a warning siren was turned off. Business processes intended to enable information sharing did not function as intended:

  1. They acted contrary to the operating manual to allow temperatures to rise higher than the manual allowed.
  2. Safety audits documented problems and none of the recommended changes were made. The information didn’t reach those who could and would do something.
  3. There was no process for informing people around the facility on how to simply protect themselves from harm.

Turkish Airlines Flight 981 crash

The next disaster to explore is Turkish Airlines Flight 981 on 3 March 1974 killing 346 people. The cargo doors of DC-10s would come open during flight.

The design flaw in the cargo door was first discovered in 1969 and was left uncorrected. In June 1972 (two years before this accident), Flight 96 near Windsor, Ontario had its cargo door open during flight. The decompression caused the passenger floor to be sucked out and the control cables that run through it were severed. The pilot had trained himself to fly and safely land the plane without the controls, so no disaster occurred. The pilot was also helped by the plane carrying a light load. The pilot recommended that every DC-10 pilot be training in the flying technique that saved him, his passengers, the crew and the aircraft. The FAA investigators, the National Transportation Safety Board, and the engineers all recommended changes in the design to prevent cargo doors opening during flight. However McDonnell Douglas attributed the Windsor incident as “human error on the part of the baggage handler responsible for closing the cargo compartment door”. The door could be improperly closed and made to appear (by the warning system in the cockpit) to be properly closed.

Two years later when Turkish Airlines was flying to Paris loaded down to the max, the same thing happened as what happened in Windsor. Due to the heaviness of the load, the pilot was not able to establish any other control with the cables severed.

Other factors:

  • A support plate for the handle linkage of the cargo door had not been installed, although this work had been documented as completed.
  • The warning notices around the cargo door were written in Turkish and English. The baggage handler was fluent in 3 languages, none were Turkish or English.
  • The plane had a small indicator window that allowed baggage handlers to visually inspect that the pins were in the correct position. This plane had the window. However this baggage handler hadn’t been trained about the purpose of the window and (of course) couldn’t read the instructions.
  • An investigator found that the pins on the cargo door had been filed down because the baggage handlers had trouble closing the door. After the pins were filed down, they were able to close the door effortlessly. That caused the door to only be able to withstand 15 psi of pressure, whereas it was designed to withstand 300 psi.
  • McDonnell-Douglas issued a service bulletin to change the door latch. Three months later, Turkish Airlines ordered this plane. Another 3 months after that, the plane was delivered. So McDonnell-Douglas didn’t fix the problem on the planes in its assembly line.

Again McDonnell Douglas blamed the baggage handler, however this time the FAA “ordered” modifications to all DC-10s to eliminate the hazard.

Reflecting on information management principles, some of the information was inaccurate to signal to the pilot that the door is properly closed when it was not. There was also false information about the plate being installed when it was not. The accurate information was inaccessible to the baggage handler by being in Turkish and English, languages he did not know. The business process at McDonnell-Douglas was inadequate that it had issued a service bulletin to fix the known problem, but delivered a plane six months later that did not have the “fix” applied.

Piper Alpha oil/gas rig explosion

The next disaster to explore is the Piper Alpha on 6 July 1988 killing 165 people plus another two from the rescue team. The Piper Alpha was an oil/gas rig in the north sea off the British coast operated by Occidental Petroleum (Caledonia) Ltd.

[Image] Piper Alpha oil gas rig explosion.
Image: Piper Alpha (Wikimedia, n.d.).

There were 226 people on the rig. Two information disasters occurred in this scenario: what caused the initial explosion and the issues around rescuing the staff off the rig.

This rig had pipeline connections to two other rigs (The Tartan and Claymore) who were a few kilometres away and two pipelines: one to Orkney Islands for crude oil and to the MCP-01 for gas compression. They had a maintenance schedule in which valves were replaced every 24 months. In July 1988, there were quite a few known leaks around the rig. They had shut down a number of pipes and pumps for maintenance. At about 9:45 p.m. about six alarms went off from a wide variety of locations. Safety devices were triggered that shut down functions they were connected to. Although it wasn’t unusual for all those functions to be shut down, if they are all shut down at the same time, no power is being generated resulting in a full black-out. With no power, it is much harder to restart, so they wanted to avoid that problem.

Pump A was down for a major overhaul of a pressure safety valve. Not all the “lead operators” (i.e. manager) knew that. Some thought it was only down for routine maintenance. As a “lead operator” went off duty and the next one came on, the status of the valve and therefore the pump wasn’t communicated. However it was communicated on the written “permit to work” which said “not ready and must not be switched on under any circumstances”.

When Pump B had a triggered shut down and they couldn’t restart it, they wanted to get pump A running as soon as possible because the entire power supply of the offshore construction work depended on these pumps. The “lead operator” who authorised Pump A to be started seemed to not know that the valve was in the maintenance repair shop. He saw a “permit to work” that said the pump was to be overhauled (which had not been started) but he didn’t find the “permit to work” about the overhauled valve as it was in a different location. The “permits” were sorted by location and not cross-referenced. So Pump A was started without its valve, causing enough condensate (natural gas liquids) to leak out that would explode as soon as it found a source of ignition.

Over the next few minutes, this first explosion caused a chain of events that led to several other fires and explosions…such as a pool of oil, a pipe for riser bursting, etc. The operator threw the switch to shut down all the pipes and pumps. The fire would have burnt out were it not being fed with oil and gas from the Tartan and Claymore platforms. The managers of those platforms didn’t have permission from the Occidental control centre to shut down with the first emergency call.

There were issues with the “Permit to Work” system. As mentioned before, they were not looked at as shifts turned over. Those completing the form did not put enough detail down. They would be filled in with minimal information in a hurry. 

“There were always times when it was a surprise when you found out some things that were going on” (Clark in Cullen, 1990,  vol. 1, pp. 121, 197).

The forms were not cross-referenced to get the complete picture of how each is interconnected to the other. So one permit would say Pump A was down for maintenance but would not be connected to the other permit that said the valve was removed from the pump and was in the maintenance repair shop.

Regarding the disaster of the rescue activities, the procedures were that the lead operator was to announce over the loud speakers what to do to evacuate. He failed to do this. When many of the men became aware a state of emergency existed, they did what they were taught in “drills” to do, go up to the helipad and wait for a helicopter to rescue them. However in this case, that wasn’t the best thing to do. Of the six helicopters flying in the area, none could land on the helipad because there was too much wind, fire and smoke, obscuring visibility. It put those men higher up so that those who jumped into the sea were at greater risk of that impact killing them. “Drills” also taught them to go to lifeboat stations, but the fire prevented that.

As the first explosion occurred at 10pm, the night shift was on duty and the rest of the men were either in the TV room, the cinema or in the accommodation block sleeping. The accommodation block was believed to be completely fireproof (but it wasn’t). With the first explosion, the common exits were unusable due to smoke. Only those who were very familiar with the rig knew of another way out that avoided the smoke and got them down to lower levels…low enough that if they jumped into the sea, they could survive the impact. Many of the men stayed in the accommodation block, waiting for instructions that never came. They believed staying in the accommodation block was the safest place for them. They didn’t follow the men who were familiar with the rig. Hours later, the whole accommodation block went into the sea, and when it was recovered, the post-mortems showed that the occupants died of smoke inhalation – not drowning.

“These examples serve to demonstrate that the operating staff had no commitment to working to the written procedure; and that the procedure was knowingly and flagrantly disregarded” (Cullen, 1990, vol. 1, p. 193]

Reflecting on information management principles, the information was incomplete and not comprehensive as the Permit to Work system didn’t supply comprehensive or complete information as it was intended to do. The lead operator had the information of what to do in case of an emergency (or should have had it) and did not tell anyone else.  There was inadequate business process to inform the Tartan and Claymore platforms to quit pumping oil and gas to Piper Alpha.

American Airlines Flight 695 crash

The next disaster to explore is the American Airlines Flight 695 on 29 December 1995 killing 159 people and seriously injuring four.

The flight was late leaving Miami and upon approaching Cali, Colombia the pilot was cleared to take a more direct approach to land to help make up some time. To land at Cali, the pilot would enter ROZO into the automatic flight management system. However the pilot’s approach charts just gave R, so that’s what he entered. Colombia had duplicated the identifier and the plane’s computer interpreted the R to stand for ROMEO which the automatic flight management system interpreted as being Bogota…not Cali. So instead of coming in for a landing, the plane thinks it needs to go to Bogota, pulls up, turns and smashed into the nearby mountains. The first officer disengaged the autopilot and the captain attempted to climb clear of the mountain. Neither pilot remembered to disengage the previously deployed speed brakes - which prevented the climb.

Other factors:

  • The co-pilot would have normally verified the name of the waypoint (being ROZO), but didn’t do so this time
  • The automatic flight management system didn’t provide the pilot with feedback that entering R would cause the system to hone in on the beacon for Bogota, not the closest beacon. Nor did it give feedback that it registered ROMEO instead of ROZO.
  • The pilots flying into South America were not warned about duplicate beacon identifiers, nor what the plane would do if the wrong identifier was entered
  • The company who makes the Flight Management system (Jeppesen-Sanderson) didn’t inform airlines of the differences between navigation information provided in the databases and the approach charts
  • There is no international standard that provides a unified criteria for the databases used in Flight Management systems.
  • 11 months before the accident, an internal memo within Jeppesen-Sanderson said: “It could cause a large incident if these problems in the flight support system are left unresolved. They must meet customer needs by all means, NOW” (Jeppesen-Sanderson memo, January 1995).

Reflecting on information management principles, the information on the approach charts was inaccurate to prompt the pilot to enter R instead of ROZO into the computer. Information was not accessible as it did not reach the pilots to warn about duplicate beacon identifiers, nor what the plane would do if the wrong identifier was entered. The manufacturer of the Flight Management system (Jeppesen-Sanderson) did not inform airlines of the differences between navigation information provided in the databases and the approach charts.

The business process broke down in two ways:

  1. A known problem had been left unresolved.
  2. An identified need to have an international standard had been left unaddressed.

Three Mile Island partial nuclear meltdown

The final disaster to explore is Three Mile Island which occurred on 28 March 1979.

[Image] Three mile island power plant.
Image: Three Mile Island (Wikimedia, n.d.).

This accident was a partial nuclear meltdown that occurred in one of the two nuclear reactors in Dauphin County, Pennsylvania. Although no lives were taken on the day, over the following two years, there was a noticeable rise in mortality rates of the very young and the elderly (Three Mile Island, 2014).

From the perspective of the staff on duty, the accident was “unexpected, incomprehensible, uncontrollable and unavoidable". It began with a typical maintenance mistake of not returning equipment in the non-nuclear secondary system (such as safety interlocks) to the operational mode. This was followed by a stuck-open pilot-operated relief valve in the primary system, which allowed large amounts of nuclear reactor coolant to escape.

The mechanical failures were compounded by the initial failure of plant operators to recognize the situation as a loss-of-coolant accident due to inadequate training and human factors. In particular, a hidden indicator light in the control room led to an operator manually overriding the automatic emergency cooling system of the reactor because the operator mistakenly believed that there was too much coolant water present in the reactor and causing the steam pressure release.

This accident occurred precisely because the operators did follow the predetermined instructions provided to them in their training.

An indicator misleadingly showed that a discharge valve had been ordered closed but not that it had actually closed. In fact, the valve was blocked in an open position. The valve was not equipped with a valve stem position monitor, so the control room operator only knows that a signal has gone to the valve for it to close but not whether it has actually done so.

The shift supervisor at the Three Mile Island hearings testified that the control room never had less than 52 alarms lit. During the TMI incident, more than a hundred alarm lights were lit on the control board, each signaling a different malfunction, but providing little information about sequencing or timing. So many alarms occurred at TMI that the computer printouts were running hours behind the events and, at one point the printer jammed, losing valuable information. Operators commonly suppress alarms in order to destroy historical information when they need real-time alarm information for current decisions. Too many alarms can cause confusion and a lack of confidence and can elicit exactly the wrong response, interfering with the operator’s ability to rectify the problems causing the alarms.

In the nine years before the TMI incident, eleven of those valves had stuck open at other plants, and only a year before, a sequence of events similar to those at TMI had occurred at another U.S. plant. Nothing had been done about correcting them.

While the Nuclear Regulatory Commission (NRC) collected an enormous amount of information on the operating experience of plants, the data were not consistently analyzed until after the Three Mile Island accident. The engineering firm for TMI, had no formal procedures to analyze ongoing problems at plants they had built or to review the reports filed with the NRC.

The information needed to prevent TMI was available, including the prior incidents at other plants, recurrent problems with the same equipment at TMI, and engineers’ critiques that operators had been taught to do the wrong thing in specific circumstances, yet nothing had been done to incorporate this information into operating practices.

Reflecting on Information management principles, the staff were deciding what actions to take based on inaccurate information - the indicator showing that a discharge valve had been ordered closed but not that it had actually closed.  Information is unusable when there is too much of it in such a short period of time that it is impossible to take it all in, including in this case, for the computer to print it all out. The information that could have prevented this accident existed, not just where it could be used. There was no business process to analyse the available data that would have identified a safety issue. A known problem had been left unresolved. Training of staff contributed to the problem as the staff were trained to do the wrong thing in these situations.

References

Aircraft Accident Investigation Commission. (1996). Aircraft Accident investigation report 96-5. Ministry of Transport. Japan.

Archives New Zealand. (2000). Recordkeeping framework. Wellington, New Zealand: Archives New Zealand. Statutory Regulatory Group.

Ayres, R.U., & Predeep, K.R. 1987. Bhopal: Lessons for technological decision-makers. Technology in Society9:19-45.

BBC. (6 March 1987). On this day : Hundreds trapped as car ferry capsizes. Available 2 April 2015.  http://news.bbc.co.uk/onthis day/hi/dates/stories/march/6/newsid_2515000/2515923.stm(external link)

BBC (8 October 1987). On this day : Zeebrugge disaster was no accident. Available 2 April 2015.  http://news.bbc.co.uk/onthisday/hi/dates/stories/october/8/newsid_2626000/2626265.stm(external link)

Bogard, William. 1989. The Bhopal Tragedy. Boulder, Colo.: Westview Press.

Chisti, A. 1986. Dateline Bhopal. New Delhi: Concept.

Eddy, P., Potter, E., & Page, B. (1976). Destination disaster. New York: Quandrangle/Tinformation managementes Books.

Jigsaw Entertainment Ltd. (n.d.). New Zealand disasters : the truth about Tangiwai railway disaster [video]. Auckland : Jigsaw Entertainment Ltd.

Johansson, B. and Hollnagel, E. (2007). Pre-requisites for large scale coordination. Cognition, Technology & Work 9:5-13.

Kemeny, J.G. (1980). Saving American democracy: the lessons of Three Mile Island. Technology Review (June-July): 65-75.

Ladd, John. Bhopal: an essay on moral responsibility and civic virtue.Department of Philosophy, Brown University, Rhode Island, January 1987.

Leveson, N.G. (1995). Safeware: system safety and computers. Boston: Addison Wesley.

Leveson, N.G. (2011). Engineering a safer world : systems thinking applied to safety. Cambridge, Mass.: MIT Press.

Lihou, D.A. 1990. Management styles – The effects of loss prevention. In Safety and Loss Prevention in the Chemical and Oil Processing Industries, ed. C.B. Ching, 147-156. Rugby, UK: Institution of Chemical Engineers.

Macfie R. (2013). Tragedy at Pike River mine : how and why 29 men died. Wellington: AWA Press.

Nakao, M. (n.d.) Crash of American Airlines Boeing. Institute of Engineering Innovation, School of Engineering. The University of Tokyo. Available 2 April 2015.  http://www.sozogaku.com/fkd/en/cfen/CA1000293.html(external link)

Perrow, C. (1986). The habit of courting disaster. The Nation (October): 346-356.

Perrow, C. (1999). Normal accidents: living with high-risk technology. Princeton, N.J.: Princeton University Press.

Piper, J.L. (2001). Chain of events: the government cover-up of the Black Hawk incident and the friendly fire death of Lt. Laura Piper.London: Brasseys.

Ross, J.W., Beath, C.M. and Goodhue, D.L. (1996, Fall). Develop long-term competitiveness through IT assets. Sloan Management Review, p. 31-42

Royal Commission on the Pike River Coal Mine Tragedy. (2012, Oct). Vol. 1&2. Wellington, NZ.  www.pikeriver.royalcommission.govt.nz(external link) [Chairperson: Graham Panckhurst, Commissioners: Stewart Bell, David Henry]

Sheen, B. (1987). Herald of Free Enterprise Report Marine Accident Investigation Branch. Department of Transport (originally Report of Court no 8074 Formal Investigation). London: HMSO.

Three Mile Island: site of America’s worst nuclear accident. Available 25 November 2014.

Tragedy that stunned a nation. (1989). [Information boards at the road-side park at Tangiwai]

U.K. Department of Energy. (1990). The Public inquiry into the Piper Alpha disaster (Hon Lord W. Douglas Cullen presiding). London: Department of Energy.

Wikimedia Foundation, Inc. (n.d.) American Airlines Flight 965. Available 2 April 2015.  http://en.wikipedia.org/wiki/American_Airlines_Flight_965(external link)

Wikimedia Foundation, Inc. (n.d.) Bhopal disaster.  http://en.wikipedia.org/wiki/Bhopal_disaster(external link) available 15 April 2015

Wikimedia Foundation, Inc. (n.d.) China Airlines Flight 140. Available 2 April 2015.  http://en.wikipedia.org/wiki/China_Airlines_Flight_140(external link)

Wikimedia Foundation, Inc. (n.d.) 1994 Black Hawk shootdown incident. Available 2 April 2015.  http://en.wikipedia.org/wiki/1994_Black_Hawk_shootdown_incident(external link)

Wikimedia Foundation, Inc. (n.d.). Three Mile Island. Available 28 April 2015.  http://en.wikipedia.org/wiki/Three_Mile_Island_accident(external link)

Wikimedia Foundation, Inc. (n.d.) Turkish Airlines Flight 981. Available 2 April 2015.  http://en.wikipedia.org/wiki/Turkish_Airlines_Flight_981(external link)