Alarming discoveries: improving operator effectiveness
By Martin Hollender, Joan Evans, Thomas-Christian Skovholt and Roy Tanner
Tuesday, 05 July, 2016
Alarm management standards such as IEC 62682 and ISA 18.2 emphasise the importance of life cycle support in alarm management.
Ahead of a recent simulation exercise at Star City in Moscow, British astronaut Tim Peak was asked what the greatest challenges are during the simulation. He replied, “The most difficult thing to deal with is multiple failures”1. Likewise for industrial facilities using distributed control systems, alarm floods remain one of the biggest challenges. To get alarm floods under control, alarm-related design knowledge from early life cycle phases needs to be easily accessible in the operational phase when additional information becomes available, so that decisions about advanced alarming methods like alarm suppression can be made with confidence. Having good management-of-change and life cycle support in place makes it possible to keep the alarm system consistent with the changing reality in the plant and allows continuous improvement. To help, alarm management standards such as IEC 62682 and ISA 18.2 emphasise the importance of life cycle support in alarm management.
Although the need for effective alarm management is now generally recognised, accidents like one in 2010 in the DuPont plant in Belle, West Virginia2 show that even well known safety leaders like DuPont still have deficiencies. Since software-configurable distributed control systems (DCSs) came into the mainstream, multiple alarms could be added at little or no cost to the end user. Unfortunately, this has led to control systems that include a low alarm system quality due to too many alarms being configured. A classic example is the explosion in the Texaco Milford Haven refinery in 19943, where the two operators received 275 alarms in the last 11 minutes before the explosion. This is now seen as a characteristic of an overloaded alarm system, which makes it impossible for an operator to be properly aware of a situation and to diagnose and correct it. These types of alarm systems are neither useful nor acceptable and resulted in the development of systematic alarm management approaches first documented in the EEMUA 191 guideline published in 1999.
Ten years later, the ISA 18.2 standard added a life cycle approach to alarm management similar to the life cycle approach already well established in the safety community with ISA 84 and IEC 61511. Simply put: ensuring safe operation and useful alarms needs ongoing effort.
IEC standard 62682 (published in 2014)4 — the first international standard for alarm management — is based on ISA 18.2 (Table 1). It emphasises the importance of systematic life cycle management. IEC 62682 requires, for example, that all information used to design alarms (safety studies, equipment specifications, etc) should be systematically captured and documented. Later, during plant operations, additional information can supplement or revise the original design decisions. Such a revision requires that all information upon which the original decision was based is available and fully understood, to deter any potentially hazardous side effects from the changes.
Figure 1 captures the essence of IEC 62682 and can be used to develop and maintain an alarm system compliant with the requirements of IEC 62682 and good industry practice.
Alarm philosophy
The first step in the project life cycle is the alarm philosophy. The alarm philosophy is the plan for how alarms are to be managed for the site. It defines:
- roles and responsibilities
- alarm requirements
- work processes and procedures to deliver agreed requirements
IEC 62682, among others, provides useful guidance on the content and structure of an appropriate alarm philosophy.
However, the challenge is not in the authoring of the document, but in its application to the project life cycle. It is necessary to focus on the translation of alarm management principles into concrete project activities and deliverables while communicating the impact of alarm requirements to the extended project team.
This is crucial in ensuring that the purpose and design intent of alarms are identified and documented during project reviews such as hazard and operability studies (HAZOP), layer of protection analysis (LOPA) and piping and instrumentation diagram (P&ID) reviews.
As this alarm design information becomes available, the project continues by deciding how and where alarm-related data will be stored and managed. For this purpose, IEC 62682 has confirmed the concept of having a master alarm database, which is defined as ‘an authorised list of rationalised alarms and associated attributes’.
Rationalisation
IEC 626825 reminds us that in the rationalisation phase of the alarm life cycle, the following need to be identified for every alarm:
- recommended operator action
- consequence of inaction or incorrect action
- probable cause of alarm
Having this information available during operation leads to more consistent operator actions and helps inexperienced operators build up their knowledge base and confidence. Where existing facilities are being revamped, operations staff are the most reliable source of this information. For new plants, the full definition of required alarms is more challenging, relying heavily on design and vendor data to define the required alarm configuration.
As well as capturing alarm requirements and design data, a key feature of an effective alarm database system should be the ability to export operator response data.
Ready access to this data in an online help facility is seen as particularly important for critical (in IEC terms, highly managed6) alarms and is increasingly expected by safety regulators. Plants already using such a system also report that it is a very effective operator support tool.
Continuous efforts
Moving into the operations phase, life cycle management is a central part of IEC 62682 and ISA 18.2 and has also been integrated into the third edition of EEMUA 191. Alarm management requires continuous efforts to maintain good practice and ensure consistency.
Today, many plants have their average alarm rate well under control, with low average alarm rates during normal operation. However, alarm floods are frequently still a challenge.
Figure 2 shows the alarm rate of a petrochemical plant over half a year. Although the average alarm rate is below one alarm every 10 minutes and is therefore well under control, sometimes floods of more than 100 alarms every 10 minutes exist and smaller floods of about 20 alarms every 10 minutes occur quite regularly.
Unfortunately, these floods often occur during the most demanding phases when operators most need support (during start-up or shutdown, for example). Alarm flood scenarios include:
- alarms floods generated because process sections are shut down (such as low-flow alarms after pump stops), operating in different operating modes (eg, cleaning) or instruments being calibrated. These alarms can become a problem if they occur together with a process problem and important alarms are buried inside a flood of unnecessary alarms.
- alarm floods along the causal chain following a process upset. A single root cause can generate many consequential alarms. The first alarm in the alarm list might not be the alarm closest to the root cause — depending on the process dynamics and how thresholds are configured, secondary and misleading alarms might show up first.
Such alarm floods cannot be avoided just by choosing good configuration values for limits, hysteresis or delay timers. Advanced alarming techniques like hiding (called suppression by design in IEC 62682) and grouping come into play. Modern distributed control systems can provide advanced alarming, including alarm grouping, hiding (dynamic suppression) and alarm shelving (time-limited, operator-driven suppression).
Balanced risk
When addressing alarm floods, the challenge is to strike a balance between the potential risks associated with suppressing an alarm during a particular scenario versus the need to address peaks in the alarm rate during abnormal conditions. These risks are best mitigated via a combination of an effective alarm management toolset and a robust management of change (MOC) process to include the appropriate level of review and approval.
Initial (prospective) rationalisation reviews may have identified candidates for basic alarm suppression such as alarm grouping for alarms to be masked when equipment is out of service. Later alarm flood studies during the operations phase will seek to go further and draw on a full range of alarm system functionality, such as:
- operator comments on alarm responses
- detailed alarm analysis data
- current alarm attributes from the alarm database
Combining all this in a single toolset facilitates the identification of potential alarm suppression scenarios based on analysis of actual plant data. With the need for manual, ad hoc analysis removed, the potential for human error in deducing cause and effect is greatly reduced and conclusions can be based on much larger data sets — extending over several years if appropriate. Once a particular scenario has been identified, reviewed and confirmed, the toolset can then be used to explore whether there are other instances in which the same logic can be applied. Integration between the alarm database and the DCS enables continuous alarm optimisation, enforcement and monitoring over time.
This approach has been of proven value in a number of cases, including:
- identification of consequential alarms following a particular shutdown
- critical event analysis, highlighting event triggers with potential for early operator response (intervention) and mitigation of equipment shutdown/plant upset
The main benefits are achieved through a life cycle toolset providing a framework for continuous improvement and include:
- reduced production trips
- reduced legislative risk — safer, more environmentally robust operations
- improved operator effectiveness
Figure 3 shows how it was possible to reduce the average alarm rate in a Rashid Petroleum Company (Rashpetco) offshore gas production plant. This resulted in a reduction of plant trips from 25 down to six per year. As each trip is associated with significant costs, the overall savings are substantial.
Insight achieved
Alarm management is an area of increasing concern to regulators, other public bodies and the public at large who are pushing for evidence of a life cycle approach and continuous improvement, resulting in safer plant operations. With IEC 62682, the best practice in alarm management is finally available as an international standard.
References
- Shukman, D 2015, Tim Peake: British astronaut’s training nears end, <http://www.bbc.com/news/science-environment-34788169>
- Smith, S 2011, Did DuPont Prioritize Cost Over Safety at Belle, W.Va., Facilities? Chemical Safety Board Investigation Indicates It Did, EHS Today, July 2011.
- Health and Safety Executive 1997, The explosion and fires at the Texaco Refinery, Milford Haven, 24 July 1994, Health and Safety Executive, Norwich.
- International Electrotechnical Commission 2014, IEC 62682 - Management of Alarm Systems for the Process Industries
- ibid, section 6.2.1, Table 3, p36.
- ibid, section 6.2.9, p38.
Microgrids: moving towards climate change resilience
The benefits of microgrids go far beyond support during a natural disaster and can provide...
Good for today, ready for tomorrow: how the DCS is adapting to meet changing needs
The future DCS will be modular and offer a more digital experience with another level of...
Software-based process orchestration improves visibility at hydrogen facility
Toyota Australia implemented software-based process orchestration from Emerson at its Altona...