Improving alarm management with ISA-18.2: Part 1
Thursday, 06 February, 2014
Poor alarm management is one of the leading causes of unplanned downtime and has been a major contributor to some of the worst industrial accidents on record. Changing the practices and procedures used in the plant has become easier and more important with the release of the ISA-18.2 standard, which provides a blueprint for creating a safer and more productive plant.
In difficult economic times, focusing on operational excellence is a key to short-term survival and to future growth. However, poor alarm management is a major barrier to reaching operational excellence, and has been known to result in unplanned downtime, which can cost $10k/h to $1m/h for facilities that run 24x7. It also impacts the safety of a plant and its personnel, having played a major part in some of the major incidents of the last decade that resulted in significant cost in the form of injury, loss of life, equipment and property damages, fines and damage to company reputations.
In June of 2009 the standard ANSI/ISA-18.2-2009, ‘Management of Alarm Systems for the Process Industries’, was released. In this two-part article we will review ISA-18.2 and describes how it impacts end users, suppliers, integrators and consultants.
Introducing the ISA-18.2 standard
ISA-18.2 provides a framework for the successful design, implementation, operation and management of alarm systems in a process plant. It builds on the work of other standards and guidelines such as EEMUA 191, NAMUR NA 102, and ASM (the Abnormal Situation Management Consortium). Alarm management is not a ‘do once’ activity - rather it is a process that requires continual attention. Consequently, the basis of the standard is to follow a life-cycle approach as shown in Figure 1.
The connection between poor alarm management and process safety accidents was one of the motivations for the development of ISA-18.2. Both OSHA and the HSE have identified the need for improved industry practices to prevent these incidents. Consequently, ISA-18.2 is expected to be “recognised and generally accepted good engineering practice” (RAGAGEP) by both insurance companies and regulatory agencies. As such, it becomes the expected minimum practice.
Common alarm management problems
The ISA-18.2 standard is quite specific in defining an alarm:
Alarm: An audible and/or visible means of indicating to the operator an equipment malfunction, process deviation, or abnormal condition requiring a response.
This clear definition of an alarm is helpful in understanding an alarm’s intended purpose and how misapplication can lead to problems. One of the most important principles of alarm management is that an alarm requires a response. This means if the operator does not need to respond to an alarm (because unacceptable consequences do not occur), then that particular alarm is probably unnecessary. Following this cardinal rule will help eliminate many potential alarm management issues. The recommendations in the standard provide the blueprint for eliminating and preventing the most common alarm management problems, such as those shown in Table 1.
Alarm management problem | Cause(s) |
Alarms are generated which are ignored by the operator. | Nuisance alarms (chattering alarms and fleeting alarms), faulty hardware, redundant alarms, cascading alarms, incorrect alarm settings, alarms have not been rationalised. |
When alarms occur, operators do not know how to respond. | Lack of training and insufficient alarm-response procedures. |
Minor plant upsets generate a large number of alarms. | Average alarm load is too high. Redundant alarms, cascading alarms, alarms have not been rationalised. |
The alarm display is full of alarms, even when there is nothing wrong. | Nuisance alarms (chattering alarms and fleeting alarms), faulty hardware, redundant alarms, cascading alarms, incorrect alarm settings, alarms have not been rationalised. |
Some alarms are present on the alarm display continuously for long periods of time (>24 hours). | Corrective action is ineffective, equipment is broken or out of service, change in plant conditions. |
During an upset, operators are flooded with so many alarms that they do not know which ones are the most important. | Incorrect prioritisation of alarms. Not using advanced alarm techniques (eg, state-based alarming). |
Alarm settings are changed from one operator to the next. | Lack of management of change procedures. |
Following the ISA-18.2 standard
Philosophy (Phase A)
The first phase of the alarm management life cycle focuses on the development of an alarm philosophy document. This document establishes the standards for how your company or site will address all aspects of alarm management - including design, operations and maintenance. It should contain the rules for classifying and prioritising alarms, for using colour to indicate an alarm in the HMI, and for managing changes to the configuration. It should also establish key performance benchmarks, such as the acceptable alarm load for the operator (average number of alarms per hour). For new plants, the alarm philosophy should be fully defined and approved before commissioning. Roles and responsibilities for those involved in the management of alarms should also be clearly defined.
Identification and rationalisation (Phases B and C)
In the second part of the alarm management life cycle, potential alarms are identified. There are many different sources for identifying potential alarms including P&IDs, operating procedure reviews, process hazards analysis (PHA), HAZOPs, incident investigations and quality reviews.
Next, these candidate alarms are rationalised, which means each one is evaluated with a critical eye to justify that it meets the requirements of being an alarm.
- Does it indicate an abnormal condition?
- Does it require an operator action?
- Is it unique (or are there other alarms that indicate the same condition)?
Alarms that pass this screening are further analysed to define their attributes (such as limit, priority, classification and type). Alarm priority should be set based on the severity of the consequences and the time to respond. Classification identifies groups of alarms with similar characteristics (eg, environmental or safety) and common requirements for training, testing, documentation or data retention. Safety alarms coming from a safety instrumented system (SIS) are typically classified as ‘highly managed alarms’. These alarms should receive special treatment particularly when it comes to viewing their status in the HMI.
Alarm attributes (settings) are documented in a master alarm database, which also records important details discussed during rationalisation - the cause, consequence, recommended operator response and the time to respond for each alarm. This information is used during many phases of the life cycle. For example, many plant operations and engineering teams are afraid to eliminate an existing alarm because it was “obviously put there for a reason”. With the master alarm database, one can look back years afterward and see why a specific alarm was created (and evaluate whether it should remain).
Documentation about an alarm’s cause and consequence can be invaluable to the operator who must diagnose the problem and determine the best response. The system should allow the alarm rationalisation information to be entered directly into the configuration so that it is part of the control system database and so that it can be made available to the operator online through the HMI.
One of the major benefits of conducting a rationalisation is determining the minimum set of alarm points that are needed to keep the process safe and under control. Too many projects follow an approach where the practitioner enables all of the alarms that are provided by the DCS, whether they are needed or not, and sets them to default limits of 10, 20, 80 and 90% of range. A typical analog indicator can have six or more different alarms configured (eg, high-high, high, low, low-low, bad quality, rate-of-change, etc), making it easy to end up with significantly more alarm points than are needed. To prevent the creation of nuisance alarms and alarm overload conditions, it is important to enable only those alarms that are called for after completing a rationalisation. Thus an analog indicator, for example, may have only a single alarm condition enabled (such as a high-level alarm).
Detailed design (Phase D)
Poor design and configuration practices are a leading cause of alarm management issues. Following the recommendations in the standard can go a long way to eliminating the issues. In many control rooms, more than 50% of standing alarms are for motors (pumps, fans, etc) that are not running.
During the detailed design phase, the information contained in the master alarm database (such as alarm limit and priority) is used to configure the system. Alarm settings should be copied and pasted or imported from the master alarm database directly into the control system configuration to prevent configuration errors. Spreadsheet-style engineering tools can help speed the process, especially if they allow editing attributes from multiple alarms simultaneously. If the control system configuration supports the addition of user-defined fields, it may be capable of fulfilling the role of the master alarm database itself.
Following the recommendations for alarm deadbands and on/off delays from the standard (shown in Table 2) can help prevent nuisance alarms during operation. A study by the ASM found that the use of on/off delays in combination with other configuration changes was able to reduce the alarm load on the operator by 45-90%.2
Signal type | Deadband (% of range) | Delay time (on/off) |
Flow rate | 5% | 15 seconds |
Level | 5% | 60 seconds |
Pressure | 2% | 15 seconds |
Temperature | 1% | 60 seconds |
Configuration of alarm deadband (hysteresis), which is the change in signal from the alarm setpoint necessary to clear the alarm, can be optimised by a system that displays settings from multiple alarms at the same time, allowing them to be edited in bulk. This capability also makes it easy to review and update the settings after the system has been operating as recommended by the standard. Similar tools and procedures can be used to configure the on/off delay, which is the time that a process measurement remains in the alarm/normal state before the alarm is annunciated/cleared.
The design of the HMI is critical for enabling the operator to detect, diagnose and respond to an alarm within the appropriate time frame. The proper use of colour, text and patterns directly affects the operator’s performance. Since 8-12% of the male population is colourblind, it is important to follow the design recommendations shown in Table 3 to ensure that changes in alarm state (normal, acknowledged, unacknowledged, suppressed) are easily detected.
Alarm State | Audible indication | Visual indications | ||
Colour | Symbol | Blinking | ||
Normal | No | No | No | No |
Unacknowledged (new) alarm | Yes | Yes | Yes | Yes |
Acknowledged alarm | No | Yes | Yes | No |
Return to normal state indication | No | Optional | Optional | Optional |
Unacknowledged latched alarm | Yes | Yes | Yes | Yes |
Acknowledged latched alarm | No | Yes | Yes | No |
Shelved alarm | No | Optional | Optional | No |
Designed suppression alarm | No | Optional | Optional | No |
Out of service alarm | No | Optional | Optional | No |
Symbols and faceplates provided with the system should comply with recommendations of ISA-18.2. Figure 2 shows an example where the unacknowledged alarm state can be clearly distinguished from the normal state by using both colour (yellow box) and symbol (the letter ‘W’). This ensures that even a colourblind operator can detect the alarm. The out-of-service state is also clearly indicated.
The standard recommends that the HMI should make it easy for the operator to navigate to the source of an alarm (single click) and provide powerful filtering capability within an alarm summary display.
Advanced alarming techniques can improve performance by ensuring that operators are presented with alarms only when they are relevant. Additional layers of logic, programming or modelling are configured to modify alarm attributes or suppression state dynamically. One method described in ISA-18.2 is state-based alarming, where alarm attributes are modified based on the operating state of the plant or a piece of equipment.
State-based alarming can be applied to many situations. It can suppress a low-flow alarm from the operator when it is caused by the trip of an associated pump. It can mask alarms coming from a unit or area that is shut down. In batch processes it can change which alarms are presented to the operator based on the phase (eg, running, hold, abort) or based on the recipe.
One of the most challenging times for an operator is dealing with the flood of alarms that occur during a major plant upset. When a distillation column crashes, tens to hundreds of alarms may be generated. To help the operator respond quickly and correctly, the system should be able to hide all but the most significant alarms during the upset. For example, logic in the controller can determine the state of the column. The state parameter could then be used to determine which alarms should be presented to the operator based on a pre-configured state matrix.
In Part 2
In part 2 of this article we will examine phases E though J of the alarm management life cycle (Implementation through to Audit).
References
- ANSI/ISA-18.2-2009, Management of Alarm Systems for the Process Industries, www.isa.org
- Zapata R and Andow P, Reducing the Severity of Alarm Floods, www.controlglobal.com
- EEMUA 191 (2007), Alarm Systems: A Guide to Design, Management and Procurement Edition 2, The Engineering Equipment and Materials Users Association, www.eemua.co.uk
- Abnormal Situation Management Consortium, www.asmconsortium.net
- NAMUR (Interessengemeinschaft Automatisierungstechnik der Prozessindustrie), www.namur.de
Anticipating maintenance problems with predictive analytics
By utilising predictive analytics, process manufacturers can predict failures, enhance...
Air-gapped networks give a false sense of security
So-called 'air-gapped' OT networks can still fall victim to cyber attacks, so what is the...
Maximising automation flexibility: the ISV-driven approach
Vendor lock-in has long been a significant barrier to innovation in the industrial sector, making...