Alarm management blunders: avoiding 12 costly mistakes

Matrikon Asia-Pacific
By Michael Marvan, P.Eng. (Alberta), Product Mgr, Matrikon, Inc.
Thursday, 07 September, 2006


Ineffective alarm systems pose a serious risk to safety, the environment and plant profitability. Too often, alarm system effectiveness is unknowingly undermined by poorly configured alarms. Static alarm settings can't adapt to dynamic plant conditions and many other nuisances result in alarm floods that overwhelm operators just when they most need concise direction.

Operators and engineers in the process control industry have become increasingly aware of the value that alarm management solutions offer. Alarm systems are the primary tool for identifying abnormal situations and helping plant personnel take timely, appropriate action to move the process back to operational targets.

As alarm management solutions become more common, our understanding of the factors that impede their success has grown. If you're thinking of undertaking an alarm management solution, or if you have already started one, the following information, based on lessons learned, can help drive your project to success.

The overall structure of a successful alarm management project is fundamentally the same across industries, regardless of plant size.

1. Benchmark and evaluate current performance: this is the time to identify your biggest alarm system problems and your biggest opportunities for improvement.

2. Develop an alarm philosophy document: this critical document clearly outlines key concepts and governing rules for your alarm strategy such as what constitutes an alarm and what risk categories pertain to your site operations. The philosophy also outlines: roles and responsibilities; management procedural changes; and project goals, such as target alarm rates. There is good news for those who find it difficult to compile the philosophy document. Templates are available that do most of the work for you. All you are required to do is include your specific metrics and situation.

3. Alarm rationalisation: first, target and eliminate the top 20 to 30 bad actors to significantly improve alarm loading. Then, perform an alarm system configuration review to ensure priorities convey consistent urgency to the operator.

4. Implementation: control system reconfiguration makes the intentions of alarm rationalisation a reality by eliminating nuisance alarms at their source.

5. Continuous improvement: routine performance monitoring helps to identify new opportunities for improvement, such as dynamic alarm strategies.

6. Maintenance: integrate alarm management practices into plant workflow to sustain optimised plant performance over the long term.

Now that we have defined the correct execution path, let's take a look at the recent lessons learned by industry.

Blunder #1: Poor project management

Poor planning, system design, resource allocation, scheduling, or expectation management can destroy the success of any project. Alarm management is no exception. This may seem drastically obvious; unfortunately it is here where common sense is often neglected. The single most important alarm management activity is planning: detailed, systematic, team-involved plans are the foundation for project success.

Blunder #2: Using the wrong tools

Alarm and event archiving and the correct analysis tools must be used to ensure that time spent on problem correction delivers the maximum return. All alarms should be reviewed in due course to ensure consistent priorities, but it is inefficient, costly and irresponsible to correct minor nuisances when problems remain that pose serious risk to plant safety.

Beyond simple analysis, tools that enable automatic change control, punch-list generation and project tracking are available. Forethought should be given to how leveraging alarm information will be achieved once this knowledge is in a repository. Although these tasks can be performed without special software tools, it is not practical to do so. The effort often becomes so daunting that alarm management initiatives can collapse under the weight of their own logistics. It is best to do away with paper trails for change control and spreadsheets posing as master alarm databases. Use the right tools.

Blunder #3: Neglecting to benchmark

Benchmarking is vital to any serious improvement initiative. If you don't measure your current performance, you won't be able to accurately determine your progress. The first step is to keep track of alarm rates for several weeks in order to get a baseline measurement. Once that's done, assess how your plant's current alarm levels measure up to industry standards.

To get a quick snapshot of where your plant ranks according to EEMUA standards, Matrikon has posted an automated calculator on its website at www.matrikon.com/plantperformance.

When you have finished benchmarking and assessing your current performance, you can start identifying opportunities for improvement. Below are the key questions you need to answer when performing this assessment. Note that this checklist is in order of importance:

  1. Is the dynamic (real-time) alarm load acceptable for all operators?
  2. Does the dynamic alarm prioritisation meet industry standards?
  3. What are the troublesome tags on the system during steady-state operation?
  4. How does the configured DCS alarm count compare to standards (alarms per tag)?
  5. What does the configured alarm distribution look like compared to standards?

Blunder #4: Stop philosophising and get it done!

Failing to establish and document best practices is a recipe for disaster. In order to get consistent results you have to create guidelines for performing alarm rationalisation. For example, a project-specific alarm philosophy, including a methodology and rules for setting alarms, an alarm review to build commitment and consolidate training, as well as an audit process to ensure that the philosophy is consistently applied. These guidelines will clearly define the criteria for legitimate alarms and setting of their priorities. These are the backbone of an 'alarm philosophy' document, which acts as a corporate standard to guide your entire organisation's alarm management initiatives.

Blunder #5: Cutting resource corners

It is disturbingly common for companies to try and exclude the most important resource from rationalisation meetings: the panel operator. Panel operators are the end user and the primary stakeholder in alarm optimisation. If you exclude the panel operator from the rationalisation process, the project will fail.

The following reality is based on unpleasant site experience. Instrument technicians, automation engineers, process engineers and field operators are not panel operators. Please pay attention: the only person who can be the 'panel operator' is an experienced panel operator. This person fights alarms and unit problems day in and day out and his knowledge becomes very valuable during the rationalisation process.

Alarm rationalisation is the process of applying operational experience to alarm system design. Although operators are the most important participants in this process, they cannot carry this burden alone. Without a facilitator who is familiar with alarm rationalisation, your rationalisation project will take longer than it should, yield poor results and have to be repeated.

Finally, alarm rationalisation requires an engineering review prior to implementation. This is required to ensure results are consistent with Hazard and Operability Studies (HAZOP) and Safety Integrity Level (SIL) studies. The process, unit, or contact engineer plays this role.

Blunder #6: Establishing the easiest or cheapest connection

Collecting alarm data in an optimal fashion is system specific. The easiest way is often not the best way. Be sure to answer the following questions:

  • Does the analysis package need to present information to the operator in real time or are existing alarm visualisation tools adequate to manage plant upsets?
  • Is the plant hierarchy represented consistently and intuitively within the control system and the alarm management system?
  • Is redundant alarm data collection required to meet regulatory or corporate policy compliance?
  • Are all required events such as 'return to normal', 'operator actions', and 'system messages' included in the chosen connection method?
  • Are all required fields available in the data? Can priorities be distinguished? Can audible and suppressed alarms be distinguished? Can setpoint changes be discerned from output changes? Can absolute alarms be separated from deviation alarms? If gaps exist, what other sub-system(s) can be referenced to close them?
  • Are basic alarm and event archiving and analysis adequate to meet my objectives, or do I need to establish a connection with the control system configuration database?
  • How likely is the connection strategy to function with control system upgrades?
  • How much maintenance is required to keep the system running?
  • Does one option provide advantages over another and vice versa? Should more than one connection be used for each area?
  • Do I only want to view this data at the plant level, or would corporate comparisons between sites benefit my operations?

Don't restrict connectivity to legacy strategies if they do not meet current needs. What worked in the past may no longer be the best solution. However, do not make things unnecessarily complex. Decide what you want to accomplish and then choose the simplest method that meets all of your needs. If the collection strategy becomes overly complex then it will be hard to maintain, and ultimately your entire alarm management strategy will suffer.

Blunder #7: Failing to automate

Good technology makes life easier. Its purpose is to relieve people of dangerous, repetitive tasks, freeing them to intervene when the automated system requires guidance. When intervention is required, software should make problem assessment and diagnosis easy so as to free the user's time to fix the problem.

Although task accountability is necessary for successful alarm management, staff are more likely to use reliable technologies that are available on demand to make their jobs easier.

Blunder #8: Only tracking alarms

People often mistakenly fail to track all of the data required. Only tracking alarms is not enough! Alarm rationalisation requires more than one type of data. For example, when an alarm occurs you need to know if an operator actually responded to it. Tracking operator actions is an effective way to identify control problems, automation opportunities, and audit the effectiveness of your alarm strategy. If the operator did not respond, there is a good chance that the alarm is a nuisance alarm. Examine the ratio of operator actions to audible process alarms in order to identify poor alarm strategies. The de-facto standard "every alarm requires operator intervention" demands this ratio exceed one.

Other data to track consists of operator actions, including controller setpoint, mode changes and system errors. If a controller's mode or output is repeatedly changed it is a clear sign the loop needs fixing. If action data is coupled with controller performance data, an understanding of the loop's problems can be quickly diagnosed, saving time. If a controller's setpoint is frequently changed and the controller has no supervisory control, then the automation engineer must ask "Why not?" Installing new automation strategies can free the operator to focus on pushing limits rather than maintaining process stability. In addition, process variable history is important for determining some dead-band alarm settings or for performing the engineering reviews prior to implementation.

Blunder #9: Treating all data the same

Audible alarms are not the same as non-audible alarms. Many control systems continue to send alarms to the journals when alarms are not audible. Failure to separate this data creates an inaccurate picture of alarm system performance and may lead personnel to think the situation is worse than it is. Moreover, this may waste time by falsely indicating alarm problems.

Blunder #10: Assuming users will read the manuals

I confess to not reading my motherboard manual the last time I bought a computer. Nor did I read the instructions for my television, DVD player, microwave, and certainly not the 1800-page operating system help files. I know you're guilty too. The easiest way to undermine effective alarm management is to implement a solution without giving personnel the hands-on training they need. This point is perhaps best illustrated with a real-world example:

A large petrochemical plant went to great efforts to improve its alarm system performance through alarm rationalisation. Once the new settings were designed, changes were uploaded to the control system over the span of two months. Training was provided throughout this period.

Joe, a veteran operator with 21 years of experience, was entitled to five weeks of vacation per year. Shift rotations at the company normally consisted of four weeks on and one week off. Joe had recently earned some time-in-lieu by working some shifts for a co-worker. With these factors combined, Joe decided to take two months off. Guess when?

On Joe's first day back, there was a compressor trip. This caused a single emergency priority alarm to be sent to the control system. Joe was accustomed to assessing the plant's state based on the rate of alarms. He naturally assumed things were running quite smoothly: he had only a single alarm in nearly 30 minutes! His delayed intervention escalated the upset to an unnecessary plant shutdown.

Effective operator training ensures that operators know what needs to be done, when and how. Remember, team-involved plans are the only foundation for project success. If unable to provide effective in-house operator training there are companies that specialise in third-party training.

Blunder #11: Overhauling the whole system at once

In line with proper training, implementation should be staged. If all changes happen at once, implementation strategy becomes complicated. This will only ensure that it never gets done. Recognising this prior to rationalisation will help personnel break the execution into easy steps. This also enables operations to become accustomed to the changes gradually, thus improving the chances of success.

Blunder #12: Having no accountability

Failing to assign roles and responsibility is the most common and most deadly oversight in an alarm management project. I advocate resolving this by encouraging 'accountability through visibility'. In other words, make sure everyone has access to their peers' data. This will motivate your plant personnel to work together and prove they run the 'tightest ship'. Some sites may make excuses and complain, but in the end they will improve plant operations to avoid repeated corporate humiliation. This sounds harsh, but it works.

It is best to define maintenance tasks and assign responsibility for them at an early project stage such as during the project plan design. This must be done in a simple manner, both textually and in actual day-to-day practice, to ensure the sustained support of the idea. This will give personnel an opportunity to participate in the system installation and/or verification and they will be more likely to use the new technologies because they have ownership from participating in the initial configuration.

Conclusion

Alarm management solutions can significantly improve plant safety, reliability and profitability, but will only succeed if they are implemented properly. If you follow the recommended project methodology, and if you avoid the common mistakes we've examined throughout this paper, you will have an effective and successful alarm management project that will make your personnel more productive and your plant run more reliably.

Related Articles

Building a critical infrastructure security dream team

Today it’s essential to have a strong cyber strategy, with all corners of the business...

Anticipating maintenance problems with predictive analytics

By utilising predictive analytics, process manufacturers can predict failures, enhance...

Air-gapped networks give a false sense of security

So-called 'air-gapped' OT networks can still fall victim to cyber attacks, so what is the...


  • All content Copyright © 2024 Westwick-Farrow Pty Ltd