July 2024 — During my 24+ years in alarm management, I have collaborated with various companies on their distributed control systems (DCS) across the United States and throughout 20 other countries. Although every system is different, there are more commonalities than you might imagine. I am consistently asked what my favorite and least favorite control systems are to work on. My answer is always the same, “my favorite system is the one I just finished for obvious reasons, and my least favorite is the one I’m working on right now.” This is because all alarm management systems have issues, but naturally, these issues are different from system to system. That is why I felt it was important to discuss how to prevent the five most common industrial alarm management issues.
Avoiding Unnecessary and Misused Alarms for Effective Industrial Alarm System Management
One tenet of alarm management is that alarms will only be used for abnormal situations. I cannot tell you the number of times that I have found alarms configured on systems for things that should never have an alarm. Some of these were obviously designed for convenience. A typical example of a convenience alarm is a low-temperature alarm on an ambient sensor located just outside the control room door. Although there are a few circumstances when this could be necessary (e.g., an extremely low ambient temperature could adversely affect the viscosity of a process fluid), most of the times that I have encountered this type of convenience alarm, it is simply to let the operator know if it is cold outside. Once, a senior operator in upstate New York actually told me that without the alarm, he wouldn’t know if he should put on a coat or not. The alarm was removed.
Another relatively common misuse of industrial alarm systems occurs when a system timer alarm set up thirty (30) to sixty (60) minutes before the end of the shift in order to remind personnel to fill out shift changeover paperwork before going home. In situations where I have found these, the alarms have descriptions like “Time for Turnover Paperwork” or “Call-in Reading to Foreman.” In one of these cases, the description was “Wake Up and Pack Up to Go Home.” In this case, not only was the alarm removed from the system, but the tag was removed as well, and the person this applied to was told to buy an alarm clock. Ultimately, avoiding unnecessary or misused alarms will improve your industrial alarm system’s effectiveness.
Ensuring Operator Action — Proper Alarm Criteria and the Use of Alert Systems
Another principle of alarm management is that every alarm requires an operator action. When designing an alarm philosophy, one of the steps is to determine the time to respond (how much time is available to take action to avoid the consequences) vs. the severity of consequences matrix, as shown in Table 1 below.
As you can see in the table above, if there are no consequences or the time available is more than thirty (30) minutes, the parameter does not qualify to be an alarm. Although the operator may need to know that an instrument has reached a certain point, that does not mean that it should necessarily be an alarm. This condition can cause concern when these points support operations and do not meet the necessary qualifications of an alarm but still need to be viewed or accessed as part of operational efficiency.
For those items that do not qualify as an alarm, there should be a separate mechanism to inform the operator (e.g., an alert system). I have encountered many types of alert systems, and there are numerous ways to implement them. One of the most common is to set up an alert as a separate “priority” on the DCS that has no visual or audible actions tied to it. This will result in the alerts going to a separate screen designated just for them. The operators will have to become accustomed to checking the screen multiple times during a shift, however these alerts should not be short-time critical (e.g., <1 hour or the potential to be a HIGH priority). If the alarm has the potential to be a HIGH priority, then it should be re-engineered to the point that the time available is 30 minutes or less.
Implementing Effective Single Alarms for Each Cause or Action | Industrial Alarm Management
Creating a single alarm for each cause or corrective action is another doctrine of effective industrial alarm management. In other words, you should not have to be told more than once to do something. This issue most often occurs with multiple levels of alarming (e.g., High (H) & High-High (HH) or Low (L) & Low-Low (LL)). Below is an example of multiple level alarming being used correctly and incorrectly.
Correct use of multiple levels of alarming example:
A tank is ten feet in height and will overflow at that ten-foot level. There is a high-level alarm (H) set at nine feet with a HIGH priority to notify the operator to take action, stopping the level rise. There is a high-high level alarm (HH) at the do not exceed height of 9.5 feet, with a LOW priority and a corresponding automated action that stops filling the tank. The alarm located at nine feet notifies the operator that action is needed. The alarm at 9.5 feet notifies the operator that the action taken was not effective and the DCS — or in some cases — the safety instrumented system (SIS), has shut the process down to avoid over-filling the tank.
Incorrect use of multiple levels of alarming example:
A client had a 40-foot naphtha tank with a high-high alarm set at 39 feet, designated with emergency priority, and a high alarm set at 38 feet, designated with high priority. There were no automated shutdown systems on this tank, and during operations, they overfilled the tank and had a loss of containment (LOC) incident. In an attempt to remedy this issue, the client contacted the DCS vendor and had a custom code written to add a high-high-high (HHH) alarm at 39 feet with an emergency priority, a high-high alarm at 38 feet with an emergency priority, and a high alarm at 37 feet with a high priority.
Much to the chagrin of the client, this attempted resolution left their problem unresolved, and once again, they overfilled the tank and had a subsequent loss of containment (LOC) incident. This cycle repeated several times until they performed an alarm rationalization project. At the beginning of this project, the client’s setup was:
High-High-High-High-High (HHHHH) Alarm at 39 ft with an EMERGENCY priority
High-High-High-High (HHHH) Alarm at 38 ft with an EMERGENCY priority
High-High-High (HHH) Alarm at 37 ft with an EMERGENCY priority
High-High (HH) Alarm at 36 ft with an EMERGENCY priority
High (H) Alarm at 35 ft with a HIGH priority
Not only was this bad practice for industrial alarm system management, but the operators became so numb to the alarms that they were ignoring them and setting themselves up to run the tank over again.
The results of their alarm rationalization study findings suggested reverting back to the original two (2) alarms and adding an automated shutdown at 39.5 feet with a LOW priority to notify the operator that control has been taken away from the operator and that an automated shutdown has occurred.
Preventing DCS Alarm Floods with Advanced Suppression Techniques
Another common issue in industrial alarm system management is the prevention of DCS alarm floods (e.g., having more than ten alarms in ten minutes). A leading cause of alarm floods is the absence of the configuration of advanced alarming techniques such as suppression. Many of the newer DCS systems now have some form of suppression built into them; however, this feature is often underutilized.
Automated suppression is when the DCS automatically disables (suppresses) an alarm’s audible and visual indicators and sends the alarm to an event log or journal instead. Suppression can be used to support alarm flooding in multiple ways; one way is that it allows a single indication of an issue to be alarmed while hiding all the similar alarms the issue causes.
An example of this would be a compressor trip. When the compressor is running, it has numerous alarms configured and enabled, such as the run status, high & low suction pressure, high & low discharge pressure, bearing temperatures, and vibrations — just to name a few. If the discharge pressure goes high while the compressor is running, it can be a big issue. You may have a plug downstream or someone may have accidentally closed a wrong valve. These things need to be taken care of quickly. However, if the compressor shuts down without suppression configured, the result each time will be a run status alarm along with alarms for the high suction pressure, low discharge pressure, all the bearing vibrations as it spools down, and potentially many other alarms. Typically, the only alarm needed is the run status alarm because if the compressor shuts down, a good operator knows that all of these secondary issues are due to the shutdown. If they are allowed to alarm, they become a distraction and hindrance to the mitigation of the issue.
Enhancing DCS Security — The Importance of Firewalls and Controlled Internet Access
Lastly, the largest issue in industrial alarm system management — which thankfully is seen less and less these days — is the lack of firewalls between the DCS and the outside world. Ideally, a control system would be “air-gapped” in order to minimize the possibility of introducing intrusions or viruses. However, this is not always possible. Typically, the DCS will be protected by firewalls, and often, those firewalls will be in their own layer between the control system and the rest of the company assets. The firewalls will only have a minimum number of obscure ports opened, and those ports will only allow one-way (outbound) traffic. This helps to minimize potential hijacking and infections.
The most egregious example of not having firewalls that I have encountered was a few years ago on a project outside the US. One of the client’s complaints was how slow their DCS was running, and they were asking for suggestions on how to improve it. Upon entering the control room, my colleague and I were greeted by what is inarguably the nicest control room I’ve ever seen. The room was brightly lit and immaculately clean. The two (2) main operator stations were laid out in a huge arch in the middle of the room with sixteen (16) monitors each. Sitting perpendicular on the right end was the foreman’s station with four (4) monitors. In the back left corner was the utilities operator station with another twelve (12) monitors, and dead center of the front wall were eight (8) 55” monitors that networked together to make two (2) giant screens that were each two screens high by two screens wide. It was impressive, to say the least — until I realized that the giant screen on the right had more flashing red alarms than I have fingers to count. No one was paying attention to them because the operator on the left was using his giant screen to play an online video game. That’s right, the DCS had a direct connection to the internet. My first suggestion was to disable the internet connection and establish firewalls and the second was to delete all non-business required software from the system. Amazingly, within a week of implementing the suggestions, the system speed had more than doubled.
While there are many more issues that could be discussed, these are the five most common issues that stand out in my career. Does your plant suffer from any of these issues or others not mentioned here?
The Takeaway | Common Industrial Alarm Management Issues
Addressing the most common industrial alarm management issues is crucial for ensuring operational efficiency, safety, and system reliability. By avoiding unnecessary and misused alarms, setting proper alarm criteria, implementing single alarms for each cause or corrective action, preventing alarm floods through advanced suppression techniques, and securing the DCS with firewalls and controlled internet access, companies can significantly enhance their alarm management systems. These ISA-approved best practices not only streamline operations but also empower operators to respond effectively to true emergencies, thereby minimizing risks and maintaining optimal system performance. Implementing these strategies will lead to a more robust and responsive alarm management framework, ultimately contributing to the overall success and safety of industrial operations. If your alarm system issues have you scratching your head, the experts at aeSolutions are always available to help identify and mitigate your industrial alarm system problems.
About the author: Burt Ward is a Senior Principal Specialist with a strong background in both operations and digital control systems. His experience includes over 24 years of Alarm Management projects conducted both remotely and onsite around the world.
Comentarios