top of page

167 results found with an empty search

  • How to keep the alarm management lifecycle evergreen

    Updated April 2026 - It is commonly touted that once a plant rationalizes their alarms , they have completed the alarm management lifecycle. Nothing could be further from the truth. So what can an organization do to keep the alarm lifecycle alive and evergreen? Alarm management is the collection of processes and practices for determining, documenting, designing, operating, monitoring, and maintaining alarm systems. It is characterized by design principles including hardware and software design, good engineering practices, and human factors. Tying the alarm management lifecycle into process safety management and other work processes that already exist will help ensure it remains evergreen and delivers the intended benefits. While the integration of these activities will look different for each company, time has shown that success comes most easily when the management of change process, testing and training activities have been integrated into what is already being accomplished. The alarm management lifecycle is essentially a circle; there is no beginning or ending. There are different places an organization may choose to enter it, but the overall lifecycle process never really ends. An organization may have developed a philosophy , rationalized alarms , and implemented them , but that does not mean they have ‘completed’ alarm management. As processes and equipment evolve and change (e.g., removing or introducing equipment, changing flow rates, changing chemicals, etc.), different steps of the lifecycle come back into importance. The goal of alarm management should be to keep the lifecycle updated and evergreen. Integrating the alarm management, functional safety, and cybersecurity lifecycles is a key to success and will help avoid costly rework. There are similarities in all three lifecycles (e.g., asses, implement, operate & maintain phases, management of change, testing and training requirements, etc.). The process hazards analysis (PHA) feeds the other lifecycles. When assessing items in cybersecurity, one is considering scenarios first identified in PHAs. The same is true in alarm management when an alarm is used as a protection layer. A change in one lifecycle may, and most likely will, impact all three lifecycles. Something as minor as altering a chattering alarm (e.g., because its setpoint was too close to a shutdown value ) will impact the alarm, the master alarm database, the other lifecycles, and many different process safety information documents. If normalization of deviation is allowed (i.e., not tracking and reviewing the impact of what are believed to be minor changes), alarms will eventually become unrationalized, and things will revert back to their original, un-managed state. To learn more about the ISA 18.2 standard and how to keep the alarm management lifecycle evergreen, read the full paper “Breathing life into the alarm management lifecycle” .

  • How to Prevent the Five Most Common Industrial Alarm Management Issues

    Updated April 2026 — During my 24+ years in alarm management, I have collaborated with various companies on their distributed control systems (DCS)  across the United States and throughout 20 other countries.  Although every system is different, there are more commonalities than you might imagine. I am consistently asked what my favorite and least favorite control systems are to work on. My answer is always the same, “ my favorite system is the one I just finished for obvious reasons, and my least favorite is the one I’m working on right now .” This is because all alarm management systems have issues, but naturally, these issues  are different from system to system. That is why I felt it was important to discuss how to prevent the five most common industrial alarm management issues. Avoiding Unnecessary and Misused Alarms for Effective Industrial Alarm System Management One tenet of alarm management  is that alarms will only be used for abnormal situations. I cannot tell you the number of times that I have found alarms configured on systems for things that should never have an alarm. Some of these were obviously designed for convenience. A typical example of a convenience alarm is a low-temperature alarm on an ambient sensor located just outside the control room door. Although there are a few circumstances when this could be necessary ( e.g., an extremely low ambient temperature could adversely affect the viscosity of a process fluid ), most of the times that I have encountered this type of convenience alarm , it is simply to let the operator know if it is cold outside. Once, a senior operator in upstate New York actually told me that without the alarm, he wouldn’t know if he should put on a coat or not. The alarm was removed. Another relatively common misuse of industrial alarm systems occurs when a system timer alarm set up thirty (30) to sixty (60) minutes before the end of the shift in order to remind personnel to fill out shift changeover paperwork before going home. In situations where I have found these, the alarms have descriptions like “ Time for Turnover Paperwork”  or “ Call-in Reading to Foreman .” In one of these cases, the description was “ Wake Up and Pack Up to Go Home .” In this case, not only was the alarm removed from the system, but the tag was removed as well, and the person this applied to was told to buy an alarm clock. Ultimately, avoiding unnecessary or misused alarms will improve your industrial alarm system’s effectiveness.   Ensuring Operator Action — Proper Alarm Criteria and the Use of Alert Systems Another principle of alarm management  is that every alarm requires an operator action. When designing an alarm philosophy , one of the steps is to determine the time to respond ( how much time is available to take action to avoid the consequences ) vs. the severity of consequences matrix, as shown in Table 1 below. Table 1 - Alarm Priority Determination - aeSolutions As you can see in the table above, if there are no consequences or the time available is more than thirty (30) minutes, the parameter does not qualify to be an alarm. Although the operator may need to know that an instrument has reached a certain point, that does not mean that it should necessarily be an alarm. This condition can cause concern when these points support operations and do not meet the necessary qualifications of an alarm but still need to be viewed or accessed as part of operational efficiency. For those items that do not qualify as an alarm, there should be a separate mechanism to inform the operator (e.g., an alert system). I have encountered many types of alert systems, and there are numerous ways to implement them. One of the most common is to set up an alert as a separate “ priority ” on the DCS that has no visual or audible actions tied to it. This will result in the alerts going to a separate screen designated just for them. The operators will have to become accustomed to checking the screen multiple times during a shift, however these alerts should not be short-time critical ( e.g., <1 hour or the potential to be a HIGH priority ). If the alarm has the potential to be a HIGH priority, then it should be re-engineered to the point that the time available is 30 minutes or less.   Implementing Effective Single Alarms for Each Cause or Action | Industrial Alarm Management Creating a single alarm for each cause or corrective action is another doctrine of effective industrial alarm management . In other words, you should not have to be told more than once to do something. This issue most often occurs with multiple levels of alarming ( e.g., High (H) & High-High (HH) or Low (L) & Low-Low (LL) ). Below is an example of multiple level alarming being used correctly and incorrectly. Correct use of multiple levels of alarming example: A tank is ten feet in height and will overflow at that ten-foot level. There is a high-level alarm (H) set at nine feet with a HIGH priority to notify the operator to take action, stopping the level rise. There is a high-high level alarm (HH) at the do not exceed  height of 9.5 feet, with a LOW priority and a corresponding automated action that stops filling the tank. The alarm located at nine feet notifies the operator that action is needed. The alarm at 9.5 feet notifies the operator that the action taken was not effective and the DCS — or in some cases  — the safety instrumented system  (SIS), has shut the process down to avoid over-filling the tank. Incorrect use of multiple levels of alarming example: A client had a 40-foot naphtha tank with a high-high alarm set at 39 feet, designated with emergency priority, and a high alarm set at 38 feet, designated with high priority. There were no automated shutdown systems on this tank, and during operations, they overfilled the tank and had a loss of containment (LOC) incident. In an attempt to remedy this issue, the client contacted the DCS vendor and had a custom code written to add a high-high-high (HHH) alarm at 39 feet with an emergency priority, a high-high alarm at 38 feet with an emergency priority, and a high alarm at 37 feet with a high priority. Much to the chagrin of the client, this attempted resolution left their problem unresolved, and once again, they overfilled the tank and had a subsequent loss of containment (LOC) incident. This cycle repeated several times until they performed an alarm rationalization project . At the beginning of this project, the client’s setup was:                 High-High-High-High-High (HHHHH) Alarm at 39 ft with an EMERGENCY priority High-High-High-High (HHHH) Alarm at 38 ft with an EMERGENCY priority                 High-High-High (HHH) Alarm at 37 ft with an EMERGENCY priority                 High-High (HH) Alarm at 36 ft with an EMERGENCY priority                 High (H) Alarm at 35 ft with a HIGH priority Not only was this bad practice for industrial alarm system management, but the operators became so numb to the alarms that they were ignoring them and setting themselves up to run the tank over again. The results of their alarm rationalization study findings suggested reverting back to the original two (2) alarms and adding an automated shutdown at 39.5 feet with a LOW priority to notify the operator that control has been taken away from the operator and that an automated shutdown has occurred. Preventing DCS Alarm Floods with Advanced Suppression Techniques Another common issue in industrial alarm system management is the prevention of DCS alarm floods  ( e.g., having more than ten alarms in ten minutes ). A leading cause of alarm floods is the absence of the configuration of advanced alarming techniques such as suppression. Many of the newer DCS systems now have some form of suppression built into them; however, this feature is often underutilized. Automated suppression is when the DCS automatically disables (suppresses) an alarm’s audible and visual indicators and sends the alarm to an event log or journal instead. Suppression can be used to support alarm flooding in multiple ways; one way is that it allows a single indication of an issue to be alarmed while hiding all the similar alarms the issue causes. An example of this would be a compressor trip. When the compressor is running, it has numerous alarms configured and enabled, such as the run status, high & low suction pressure, high & low discharge pressure, bearing temperatures, and vibrations —  just to name a few. If the discharge pressure goes high while the compressor is running, it can be a big issue. You may have a plug downstream or someone may have accidentally closed a wrong valve. These things need to be taken care of quickly. However, if the compressor shuts down without suppression configured, the result each time will be a run status alarm along with alarms for the high suction pressure, low discharge pressure, all the bearing vibrations as it spools down, and potentially many other alarms. Typically, the only alarm needed is the run status alarm because if the compressor shuts down, a good operator knows that all of these secondary issues are due to the shutdown. If they are allowed to alarm, they become a distraction and hindrance to the mitigation of the issue.   Enhancing DCS Security — The Importance of Firewalls and Controlled Internet Access Lastly, the largest issue in industrial alarm system management — which thankfully is seen less and less these days  — is the lack of firewalls between the DCS and the outside world. Ideally, a control system would be “ air-gapped ” in order to minimize the possibility of introducing intrusions or viruses. However, this is not always possible. Typically, the DCS will be protected by firewalls, and often, those firewalls will be in their own layer between the control system and the rest of the company assets. The firewalls will only have a minimum number of obscure ports opened, and those ports will only allow one-way (outbound) traffic. This helps to minimize potential hijacking and infections. The most egregious example of not having firewalls that I have encountered was a few years ago on a project outside the US. One of the client’s complaints was how slow their DCS was running, and they were asking for suggestions on how to improve it. Upon entering the control room, my colleague and I were greeted by what is inarguably the nicest control room I’ve ever seen. The room was brightly lit and immaculately clean. The two (2) main operator stations were laid out in a huge arch in the middle of the room with sixteen (16) monitors each. Sitting perpendicular on the right end was the foreman’s station with four (4) monitors. In the back left corner was the utilities operator station with another twelve (12) monitors, and dead center of the front wall were eight (8) 55” monitors that networked together to make two (2) giant screens that were each two screens high by two screens wide. It was impressive, to say the least — until I realized that the giant screen on the right had more flashing red alarms than I have fingers to count . No one was paying attention to them because the operator on the left was using his giant screen to play an online video game. That’s right, the DCS had a direct connection to the internet. My first suggestion was to disable the internet connection and establish firewalls and the second was to delete all non-business required software from the system. Amazingly, within a week of implementing the suggestions, the system speed had more than doubled. While there are many more issues that could be discussed, these are the five most common issues that stand out in my career. Does your plant suffer from any of these issues or others not mentioned here? The Takeaway | Common Industrial Alarm Management Issues Addressing the most common industrial alarm management  issues is crucial for ensuring operational efficiency, safety, and system reliability. By avoiding unnecessary and misused alarms, setting proper alarm criteria, implementing single alarms for each cause or corrective action, preventing alarm floods through advanced suppression techniques, and securing the DCS with firewalls and controlled internet access, companies can significantly enhance their alarm management systems. These ISA-approved  best practices not only streamline operations but also empower operators to respond effectively to true emergencies, thereby minimizing risks and maintaining optimal system performance. Implementing these strategies will lead to a more robust and responsive alarm management framework, ultimately contributing to the overall success and safety of industrial operations. If your alarm system issues have you scratching your head, the experts at aeSolutions are always available to help identify and mitigate your industrial alarm system problems .   About the author: Burt Ward is a Senior Principal Specialist with a strong background in both operations and digital control systems. His experience includes over 24 years of Alarm Management projects conducted both remotely and onsite around the world.

  • Lessons Learned on SIL Verification and SIS Conceptual Design

    Updated April 2026 - Written by aeSolutions Technical Team - There are many critical activities and decisions that take place prior to and during the Safety Integrity Level (SIL) Verification and other Conceptual Design phases of projects conforming to ISA84 & ISA/IEC 61511. These activities and decisions introduce either opportunities to optimize, or obstacles that impede project flow, depending when and how these decisions are managed. Implementing Safety Instrumented System (SIS) projects that support the long‐term viability of the Process Safety Lifecycle requires that SIS Engineering is in itself an engineering discipline that receives from, and feeds to, other engineering disciplines. This paper will examine lessons learned within the SIS Engineering discipline and between engineering disciplines that help or hinder SIS project execution in achieving the long‐term viability of the Safety Lifecycle. Avoiding these pitfalls can allow your projects to achieve the intended risk reduction and conformance to the ISA/IEC 61511 Safety Lifecycle, while avoiding the costs and delays of late‐stage design changes. Alternate execution strategies will be explored, as well as the risks of moving forward when limited information is available. Click here to view the complete whitepaper Topics Include: IEC 61511, ISA/IEC 61511 , Safety Instrumented Systems (SIS) , Independent Protection Layers (IPL) , Functional Safety Assessment (FSA) , Safety Requirement Specification (SRS) , Safety Lifecycle , Functional Safety Management Plan (FSMP ), Project Execution Plan (PEP), SIS Front‐End Loading (SIS FEL), Layer of Protection Analysis (LOPA ), SIL Verification ​

  • What is a Stage 1 FSA & How Can It Help Discover Critical SIS Flaws?

    Updated April 2026 — Imagine discovering a critical flaw in your safety system design before your plant goes operational. This scenario, while nerve-wracking, underscores the importance of early intervention in the design phase. When developing a Safety Instrumented System  (SIS), it’s crucial to ensure that the hardware and software meet the practical needs identified from the initial hazard and risk assessment. That’s the purpose of a functional safety assessment  (FSA). How can stage 1 FSA help discover critical SIS flaws? FSAs, as defined by IEC 61511 , provide a five-stage, evidence-based investigation to judge the functional safety achieved by one or more SIS and/or other protection layers. Stages 1 through 3 of the FSA encompass the SIS  from its original concept through design, construction, and commissioning. Stage 1 specifically takes place after the hazard and risk assessments have been completed and before detailed design work begins, which can help with the early identification of design flaws and safety issues. Here’s what to expect in the Stage 1 FSA process, along with recommendations for a successful outcome. What Are the Goals of a Stage 1 FSA? A well-executed FSA reduces the likelihood of safety incidents. For FSA Stage 1, the primary goal is to verify that the safety requirements specification  (SRS) accurately reflects the needs identified during the hazard and risk assessments. Does what’s on paper reflect the scenario in which the SIS must operate in the real world? Will the SIS actually mitigate the risks identified in the hazard and risk assessment? By ensuring thorough verification of the SRS at this early stage, Stage 1 FSAs help prevent costly modifications and delays later in the project lifecycle. Proper planning leads to smoother project execution, reducing downtime, and increasing overall efficiency. The deliverable for a Stage 1 FSA includes a comprehensive report that presents findings, recommendations, and general observations. It is a good idea to loop in stakeholders to review this deliverable together to align on opportunities to course-correct and next steps. Key personnel include process engineers, control engineers, operations supervisors, and site leadership. What is the Anticipated Time, Cost, and ROI of a Stage 1 FSA? The effort for executing an FSA is minimal relative to the overall project. The initial cost of performing a Stage 1 FSA includes expenses related to document reviews, stakeholder interviews, and detailed analyses. The expense is minimal compared to the total project cost. The duration of a Stage 1 FSA can vary based on the project's size and complexity, typically involving several days of document reviews and interviews with key personnel. Failing to conduct a thorough Stage 1 FSA can lead to incomplete or incorrect safety requirements. This oversight can result in costly modifications, delays, and potentially catastrophic failures once the system is operational. These issues often incur far higher costs than the initial FSA investment. A Stage 1 FSA can help surface the following issues: ●     Incomplete risk assessments ●     Failure to capture safety requirements ●     Insufficient detail in the preliminary assessments. ●     Inadequate stakeholder engagement Conducting a Stage 1 FSA allows for early identification of design flaws and safety issues, which are less expensive to address in the design phase than during or after construction. What is The FSA Stage 1 Process? The FSA Stage 1 process typically consists of the following steps: ●     Hazard and risk assessment verification ●     Verification of safety requirements specification ●     Operational readiness Table: Process Steps for FSA Stage 1 Process Step Objectives   Hazard and Risk Assessment V ●      Review of Hazard Analysis:  Ensure that all potential hazards have been identified and assessed. ●      Risk Assessment Validation: Confirm that the risk assessments accurately reflect the potential consequences and likelihood of identified hazards.   Verification of Safety Requirements Specification ●      Document Review:  Verify that the SRS accurately captures all safety requirements derived from the hazard and risk assessments. ●      Design Verification: Ensure that the proposed SIS design addresses all identified safety requirements and mitigates the associated risks. ●      Cross-Functional Collaboration: Engage with multiple stakeholders to verify the SRS and ensure it reflects the input and expertise of all relevant parties. Operational Readiness     ●      Stakeholder Engagement: Confirm that all relevant stakeholders, including process engineers, control engineers, and operations personnel, are involved in the development and review of the SRS. What Are the Renewable Energy Implications As renewable energy sources reach maturity in the market, the nature of hazards and associated risks change with new unknowns and limited data. Consider the unique explosion and flammability risks of hydrogen, which is relatively new to the market. The hazard and risk assessments for hydrogen facilities  must account for these unique dangers. Similarly, large-scale battery storage systems, essential for renewable energy, can suffer from thermal runaway leading to fires and explosions. Wind and solar farms present risks such as electrical hazards, mechanical failures, and environmental impacts. It is critical that the team conducting the FSA understands the unique hazards. Conclusion Stage 1 FSAs help prevent hazardous events and protect both personnel and assets. Engaging the team actively and addressing potential issues proactively can significantly enhance the effectiveness of Stage 1 FSAs, ensuring the safety and reliability of industrial operations. If you have any questions about your scenario, aeSolutions  is here to provide support. Our team of industry experts are available to help navigate even the most unique challenges.

  • FSA Stages - What They Are and Why We Do Them

    Updated April 2026 - A Functional Safety Assessment (FSA) is defined by the IEC 61511 standard as an “investigation, based on evidence, to judge the functional safety achieved by one or more SIS and/or other protection layers.” The ultimate goal of an FSA is to make the team confident that their instrumented safety system will reliably achieve the risk reduction needed. While many organizations understand the importance of FSAs, not everyone realizes the significant advantages of conducting one, especially when initiated earlier in the design process. Starting the assessment early allows for more thorough safety considerations and ensures safety measures are ingrained in the project from the beginning. Funct Safety Service Sub Page Why Do You Conduct Functional Safety Assessments? The primary motivation is to ensure the Safety Instrumented Functions being implemented actually address the hazards for which they are designed. It might seem routine, but a Functional Safety Assessment is not just a box to check in your development process; it's a powerful tool that can enhance your organization’s safety, compliance, and cost-efficiency. The benefits include: Safety Assurance The primary and most critical reason for conducting FSAs is to ensure the safety of people, property, and the environment. By identifying and addressing potential hazards, we can prevent accidents and reduce the impact of failures. Standard and Regulatory Compliance: Conducting FSAs helps organizations comply with these regulations, reducing the risk of legal and financial repercussions. Cost Reduction: While implementing safety measures can require an initial investment, it often leads to long-term cost savings. Preventing accidents and failures can significantly reduce downtime, repair costs, and potential liability claims. Innovation and Competitive Advantage : Functional safety assessments can drive innovation by pushing engineers and developers to create more robust and reliable systems. FSA Stages The standard requires 5 stages of FSAs to be performed over the lifetime of a SIS at key phases of the project lifecycle. Stage 1 – After the Hazard and Risk Assessment has been carried out, the required protection layers have been identified, and the SRS has been developed Stage 2 – After the SIS has been designed (typically after Factory Acceptance Testing) Stage 3 – After the installation, pre-commissioning, and final validation of the SIS have been completed, and operation and maintenance procedures have been developed (typically during the Pre-Startup Safety Review) Stage 4 – After gaining experience with the operation and maintenance of the system Stage 5 – After modification and prior to decommissioning of a SIS These stages are sequentially depicted in Figure 7 from ANSI/ISA-61511-1-2018 - Safety Lifecycle Phases and FSA Stages: https://blog.isa.org/hs-fs/hubfs/Imported_Blog_Media/ANSI-ISA-84_00_0-1-2004-IES-61511-Mod-Safety-Life-Cycle.jpg A typical Stage 1 FSA compares the content of the SRS to the hazardous scenario outlined in the risk assessment. For example, Stage 1 will review whether the IPLs are truly independent, whether the SIF will protect against the stated hazard, etc. A Stage 2 will be completed after the detailed engineering is complete and will review the detailed design against the SRS. Identifying and rectifying safety issues at the initial stages of development is significantly more cost-effective than addressing them later in the process or, worse, post-construction. In summary, it’s most cost effective to assess the design while it is still on paper. Late-stage changes can be expensive, lead to project delays, and sometimes even necessitate a complete redesign. In addition to the practical benefits, by addressing safety concerns from the outset, you foster a proactive approach to safety that can be carried forward into future projects, enhancing overall safety awareness and practices. An FSA Stage 3 is done after installation, commissioning, and validation is complete, typically during the Pre-Startup Safety Review. Conducting a Stage 3 reviews work done during the installation and pre-commissioning phases. The Stage 3 FSA ensures the installed system matched the design package. There is now a greater emphasis on FSAs in the standard than previously. IEC 61511 formerly only required the FSA Stage 3 before the introduction of hazards to the process. With the latest version of the standard, FSA Stages 1, 2, and 3 are now required. If the project has advanced beyond the design phase, Stage 1 and 2 can be done congruently along with the Stage 3. By performing FSAs early in your project's lifecycle, you reduce risks and demonstrate your commitment to safety and quality. While these stages of FSAs are a requirement of the 61511 standard, they deliver significant value beyond standard compliance as they provide meaningful advancements towards protecting people and assets. Related: How About a Stage Zero Functional Safety Assessment (FSA)? Don’t Dismiss Stage 4 of an SIS Functional Safety Assessment!

  • The Use of Bayesian Networks in Functional Safety

    Functional Safety & Bayesian Networks Functional safety engineers fol low the ISA/IEC 61511 standard & perform calculations based on random hardware failures. These result in low failure probabilities, which are then combined with similarly low failure probabilities for other safety layers, to show that the overall probability of an accident is extremely low (e.g., 1E-5/yr). Unfortunately, such numbers are based on frequentist assumptions and cannot be proven. Looking at actual accidents caused by control and safety system failures shows that accidents are not caused by random hardware failures. Accidents are typically the result of steady and slow normalization of deviation (a.k.a. drift). It’s up to management to control these factors. However, Bayes theorem can be used to update our prior belief (the initial calculated failure probability) based on observing other evidence (e.g., the effectiveness of the facility’s process safety management process). The results can be dramatic. For example, ass uming a safety instrumented function w ith a risk reduction factor of 5,000 (i.e., SIL 3 performance), and a process safety management program with a 99% effectiveness, results in the function actually having a risk reduction factor of just 98 (i.e., essentially the borderline between SIL1 and SIL 2). The key takeaway is that the focus of functional safety should be on effectively following all the steps in the ISA/IEC 61511 safety lifecycle and the requirements of the OSHA PSM regulation, not the math or certification of devices. Both documents were essentially written in blood through lessons learned the hard way by many organizations. To learn more about the use of Bayesian networks in functional safety , read the full paper here. Click here to view the complete whitepaper

  • Taking credit for unplanned shutdowns as a Proof Test

    By Keith Brumbaugh (CFSE, PE) and aeSolutions Technical Team Updated April 2026 - This blog post will examine the concept of taking proof test credit for an unplanned shutdown in order to delay a Safety Instrumented Function (SIF) proof testing deadline. If scheduled outages go according to plan, this is unnecessary; however, when an outage gets postponed, credit for the unplanned trip may be needed to confirm the SIF still achieves its target risk reduction. Safety Instrumented Functions (SIFs) are required to be proof tested at specific intervals (expressed in months or years) in order to justify the calculated probability of failure on demand. Proof tests are performed to detect dangerous covert failures, which can render the SIF inoperable when it is most needed during a hazardous event. These proof tests are given a specified amount of coverage expressed as a percent of the dangerous failures detected vs total failures (detected and undetected). A proof test is typically undertaken during scheduled plant outages (for example, a turnaround). Unfortunately, the timing of an outage often shifts due to external circumstances. If the calculated SIF proof test interval is equal to the outage timing, then delaying an outage could result in a SIF that is no longer meeting its calculated probability of failure on demand. If the delay is long enough, the SIF could potentially fall below its performance target. This could result in the plant operating with an unmitigated risk gap. The concept of taking credit for an unplanned shutdown boils down to the fact that during an unplanned shutdown, all devices will typically trip and move to their safe state. This would apply to almost any SIF’s final elements (typically a valve or a pump). Using valves for example, many SIF valves are fail closed. If the air is vented from the actuator, or if the power is removed, the valve should close. If a final element is able to transition from the operating state to the safe state, and the transition can be proven, this is proof of the final element’s ability to function on demand. This actuation can be assigned appropriate coverage credit, and the credit can be applied to satisfy part of the SIF proof testing requirements, allowing for a delay in the full proof test.   What devices can we take credit for? When determining what devices to credit in a trip, we need to examine what sensors, logic solvers, and final elements were involved. The first question we want to answer is what caused the trip: the SIF sensor or something else? For the logic solver, we need to determine how the trip was commanded. For the final element, we need to figure out what moved (or stopped moving). Typically SIF sensors will not be demanded during an unplanned shutdown. These devices are monitoring for a process upset. Unless the source of the unplanned shutdown was due to a process excursion involving the actual SIF, then the SIF sensor will be reading normal during the trip. Consequently, there would be no proof of the successful function of the sensor. Fortunately, this is not typically an issue as sensors are rarely the driving factor in a SIL calculation. For final elements such as valves, the valve body can almost always receive credit as long as it moved. The actuator, solenoid, and positioner will need a closer look, as well as the mechanism performing the trip of the valve. The user needs to consider what form of actuator and solenoid (or other positioner) was involved in the trip. This particularly makes a difference when a smart SIL-certified positioner is used rather than a solenoid. If the SIS logic did not demand the trip, it is possible the solenoid never moved and thus would not receive credit. On the other hand, when a valve uses a SIL-certified positioner, these are often driven to 0% during a shutdown by either the SIS logic solver, or even requested by the BPCS logic solver. Solenoids and positioners operate differently, so moving a positioner is not the same as breaking the circuit of a solenoid. The same concepts apply for other types of final elements. For example, for equipment driven by a motor, we need to figure out if the motor was stopped by the SIF relay or a BPCS relay.   How much credit can we take? The next important question we need to answer is how much coverage credit we can take. Crediting the equivalent of a full stroke proof test is not recommended for an unplanned shutdown. In SIL calculations for valves, varying amounts of credit are given depending on whether you are performing a full stroke test or a partial stroke test, with the amount of credit determined by the robustness of the test. For example, a full stroke proof test could provide 90% proof test coverage, particularly if a leak test is performed. A partial stroke test might give 60% credit for moving the valve a minimal amount closed and then back open within a few seconds. As it can be reasoned, the partial stroke would detect only a subset of the failures that would be detected by the full stroke proof test. Because the partial stroke test only strokes the valve a portion of the total travel possible (and doesn’t fully close it), the partial stroke test would tell nothing about the integrity of the valve seat and associated leakage. The amount of credit possible due to an unplanned trip will not be the same as a full stroke proof test credit. The practitioner would need to examine what portion of failures would be detected during an unplanned trip (much like the partial stroke test). For example, the practitioner might assume the valve moved from the unsafe state to the safe state during the shutdown, but this would need proven. They might look to see if there is valve position feedback, including possibly a physical valve inspection at the time of the trip. If the practitioner does not have any indication that the valve moved, then it’s not possible to say the valve actually did. It is possible some other equipment brought the process to the safe state independent of the valve. Without feedback of the actual valve, the practitioner will never know if the valve actually moved. For motor driven equipment, positive indication of motor stoppage should be examined. For other types of final elements, such as electrostatic precipitators, credit for an unplanned trip requires verification by other means.   Other Considerations Finally, we should confirm our devices are still operating within their design parameters (e.g. have they exceeded their manufacturer recommended replacement interval). Useful life is typically provided by the device vendor and has various connotations, one of which is how long a device’s failure rates are considered valid. If useful life is exceeded, the device may no longer have the same failure rate assumed in the SIL calculation. Useful life is typically longer than the proof test interval, and it becomes more relevant to this discussion as the devices ages. If the useful life will be expended by the next planned test, and the credit for the unplanned shutdown will push the turnaround beyond the useful life, then the device should be replaced during the unplanned shutdown. In summary, credit can be taken for an unplanned shutdown, but there must be careful consideration of the circumstances and justification. A primary concern in this process is that over crediting the test can lead to non-conservative results and additional risk. The practitioner must understand the mechanics of the unplanned shutdown to ensure appropriate credit is taken.

  • Integrating PHA LOPA Outputs into Effective SIS Engineering

    Updated April 2026 - We can help you pick up your PHA/LOPA which maybe been put to the side and provide the services to set you up for the safety system design phase. Standard SIS deliverables include the review of specification, confirming that the SIL levels are what they should be, and that the proof testing procedures are correctly documented to support your regular testing intervals.  Transcript: "aeSolutions has a full suite of offerings and the safety lifecycle, from the PHA LOPA aspect in the upstream design all the way through into the detailed engineering phase. Our group SIS engineering (Safety Instrumented Systems) sits kind of right in the middle between the two, you have the PHA LOPA upstream and you've got the detailed design downstream. We take what the PHA LOPA outputs. We massage it a little bit to get it into a more meaningful list of safety functions, for example. And then we can take that through the conceptual design phase where it goes into SIL calculations and SRS's and cause and effects and gets that into a into a package that can ultimately be handed downstream into the design phase. Everything we do here is developing standard SIS deliverables by making sure that all the specifications are correct. All the safety integrated levels are what they should be. All the proof testing procedures are correctly documented. So that our clients can have regular testing intervals with the necessary equipment that they need to be testing. One of the things that we've seen a lot of our clients do is they'll do the PHA LOPA and then they'll take that information and they'll essentially file it away and do very little else with that. And one of the challenges that we've seen is the SIS needs to be designed against that document and so we can take that document either from an internal study or from a client and help pick it up and do the rest of the upstream engineering on it. Where we identify, what are your safety functions look like, how many sensors o you have? And get that into a more defined safety function that can ultimately be? Hand it off to the design team. The other aspects that we run into is a lot of times I'll do the front end engineering all the way through. You know, for example an SRS data sheet, but then they don't do things with it and ultimately the intention behind that is not only to use as an operating manual for your safety function, but you also want to use that as the guiding document in the design phase so that you ensure that everything that you're doing in the design. This matches what you intended it to do on the on the front end."    PHA LOPA Process Safety

  • How Taking Credit for Planned and Unplanned Shutdowns Can Help You Achieve Your SIL Targets

    by Keith Brumbaugh , P.E., CFSE ​ Achieving Safety Integrity Level (SIL) targets can be difficult when proof test intervals approach turnaround intervals of five years or more. However, some process units have planned and predictable unplanned shutdowns multiple times a year. During these shutdowns, it may be possible to document that the safety devices functioned properly. This can be incorporated into SIL verification calculations to show that performance targets can now be met without incorporating expensive fault tolerance , online testing schemes, etc. This can result in considerable cost savings for an operating unit. The problem If a process plant is following the ANSI/ ISA 84.00.0 1 process safety lifecycle (i.e. ISA 84) or similar, as part of the allocation of safety functions to protection layers phase, a SIL assessment (e.g., a Layers of Protection Analysis (LOPA)) would be undertaken to assign Safety Integrity Levels (SIL) targets to a Safety Instrumented Function (SIF) . A scenario could occur in the design and engineering phase of the ISA 84 safety lifecycle when performing the SIL verification calculations, that the team discovers the SIFs do not meet their performance target. Assuming the calculation was done properly using valid data and assumptions, something would need to change in order to meet or exceed the required performance targets. This issue could occur in a Greenfield plant when first designing a SIF, but is more likely to be discovered during a revalidation cycle of a brownfield plant. Click here to view the complete whitepaper

  • Reducing Systematic Failures - Process Safety Management PSM

    Updated April 2026 - Written by aeSolutions Technical Team - Some companies implement intermediate tasks during the analysis and design stages of an IEC/ISA 61511 Lifecycle Project with names such as “IPL Select” or “ LOPA Reconciliation”. The result of such studies is often a “refinement” of the control and/or safety system. Examples have ranged from identifying additional final elements to avoid the hazard, eliminating the use of shared instrumentation between protection layers, addressing response time issues, and assessing control system protection layers for full independence of a function against the initiating event and other protection layers. The benefit of such studies is that it’s easier and less expensive to make necessary changes to systems while the design is still on paper. It’s very expensive, and in some cases not even possible, to make design changes after systems have been installed. In the end, it all boils down to people. It is imperative that all personnel be competent in their roles within the Safety lifecycle. New people entering the industry need an opportunity to learn. Yet they need training, mentoring and reviews of their work in order to prevent systematic failures from creeping in and causing accidents. To read more examples of systematic failures throughout the lifecycle , and to learn how to reduce them, read the full paper “Methodologies in Reducing Systematic Failures of Wired IPLs” . Process Safety & Risk Management Industrial Safety Instrumented Systems

  • Methodologies in Reducing Systematic Failures of Wired IPLs

    by aeSolutions Technical Team & Tab Vestal ​ Updated April 2026 - The history of high consequence incidents in industry reveals that most accidents were the result of systematic failures, not hardware failures. However, a higher degree of focus in engineering is often on the quantifiable failures of hardware. Process Safety risk gaps are often closed or reduced by several types of Independent Protective Layers (IPLs). Two common types are Safety Instrumented Functions (SIFs) and Basic Process Control System (BPCS) functions. The SIFs typically reside within a SIL-rated programmable logic controller, and their achieved quantitative performance is calculated based on random hardware failures of the SIF hardware components. Conversely, BPCS protective layers are assigned generic industry-accepted probability of failure credits. The BPCS generic industry-accepted probabilities of failure are conservatively assigned and consider unquantifiable human-induced systematic failures. In either case, the likelihood of systematic failures can be reduced by recognizing design, specification, maintenance, and operations activities that are potential sources, and applying measures to prevent or reduce them. By reducing systematic failures, you reduce the risk in the industrial process and increase confidence in meeting the intended integrity requirements. This technical paper will discuss the common sources of systematic failures and preventative or mitigative measures to prevent their occurrence. Topics Included in Whitepaper: Systematic failure , random hardware failure , Independent Protective Layer, IPL, SIF, SIS, BPCS , common cause, Human Factor Analysis , SIL Verification Click here to view the complete whitepaper

  • LOPA Independent Protection Layers- Common Pitfalls in IPL Selection

    Updated April 2026 - Those who work in high hazard industries are familiar with the OSHA Process Safety Management (PSM) requirements for routine Process Hazard Analyses (PHA) for their processes. Hazard and Operability (HAZOP) and Layer of Protection Analysis (LOPA) are recognized methods for PHA. LOPA is widely used as a semi-quantitative method to identify, assess, and improve the most effective safeguards for higher consequence scenarios identified in a qualitative HAZOP study . One of the important products of a LOPA is a list of Independent Protection Layers (IPL) . When correctly identified, IPLs are devices, systems, and actions that are capable of preventing a hazard scenario from proceeding to the undesired consequence. In layman’s terms, they are the “best” and most effective of the safeguards that were identified in the HAZOP for specific scenarios and initiating events. The core attributes for safeguards to qualify as IPLs are well-known and have criteria including: Independent of the initiating event and of other protection layers Specific to the hazard Functional, dependable, and reliable (including routine testing) Auditable Secure Subject to management of change There are many reputable sources for training for the HAZOP and LOPA methods. Many organizations also have good internal guidance on this subject. But what happens when inadequate guidance, training, or discipline for the correct use of LOPA and identification of IPLs is present? You might be surprised at how often safeguards not meeting the core attributes are specified as IPLs in industry. It’s easy to find advice detailing the complexities of proper IPL selection and management, but without a facilitator well-versed in the basics of IPL selection, LOPA teams can get off on the wrong foot. The Challenges Many companies and LOPA practitioners employ excellent practices to identify and validate IPLs during LOPA. However, it is surprisingly common for significant IPL selection errors to be encountered during externally facilitated revalidation PHAs, audits and other types of process safety reviews. IPL concerns of the following types are entirely possible to occur in LOPA studies if initial selection or follow-up IPL validation is not as it should be: Use of two or more relief devices, all taken with two or more IPL credits. Relief devices are often a highly effective safeguard. However, they are subject to concerns that should limit the credit taken at times, including use in services where "pluggage" or other common cause failures are credible, engineering assumptions on sizing are not as the PHA team assumed, poor-quality or no routine inspections are performed, and other issues. Use of instrumentation whose failsafe failure modes are opposite of that assumed by the PHA team, which may result in an unrecognized IPL failure to the dangerous mode. Selection of one facet of an IPL such as a BPCS alarm, without recognition that other facets are also needed for a complete IPL, such as alarm prioritization and management, training in the specific alarm response, an operating procedure, and proper field instrument functional testing. Selection of a BPCS alarm and Operator response as an IPL, without confirming that sufficient time is present before hazard development to evaluate and respond effectively to the alarm. Selection of IPLs with insufficient independence from the initiating cause of a hazardous scenario, or insufficient independence from another IPL for the same scenario. A classic example of this is selection of an instrument to alarm or interlock of a process condition that could be initiated by a failure of that same instrument. Crediting design pressure and temperature ratings; both are equipment attributes that should normally be taken into account in identifying the scenario consequences, not credited as an IPL. Building Confidence Improperly selected and validated IPLs can result in high hazard scenarios that have far less risk reduction in place than you think you have. Implementing a systematic process to properly vet your IPL candidates for the core attributes is strongly recommended. Engaging experienced PHA/LOPA facilitators and having the right team during the meeting is the first step in proper IPL selection. Further validation of IPLs to confirm they meet the defined criteria can be time consuming but also goes a long way toward increasing your confidence in your most important safeguards for higher consequence scenarios in highly hazardous chemical processes.

bottom of page