by John Cusimano
A refinery attempted to upgrade their almost 10-year-old process control network (PCN) switches in one unit during a planned maintenance window. The new switches were updated models made by the same manufacturer as the legacy switches. Workers moved the configuration files over, double and triple checked everything, installed the new switches and the unit was back up and running. Unfortunately, six other process units experienced outages and loss of view/loss of control across the refinery. Downtime totaled seven hours.
The workers attempted to troubleshoot the issue, but ultimately decided to roll back to the previous state and reinstall the legacy switches. While this restored the network and operations, it meant the facility couldn’t upgrade the legacy switches. They had to accept the operational risk that if any of those switches failed, they wouldn’t be able replace them because it would cause the same outages until the root cause was determined.
After the unsuccessful upgrade attempt, PCN network experts from aeSolutions were brought in to conduct further analysis and determine the root-cause. A detailed plan was developed to remediate the findings. The update was conducted during a maintenance window. The entire cutover to the new switches lasted less than five minutes. Plant operations observed no impact to the control systems while the remediation was being conducted. The refinery PCN is now fully operational and performing better than it has in years.
The story of this refinery is not uncommon. Most of the networks in industrial applications (e.g. process control, SCADA, manufacturing, etc.) have been expanded, extended and transformed over the last few decades to support the growing needs of operations. During that evolutionary process, the networks accumulate misconfigurations and vulnerabilities that can eventually lead to a costly unplanned downtime incident. Furthermore, because the work is often done piecemeal over a long period of time, it is highly unlikely that anyone has performed a holistic assessment of the network looking for vulnerabilities, misconfigurations and other issues that could impact network performance, reliability and security.
It doesn’t have to be this way. Having an expert come in to ferret out the issues and apply industry best practices can avoid an incident and improve network performance.