Securing SCADA field systems — Part 2
Part 1 of this article introduced the standards efforts in relation to best practices in securing SCADA systems and introduced some key practices in protecting remote control systems. In Part 2, we discuss physical protection of remote RTU systems and highlight best practices for dealing with system failure.
A discussed in Part 1, remote sites provide numerous characteristics which differ significantly from those associated with the enterprise zone or control centre zones and so we are focusing on the control zones.
Addressing RTU physical threats — prevention
Following are measures to physically secure the RTU installations in your SCADA system:
- The best practice for RTU location is to place it in a physically secure area. Risk is significantly decreased if the RTU is installed in a location with access control.
- Keep information about RTU locations secured. Risk is also significantly decreased if as few people as possible know the location of the RTU in the first place.
- Power and network cabling should be kept secure and out of sight. Information on their routing and termination locations should be secured.
In case of a main power failure, the RTU should include adequate battery backup to continue all operations for a time you determine. This time depends on how long you feel it could take to restore mains power. Note that this does not mean how long it could take for operators to find out about the problem. The alarm system must inform operators of a main power failure immediately — we will cover that more in the next section on monitoring and detection. Typical RTU backup times are between eight and 72 hours — the latter taking three-day holiday weekends into consideration.
The backup batteries should be secured inside a locked cabinet with ventilation. For outdoor locations, the most appropriate rating is NEMA 3R or IP14. You must periodically maintain the batteries on a schedule provided by the battery supplier. You can expect a maximum of a five-year lifetime from lead acid cell batteries but you should check them at least once per year. In areas in which temperatures are often at the extremes of the operating range, battery lifetime is significantly reduced. The RTU should continually monitor the batteries and set an alarm if they lose their charge. If their condition is in doubt, replace the batteries.
Include line filters and surge suppression on the power input. Accidentally or otherwise, and battery-backed or otherwise, power problems should not take the RTU out.
Always keep RTU cabinet doors closed and secured. Once the door is opened, it is just too easy to cause any number of problems. If the RTU is not in a physically secure area, then you must keep keypads, pushbuttons and switches secured. Users should have to open up a door that is secured by access control — which could be as simple as a key lock — in order to access these devices. Of course, this is all easy to say but what do you do about an existing installation? In most cases, it has been feasible to secure the room or building in which the RTU is located. In cases where this has been impossible, it was better to secure the RTU inside a locked cabinet or put a gate around it. Ideally, both the room and the RTU enclosure are secured. However, you may have to settle for one or the other. Finally, be on the alert for innovative methods of disabling the RTU. In some industries, computer equipment has been disabled through the use of fire extinguishers, other chemical spray, excessive dust or sand, flooding, sprinkler systems, radio interference and surges on wiring. Vulnerability assessments must include such scenarios, even though they would likely be far down the list in terms of risk. Best practices in terms of locating and physically securing the RTU should prevent these problems. |
|
Addressing RTU physical threats — monitoring and detection
Following are measures to monitor and detect physical threats to the RTU installations in your SCADA system:
- The RTU should detect entry into the physical secure zone via an access control device, that is, when a door or gate is opened, and alert operators via an alarm.
- Clear or reset the door when the door closes. If the user forgets to close the door, the original alarm, set upon opening of the door, should continue to be displayed as a live alarm. As a further provision, you can consider escalating that alarm after a certain time.
- The RTU should continually monitor main power and report an alarm on main power failure.
- The RTU must be able to report that a user has plugged a handheld device or PC into the local port — or gained access via Bluetooth or other local wireless link. This could be an alarm but some users simply log it as an event.
- Log an event when the user signs on by entering a password.
- Log an event for each value change the user makes. Operators must be aware that value changes are being made, locally.
- Log an event when the user signs off and either log an event or clear/reset the alarm when the user unplugs the handheld device or PC. If the user forgets to sign off, the RTU should automatically do this after a set time.
Coordinate the alarms with operating procedures. These procedures should include schedules for site visits and ways to keep operators informed regarding them. Don’t disable alarms just because operators know that a site visit is taking place. Keeping alarming active reinforces procedures and allows the alarms to be kept in a history.
The RTU should not only report alarms over the SCADA network on a priority basis, it should also keep a date- and time-stamped record of all alarms and events locally in memory. The memory must be non-volatile. RAM must be backed up by a battery, and flash memory, which does not require battery backup, is now being used more often.
Many of today’s RTU products incorporate data logging capability, including maintenance of an alarm/event log. In the gas flow computer business, this is known as the ‘audit trail’.
One problem with an alarm/event log is a ‘noisy’ alarm condition whose recurring messages fill it up. Not only is this very annoying but, worse, meaningful messages drop out and are permanently lost. In most cases, it is simple to automatically filter out these transitions or disable the alarming characteristic of the misbehaving input.
The alarm/event log is an excellent backup in case of problems with the SCADA host or network, which could cause alarm reports and event logs to be lost. Typically, it allows the user to access all such information, locally. In addition, many RTUs will allow the audit trail, as well as historical averages and totals, to be transmitted to the SCADA host once communication is restored. You have seen that many of the security tactics in this section involve use of the RTU for alarm reporting. Please be aware that a common problem with SCADA alarm systems is that engineers are tempted to define too many points as alarms. These quickly become ‘nuisance’ alarms, which are ignored. You should avoid this situation because the alarm system should never lose credibility with operators for any reason. It can also create a situation in which an operator can be easily overloaded with alarms and overlook an important development. It is even possible that a security violation can occur because operators are decoyed by a deliberate overload. |
|
Your alarm system design should define alarm points as sparingly as possible and it should use alarm management as a further measure to reduce the quantity of alarms generated from any process or zone.
Finally, for remote site security, using the RTU to report alarms for fire, smoke, water spray or water flooding is also very feasible. The RTU can also be put in the security loop through interfaces with access control devices and video cameras.
Design practices in case of failures
Best practice system design calls for provisions in case of various failures (or breaches) of the SCADA system.
In case the host computer or network fails, the RTU should independently monitor and control the process. Remote processes, today, should not depend on the availability or performance of the network.
The RTU should continue operating even if the network is jammed or one or more ports are kept busy. While this would amount to a denial-of-service attack on the RTU, we have seen many cases in which the SCADA network was simply overloaded. The multitasking kernels in today’s RTUs prioritise tasks and allow the measurement and control functions to continue even with heavy activity on the network.
You should also consider a redundant network. Competition in the communications industry has resulted in decreasing pricing for hardware that includes cellular radio, licensed radio, spread spectrum radio and wireless ethernet. I know some users will scoff at this because they’ve found that selecting even one network is difficult enough! But increasingly, users are installing redundant SCADA networks. Most SCADA software will automatically switch over to a standby network if the primary network fails. At the RTU, the standby network uses a separate communication port that is not affected by problems on the primary network port.
To detect tampering with process equipment, you can use sanity limits or sanity condition tables to validate commands or process conditions. Even though no RTU includes expert system software, you can still put your expertise in the RTU program, whatever the programming language. If you know that all three influent pumps shouldn’t be on when the settling basin is at four metres, put that in the RTU. Maybe the RTU should know that the chlorinator shouldn’t be set on maximum when the flow is only 1.5 ML/day.
Your first reaction might be that this would add too much complexity to the RTU but some languages make the programming almost as easy as making the statement. If access control is violated and someone manually changes a process equipment setting, the RTU could detect it and report an alarm.
Finally, best practices for system design call for provisions in case of RTU failure, regardless of security issues. Upon failure, what happens to the control outputs, with or without power, is a basic design issue. If power remains available, many devices allow selection of a ‘safe mode’ for the outputs. Process equipment continues to run in a reasonable manner. You also need a separate provision to cover the case in which the RTU fails and all power is lost. Equipment that runs using backup power must have a ‘safe’ default setting.
Many users have rock-solid procedures for activity at the sites in response to any failure or security breach in the SCADA system. You need to be in this category.
Conclusion
Today, information that is widely available, and products and technologies that are now on the market, allow SCADA system operators to install and maintain very secure systems.
Utilities need to be well aware of NERC CIP, which requires compliance in your planning, processes and procedures. Meanwhile, ANSI/ISA-99 is a work-in-progress. Part I, which is now available, establishes important ‘common ground’ in definitions of security-related concepts, assets, risks, threats and vulnerabilities.
Users, today, can assess threats, both physical and cyber related, and implement measures for detection as well as prevention of intrusions and attacks in their SCADA systems.
*Kevin L Finnan is vice-president, marketing for CSE-Semaphore.
CSE-Semaphore
www.cse-semaphore.com
Anticipating maintenance problems with predictive analytics
By utilising predictive analytics, process manufacturers can predict failures, enhance...
Air-gapped networks give a false sense of security
So-called 'air-gapped' OT networks can still fall victim to cyber attacks, so what is the...
Maximising automation flexibility: the ISV-driven approach
Vendor lock-in has long been a significant barrier to innovation in the industrial sector, making...