Understanding industrial network redundancy

Control Logic Pty Ltd

By Adam Rickards, Application Engineer, Control Logic
Friday, 16 July, 2021

I’ve often been asked what is the ‘best redundancy protocol’ for industrial networks, and usually this question is posed by someone who wants an easy answer so that they can move forward with the comfort of knowing they are going to achieve the best outcome.

However, I think that this question itself poses an interesting thought process about how we should choose the redundancy protocols for our industrial networks. Or maybe a better question would be “how do we design the network so as to meet the requirements of the application(s) inside the industrial network?”

In the simplest terms, the way to go about this is to understand the application’s communication requirements. In the case of a typical PLC, this is a cyclical process on a fixed timer with a number of faults (or retries) being allowed before the device stops the process to ensure the integrity of the process or safety of individuals. Knowing this information, we would then choose an appropriate redundancy protocol to achieve the goal of recovering the network before the PLC stops the process, so that in the event of a small network issue the process continues to operate.

Each standardised redundancy protocol has its own benefits over the others, so let’s briefly touch on the common standardised protocols.

RSTP (Rapid Spanning Tree Protocol) offers a typical recovery time of 0–20 ms per node or switch and allows for any type of architecture you want — which has the advantage in some unique situations (such as redundancy in underground mining where power for some areas will come from another area) of providing additional redundancy paths for multiple points of failure. The disadvantage of this protocol is that you will sometimes get a much higher recovery time if the root bridge (master switch) fails, which leads to it being hard to predict and therefore you cannot guarantee the process will not be hindered.

MRP (Media Redundancy Protocol) is a ring-based protocol offering a single point of failure with a guaranteed worst-case recovery time of 500 ms or 200 ms (configurable). The architecture used is a ring so it may not cater for every case, but the predictability makes its use very easy in an industrial network.

HSR (High-speed Seamless Redundancy) is also a ring-based protocol offering a single point of failure with a 0 ms recovery time guaranteed. Note that specialised hardware is required to support HSR, which can incur additional cost for the application.

PRP (Parallel Redundancy Protocol) is a dual bus architecture offering a single point of failure and also with a 0 ms recovery time guaranteed. There is duplication of some hardware required as well as specialised units that support PRP, so quite a large additional cost can be involved.

DLR (Device Level Ring) is another ring-based protocol offering a single point of failure with a guaranteed worst-case recovery time of as little as 3 ms. The architecture used is a loop or ring, so it may not cater for every case and the hardware has additional cost over typical MRP hardware.

After review of the requirements of the application and some knowledge on the standardised options available, it’s just a matter of ensuring that whichever process is selected for the redundancy is compatible and will meet the requirements of the application.

Control Logic Application Engineer Adam Rickards is a passionate technology and communications professional with over 15 years’ experience in industrial networking, with industry experience in design, implementation and investigation into complex faults in existing networks.

Image: ©stock.adobe.com/au/navintar