High availability is one of the important consideration during network design and deployment stage and all most all the network vendors support various high availability features.
The objective of this article is to describe Junos best practices required to achieve minimum downtime in case of fail-over scenarios.
The Routing Engine or Control Plan is the brain in Junos based devices to run and execute all the management functions. Most of the Junos based devices offers redundant routing engines (either through default configuration or through explicit configuration virtual chassis ). At one time only one Routing engine can be active (exception of Active-Active MC-LAG which is beyond the scope of this blog). The mere presence of 2nd routing engine in the Junos device will not add any advantage with respect to high availability until certain features are not configured.
- Grace-full Routing Engine Switch Over (GRES). GRES enables synchronization of kernel and chassis demon between mater routing engine and backup routing engines and in case of failure of master routing Packet Forwarding Engine (PFE) will simply join to new master routing engine (which was backup routing before fail-over).
GRES can be configured by following configuration command:-
set chassis redundancy graceful-switchover
Effects of above configuration can be monitored on backup RE
show system switchover
Graceful switchover: On
Configuration database: Ready
Kernel database: Ready
Peer state: Steady State
If GRES is not enabled and primary routing engines fails then kernel demon will restart on new master routing engine , after this chassis demon will restart and traffic will contentiously drop during this whole process.
- Non Stop Active Routing (NSR). As described above GRES will only sycnh kernel and chassis demon between primary and backup routing engines but will not sycnh routing demon (rpd) between two routing engines. In case of fail-over of primary routing engine new primary will start the routing demon and all peer ship will be re-established. Non Stop Active Routing helps to avoid this scenario.
If we compar fig 1 for GRES and below fig for NSR we can see rpd is also now added to backup rotuing engine.
Follow these commands to enable and verify NSR
set system commit synchronize
set routing-options nonstop-routing
Results can be verified by:-
run show task replication
Stateful Replication: Enabled
RE mode: Master
Protocol Synchronization Status
- Non Stop Bridging. NSB is very simiar to NSR, but works for Layer 2 protocols such xSTP, LLDP, LLDP-MED, LACP. On backup RE starts l2cpd, which controls layer 2 protocols in JunOS.
set protocols layer2-control nonstop-bridging