High availability is one of the important consideration during network design and deployment stage and all most all the network vendors support various high availability features.

The objective of this article is to describe Junos best practices required to achieve minimum downtime in case of fail-over scenarios.

The Routing Engine or Control Plan is the brain in Junos based devices to run and execute all the management functions. Most of the  Junos based devices offers redundant routing engines (either through default configuration or through explicit configuration virtual chassis ). At one time only one Routing engine can be active (exception of Active-Active MC-LAG which is beyond the scope of this blog).  The mere presence of 2nd routing engine in the Junos device will not add any advantage with respect to high availability until certain features are not configured.

  •  Grace-full Routing Engine Switch Over  (GRES). GRES enables synchronization of kernel and chassis demon between mater routing engine and backup routing engines and in case of failure of master routing Packet Forwarding Engine (PFE) will simply join to new master routing engine (which was backup routing before fail-over).

Preparing for a Graceful Routing Engine Switchover

 

Graceful Routing Engine Switchover Process

GRES can be configured by following configuration command:-

set chassis redundancy graceful-switchover

Effects of above configuration can be monitored on backup RE

{backup:1}

show system switchover

fpc1:————————————————

Graceful switchover: On

Configuration database: Ready

Kernel database: Ready

Peer state: Steady State

If GRES is not enabled and primary routing engines fails then kernel demon will restart on new master routing engine , after this chassis demon will restart and traffic will contentiously drop during this whole process.  

  • Non Stop Active Routing (NSR). As described above GRES will only sycnh kernel and chassis demon between primary and backup routing engines but will not sycnh routing demon (rpd) between two routing engines. In case of fail-over of  primary routing engine new primary will start the routing demon and all peer ship will be re-established. Non Stop Active Routing helps to avoid this scenario.

 

If we compar fig 1 for GRES and below fig for NSR we can see rpd is also now added to backup rotuing engine.

Nonstop Active Routing Switchover Preparation Process

Nonstop Active Routing During a Switchover

 

Follow these commands to enable and verify NSR

set system commit synchronize
set routing-options nonstop-routing
Results can be verified by:-

run show task replication

Stateful Replication: Enabled

RE mode: Master

Protocol Synchronization Status

OSPF Complete

  • Non Stop Bridging. NSB is very simiar to NSR, but works for Layer 2 protocols such xSTP, LLDP, LLDP-MED, LACP. On backup RE starts l2cpd, which controls layer 2 protocols in JunOS.

set protocols layer2-control nonstop-bridging