EVPN Based Data Center Interconnect- Juniper Design Options and Config Guide

1       Data Center Inter-Connect (DCI)

DCI was always a challenge in days of VPLS and other vendor specific layer 2 extension technologies. Main challenge was how and where to integrate layer 2 and layer 3 e.g VPLS does offer layer 2 extension between 2 DCs but main challenge was where to configure layer 3 gateways and how to maintain ARP entry for gateway inside a Virtual Machine (VM) if VM moves from once DC to another DC.

EVPN gives answer to all those questions as we can create MAC-VRF along with Integrated Routing and Bridging (IRB) interface for a VLAN and that IRB interface can also be referred under standard L3 VRF if L3 extension is required between DCs. Thus, EVPN allows to combines L2 and L3 at L3 VTEP layer. Furthermore, we can configure same “virtual-gateway” on all L3 VTEPs for a VLAN. This scenario will allow a VM to maintain the ARP entry for the gateways if it moves from one DC to another DC.

 

1.1       Option 1 

In each Data Center “Collapsed IP CLOS” is recommended to be configured if DCI Option 1 is selected for Layer 2 extension between the DCs.  One leaf node from each DC can be selected as DC gateway node and its loopback IP can be advertised to the other DC through existing L3 VPN. Once loopback IP address of remote DC gateway leaf node is reachable on the local DC gateway leaf node, over the top EVPN-VxLAN can be configured in usual manner.

At same time if there is a requirement to extend only layer 3 connectivity from a DC to another site, we need to advertise layer 3 gateways configured on leaf node toward core layer on overlay BGP session. Once layer 3 gateways are available on core layer same can be advertised toward PE router by use of any dynamic routing protocol. On PE router, these routes can be advertised to any remote site through L3 VPN.

 DCI-1

 

1.2       Option 2

Control plane and Data Plane flow of sequence is as under: –

  • VxLAN gateways are configured on core layer, VxLAN encapsulated packets arrives on core layer.
  • Core layer will de-encapsulate VxLAN packets and will be forwarded VLAN tagged packets to PE router.
  • On PE router VLAN tagged interfaces (connected to Data Center core layer) will be configured under MAC VRF.
  • PE routers are configured to participate in MP-iBGP session along with EVPN signaling and MPLS based forwarding.
  • APR entries arriving from Core layer on PE router for each VLAN will be shared with remote PE through EVPN based control plane.
  • Once remote PE receive the EVPN type 2 routes, it will de-encapsulate EVPN routes and share it with te attached core layer as VLAN tag packets.
  • The remote core layer will share the received Ethernet frames with leaf layer through IP-CLOS (EVPN-VxLAN based signaling and forwarding plane).
  • Severs connected on both Data Centers will be able to communicate through MPLS based Data Plane running on Svc provider network, as their ARP entries have already been shared through EVPN based control plane.

 

DCI-2

IP-CLOS configuration has been covered above in IP-CLOS section,  only PE routers and Core routers uplink configuration will be explained here. Below snippet is showing  Core-1 uplinks and MAC VRF configuration, Core-2 uplink and MAC VRF will be configued in same way.  Each core router has one uplink connected with PE router, in order to depict both Core devices as single device on PE router,   aggregate interfaces will be configued on Core router with same system ID and on PE router single aggregate link will be configured for the interfaces connected with core layer.

Interfaces {

xe-0/0/4 {

gigether-options {

802.3ad ae0;

}

}

}

ae0 {

description Connected-with-Core;

flexible-vlan-tagging;

encapsulation flexible-ethernet-services;

aggregated-ether-options {

lacp {

active;

system-id 00:55:00:44:55:00;

}

}

unit 10 {

family bridge {

interface-mode trunk;

vlan-id-list 10;

}

}

unit 20 {

family bridge {

interface-mode trunk;

vlan-id-list 20;

}

}

}

}

routing-instances {

tenat-1 {

vtep-source-interface lo0.0;

instance-type virtual-switch;

interface ae0.10;

interface ae0.20;

route-distinguisher 172.172.1.5:1;

vrf-import EVPN-IMPORT;

vrf-target target:1:10;

protocols {

evpn {

encapsulation vxlan;

extended-vni-list [ 1000 2000 ];

vni-options {

vni 1000 {

vrf-target target:10:1000;

}

vni 2000 {

vrf-target target:10:2000;

}

}

multicast-mode ingress-replication;

}

}

bridge-domains {

BD-10 {

vlan-id 10;

routing-interface irb.10;

vxlan {

vni 1000;

}

}

BD-20 {

vlan-id 20;

routing-interface irb.20;

vxlan {

vni 2000;

}

}

}

}

}

Below configuration snippet is taken from PE1 (left side Data Center), other side PE configuration will be same.

interfaces {

xe-0/0/0 {

gigether-options {

802.3ad ae0;

}

}

xe-0/0/1 {

gigether-options {

802.3ad ae0;

}

}

ae0 {

flexible-vlan-tagging;

encapsulation flexible-ethernet-services;

aggregated-ether-options {

lacp {

active;

}

}

unit 10 {

family bridge {

interface-mode trunk;

vlan-id-list 10;

}

}

unit 20 {

family bridge {

interface-mode trunk;

vlan-id-list 20;

}

}

}

MAC VRF needs to be configured on PE routers, MAC VRF configured in 5-Stages or 3-Stages IP-CLOS section is bit different from below configuration. PE routers are enabled for MPLS based forwarding so VxLAN encapsulation configuration is not required here.

routing-instances {

EVPN-MPLS {

instance-type virtual-switch;

interface ae0.10;

interface ae0.20;

route-distinguisher 173.173.173.1:1;

vrf-target {

import target:10:1000;

export target:10:1000;

}

protocols {

evpn {

extended-vlan-list [ 10 20 ];

}

}

bridge-domains {

BD-10 {

vlan-id 10;

}

BD-20 {

vlan-id 20;

}

}

}

}

1.3       Option 3

Control plane and Data Plane flow of sequence is as under: –

DCI-3

  • VxLAN gateways are configured on core layer, VxLAN encapsulated packets arrives on core layer.
  • Core layer will de-encapsulate VxLAN packets and will be forwarded VLAN tagged packets to PE router.
  • On PE router VLAN tagged interfaces (connected to Data Center core layer) will be configured under MAC VRF.
  • PE routers are configured to participate in MP-iBGP session along with EVPN signaling and MPLS based forwarding.
  • APR entries arriving from Core layer on PE router for each VLAN will be shared with remote PE through EVPN based control plane.
  • Once remote PE receive the EVPN type 2 routes, it will de-encapsulate EVPN routes and share it with te attached core layer as VLAN tag packets.
  • The remote core layer will share the received Ethernet frames with leaf layer through IP-CLOS (EVPN-VxLAN based signaling and forwarding plane).
  • Severs connected on both Data Centers will be able to communicate through VxLAN based Data Plane running on Svc provider network, as their ARP entries have already been shared through EVPN based control plane.
  • The forwarding plane in Svc provider core network is based on VxLAN, which means on PE router we need to configured EVPN-Signaling with VxLAN forwarding.

It is possible to simulate Core and PE router in same MX device by creating 2 separate MAC-VRF, Ethernet packets from Core MAC-VRF can be extended till PE MAC-VRF by using logical tunnel (lt) interface. In this scenario, Virtual Network Identifiers (VNI) for each VxLAN used in both MAC-VRF must be different as same VNI cannot be used in more than one MAC-VRF.

5 stages IP-CLOS configuration will not be covered here as it is covered in detail in IP-CLOS section, extension of Ethernet frame from Core layer till PE router has also been covered above in “’DCI Option 2” section. Here we will cover only MAC-VRF configuration for PE router only to show how VxLAN based forwarding plane is configured in service provider network. There is no significance difference in MAC-VRF configuration on PE router and Core layer except the difference in VNI values (reason is explained in above paragraph).

routing-instances {

tenaet1 {

vtep-source-interface lo0.0;

instance-type virtual-switch;

interface ae0.10; Aggregate link toward Core router carrying Ethernet tagged frames

interface ae0.20;

route-distinguisher 173.173.173.1:1;

vrf-import EVPN-VXLAN;

vrf-target target:1:100;

protocols {

evpn {

encapsulation vxlan;

extended-vni-list [ 10000 20000 ];

vni-options {

vni 10000 {

vrf-target target:100:10000;

}

vni 20000 {

vrf-target target:100:20000;

}

}

multicast-mode ingress-replication;

}

}

bridge-domains {

BD-10 {

vlan-id 10;

vxlan {

vni 10000;

ingress-node-replication;

}

}

BD-20 {

vlan-id 20;

vxlan {

vni 20000;

ingress-node-replication;

}

}

}

}

}

Aggregate interface (carrying Ethernet tagged packets) toward Core layer is give below

interfaces {

ae0 {

flexible-vlan-tagging;

encapsulation flexible-ethernet-services;

aggregated-ether-options {

lacp {

active;

}

}

}

unit 10 {

encapsulation vlan-bridge;

family bridge {

interface-mode trunk;

vlan-id-list 10;

}

}

unit 20 {

encapsulation vlan-bridge;

family bridge {

interface-mode trunk;

vlan-id-list 20;

}

}

}

 

MAC-VRF and uplink configuration for Core layer is given below.

routing-instances {

tenat1 {

vtep-source-interface lo0.0;

instance-type virtual-switch;

interface ae0.10; Uplink interface carrying VLAN tagged packets towards PE router

interface ae0.20;

route-distinguisher 172.172.1.3:1;

vrf-import EVPN-IMPORT;

vrf-target target:1:10;

protocols {

evpn {

encapsulation vxlan;

extended-vni-list [ 1000 2000 ];

vni-options {

vni 1000 {

vrf-target target:10:1000;

}

vni 2000 {

vrf-target target:10:2000;

}

}

multicast-mode ingress-replication;

}

}

bridge-domains {

BD-10 {

vlan-id 10;

routing-interface irb.10;

vxlan {

vni 1000;

ingress-node-replication;

}

}

BD-20 {

vlan-id 20;

routing-interface irb.20;

vxlan {

vni 2000;

ingress-node-replication;

}

}

}

}

}

Each core router has one uplink connected with PE router, in order to depict both Core devices as single device on PE router,   aggregate interfaces will be configued on Core router with same system ID.

Interfaces {

xe-0/0/4 {

gigether-options {

802.3ad ae0;

}

}

}

ae0 {

description Connected-with-Core;

flexible-vlan-tagging;

encapsulation flexible-ethernet-services;

aggregated-ether-options {

lacp {

active;

system-id 00:55:00:44:55:00;

}

}

unit 10 {

family bridge {

interface-mode trunk;

vlan-id-list 10;

}

}

unit 20 {

family bridge {

interface-mode trunk;

vlan-id-list 20;

}

}

}

}

1.4      Option-4

Option 4 is used once dark fiber is available between 2 DCs, MP-iBGP with EVPN signaling and VxLAN forwarding plane can be configured. As there is no involvement with Svc provider network so it is matter of our own where to connect dark fiber and Data Center interconnect, it can be either between 2 border leaf nodes (one from each DC) or between PE routers.

If leaf nodes are selected for Data Center gateways, then configuration is covered in section above “DCI Option 1”. If PE routers are selected for DC gateways, then “DCI Option 3” section above covers the design considerations and configuration guidelines.

DCI-4

1.5        Conclusion

To select the appropriate DCI model needs deliberate considerations. The real challenge will be handling of EVPN routing entries by the FIB/RIB at Inter DC gateway nodes, as  EVPN type 2 routes (MAC+IP) for each host inside a DC will be shared with other DC and each EVPN type 2 route has subnet mask of /304. Volume of EVPN 2 routes needs to be shared between 2 web scale Data Center for layer 2 DCI will be mammoth and have serious performance degradation impact on Inter DC gateway nodes. EVPN type 5 routes provide solution to this challenge. EVPN type 5 routes means we don’t use same subnet for a VxLAN in each DC so layer 2 extension is not required between the DCs. In this case, only IP subnets needs to be shared between DCs using EVPN control plane and no need for EVPN type 2 routes to be shared between DCs. EVPN type 5 routes implementation is not covered in this document.

Juniper IP-CLOS (EVPN-VxLAN) Data Center – Design Options and Config Guide

1        Overview

IP-CLOS provides scalable option for large scale Data Center for hosting providers or Infrastructure as a Service (Iaas) model.  IP-CLOS model consists of spine and leaf layer switches, where leaf layer switches provides direct connectivity to Bare Metal Servers (BMS), hypervisor based servers or other network devices (e.g Firewall, Load balancer) for services layer. Each leaf device is connected to all spine devices through high speed link, connectivity between spine and leaf is IP based thus offering ECMP (equal cost multipath) load balancing over IP links.

The question arises why we need IP-CLOS based Date Center, the main and primary reason is to remove the upper limit on maximum number of VLANs. In switching based Data Center (traditional 3-Tier i.e Core, Distribution & Access) or modern Data Center (Spine and Leaf based switching fabric or flat switching fabric e.g Juniper Virtual Chassis Fabric and Juniper QFabric) we still have an upper limit on available VLANs inside single Data Center i.e 4096. In IP-CLOS based Data Center VLAN values are not significant and once traffic received on leaf layer from sever/ external network devices it will be converted into VxLAN packets and will be identified by Virtual Network Identifier (VNI). In VxLAN header 24 bits are used to represent a VNI so practically we can use 16 million VxLAN inside a Data Center.

2nd reason to choose IP-CLOS based Data Center is usage of MP-BGP with EVPN control plane for exchange and learning of routes (Ethernet packets converted into EVPN routes) instead of exchanging newly learned MAC addresses in forwarding plane which has serious impact on network performance when quantity of severs increased significantly.  Broadcast, Unknown Unicast and   Multicast traffic (BUM) always has impact on network devices performance but in IP-CLOS BUM traffic is handled in control plane.

3rd reason to choose IP-CLOS based Data Center is again usage of MP-BGP as control plane, as we know BGP is stable and very scalable protocol and IP-CLOS based Data Center will not be limited only to physical boundaries but can spread across multiple geographical locations and still will be treated as single Data Center.

4th reason to choose IP-CLOS based Date Center is usage of Ether VPN (EVPN) in control plane. EVPN offers us option for sever active-active multi-homing to multiple access switches (without direct connection between access switches) and it also offer combination of layer 2 and layer 3 traffic (which is very beneficial for Data Center Interconnect) as traditional Data Center Interconnect technique (e.g VPLS) has limitation on combination of L2 and L3 operation.

Detailed functionality and architecture of EVPN and VxLAN is not in scope of this the document, however this document is prepared to discuss design options for IP-CLOS based Data Centers and to provide working configurations for each option under consideration. All the configurations have been tested extensively by using Juniper vMX and vQFX virtual appliances.

 

2        Solution Components

2.1     Underlay Network

Links between leaf and spine layer will be configured with IP addresses with a specific purpose to provide transport network for overlay networks.  As best practice, a super net should be selected for IP addresses configuration on spine-leaf links with contiguous subnet of /31 on each link.   EBGP is best dynamic routing protocol for IP fabric due to its stability and support for Equal Cost Multipath Load Balancing (ECMP) feature. Each leaf node will have 1 EBGP neighbor ship with each spine node.

Leaf nodes needs to reach other loopback IP address to establish VxLAN tunnels (which will be explained in later sections).  Let’s suppose Leaf-1 (Figure 2.1) advertise its loopback address to both spine devices through EBGP session, spine devices will further advertise Leaf-1 loopback address to Leaf-2. Thus, Leaf-2 will have 2 copies of Leaf-1 loopback address. By configuring multi-path knob on leaf nodes along with load balancing policy for forwarding-table, we can ensure traffic load sharing on multiple links between leaf and spine nodes while each leaf node tries to reach out other leaf node loopback address.

Next important question is EBGP AS number design considerations.

  • To use different AS number for each device at all layers is the simplest option in term of configuration but difficult to manage if leaf and spine devices are in large quantity.
  • To use same AS number for all devices at same layer, its reduces the complexity involved in management of large pool of AS numbers but with added complexity in configuration.

2.2     Overlay Network

Overlay network provides necessary functionality to enable sever to server communication connected either with same leaf node or different leaf nodes. Overlay networks uses underlay network as transport infrastructure.

2.3     EVPN

Ether VPN is new industry standard which offer extensions of layer 2 networks over layer 3 transport networks. EVPN family can be enabled within MP-iBGP configuration stanza and rest all details will be handled by Junos. For understanding purpose different type of EVPN routes are briefly explained here.  Detail discussion on functionality of EVPN is out of scope for this document.  In simple word once Ethernet packets arrives on access ports of leaf layer those packets will be converted into EVPN routes and shared with other leaf nodes through MP-BGP based control plane.  Types and function of EVPN routes is briefly discussed below.

  • Type 1 Routes are used to identify a LAN segment by using 10 octet wide Ethernet Segment Identifier. A LAN segment is defined as set 2 or more ports from different leaf nodes connected to same end device/ server.
  • Type 2 Routes are used to transport IP/MAC addresses over control plane.
  • Type 3 Routes are used to handle multicast/ BUM traffic by usage of inclusive multicast service interface.
  • Type 4 routes are used to select “Designate Forwarder” over LAN segment, let’s suppose a situation where 1 sever is connected to 3 leaf devices and links from all leaf nodes are extending same VLANs to the server. Now what happens once BUM traffic arrives for a VLAN which leaf node will handle it.  Type 4 routes will come to rescue and each LAN segment will select one leaf node as “Designate Forwarder” per VNI.
  • Type 5- Inter subnet traffic which uses same VNI but different IP subnet (typically used for Inter DC traffic)

2.4     VxLAN

Access ports connected with sever ports usually configured as access or trunk ports by allowing VLANs required on specific server. On each leaf node VLAN traffic is converted to VxLAN traffic. This conversion is simply encapsulation of layer 2 traffic into layer 3 UDP header where each VLAN is mapped to specific Virtual Network ID (VNI) which identifies the packet belongs to specific VxLAN. UDP encapsulated traffic (VxLAN) will carry Ethernet packets from source VTEP to destination VTEP. Once VxLAN packets reaches the destination those will be de-encapsulated into Ethernet and handle as per Ethernet packet processing technique.  The question arises how to co-relate VxLAN with EVPN, if we recall service provider arena; customer routes (L2/ L3 VPN) are shared between Pes through MP-BGP control plane and once it comes to forwarding plane unified MPLS based forwarding plane handle all transit traffic   between PEs. In IP-CLOS; EVPN will handle control plane functionality and VxLAN will handle forwarding plane functionality.

2.5     VTEP

Devices which perform function to convert VLAN to VxLAN traffic or vices-versa are known as Virtual Tunneling End Points (VTEPs).  A device will be known as layer 2 VTEP if it only performs VLAN and VxLAN conversion and will be known as layer 3 VTEP if it allows inter-VxLAN communication which involves layer 3 gateways. All leaf devices will be configured necessarily as L2 VTEP and L3 VTEP placement is matter of choice (discussed in detail in one of the later section).  Leaf nodes (can be L2 or L3 VTEP) can be referred as PE router if we co-relate IP-CLOS based network to Svc provider network and server/ network devices connected with leaf nodes can be termed as CE devices.

2.6     BGP Extended Communities

Hence MP-BGP will be used for overlay network for transportation of EVPN routes through control plane, BGP route target extended community will be used to export/ import EVPN routes. Different variants of BGP route target extended community are discussed in configuration section.

3        Solution Option

3.1     3-Stages IP CLOS

3Stages-IP-CLOS-MX

                                                                              Figure (2.1)

Bare metal servers, hypervisor based server or network devices are connected to leaf layer. Layer 3 gateway for each VLAN/VNI will be configured at spine layer thus servers (on different VLANs) connected either on same or different leaf node will be 3 hops away from each other.

3.2     5-Stages IP CLOS  

 5Stages-IP-CLOS

                                                                                   

 Fabric layer is added above spine layer to provide Inter-POD or Inter-Data Center connectivity. Layer 3 gateways for each VLAN/VNI will be configured on fabric layer thus 5 hops will be involved for inter-server communication (on different VLAN/VNI).

3.3     Collapsed IP CLOS

C-IP-Fabric

Physical connectivity for collapsed IP-CLOS is like 3-Stages or 5-stages IP-CLOS, however major difference is placement of layer 3 gateways for each VLAN/ VNI which will be on leaf layer.

  • In all IP-CLOS solution; Intra-VLAN / VNI communication between two servers connected to same leaf node will be done on same leaf node.
  • In 3-Stages or 5-Stages Inter-VLAN/ Inter-VNI communication will happen on Spine or Fabric layer sequentially.
  • In collapsed IP-Fabric Solution Inter-VLAN/ Inter-VNI communication between 2 servers connected on same leaf node will happen on same leaf node.

 

4        Underlay Configuration

As explained above, sole purpose of underlay network is to provide IP connectivity between Spine and Leaf nodes and to re-distribute leaf/spine nodes loopback IP addresses to each other. The loopback reachability will be used to form overlay networks. EBGP is best suited to build underlay networks, IP-CLOS type (3-Stage, 5-Stage or Collapsed IP-Fabrics) does not have significant impact on EBGP design consideration but AS numbering has significant impact on underlay configuration.

4.1     Different AS Number for Each Device

1st option is to use different AS number for each device at all layer. This is simplest option in terms of configuration requirement but bit difficult to maintain a pool of AS numbers specially if quantity of devices is large.  Configuration will be same for 3-Stage, 5-Stage or Collapsed IP Fabric if each device at each layer uses different AS number.

3Stages-DIFFERENT-AS

4.1.1         Spine Configuration

protocols {

bgp {

group underlay {

export underlay;

local-as 65002;

multipath multiple-as;                                                       

neighbor 192.168.0.1 {

peer-as 65000;

description to-Leaf2;

}

neighbor 192.168.0.5 {

peer-as 65001;

description to-Leaf2;

}

}

}

}

Local-as knob allows us to use multiple autonomous system number, multipath multiple-as allows EBGP to select multiple routes and install in routing table for a NLRI.

 

4.1.2         Leaf Configuration

Below configuration snippet represent Leaf-1 underlay configuration and same configuration will be used by all remaining leaf devices, off-course on each leaf node local-as number will be changed accordingly.

protocols {

bgp {

group underlay {

export underlay;

local-as 65000;

multipath multiple-as;                        

neighbor 192.168.0.0 {

peer-as 65002;

description to-Spine1;

}

neighbor 192.168.0.2 {

peer-as 65003;

description to-Spine2;

 

}

}

4.2     Same AS Number at Same Layer

5Stages-IP-CLOS

  • Points to be considered at Leaf/ Core layer. Hence leaf/ core layers have same AS number so routes received on any leaf from other leaf nodes through spines will not be installed as active route because leaf node will find its own AS number in Network Layer Reachability Information (NLRI). So we need to add configuration knob local-as “loop” to avoid this scenario.
  • Points to be considered at Spine layer.  Spine devices will receive a NLRI from a leaf / core nodes and it will not re-advertise those NLRI to other leaf/ core nodes because of BGP rule that a NLRI will not be re-advertised to EBGP peer if it contains an AS number which matches the receiving node AS number.  To avoid this effect, we need to configure advertise peer-as knob at spine layer under EBGP configuration.

4.2.1         Spine Configuration

protocols {

bgp {

group to-Leaf {

advertise-peer-as;

export underlay;

local-as 65001;

multipath;

neighbor 192.168.0.1 {

peer-as 65000;

description to-Leaf1;

}

neighbor 192.168.0.5 {

peer-as 65000;

description to-Leaf2;

}

}

 

group to-Core {

advertise-peer-as;

export underlay;

local-as 65001;

multipath;

neighbor 192.168.0.9 {

peer-as 65002;

description to-Core-1;

}

neighbor 192.168.0.11 {

peer-as 65002;

description to-Core-2;

}

}

}

}

4.2.2         Leaf Configuration

protocols {

bgp {

group underlay {

export underlay;

local-as 65000 loops 2;

multipath;

neighbor 192.168.0.0 {

description to-Spine-1;

peer-as 65001;

}

neighbor 192.168.0.2 {

description to-Spine-2;

peer-as 65001;

}

}

}

 

4.2.3         Core Layer Configuration

protocols {

bgp {

group underlay {

export underlay;

local-as 65002 loops 2;

multipath;

neighbor 192.168.0.8 {

description to-Spine1;

peer-as 65001;

}

neighbor 192.168.0.12 {

description to-Spine2;

peer-as 65001;

}

}

}

}

5    Common Configuration for Underlay

5.1     Loopback Re-Distribution into EBGP

policy-options {

policy-statement underlay {

term 1 {

from {

protocol direct;

route-filter 0.0.0.0/0 prefix-length-range /32-/32;

}

then accept;

}

}

}

5.2     Load balancing Policy

 

Configuring a load balancing policy and applying it as export policy to the forwarding-table will enable forwarding table to install all active next-hops for a NLRI and load balance egress traffic.

policy-options {

policy-statement lb {

then {

load-balance per-packet;

}

}

routing-options {

forwarding-table {

export lb;

}

}

6        Overlay Configuration

6.1     iBGP and Route Reflector Design

As per iBGP best design practices, 2 Spine devices will be configured as Route Reflector and leaf nodes will be configured as Route Reflector clients. EVPN signaling is mandatory to enable transportation of EVPN routes through control plane.

6.1.1         Spine MP-iBGP

 

routing-options {

autonomous-system 10;

}

protocols {

bgp {

group overlay {

type internal;

local-address 172.172.100.1;

family evpn {

signaling;

}

multipath;

cluster 0.0.0.1;

neighbor 172.172.1.1;

neighbor 172.172.2.1;

}

}

}

Global configuration of AS value will be used for overlay “MP-iBGP”

6.1.2         Leaf MP-iBGP

routing-options {

autonomous-system 10;

}

protocols {

bgp {

group overlay {

type internal;

family evpn {

signaling;

}

multipath;

neighbor 172.172.100.1;

neighbor 172.172.200.1;

}

}

7        L2-VTEP Configuration

7.1     VLAN to VxLAN Conversion

vlans {

vlan-10 {

vlan-id 10;

vxlan {

vni 1000;

ingress-node-replication;

}

}

vlan-20 {

vlan-id 20;

vxlan {

vni 2000;

ingress-node-replication;

}

}

}

VNI should be unique in VxLAN domain, ingress-node-replication defines how to handle BUM traffic.

7.2     Server Access Port

xe-0/0/2 {

esi {

00:11:22:33:44:55:aa:bb:cc:dd;

all-active;

}

unit 0 {

family ethernet-switching {

interface-mode trunk;

vlan {

members 10;

}

}

}

}

 

xe-0/0/3 {

unit 0 {

family ethernet-switching {

interface-mode trunk;

vlan {

members 20;

}

}

}

}

ESI value defines an Ethernet segment, as it enables an end device/server for active active multi-homing with with multiple leaf nodes. ESI value causes implicit configuration of esi-export/ esi-import policy for advertisement and acceptance of EVPN type 4 routes while using ESI value as BGP extended community.

7.3    EVPN Protocols and Virtual Switch

Under protocols evpn configuration hierarchy, VxLAN encapsulation, VNI values for each VxLAN and VNI specific router target communities are defined. Route target communities will be discussed in detail in later section.

protocols {

evpn {

encapsulation vxlan;

extended-vni-list [ 1000 2000 ];

multicast-mode ingress-replication;

vni-options {

vni 1000 {

vrf-target export target:10:1000;

}

vni 2000 {

vrf-target export target:10:2000;

}

}

}

}

Under switch-options configuration hierarchy VTEP source interface is defined which is always lo0.0. Besides VTEP source interface route-distinguisher is also defined which will uniquely defines the EVPN routes. VRF-import and vrf-target statement will be discussed in detail in later section. In QFX 5110 and 10K series we don’t have option to define multiple virtual switches but in Juniper MX Series router we can define multiple virtual-switches (which helps to maintain multi-tenancy)

switch-options {

vtep-source-interface lo0.0;

route-distinguisher 172.172.2.1:10;

vrf-import evpn-import;

vrf-target target:10:1;

}

8        Route Target Community

When EVPN routes are advertised through control plane, BGP extended route target community is attached to each VNI routes. Receiving VTEPs will match its vrf-target community with all EVPN routes received from remote peer and will accept and install only those route into bgp.evnp.0 routing table whose BGP extended route target community matches its  own BGP extended target community.

8.1     Single Route-Target Policy for All VNIs

protocols {

evpn {

encapsulation vxlan;

extended-vni-list [1000 2000];

multicast-mode ingress-replication;

}

switch-options {

vtep-source-interface lo0.0;

route-distinguisher 172.172.1.1:10;

vrf-target target:10:1;

}

Vrf-target statement will add implicit export & import policy which will further add BGP extended community to all out going EVPN routes (except type 4). It will also import all incoming routes into bgp.evpn.0 routing table which matches the BGP extended community  value. This method has serious implication on scalability, e.g if any leaf node is not interested in a specific VNI routes even then it will receive all EVPN routes for wanted and unwanted VNIs due to vrf-target statement.

8.2     Per VNI Route Target Policy

Vrf-target statement is defined per VNI, which will cause advertisement of unique BGP extended community per VNI.

protocols {

evpn {

encapsulation vxlan;

extended-vni-list [ 1000 2000 ];

multicast-mode ingress-replication;

vni-options {

vni 1000 {

vrf-target export target:10:1000;

}

vni 2000 {

vrf-target export target:10:2000;

}

}

}

}

Above configuration snippet will only add BGP extended community with type and 2 type 3 routes, for type 1 routes we still need to add vrf-target statement under switch-option configuration hierarchy. There is also explicit need to configure vrf-import policy which should accept all required VNI vrf-target values.  The usage of import statement allows us to control manually what all VNI routes can imported into a specific leaf node.

switch-options {

vtep-source-interface lo0.0;

route-distinguisher 172.172.2.1:10;

vrf-import evpn-import;

vrf-target target:10:1;

}

policy-options {

policy-statement evpn-import {

term 1 {

from community vni-1000;

then accept;

}

term 2 {

from community vni-2000;

then accept;

}

 

term 3 {

from community type-1;

then accept;

}

}

community type-1 members target:10:1;

community vni-1000 members target:10:1000;

community vni-1000 members target:10:2000;

}

8.3     Auto VRF-Target Policy Generation

There is another point of consideration, if thousands of VNIs needs to be configured then configuration of per-vni vrf-target export/ import polices will be a cumbersome task and it can be avoided by generating auto vrf-target.  Type 1 routes still need explicit export and import vrf-target polices.

 

protocols {

evpn {

encapsulation vxlan;

extended-vni-list all;

multicast-mode ingress-replication;

}

}

policy-options {

policy-statement evpn-import {

term 1 {

from community type-1;

then accept;

}

}

community type-1 members target:10:1;

}

switch-options {

vtep-source-interface lo0.0;

route-distinguisher 172.172.1.1:10;

vrf-import evpn-import;

vrf-target

{

target:10:1;

auto;

}

}

9        L3- VTEP Configuration

Inter-VxLAN communication required L3 gateway for each VxLAN and dependent on hardware. Following Juniper product line support inter-VxLAN communication: –

  • Juniper QFX 5110 switches equipped with Broadcom Trident II Plus chipset
  • Juniper QFX 10K series switches equipped with Juniper Q5 Chipset
  • EX 9200 Series switches equipped with Juniper One chipset
  • MX Series Router equipped with Juniper Trio Chipset

9.1     Collapsed IP CLOS 

C-IP-Fabric

QFX 10K and QFX 5110 series switches are ideal for Leaf layer and support inter-VxLAN communication. One important consideration for configuration of L3 gateway is to maintain ARP Entries for gateway inside a virtual machine even if in VM moves its location. 2 methods are available to achieve this goal: –

  • Configure same IP address and MAC address for specific IRB (integrated routing and bridging) interface on each Leaf.
  • Configure “virtual-gateway” statement   under IRB interface hierarchy and define the same IP address. Virtual gateway statement will enable all devices to use same mac address and leaf device configured with virtual-gateway will synch with each other (MAC and gateway IP address) through EVPN control plane.

Both methods have their pros and cons, “virtual-gateway” statements defiantly give us ease of configuration as we don’t have to manually configure MAC address on each leaf device. However, there is limitation on how many number of leaf devices can be configured with virtual-gateway statement (max 64) and it involves additional overhead in EVPN control plane for synchronization of virtual-gateways MAC and IP addresses.

vlans {

vlan-10 {

vlan-id 10;

l3-interface irb.10;

vxlan {

vni 1000;

}

}

vlan-20 {

vlan-id 20;

l3-interface irb.20;

vxlan {

vni 2000;

}

}

}

Interfaces {

irb {

unit 10 {

family inet {

address 10.10.1.1/24 {

                    virtual-gateway-address 10.10.1.254;

}

}

}

unit 20 {

family inet {

address 20.20.1.1/24 {

virtual-gateway-address 20.20.1.254;

}

}

}

}

}

2nd option is to use same MAC address for specific IRB interface on all leaf device

Interfaces {

irb {

unit 10 {

        family inet {

address 10.10.1.1/24;

}

mac 00:00:44:00:55:10;

}

unit 20 {

family inet {

address 20.20.1.1/24;

}

mac 00:00:44:00:55:20;

}

}

}

With static MAC configuration option, MAC/IP address synchronization through EVPN control plane is not required among leaf devices. In this case, we need to configure additional knob under protocols evpn configuration hierarchy “default-gateway do-not-advertise”.

9.2     “3-Stage IP-CLOS”

3Stages-IP-CLOS-MX

In 3-Stage IP-CLOS , L3 gateways are configured on spine devices, QFX 10K series switches or MX series routers are ideal boxes for spine layer. As discussed earlier we have options either to configure same static MAC entry with IRB interfaces on each spine device or to use virtual-gateway-address statement.

Below configuration snippet is taken from a MX series (version 16.1R3), we can create multiple virtual-switches in MX series router as compare to QFX series switches where we cannot create multiple virtual-switches.

routing-instances {

tenat1-sw {

vtep-source-interface lo0.0;

instance-type virtual-switch;

route-distinguisher 172.172.100.1:1;

vrf-import evpn-import;

vrf-target target:1:10;

protocols {

evpn {

encapsulation vxlan;

extended-vni-list [ 1000 2000 ];

multicast-mode ingress-replication;

}

}

bridge-domains {

BD-10 {

vlan-id 10;

routing-interface irb.10;

vxlan {

vni 1000;

ingress-node-replication;

}

}

BD-20 {

vlan-id 20;

routing-interface irb.20;

vxlan {

vni 2000;

ingress-node-replication;

}

}

}

}

}

 

policy-options {

policy-statement evpn-import {

term 1 {

from community type-1;

then accept;

}

term 2 {

from community vni-1000;

then accept;

}

term 3 {

from community vni-2000;

then accept;

}

}

community type-1 members target:1:10;

community vni-1000 members target:10:1000;

community vni-2000 members target:10:2000;

}

9.3     “5-Stages IP-CLOS”

5Stages-IP-CLOS

 

Overlay network (MP-BGP) for “5-Stages IP-CLOS needs deliberate considerations: –

  • Gateway for each VxLAN will be configured at Core/ Fabric layer.
  • Leaf devices need MP-iBGP neighbor ship (family evpn signaling enabled) with core layer.
  • 2 or more spine devices can be configured as router reflector for scalability.
  • Core/Fabric device needs MP-iBGP neighbor ship with leaf devices.
  • Keeping in view the scalability; it is recommended that Core/Fabric devices should also be configured as route reflector client so that single MP-iBGP overlay network will be established between leaf, spine and core/fabric layer.

9.3.1          Leaf Layer Configuration

Leaf node configuration will be same as described in one of above section “L2-VTEP Configuration”

9.3.2          Spine Layer Configuration

protocols {

bgp {

group overlay {

type internal;

local-address 172.172.100.1;

family evpn {

signaling;

}

cluster 0.0.0.1;

multipath;

neighbor 172.172.1.1 {

description Leaf-1;

}

neighbor 172.172.2.1 {

description Leaf-2;

}

neighbor 172.172.0.1 {

description Core-1;

}

neighbor 172.172.0.2 {

description Core-2;

}

}

}

9.3.3          Core Layer 

 

routing-options {

autonomous-system 10;

}

 

protocols {

bgp {

group overlay {

type internal;

local-address 172.172.0.1;

family evpn {

signaling;

}

multipath;

neighbor 172.172.100.1 {

description Spine-1;

}

neighbor 172.172.200.1 {

description Spine-2;

}

}

}

}

 

routing-instances {

tenat1-sw {

vtep-source-interface lo0.0;

instance-type virtual-switch;

route-distinguisher 172.172.100.1:1;

vrf-import evpn-import;

vrf-target target:1:10;

protocols {

evpn {

encapsulation vxlan;

extended-vni-list [ 1000 2000 ];

multicast-mode ingress-replication;

}

}

bridge-domains {

BD-10 {

vlan-id 10;

routing-interface irb.10;

vxlan {

vni 1000;

ingress-node-replication;

}

}

BD-20 {

vlan-id 20;

routing-interface irb.20;

vxlan {

vni 2000;

ingress-node-replication;

}

}

}

}

policy-options {

policy-statement evpn-import {

term 1 {

from community type-1;

then accept;

}

term 2 {

from community vni-1000;

then accept;

}

term 3 {

from community vni-2000;

then accept;

}

}

community type-1 members target:1:10;

community vni-1000 members target:10:1000;

community vni-2000 members target:10:2000;

}

10    Conclusion

IP-CLOS with EVPN-VxLAN defiantly offers solution for next generation data center however it needs lot of deliberate considerations for proper design, configuration and operation & maintenance of IP-CLOS based data center.  Underlay and overlay BGP and L3 VTEP configuration are major factors which differentiate among various IP-CLOS design options.

 

Multistage MC-LAG in Data Center

1       Executive Summary

Compute virtualization and converged infrastructure has introduced tremendous changes in Data Center networks.  Traditional network design (Core, Aggregation and Access layers) coupled with Spanning tree protocol for management of layer 2 loops could not simply afford requirements of virtual machine mobility and elephant flows required for modern applications. All major network vendors have collaborated and brought new technologies to solve modern day Data Center challenges. 3 tier traditional networks are being replaced with flat switching fabric or scalable IP-Fabric.

2       Multi-Chassis LAG, A Solution

Multi-Chassis Link Aggregation Group is   another solution besides “Switching Fabric and IP Fabric” where access devices or servers can have active-active connectivity and traffic load sharing on links connected with 2 different network devices.  The basic idea is to prune effects of spanning tree protocol and offer active-active topology and redundancy for link and device safe fail-over.

In this solution paper; we will discuss how to design a Data Center network for small to medium organization with collapsed core architecture (Core and aggregation layers combined in single layer) with active-active multi-homing between server and access layer switches and active-active multi-homing between access and core layer network devices. Thus completely removing spanning tree within Data Center while all switches have active control and forwarding plane with end to end device and link level redundancy.

The question arises why do we need MC-LAG when already other high availability solution (e.g Juniper Virtual Chassis or Cisco VSS) exists. Out of several reasons few important ones are listed below: –

  • Juniper Virtual Chassis or Cisco VSS depends on specific type of merchant chip-set (usually supplied by Broadcom), Virtual Chassis feature may not be supported or not stable on customized chip-set (e.g Juniper One –Used in EX 9200, Q5 used in Juniper QFX 10k and Trio chip-set used in Juniper MX router)
  • Virtual Chassis offers us only 1 active control plane with multiple forwarding plane while MC-LAG not only offers us active-active forwarding plane but also offers active-active control plane in both MC-LAG peers.
  • MC-LAG is good choice once we are not deploying green field Data Center and needs to upgrade either Core or access layer switches in production data center and need to integrate switches from mutli-vendor at different layers.

 

3       Reference Topology

 

 MC-lag

Note: Multi-stage MC-LAG is high scaleable, max no of leaf devices depends on of ports available on spine or core nodes

 

4       Connectivity Description

4.1       Server to Access Switches Connectivity

Server has dual link connected to two separate leaf devices (access layer switches). Both links of server will participate in topology in active-active mode although they are connected with two separate switches.  In order to prevent layer 2 loops between server and access switches Multi-Chassis LAG (MC-LAG) will be configured on Leaf 1 and Leaf 2 and server will not come to know that it is connected with two separate devices.

4.2       Leaf to Leaf Connectivity

Leaf 1 and Leaf 2 will run Inter Chassis Control Protocol (ICCP) to exchange control states and for configuration synch checks.  Moreover, Inter-Chassis link will span all VLANs between 2 leaf devices in order to exchange forwarding plane states. Ae0 will be used to span all VLANs between Leaf 1 and Leaf 2.

4.3       Access Layer to Core Layer

Each Leaf devices is connected with each core device thus forming cross connectivity among leaf and core devices. Single MC-LAG is configured between core devices and leaf devices, thus providing all active link topology with in Data Center.

4.4       Server VLAN Gateways

Layer 3 interface for all server VLAN will be configured at core layer, however question remains how to provide single gateway for a VLAN as both core switches have separate layer 3 interface for each VLAN. VRRP came to our rescue here and each VLAN will be configured with virtual IP (VIP) address but problem still remains as VRRP can have only one active gateway. But Juniper provide us options to configure Active-Active MC-LAG where both gateway nodes can accept and process traffic.

4.5        Core Layer to DC-Edge

Core layer needs connectivity with service provider network (DC Edge/ PE device) for exchange of data to other Data Center or access to internet. MC-LAG will also be configured between DC Core layer and Svc Provider PE router. With VRRP over IRB and active-active MC-LAG both core nodes can form dynamic routing relationship with PE router and thus can exchange routing information with Svc Provider network (using OSPF / BGP). Svc Provider PE router will see both core nodes as 2 separate next hops and can do traffic load balancing on links connected with Core 1 and Core 2.

4.6       All Active-Active Links

All links starting from sever terminating at leaf nodes, connecting leaf nodes to core nodes and link between core nodes and Svc Provider PE router; are actively participating in the topology thus leaving no link un-utilized.  VRRP over IRB at Core layer and Juniper Active-Active MC-LAG arrangements enables both core nodes to process or load balance layer 3 traffic coming either from Svc provider DC-Edge router or sever traffic coming through leaf nodes.

5       Configuration

5.1       Leaf -1

Set system host-name Leaf-1

set chassis aggregated-devices ethernet device-count 3

set interfaces ae0 aggregated-ether-options lacp active                                 #Inter Chassis link

set interfaces ae0 unit 0 family ethernet-switching interface-mode trunk

set interfaces ae0 unit 0 family ethernet-switching vlan members 10-12    #data vlans

set interfaces ae0 unit 0 family ethernet-switching vlan members 254       #VLAN carrying ICCP traffic

set interfaces xe-0/0/0 ether-options 802.3ad ae0

set vlans vl10 vlan-id 10

set vlans vl11 vlan-id 11

set vlans vl12 vlan-id 12

set vlans vl254 vlan-id 254

set vlans vl254 l3-interface irb.100

set interfaces irb unit 100 family inet address 10.10.1.1/30    # MC-LAG peer establish OSPF neighbor ship

set interfaces lo0 unit 0 family inet address 2.2.2.1/32             #ICCP session will be established over lo0.0 IP

set protocols ospf area 0.0.0.0 interface irb.100

set protocols ospf area 0.0.0.0 interface lo0.0

set protocols iccp local-ip-addr 2.2.2.1                                         #ICCP configuration

set protocols iccp local-ip-addr 2.2.2.1

set protocols iccp peer 2.2.2.1 session-establishment-hold-time 50

set protocols iccp peer 2.2.2.2 redundancy-group-id-list 1 #service ID will be used here

set protocols iccp peer 2.2.2.2 liveness-detection minimum-interval 500

set multi-chassis multi-chassis-protection 2.2.2.2 interface ae0

set switch-options service-id 1

set interfaces ae2 description to-Server-1

set interfaces ae2 aggregated-ether-options lacp active

set interfaces ae2 aggregated-ether-options lacp periodic fast

set interfaces ae2 aggregated-ether-options lacp system-id 00:00:00:00:00:11 #must match on MC-LAG peers

set interfaces ae2 aggregated-ether-options lacp admin-key 11                           # must match on MC-LAG peers

set interfaces ae2 aggregated-ether-options mc-ae mc-ae-id 11                           # must match on MC-LAG peers

set interfaces ae2 aggregated-ether-options mc-ae redundancy-group 1            # must match on MC-LAG peers

set interfaces ae2 aggregated-ether-options mc-ae chassis-id 0                            # must differ on MC-LAG peers

set interfaces ae2 aggregated-ether-options mc-ae mode active-active                 #always active-active

set interfaces ae2 aggregated-ether-options mc-ae status-control active           #Only one node in active

set interfaces xe-0/0/1 ether-options 802.3ad ae2

set interfaces ae1 description to-Core-Layer

set interfaces ae1 aggregated-ether-options lacp active

set interfaces ae1 aggregated-ether-options lacp periodic fast

set interfaces ae1 aggregated-ether-options lacp system-id 00:00:00:00:00:10

set interfaces ae1 aggregated-ether-options lacp admin-key 10

set interfaces ae1 aggregated-ether-options mc-ae mc-ae-id 10

set interfaces ae1 aggregated-ether-options mc-ae redundancy-group 1

set interfaces ae1 aggregated-ether-options mc-ae chassis-id 0

set interfaces ae1 aggregated-ether-options mc-ae mode active-active

set interfaces ae1 aggregated-ether-options mc-ae status-control active

set interfaces ae1 unit 0 family ethernet-switching interface-mode trunk

set interfaces ae1 unit 0 family ethernet-switching vlan members 10-12

set interfaces xe-0/0/2 ether-options 802.3ad ae1

set interfaces xe-0/0/2 description Connected with-Core-1

set interfaces xe-0/0/3 ether-options 802.3ad ae1

set interfaces xe-0/0/3 description Connected with-Core-2

5.2       Leaf 2

Set system host-name Leaf-2

set chassis aggregated-devices ethernet device-count 3

set interfaces ae0 aggregated-ether-options lacp active                                 #Inter Chassis link

set interfaces ae0 unit 0 family ethernet-switching interface-mode trunk

set interfaces ae0 unit 0 family ethernet-switching vlan members 10-12    #data vlans

set interfaces ae0 unit 0 family ethernet-switching vlan members 254       #VLAN carrying ICCP traffic

set interfaces xe-0/0/0 ether-options 802.3ad ae0

set vlans vl10 vlan-id 10

set vlans vl11 vlan-id 11

set vlans vl12 vlan-id 12

set vlans vl254 vlan-id 254

set vlans vl254 l3-interface irb.100

set interfaces irb unit 100 family inet address 10.10.1.2/30   # MC-LAG establish OSPF neighbor ship

set interfaces lo0 unit 0 family inet address 2.2.2.2/32            #ICCP session will be established over lo0.0 IP

set protocols ospf area 0.0.0.0 interface irb.100

set protocols iccp local-ip-addr 2.2.2.2                                         #ICCP configuration

set protocols iccp peer 2.2.2.1 session-establishment-hold-time 50

set protocols iccp peer 2.2.2.1 redundancy-group-id-list 1

set protocols iccp peer 2.2.2.1 liveness-detection minimum-interval 500

set switch-options service-id 1

set multi-chassis multi-chassis-protection 2.2.2.1 interface ae0

set interfaces ae2 description to-Server-1

set interfaces ae2 aggregated-ether-options lacp active

set interfaces ae2 aggregated-ether-options lacp periodic fast

set interfaces ae2 aggregated-ether-options lacp system-id 00:00:00:00:00:11

set interfaces ae2 aggregated-ether-options lacp admin-key 11

set interfaces ae2 aggregated-ether-options mc-ae mc-ae-id 11

set interfaces ae2 aggregated-ether-options mc-ae redundancy-group 1

set interfaces ae2 aggregated-ether-options mc-ae chassis-id 1

set interfaces ae2 aggregated-ether-options mc-ae mode active-active

set interfaces ae2 aggregated-ether-options mc-ae status-control standby #Must differ on MC-LAG peers

set interfaces xe-0/0/1 ether-options 802.3ad ae2

set interfaces ae1 description to-Core-Layer

set interfaces ae1 aggregated-ether-options lacp active

set interfaces ae1 aggregated-ether-options lacp periodic fast

set interfaces ae1 aggregated-ether-options lacp system-id 00:00:00:00:00:10

set interfaces ae1 aggregated-ether-options lacp admin-key 10

set interfaces ae1 aggregated-ether-options mc-ae mc-ae-id 10

set interfaces ae1 aggregated-ether-options mc-ae redundancy-group 1

set interfaces ae1 aggregated-ether-options mc-ae chassis-id 1

set interfaces ae1 aggregated-ether-options mc-ae mode active-active

set interfaces ae1 aggregated-ether-options mc-ae status-control standby

set interfaces ae1 unit 0 family ethernet-switching interface-mode trunk

set interfaces ae1 unit 0 family ethernet-switching vlan members 10-12

set interfaces xe-0/0/2 ether-options 802.3ad ae1

set interfaces xe-0/0/2 description Connected-with-Core-1

set interfaces xe-0/0/3 ether-options 802.3ad ae1

set interfaces xe-0/0/3 description Connected-with-Core-2

5.3       Core-1

set system host-name Core-1

set chassis aggregated-devices ethernet device-count 3

set interfaces ae0 aggregated-ether-options lacp active #Inter Chassis link

set interfaces ae0 unit 0 family bridge interface-mode trunk

set interfaces ae0 unit 0 family bridge vlan-id-list 10-12

set interfaces ae0 unit 0 family bridge vlan-id-list 254

set interfaces ae0 unit 0 family bridge vlan-id-list 200

set interfaces xe-0/0/0 gigether-options 802.3ad ae0

set interfaces xe-0/0/1 gigether-options 802.3ad ae0

set bridge-domains bd10 vlan-id 10

set bridge-domains bd10 routing-interface irb.10

set bridge-domains bd11 vlan-id 11

set bridge-domains bd12 vlan-id 12

set bridge-domains bd12 routing-interface irb.12

set bridge-domains bd200 vlan-id 200

set bridge-domains bd200 routing-interface irb.200

set bridge-domains bd254 vlan-id 254

set bridge-domains bd254 routing-interface irb.254

#layer 3 interface for each VLAN , VRRP configured to provide VIP for each subnet

set interfaces irb unit 10 family inet address 1.1.10.1/24 vrrp-group 10 virtual-address 1.1.10.1

set interfaces irb unit 10 family inet address 1.1.10.1/24 vrrp-group 10 priority 255

set interfaces irb unit 10 family inet address 1.1.10.1/24 vrrp-group 10 accept-data

set interfaces irb unit 11 family inet address 1.1.11.1/24 vrrp-group 11 virtual-address 1.1.11.1

set interfaces irb unit 11 family inet address 1.1.11.1/24 vrrp-group 11 priority 255

set interfaces irb unit 11 family inet address 1.1.11.1/24 vrrp-group 11 accept-data

set interfaces irb unit 12 family inet address 1.1.12.1/24 vrrp-group 12 virtual-address 1.1.12.1

set interfaces irb unit 12 family inet address 1.1.12.1/24 vrrp-group 12 priority 255

set interfaces irb unit 12 family inet address 1.1.12.1/24 vrrp-group 12 accept-data

#ICCP Configuration

set interfaces irb unit 254 family inet address 100.100.100.1/30

set interfaces lo0 unit 0 family inet address 1.1.1.1/32

set protocols ospf area 0.0.0.0 interface irb.254 #OSPF on physical interface connected with Core-2

set protocols ospf area 0.0.0.0 interface lo0.0

set protocols iccp local-ip-addr 1.1.1.1

set protocols iccp peer 1.1.1.2 session-establishment-hold-time 50

set protocols iccp peer 1.1.1.2 redundancy-group-id-list 1 #Must match service-ID value

set protocols iccp peer 1.1.1.2 liveness-detection minimum-interval 800

set multi-chassis multi-chassis-protection 1.1.1.2 interface ae0

set switch-options service-id 1

set interfaces ae1 description to-Leaf

set interfaces ae1 aggregated-ether-options lacp active

set interfaces ae1 aggregated-ether-options lacp periodic fast

set interfaces ae1 aggregated-ether-options lacp system-id 00:00:00:00:00:01

set interfaces ae1 aggregated-ether-options lacp admin-key 1

set interfaces ae1 aggregated-ether-options mc-ae mc-ae-id 1

set interfaces ae1 aggregated-ether-options mc-ae redundancy-group 1

set interfaces ae1 aggregated-ether-options mc-ae chassis-id 0

set interfaces ae1 aggregated-ether-options mc-ae mode active-active

set interfaces ae1 aggregated-ether-options mc-ae status-control active

set interfaces ae1 unit 0 family bridge interface-mode trunk

set interfaces ae1 unit 0 family bridge vlan-id-list 10-12

set interfaces xe-0/0/2 gigether-options 802.3ad ae1

set interfaces xe-0/0/2 description to-Leaf-1

set interfaces xe-0/0/3 gigether-options 802.3ad ae1

set interfaces xe-0/0/3 description to-Leaf-2

set interfaces ae2 description to-DC-Edge

set interfaces ae2 aggregated-ether-options lacp active

set interfaces ae2 aggregated-ether-options lacp periodic fast

set interfaces ae2 aggregated-ether-options lacp system-id 00:00:00:00:00:02

set interfaces ae2 aggregated-ether-options lacp admin-key 2

set interfaces ae2 aggregated-ether-options mc-ae mc-ae-id 2

set interfaces ae2 aggregated-ether-options mc-ae redundancy-group 1

set interfaces ae2 aggregated-ether-options mc-ae chassis-id 0

set interfaces ae2 aggregated-ether-options mc-ae mode active-active

set interfaces ae2 aggregated-ether-options mc-ae status-control active

set interfaces ae2 unit 0 family bridge interface-mode access

set interfaces ae2 unit 0 family bridge vlan-id 200

set interfaces xe-0/0/5 gigether-options 802.3ad ae2

#IRB.200 will be used to form dynamic routing with DC-Edge (BGP in our case)

 set interfaces irb unit 200 family inet address 200.200.200.2/29 vrrp-group 200 virtual-address 200.200.200.1

set interfaces irb unit 200 family inet address 200.200.200.2/29 vrrp-group 200 priority 200

set interfaces irb unit 200 family inet address 200.200.200.2/29 vrrp-group 200 accept-data

 #IRB which needs to run Dynamic routing always need static ARP entry , mac address of opposite core device IRB will be used to bind static APR , show interface irb can be used to get mac on opposite MC-LAG peer device.

 set interfaces irb unit 200 family inet address 200.200.200.2/29 arp 200.200.200.3 l2-interface ae0.0

set interfaces irb unit 200 family inet address 200.200.200.2/29 arp 200.200.200.3 mac 00:05:86:94:9b:f0

EBGP Configuration with DC-Edge

set protocols bgp group DC peer-as 65000

set protocols bgp group DC local-as 65001

set protocols bgp group DC neighbor 200.200.200.4 local-address 200.200.200.2

 #iBGP Configuration with Core-2

set protocols bgp group iBG type internal

set protocols bgp group iBG local-address 200.200.200.2

set protocols bgp group iBG peer-as 65001

set protocols bgp group iBG local-as 65001

set protocols bgp group iBG neighbor 200.200.200.3

#Exporting server subnets to DC-Edge

set protocols bgp group DC export to-bgp

set policy-options policy-statement to-bgp term 1 from protocol direct

set policy-options policy-statement to-bgp term 1 from route-filter 10.10.20.0/24 exact  #rejecting fxp0 to be advertise

set policy-options policy-statement to-bgp term 1 then reject

set policy-options policy-statement to-bgp term 2 from protocol direct

set policy-options policy-statement to-bgp term 2 from route-filter 0.0.0.0/0 prefix-length-range /24-/24

set policy-options policy-statement to-bgp term 2 then accept

 

5.4       Core-2

set system host-name Core-2

set chassis aggregated-devices ethernet device-count 3

set interfaces ae0 aggregated-ether-options lacp active #Inter Chassis link

set interfaces ae0 unit 0 family bridge interface-mode trunk

set interfaces ae0 unit 0 family bridge vlan-id-list 10-12

set interfaces ae0 unit 0 family bridge vlan-id-list 254

set interfaces ae0 unit 0 family bridge vlan-id-list 200

set interfaces xe-0/0/0 gigether-options 802.3ad ae0

set interfaces xe-0/0/1 gigether-options 802.3ad ae0

 

set bridge-domains bd10 vlan-id 10

set bridge-domains bd10 routing-interface irb.10

set bridge-domains bd11 vlan-id 11

set bridge-domains bd12 vlan-id 12

set bridge-domains bd12 routing-interface irb.12

set bridge-domains bd200 vlan-id 200

set bridge-domains bd200 routing-interface irb.200

set bridge-domains bd254 vlan-id 254

set bridge-domains bd254 routing-interface irb.254

#layer 3 interface for each VLAN , VRRP configured to provide VIP for each subnet

set interfaces irb unit 10 family inet address 1.1.10.2/24 vrrp-group 10 virtual-address 1.1.10.1

set interfaces irb unit 10 family inet address 1.1.10.2/24 vrrp-group 10 priority 200

set interfaces irb unit 10 family inet address 1.1.10.2/24 vrrp-group 10 accept-data

set interfaces irb unit 11 family inet address 1.1.11.2/24 vrrp-group 11 virtual-address 1.1.11.1

set interfaces irb unit 11 family inet address 1.1.11.2/24 vrrp-group 11 priority 200

set interfaces irb unit 11 family inet address 1.1.11.2/24 vrrp-group 11 accept-data

set interfaces irb unit 12 family inet address 1.1.12.2/24 vrrp-group 12 virtual-address 1.1.12.1

set interfaces irb unit 12 family inet address 1.1.12.2/24 vrrp-group 12 priority 200

set interfaces irb unit 12 family inet address 1.1.12.2/24 vrrp-group 12 accept-data

#ICCP Configuration

set interfaces irb unit 254 family inet address 100.100.100.2/30

set interfaces lo0 unit 0 family inet address 1.1.1.2/32

set protocols ospf area 0.0.0.0 interface irb.254 #OSPF on physical interface connected with Core-2

set protocols ospf area 0.0.0.0 interface lo0.0

set protocols iccp local-ip-addr 1.1.1.2

set protocols iccp peer 1.1.1.1 session-establishment-hold-time 50

set protocols iccp peer 1.1.1.1 redundancy-group-id-list 1 #Must match service-ID value

set protocols iccp peer 1.1.1.1 liveness-detection minimum-interval 800

set multi-chassis multi-chassis-protection 1.1.1.1 interface ae0

set switch-options service-id 1

set interfaces ae1 description to-Leaf

set interfaces ae1 aggregated-ether-options lacp active

set interfaces ae1 aggregated-ether-options lacp periodic fast

set interfaces ae1 aggregated-ether-options lacp system-id 00:00:00:00:00:01

set interfaces ae1 aggregated-ether-options lacp admin-key 1

set interfaces ae1 aggregated-ether-options mc-ae mc-ae-id 1

set interfaces ae1 aggregated-ether-options mc-ae redundancy-group 1

set interfaces ae1 aggregated-ether-options mc-ae chassis-id 1

set interfaces ae1 aggregated-ether-options mc-ae mode active-active

set interfaces ae1 aggregated-ether-options mc-ae status-control standby

set interfaces ae1 unit 0 family bridge interface-mode trunk

set interfaces ae1 unit 0 family bridge vlan-id-list 10-12

set interfaces xe-0/0/2 gigether-options 802.3ad ae1

set interfaces xe-0/0/2 description to-Leaf-1

set interfaces xe-0/0/3 gigether-options 802.3ad ae1

set interfaces xe-0/0/3 description to-Leaf-2

set interfaces ae2 description to-DC-Edge

set interfaces ae2 aggregated-ether-options lacp active

set interfaces ae2 aggregated-ether-options lacp periodic fast

set interfaces ae2 aggregated-ether-options lacp system-id 00:00:00:00:00:02

set interfaces ae2 aggregated-ether-options lacp admin-key 2

set interfaces ae2 aggregated-ether-options mc-ae mc-ae-id 2

set interfaces ae2 aggregated-ether-options mc-ae redundancy-group 1

set interfaces ae2 aggregated-ether-options mc-ae chassis-id 1

set interfaces ae2 aggregated-ether-options mc-ae mode active-active

set interfaces ae2 aggregated-ether-options mc-ae status-control standby

set interfaces ae2 unit 0 family bridge interface-mode access

set interfaces ae2 unit 0 family bridge vlan-id 200

set interfaces xe-0/0/5 gigether-options 802.3ad ae2

#IRB.200 will be used to form dynamic routing with DC-Edge (BGP in our case)

 set interfaces irb unit 200 family inet address 200.200.200.3/29 arp 200.200.200.2 l2-interface ae0.0

set interfaces irb unit 200 family inet address 200.200.200.3/29 vrrp-group 200 virtual-address 200.200.200.1

set interfaces irb unit 200 family inet address 200.200.200.3/29 vrrp-group 200 priority 100

set interfaces irb unit 200 family inet address 200.200.200.3/29 vrrp-group 200 accept-data

 #IRB which needs to run Dynamic routing always need static ARP entry , mac address of opposite core device IRB will be used to bind static APR , show interface irb can be used to get mac on opposite MC-LAG peer device. 

 set interfaces irb unit 200 family inet address 200.200.200.3/29 arp 200.200.200.2 mac 00:05:86:72:fb:f0

 #EBGP Configuration with DC-Edge

 set protocols bgp group DC1 local-address 200.200.200.3

set protocols bgp group DC1 export to-bgp

set protocols bgp group DC1 peer-as 65000

set protocols bgp group DC1 local-as 65001

set protocols bgp group DC1 neighbor 200.200.200.4

 #iBGP Configuration with Core-1

 set protocols bgp group iBGP peer-as 65001

set protocols bgp group iBGP local-as 65001

set protocols bgp group iBGP neighbor 200.200.200.2 local-address 200.200.200.3

#Exporting server subnets to DC-Edge

set policy-options policy-statement to-bgp term 1 from protocol direct

set policy-options policy-statement to-bgp term 1 from route-filter 10.10.20.0/24 exact

set policy-options policy-statement to-bgp term 1 then reject

set policy-options policy-statement to-bgp term 2 from protocol direct

set policy-options policy-statement to-bgp term 2 from route-filter 0.0.0.0/0 prefix-length-range /24-/24

set policy-options policy-statement to-bgp term 2 then accept

6       Verifications

6.1       Leaf 1

  root@Leaf-1> show iccp

 Redundancy Group Information for peer 2.2.2.2

TCP Connection       : Established

Liveliness Detection : Up

Redundancy Group ID          Status

1                           Up

Client Application: lacpd

Redundancy Group IDs Joined: 1

Client Application: MCSNOOPD

Redundancy Group IDs Joined: None

Client Application: l2ald_iccpd_client

Redundancy Group IDs Joined: 1

 root@Leaf-1> show lacp interfaces ae0                #Inter Chassis link

Aggregated interface: ae0

LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity

xe-0/0/0       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active

xe-0/0/0     Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active

LACP protocol:        Receive State  Transmit State          Mux State

xe-0/0/0                  Current   Fast periodic Collecting distributing

 

root@Leaf-1> show lacp statistics interfaces ae0   #Inter Chassis link

Aggregated interface: ae0

LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx

xe-0/0/0                1301        1287            0            0

root@Leaf-1> show lacp interfaces ae1

Aggregated interface: ae1

LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity

xe-0/0/2       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active

xe-0/0/2     Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active

xe-0/0/3       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active

xe-0/0/3     Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active

LACP protocol:        Receive State  Transmit State          Mux State

xe-0/0/2                  Current   Fast periodic Collecting distributing

xe-0/0/3                  Current   Fast periodic Collecting distributing

root@Leaf-1> show lacp statistics interfaces ae1

#ae1 member interfaces connected to two different core devices and actively receiving/ sending LACP packets, same can also be verified on Leaf-2.

Aggregated interface: ae1

LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx

xe-0/0/2                 323        1291            0            0                         #Interface connected to Core-1

xe-0/0/3                 128        1289            0            0                         #Interface connected to Core-2

root@Leaf-1> show lacp interfaces ae2   #Interface connected with Server-1

Aggregated interface: ae2

LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity

xe-0/0/1       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active

xe-0/0/1     Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active

LACP protocol:        Receive State  Transmit State          Mux State

xe-0/0/1                  Current   Fast periodic Collecting distributing

root@Leaf-1> show lacp statistics interfaces ae2

#ae2 member interface connected to server sending/ receiving packet, same can be verified on Leaf-1.

Aggregated interface: ae2

LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx

xe-0/0/1                  58        1287            0            0

6.2       Leaf 2

root@Leaf-2> show iccp

Redundancy Group Information for peer 2.2.2.1

TCP Connection       : Established

Liveliness Detection : Up

Redundancy Group ID          Status

1                           Up

Client Application: lacpd

Redundancy Group IDs Joined: 1

Client Application: MCSNOOPD

Redundancy Group IDs Joined: None

Client Application: l2ald_iccpd_client

Redundancy Group IDs Joined: 1

root@Leaf-2> show lacp interfaces ae0 #Inter chassis link

Aggregated interface: ae0

LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity

xe-0/0/0       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active

xe-0/0/0     Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active

LACP protocol:        Receive State  Transmit State          Mux State

xe-0/0/0                  Current   Fast periodic Collecting distributing

 

root@Leaf-2> show lacp statistics interfaces ae0

Aggregated interface: ae0

LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx

xe-0/0/0                1271        1296            0            0

 

root@Leaf-2> show lacp interfaces ae1 #link connected to Core-1 and Core-2

Aggregated interface: ae1

LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity

xe-0/0/2       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active

xe-0/0/2     Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active

xe-0/0/3       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active

xe-0/0/3     Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active

LACP protocol:        Receive State  Transmit State          Mux State

xe-0/0/2                  Current   Fast periodic Collecting distributing

xe-0/0/3                  Current   Fast periodic Collecting distributing

root@Leaf-2> show lacp statistics interfaces ae1

#ae1 member interfaces connected to two different core devices and actively receiving/ sending LACP packets, same can also be verified on Leaf-1.

Aggregated interface: ae1

LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx

xe-0/0/2                 113        1296            0            0

xe-0/0/3                 306        1296            0            0

root@Leaf-2> show lacp interfaces ae2 # link connected to server

Aggregated interface: ae2

LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity

xe-0/0/1       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active

xe-0/0/1     Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active

LACP protocol:        Receive State  Transmit State          Mux State

xe-0/0/1                  Current   Fast periodic Collecting distributing

 

root@Leaf-2> show lacp statistics interfaces ae2

#ae2 member interface connected to server sending/ receiving packet, same can be verified on Leaf-1.

Aggregated interface: ae2

LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx

xe-0/0/1                  44        1296            0            0

6.3       Core-1

root@Core-1> show iccp

Redundancy Group Information for peer 1.1.1.2

TCP Connection       : Established

Liveliness Detection : Up

Redundancy Group ID          Status

1                           Up

 

Client Application: lacpd

Redundancy Group IDs Joined: 1

Client Application: l2ald_iccpd_client

Redundancy Group IDs Joined: 1

root@Core-1> show lacp interfaces ae0 #Inter Chassis link

Aggregated interface: ae0

LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity

xe-0/0/0       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active

xe-0/0/0     Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active

xe-0/0/1       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active

xe-0/0/1     Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active

LACP protocol:        Receive State  Transmit State          Mux State

xe-0/0/0                  Current   Fast periodic Collecting distributing

xe-0/0/1                  Current   Fast periodic Collecting distributing

 root@Core-1> show lacp statistics interfaces ae0

Aggregated interface: ae0

LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx

xe-0/0/0                 183         378            0            0

xe-0/0/1                 182         373            0            0

 root@Core-1> show lacp interfaces ae1 #Link connected with Leaf-1 and Leaf-2

Aggregated interface: ae1

LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity

xe-0/0/2       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active

xe-0/0/2     Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active

xe-0/0/3       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active

xe-0/0/3     Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active

LACP protocol:        Receive State  Transmit State          Mux State

xe-0/0/2                  Current   Fast periodic Collecting distributing

xe-0/0/3                  Current   Fast periodic Collecting distributing

root@Core-1> show lacp statistics interfaces ae1

#ae1 member interfaces connected with 2 different Leaf devices and activity sending / receiving LACP packets. Same can be verified on Core-2 and corresponding leaf devices.  

Aggregated interface: ae1

LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx

xe-0/0/2                 348         377            0            0

xe-0/0/3                 345         375            0            0

root@Core-1> show lacp interfaces ae2 #Link connected with DC-Edge /PE Router

Aggregated interface: ae2

LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity

xe-0/0/5       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active

xe-0/0/5     Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active

LACP protocol:        Receive State  Transmit State          Mux State

xe-0/0/5                  Current   Fast periodic Collecting distributing

 root@Core-1> show lacp statistics interfaces ae2

#ae2 connected to DC-Edge/PE router. Same interface on Core-2 also sending/ receiving LACP pockets.

Aggregated interface: ae2

LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx

xe-0/0/5                 324         375            0            0

root@Core-1> show bgp summary

Groups: 2 Peers: 2 Down peers: 0

Table          Tot Paths  Act Paths Suppressed    History Damp State    Pending

inet.0

2          0          0          0          0          0

Peer                     AS      InPkt     OutPkt    OutQ   Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped…

200.200.200.3         65001         12          8       0       0        2:53 0/1/1/0              0/0/0/0 #iBGP with Core-2

200.200.200.4         65000         14         14       0       0        5:16 0/1/1/0              0/0/0/0 #eBGP with DC-Edge

6.4       Core-2

root@Core-2> show iccp

Redundancy Group Information for peer 1.1.1.1

TCP Connection       : Established

Liveliness Detection : Up

Redundancy Group ID          Status

1                           Up

Client Application: lacpd

Redundancy Group IDs Joined: 1

Client Application: l2ald_iccpd_client

Redundancy Group IDs Joined: 1

root@Core-2> show lacp interfaces ae0  #Inter Chassis Link

Aggregated interface: ae0

LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity

xe-0/0/0       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active

xe-0/0/0     Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active

xe-0/0/1       Actor    No   Yes    No   No   No   Yes     Fast    Active

xe-0/0/1     Partner    No   Yes    No   No   No   Yes     Fast   Passive

LACP protocol:        Receive State  Transmit State          Mux State

xe-0/0/0                  Current   Fast periodic Collecting distributing

xe-0/0/1            Port disabled     No periodic           Detached

root@Core-2> show lacp statistics interfaces ae0

Aggregated interface: ae0

LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx

xe-0/0/0                 246         250            0            0

xe-0/0/1                 247         246            0            0

 

root@Core-2> show lacp interfaces ae1#Link connected with Leaf-1 and Leaf-2

Aggregated interface: ae1

LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity

xe-0/0/2       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active

xe-0/0/2     Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active

xe-0/0/3       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active

xe-0/0/3     Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active

LACP protocol:        Receive State  Transmit State          Mux State

xe-0/0/2                  Current   Fast periodic Collecting distributing

xe-0/0/3                  Current   Fast periodic Collecting distributing

root@Core-2> show lacp statistics interfaces ae1

#ae1 member interfaces connected with 2 different Leaf devices and activity sending / receiving LACP packets. Same can be verified on Core-1 and corresponding leaf devices.  

Aggregated interface: ae1

LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx

xe-0/0/2                 228         247            0            0

xe-0/0/3                 226         247            0            0

root@Core-2> show lacp interfaces ae2 #Link connected with DC-Edge /PE Router

Aggregated interface: ae2

LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity

xe-0/0/5       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active

xe-0/0/5     Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active

LACP protocol:        Receive State  Transmit State          Mux State

xe-0/0/5                  Current   Fast periodic Collecting distributing

root@Core-2> show lacp statistics interfaces  ae2

#ae2 connected to DC-Edge/PE router. Same interface on Core-1 also sending/ receiving LACP pockets.

Aggregated interface: ae2

LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx

xe-0/0/5                 244         249            0            0


root@Core-2> show bgp summary

Groups: 2 Peers: 2 Down peers: 0

Table          Tot Paths  Act Paths Suppressed    History Damp State    Pending

inet.0

2          0          0          0          0          0

Peer                     AS      InPkt     OutPkt    OutQ   Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped…

200.200.200.2         65001         12          8       0       0        2:53 0/1/1/0              0/0/0/0 #iBGP with Core-1

200.200.200.4         65000         14         14       0       0        5:16 0/1/1/0              0/0/0/0 #eBGP with DC-Edge

7       Multi-Chassis LAG Important Terms and Concepts

7.1       Inter Chassis Control Protocol

The MC-LAG peers use the Inter-Chassis Control Protocol (ICCP) to exchange control information and coordinate with each other to ensure that data traffic is forwarded properly. ICCP replicates control traffic and forwarding states across the MC-LAG peers and communicates the operational state of the MC-LAG members. It uses TCP as a transport protocol and requires Bidirectional Forwarding Detection (BFD) for fast convergence. Because ICCP uses TCP/IP to communicate between the peers, the two peers must be connected to each other. ICCP messages exchange MC-LAG configuration parameters and ensure that both peers use the correct LACP parameters. ICCP configuration parameters as under: –

7.1.1         Local-IP-Address

IP adress configured on lcoal MC-LAG member that will be used to estblish ICCP session with MC-LAG peer device. (lo0 address is recommended to be used for ICCP peer establishment).

7.1.2         Peer- IP

ICCP Peer-IP address configured on peer MC-LAG member that will be used to estblish ICCP session with local MC-LAG device. (lo0 address is recommended to be used for ICCP peer establishment).

7.1.3         Session-Establishment-Hold-Time

50 seconds is recommended value for faster ICCP connection establishment among MC-LAG peers.

7.1.4         Redundancy-Group-ID-List

it must be same on both MC-LAG peers and will be used in MC-ae configuration, same as value configured under switching-option service-id.

7.1.5         Liveness-Detection Minimum-Interval

BFD session timer to detect failure of mc-lag peer.

7.1.6         Liveness-Detection Multiplier

This multiplier will be used along with liveness-detection minimum-interval to detect failure if ICCP peer, default value is 3.

7.1.7         Back up liveness Detection

Determine whether a peer is up or down by exchanging keep alive messages over the management link between the two Inter-Chassis Control Protocol (ICCP) peers

7.2       Inter Chassis Link (ICL)

ICL is used to forward data traffic across the MC-LAG peers, ICL should be aggregate link and member interfaces must be from different line cards. ICL must span all data VLANs among MC-LAG peers, option ICCP traffic can also traverse same links which are being used as ICL.

7.2.1         Hold Time

The hold-time down value (at the [edit interfaces interface-name] hierarchy level) for the inter-chassis link with the status-control standby configuration to be higher than the ICCP BFD timeout. This configuration prevents data traffic loss by ensuring that when the router or switch with the status-control active configuration goes down, the router or switch with the status-control standby configuration does not go into standby mode.

 

7.3       Multi Chassis Control Aggregated Link (MC-AE)

1 x interface from each member of MC-LAG peer is connected to downstream or upstream network devices or compute machines. The devices connected to MC-LAG peers will not know that they are connected to different devices rather they will treat the link as normal Aggregate Link and continue to load balance traffic over LAG member interfaces.

7.3.1         LACP System-ID

Must be same on both MC -LAG peer but must be unique in MC-LAG configuration from other MC-AE. Its LACP ID that will be transmitted to upward or downward connected devices from both MC-LAG peers and link from both MC-LAG peer who has same system-id will be considered as same LAG member.

7.3.2         LACP ADMIN-KEY

LACP Admin-Key must be same on both MC -LAG peer but must be unique in MC-LAG configuration from other MC-AE interfaces.

7.3.3         MC-ae-ID

MC-ae-ID must be same configuration on both MC -LAG peer but must be unique in MC-LAG configuration from other MC-AE interfaces.

7.3.4         MC-ae Redundancy-Group

Must be same configuration on both MC -LAG peer and it should be as per redundancy-group value configured under ICCP.

7.3.5         MC-ae Chassis-ID

Specify the chassis ID for Link Aggregation Control Protocol (LACP) to calculate the port number of MC-LAG physical member links. Values can be 0 or 1.

7.3.6         MC-ae MODE

MC-ae-Mode is active-active in this topology, it will ensure both MC-LAG peers are actively sending and transmitting data despite the fact that VRRP is master on only 1 MC-LAG peer.

7.3.7         MC-ae Status-Control

MC-ae-Status Control describe the status of MC-AE interface when ICL goes down. It must be active on one MC-LAG peer and standby in other peer.

7.3.8         Prefer-Status-Control-Active

The prefer-status-control-active statement can be configured with the status-control standby configuration to prevent the LACP MC-LAG system ID from reverting to the default LACP system ID on ICCP failure. This configuration option should be used only if it can be ensured that ICCP session will not go down unless the router or switch is down.

7.4       Service-ID

The switch service ID is used to synchronize applications, IGMP, ARP, and MAC learning across MC-LAG members (configuration hierarchy edit switching-option).

7.5       Multi-Chassis-Protection

Multi chassis protection must be configured on one interface (ICL) for each peer. If the Inter-Chassis Control Protocol (ICCP) connection is up and the inter-chassis link (ICL) comes up, the peer configured as standby brings up the multi chassis aggregated Ethernet interfaces shared with the peer (configuration hierarchy edit multi-chassis).

7.6       ARP-l2-Validate

Enables periodic checking of ARP Layer 3 addressing and MAC Layer 2 addressing tables, and fixes entries if they become out of sync among MC-LAG peers (configuration hierarchy edit interfaces irb).

7.7       VRRP over IRB

Integrating Routing and Bridging interface for each VLAN needs to be configured on the MC-LAG peers where layer 3 routing is required among VLANs.  VRRP will be configured to provide single gateway for all VLAN/subnets. VRRP on both master and secondary devices should be configured with “accept-data” knob.

7.8       Static ARP

Static ARP entries for all those IRB interface will be configured which needs to patriciate in dynamic routing peer ship establishment.  MAC for static ARP can be obtained on opposite MC-LAG peer by using command “show interface irb” and IP address will be real IP of IRB interface on opposite MC-LAG peer.

 

 

Starting SDN Learning Journey- Through Open vSwitch

openvswitch.png

Introduction

Software Defined networking (SDN) is no more a new topic but still many Network/ System engineers feel it painful how to start learning SDN. Many SDN solution exists in market and each has its pros and cons. Objective of this blog is to give an idea about SDN basics to the engineers who want to start their SDN learning curve.

Reference topology

  • 2 x Ubuntu host (14.04 LTS) each with multiple NICs
  • Open vSwitch installed in each host and 1 instance created.
  • Virtual Box installed in each host, vBox will be used to host guest virtual machines (VM-A & VM-B)

Topology Description

Open vSwitch (e.g br0) in each host will have following interfaces:-

  • A tap interface which will be used to bind guest VM to Open vSwitch
  • Eth1 of each host will be added to Open vSwitch
  • IP address / sub netmask for Eth1 of each host will be configured on Open vSwitch itself (br0)
  • Guest VM eth1 will be configured with IP/sub net mask different that host IP/ sub net mask
  • VXLAN / GRE will be configured on each host (by using host IP addresses)

Step by Step setting up Lab

It is assumed Ubuntu 14.04 is installed in each host and host machine has connectivity to the internet.

 

  • Install open-vSwitch in each (apt-get install openvswitch-switch)
  • Create an instance of Open-Vswitch (ovs-vsctl add-br br0)
  • Add tunnel interface in each host (ip tuntap add mode tap tap0 , ifconfig tap0 up)
  • Add relevant ports to Open –Vswitch
    • Ovs-vsctl add-port br0 tap0
    • Ovs-vsctl add-port br0 eth1
  • Assign IP address to Open-vSwitch br0 (IP address for the host Ethernet Eth1)
    • Ifconfig eth1 0
    • Ifconfig br0 192.168.100.1 netmask 255.255.255.0
    • Ifconfig br0 up
    • Ifconfig tap0 up
  • Install Virtual-Box in each host (apt-get install virtualbox)
  •    Create Tunnel Interface

Ovs-vsctl add-port br0 GRE – set interface GRE type=gre options:remote_ip=192.168.100.x

or vxlan

Ovs-vsctl add-port br0 VX – set interface VX type=vxlan options:remote_ip=192.168.100.x

(.x is remote host IP)

  • Check the configuration/ status of ports assigned to Open-vSwitch
    • ovs-vsctl show
7d7c9778-ac43-443c-82d9-1efdbcf3ba0e

Bridge “br0”

Port “br0”

Interface “br0”

type: internal

Port “tap0”

Interface “tap0″

Port GRE

Interface GRE

type: gre

options: {remote_ip=”192.168.100.1”}

Port “eth1”

Interface “eth1”

ovs_version: “2.0.2

 

 

  • Start virtual-box and create a VM using any tinny Linux image
  • In VM setting , set network as bridge and select physical interface “tap0”
  • Assign IP address to guest VM (e.g 172.172.1.1/24 to VM-A and 172.172.1.2/24 to VM-B)
  • Start ping from one guest VM to other guest VM
  • Check MAC table on open-vSwitch
port  VLAN  MAC                Age

    1     0  00:0c:29:45:15:06    1

    3     0  00:0c:29:dd:ac:b2    1

    2     0  08:00:27:5e:55:54    1 (VM-B hosted on Host-2,  MAC address learned on local tap interface)

    3     0  08:00:27:04:36:64    1 ( VM-A hosted on Host-1, MAC learned on GRE tunnel interface)

  • root@ubuntu:~# ovs-appctl fdb/show br0
  • root@ubuntu:~# ovs-ofctl show br0
1(eth1): addr:00:0c:29:dd:ac:b2

config:     0

state:      0

current:    1GB-FD COPPER AUTO_NEG

advertised: 10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER AUTO_NEG

supported:  10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER AUTO_NEG

speed: 1000 Mbps now, 1000 Mbps max

 2(tap0): addr:2e:c1:d7:25:5e:18

config:     0

state:      LINK_DOWN

current:    10MB-FD COPPER

speed: 10 Mbps now, 0 Mbps max

3(GRE): addr:e6:da:71:24:70:bb

config:     0

state:      0

speed: 0 Mbps now, 0 Mbps max

 LOCAL(br0): addr:00:0c:29:dd:ac:b2

config:     0

state:      0

speed: 0 Mbps now, 0 Mbps max

Conclusion

Traffic will be initiated from guest VM  and once traffic will reach open-vSwitch from guest VM through tap interface it will be encapsulated in tunnel (gre/vxlan) and will be send to remote host.  Above snippet taken from host-2 shows that VM-A (gust VM on host-1) mac address has been learned on GRE interface (interface 3) and local VM address is learned on tap interface (interface 2).

References:

http://openvswitch.org/

Contrail Integration with Bare Metal Devices via EVPN-VxLAN

In this blog we will discuss how to integrate Bare metal devices with Juniper Contrail (SDN Controller) by using EVPN-VXLAN.

My earlier blogs on Contrail can be viewed on links  Blog-1Blog-2 ,

Reference Topology  

evpn-vxlan

Problem statement “Gust VM spawned inside SDN environment needs to communicate with Bare Metal Device (same sub net or different sub net here we will discuss former use case only).

Solution “EVPN based control plane will be established between MX Router and Contrail Controller to exchange ARP entries between them,  VxLAN based forwarding plane will be configured for communication between Guest VMs and Bare Metal Devices”

Solution components:-

  1. Contrail GUI
    • RED network 2.2.2.0/ is configured and VMs are spawned using open stack “Horizon” Web GUI (not covered in this article)
    • Configure VxLAN as 1st encapsulation method under “Encapsulation Priority Order” go to Configure then  Infrastructure  then Global Config and click edit button.
    • Select VxLAN Identifier Mode as “user configured”
    • Configure VxLAN ID & Route target community  for the desired network
      • Go to Configure  then Networking  then Networks, select the desired network and click edit
      • Add the VxLAN ID (in this example 2000 is used for RED subnet) under Advanced Option
      • Add the route target community which is 200:1 for the RED network
  2. MX Router
    • A Routing Instance “instance type virtual switch” configured “RED-EVPN”
    • Protocol EVPN configured inside routing-instance.
    • Bridge domain 200 and VxLAN VNI ID 2000 under EVPN routing instances
    • Physical port connected with Bare metal devices configured as access port with domain member 200.
    • Route target community on  Virtual-Switch routing instance must match with route target community assigned to RED sub net in Contrail GUI
    • Forwarding plane functionality needs valid route pointing to compute nodes in inet.3 routing table of MX gateway. (If we recall VPN stuff in Junos, next-hop lookup for VRF is always in inet.3. )
    • There are two methods to achieve this  goal:-
      • Adding route in inet.3 RIB dynamically through GRE tunnel.
      • Adding route in inet.3 RIB statically.
    • I choose to go with both methods at same time for following reasons:-
      • Consider that we need to push external routes toward Contrail from MX gateway, in this case MPLS over GRE forwarding plane will be required which is dependent on dynamic GRE tunnel.
      • At same I need EVPN-VXLAN extension from Contrail to  MX gateway for integration of bare metal devices connected on MX router, for this use case I wanted to avoid GRE tunnel to be used in forwarding plane, so adding static route in inet.3 RIB helped me.

Configuration 

admin@ER> show configuration routing-instances RED-EVPN
vtep-source-interface lo0.101;
instance-type virtual-switch;
interface ge-1/1/9.0;                   ##interface connected with bare metal device##
route-distinguisher 192.168.240:1;
vrf-target target:200:1;
protocols {
evpn {
encapsulation vxlan;
extended-vni-list 2000;
multicast-mode ingress-replication;
}
}
bridge-domains {
vlan200 {
domain-type bridge;
vlan-id 200;
routing-interface irb.200;     ##Optional
vxlan {
vni 2000;
ingress-node-replication;
}
}
}

admin@ER> show configuration routing-instances RED

##VRF configuration is optional , this use case is already covered in my earlier bog 

instance-type vrf;
interface irb.200;
interface lo0.1;
route-distinguisher 200:1;
vrf-target target:200:1;       ##Route target is matching with virtual switch route target
vrf-table-label;

admin@ER> show configuration interfaces ge-1/1/9
unit 0 {
family bridge {
interface-mode access;
vlan-id 200;
}
}

admin@ER> show configuration routing-options dynamic-tunnels to-contrail
source-address 101.101.101.101;
gre;
destination-networks {
192.168.243.0/24;
}
admin@ER> show configuration routing-options rib inet.3
static {
route 192.168.243.0/24 receive;
}
admin@ER> show route table inet.3

inet.3: 5 destinations, 6 routes (5 active, 0 holddown, 0 hidden)
+ = Active Route, – = Last Active, * = Both
192.168.243.0/24 *[Static/5] 1d 02:04:03 ##static route added in inet.3 , will be used for EVPN-VXLAN
Receive
[Tunnel/300] 1d 02:02:07
Tunnel
192.168.243.50/32 *[Tunnel/300] 1d 02:02:07 ##Dynamic routes added in inet.3 , will be used for MPLS over GRE forwarding plane
> via gr-0/0/0.32770
192.168.243.51/32 *[Tunnel/300] 1d 02:02:07
> via gr-0/0/0.32769
192.168.243.52/32 *[Tunnel/300] 10:58:22

bgp-summary

red-evpn

bdg-add

mac

Deep Dive- Contrail Data Center Interconnect

In previous blog we discussed high level for  Juniper Contrail Data Center Interconnect and how to connect physical servers with servers deployed inside SDN environment. In this blog we will have deep dive for both scenarios. We will discuss in detail configuration options ,  control plane and data plane operations involved in both options:-

Next blog: Contrail Integration with Bare Metal Devices via EVPN-VxLAN

picture1

Following component are included in reference topology:-

  1. 1 x MX-5 will be configured as Data Center Edge Router
  2. Contrail Control Node
  3. Compute 51 (which has 1 x vRouter)
  4. Compute 52 (Which has 1 x vRouter)
  5. MP-iBGP will be configured by Contrail Control Node between itself and all vRouters.
  6. Contrail node will act as Route Reflector (RR) and all vRouter will act as client to RR.
  7. vRouter will establish GRE tunnel (for data plane forwarding) with all other vRouter .
  8. MX-5 (Data Center Edge Router) will also establish MP-iBGP  peer-ship with Contrail Control node and will establish GRE tunnel with all vRouters and Contrail.

Now if we recall iBGP forwarding rules and co-relate to our environment:-

  1. All vRouter which are RR  clients will transmit routes only to RR.
  2. RR will receive the routes from any of the client and will transmit received routes to all clients (except the vRouter from where the routes came) and to all non-client iBGP neighbor (which is MX-5 here)
  3. MX-5 will transmit routes to Contrail Control Node (RR) and these routes will be subsequently transmitted by RR to all clients.
  4. Full mesh of GRE tunnels will be established between MX-5 and all vRouters which will be used for forwarding plane.

picture2

We have  created 2  x tenants (Red  and Blue) inside virtual Data Center. Red tenant VM also needs to talk with one DB   Servers which is installed in physical compute machines.

  1. 1.1.1.0/24 subnet will be used for Red tenants VMs, route-target value 100:1.
  2. 2.2.2.0/24 subnet will be used for Blue tenant VMs, route-target value 200:1.
  3. IP address for physical Data base server which needs to communicate with Red tenant is 13.13.13.13.

untitled

Above snippet shows BGP configuration on MX-5 and session state with Contrail Control Node. We can see many routing tables showing no of prefixes received and accepted.

untitled6

untitled8

Above snippets show following:-

  1. Dynamic GRE tunnel configuration, in absence of MPLS LSPs inside Data Center GRE tunnels will be used to populate inet.3 routing table.
  2. RED routing instance configuration with route-target community (which must be matching with route-target community configured on Contrail Control Node for RED subnet).
  3. Interface lo0.1 is depicting here Data Base Sever. vrf-table-label causing auto-export of interface routes into BGP after adding route-target community. Without this statement we need to configure routing policy to re-distribute interface routes into BGP.
  4. inet.3 routing tables is showing address for Contrail Control and 2 x other compute nodes which means full mesh GRE tunnel has been established for data plane forwarding.

Now comes to actual routes received from Contrail Control Node for RED VPN:-

untitled7

We can see 3 x routes has been received from 192.168.243.50 which is Contrail Control Node. Now lets see details of  route 1.1.1.3/32:-

untitled2

From above snippet following can be concluded:-

  1. Route 1.1.1.3/32 is learned from 192.168.243.50 (which is Contrail Control Node)
  2. Protocol Next-hop is 192.168.243.52 (which is compute 52 address) so it means RED VM with address 1.1.1.3/32 is located on 192.168.243.52 compute node.
  3. We can also see next hop interface is gr-0/0/0.32771  and with label operation Push (label value of 24).
  4. Route target community value is target:100:1 which depicts RED tenant.

untitled9

Above  snippet is showing Data Base server route being advertised to Contrail Control Node with Route target community 100:1. Contrail Control Node will further re-advertise this route to vRouters on compute 51 and compute 52. On each vRouter route-target community will be checked and if it matches with any VRF then this route will be accepted and installed into that particular VRF.

Now consider a scenario  where we need L2 & L3 Data Center Interconnect (DCI) for RED tenant.

picture3

 

L3 DCI has never been a challenge and requirement can be  meet through IP/VPN or through many other ways (IPSec , GRE etc). The real challenge lies in L2 DCI as traditionally enterprises are dependent either on Dark Fiber or L2 services (VPLS etc) from service provider. Hence both solution involves additional cost.

Now we will look how Ethernet VPN (EVPN) can helps us for L2 DCI and  particularity I will focus,  for EVPN we do not have dependency on service provider network (except layer 3 IP connect). We will configure GRE tunnel between Data Center Edge routers, once GRE tunnel is established we will configure MP-eBGP between both DCs (with family inet-vpn unicast and evpn signaling).

dci

The simple piece of configuration will solve the big problem involved in L2 DCI and real support is MP-BGP (by simply adding evpn signaling) we achieved our target.

 

evpn

Above snippet shows bgp.evpn.0 routing table, now lets explore the entries:-

3:192.168.243.51:1::0::192.168.243.51/304   (3 : <RD> :: <VLAN-ID> :: <ROUTER-ID> /304)

1st digit in the prefix shows the type of  EVPN route, type 3 is Inclusive multicast Ethernet tag route, it depicts multicast tunnel over which un-unknown uni-cast traffic will be forwarded to this particular vRouter (compute machine). 192.168.243.51 depicts compute node 51 and :1 shows one of the 2 VRF which are created on compute 51 (RED or BLUE). Let’s see more detail about this route to discover other information.

3:192.168.243.51:1::0::192.168.243.51/304 (1 entry, 1 announced)
*BGP Preference: 170/-101
Route Distinguisher: 192.168.243.51:1
PMSI: Flags 0x80: Label 16: Type INGRESS-REPLICATION 192.168.243.51
Next hop type: Indirect
Address: 0x26ffb74
Next-hop reference count: 8
Source: 192.168.243.50
Protocol next hop: 192.168.243.51
Indirect next hop: 0x2 no-forward INH Session ID: 0x0
State: <Active Int Ext>
Local AS: 100 Peer AS: 100
Age: 1d 15:02:05 Metric2: 0
Validation State: unverified
Task: BGP_100.192.168.243.50+56806
Announcement bits (1): 1-BGP_RT_Background
AS path: ? (Originator)
Originator ID: 192.168.243.51
Communities: target:100:8000002 target:200:1 unknown iana 30c unknown iana 30c unknown iana 30c unknown type 8071 value 64:5
Import Accepted
Localpref: 100
Router ID: 192.168.243.50

PMIS  (Provide multicast service Interface) depicts that a tunnel created on compute node 192.168.243.51 to handle unknown uni-cast traffic, target:200:1 depicts  BLUE tenant. Ultimate meaning of this entry is that all unknown uni-cast traffic originated or destined for compute node 192.168.243.51 particular to BLUE tenant will be handled through this PMSI. So each vRouter (compute machine) will create PMSI tunnel for each VRF / tenant to handle BUM traffic (Broadcast, Multicast and un-know uni-cast traffic).

2:192.168.243.51:1::0::02:15:1d:44:12:30/304 (1 entry, 1 announced)
*BGP Preference: 170/-101
Route Distinguisher: 192.168.243.51:1
Next hop type: Indirect
Address: 0x26ffb74
Next-hop reference count: 8
Source: 192.168.243.50
Protocol next hop: 192.168.243.51
Indirect next hop: 0x2 no-forward INH Session ID: 0x0
State: <Active Int Ext>
Local AS: 100 Peer AS: 100
Age: 1d 15:10:55 Metric2: 0
Validation State: unverified
Task: BGP_100.192.168.243.50+56806
Announcement bits (1): 1-BGP_RT_Background
AS path: ? (Originator)
Originator ID: 192.168.243.51
Communities: target:100:8000002 target:200:1 unknown iana 30c unknown iana 30c unknown iana 30c unknown type 8004 value 64:7a1203 unknown type 8071 value 64:5
Import Accepted
Route Label: 24
ESI: 00:00:00:00:00:00:00:00:00:00
Localpref: 100
Router ID: 192.168.243.50

2:192.168.243.51:1::0::02:15:1d:44:12:30/304 (Type 2 EVPN routes  depicts MAC/IP advertisement).  The route is advertised by 192.168.243.51 (which is compute 51) and  target:200:1 shows this route belongs to BLUE tenant. Next entry shows MAC and IP address for this particular BLUE tenant VM advertised by compute 51. 

2:192.168.243.51:1::0::02:15:1d:44:12:30::2.2.2.3/304 (1 entry, 1 announced)
*BGP Preference: 170/-101
Route Distinguisher: 192.168.243.51:1
Next hop type: Indirect
Address: 0x26ffb74
Next-hop reference count: 8
Source: 192.168.243.50
Protocol next hop: 192.168.243.51
Indirect next hop: 0x2 no-forward INH Session ID: 0x0
State: <Active Int Ext>
Local AS: 100 Peer AS: 100
Age: 1d 15:15:56 Metric2: 0
Validation State: unverified
Task: BGP_100.192.168.243.50+56806
Announcement bits (1): 1-BGP_RT_Background
AS path: ? (Originator)
Originator ID: 192.168.243.51
Communities: target:100:8000002 target:200:1 unknown iana 30c unknown iana 30c unknown iana 30c unknown type 8004 value 64:7a1203 unknown type 8071 value 64:5
Import Accepted
Route Label: 24
ESI: 00:00:00:00:00:00:00:00:00:00
Localpref: 100
Router ID: 192.168.243.50 

Now coming to MP-eBGP configuration between DC-1 and DC-2, 1st we will configure dynamic GRE tunnel between two DCs so that MP-BGP routes can resolve next hop through inet.3 routing table.

root@ER> show configuration routing-options dynamic-tunnels to-DC2
source-address 101.101.101.101;
gre;
destination-networks {
10.10.10.10/32;
}

root@ER> show route table inet.3

inet.3: 5 destinations, 6 routes (5 active, 0 holddown, 0 hidden)
+ = Active Route, – = Last Active, * = Both

10.10.10.10/32 *[Tunnel/300] 1d 15:38:48
> via gr-0/0/0.32773
[Tunnel/300] 1d 15:38:56
Tunnel

dc2
I have simulated DC-2 on same MX-5 using a separate  logical system, above snippet shows BGP configuration for DC-2 and BGP summary for DC-2 also showing different routing tables are populated with relative NLRI (Network Layer Readability Information) received from DC-1 through MP-BGP.

evpn1

And finally bgp.evpn.0 on DC-2 is populated with relevant EVPN NLRI transmitted from DC-1 by using MP-BGP (family evpn signaling).

 

Data Center Interconnect for Juniper Contrail (SDN Controller)

 

Juniper Contrail is Software Defined Networking (SDN) controller which automate the network provisioning in a Virtual Data Center. In traditionally server hyper-visor environment there is still need to configure and allow VLANs on Data Center switches ports connected with servers, which involves inordinate delays due to lengthy “Change Process” approval and dependency on many teams. But modern centers can not afford such delays for service provisioning as delay in service provisioning means loss of revenue.

The scope of this blog is to discuss:-

  1. How physical servers can talk with servers deployed inside SDN environment.
  2. Layer 2 & Layer 3 Data Center Interconnect (DCI) solution between two enterprise Data Centers (DCs)

contrail

Above diagram shows architecture of  Contrail , quick overview of Contrail inner working described below, please follow the link for Contrail in depth reading (http://www.opencontrail.org/opencontrail-architecture-documentation/)

  1. Contrail  control node act as central brain.
  2. Contrail installs an instance of  vRouter on each compute node.
  3. Each vRouter on a compute node creates separate VRF (Virtual Routing and Forwarding table)  for each particular subnet for which a Virtual Machines are created.
  4. Full mesh MP-iBGP is configured by Contrail between itself and all vRouters, Overlay tunnels (MPLS over GRE, MPLS over UPD or VXLAN can be used to carry data plane traffic.
  5. Contrail Control node acts as Router Reflector in whole MP-iBGP domain.

 

Requirement 1:- Data base server which is installed on a physical compute machines needs to communicate with  Application servers which are installed inside SDN environment.

Solution: 

  1. Hence Contrail Control node as acting as Route Reflector (RR) for all vRouters running inside virtual environment. MP-iBGP between Contrail Controller and Gateway device  will be configured.
  2. VRF will be configured on physical gateway with appropriate route-target community which should match with route-target community added on Contrail for particular subnet.
  3. VPN routes always looks in inet.3 (Junos) routing table for next hop resolution and inet.3  is  normally populated through routes learned from  MPLS LSPs (RSVP/ LDP). Inside Data Center it is not possible to signal MPLS LSPs particularly in terms of Contrail so we will configure dynamic GRE tunnels. These tunnel will populate inet.3 on physical gateway and  thus fulfill the requirement of presence of VPN next hop in inet.3
  4. Physical servers which needs to talk with virtual world must have interface routes on gateway device (which means for this particular vlan RVI/ IRB must be configured on the gateway device). RVI/ IRB interface will  be placed inside the VRF and statement”vrf-table-label” (Gateway is Juniper MX router) will configured inside VRF. It will cause automatic export of interface routes from that particular VRF after adding the route-target community.
  5. SDN controller will receive physical server routes through MP-iBGP and will  re-advertised to all vRouters.
  6. Each vRouter will check the route-target community on the received routes to find any VRF with matching route-target and subsequently install the received routes into respective VRF.
  7. In this way end to end communication between SDN environment server and  physical server will take place.

Requirement 2:  L2 and L3 Data Center interconnect is required between two DCs who has their own SDN controller.

Solution:

  1. L3 Data Center is pretty simple and requirement can  be fulfilled through traditional IP/VPN connection from service provider connection.
  2. Enterprises traditionally depends on Dark Fiber or on service providers for L2 extension (VPLS etc) between enterprises Data Center, hence both options involves plenty of cost.
  3. We will discuss a cost effective solution for L2 DCI, it is nor dependent on Dark Fiber and neither dependent on service provider solutions (VPLS etc)
  4. Between two Data Center we just need IP connectivity and by using that IP connectivity GRE based dynamic tunnels will configured between DCs.
  5. MP-eBGP will be configured between the DCs with families (inet-VPN unicast and evpn-signalling).
  6. Ethernet VPN is new techniques which enables MP-BGP to carry (MAC , ARP and VLAN IDs) thus allowing Layer 2 extension between Data  Centers. Follow the following link to learn more about EVPN  (https://tgregory.org/2016/06/04/evpn-in-action-1/ )

dci

This blog only discussed high level architecture for DCI and  gateway for SDN environment for deep dive into the topic please read my another blog.