Alexis Dacquay – is CCIE with over 10 years experience in the networking industry. He has in the past been designing, deploying, and supporting some large corporate LAN/WAN networks. He has in the last 4 years specialised in high performance datacenter networking to satisfy the needs of cloud providers, web2.0, big data, HPC, HFT, and any other enterprise for which high performing network is critical to their business. Originally from Bretagne, privately a huge fan of polish cuisine.
Topic of Presentation: Architectures for Universal Data Centre Networks, topologies and overlays
Language: English
Abstract: Network integration with single- and multi-hypervisor virtualization environments.
PLNOG 13: Alexis Dacquay: Architectures for Universal Data Centre Networks, topologies and overlays
1. Architecture for Universal DC Networks
Topologies & Overlays
Alexis Dacquay (ad@arista.com)
Tuesday, 30 September 14 PLNOG 13 - Krakow 1
2. Universal architecture
§ One design, tunable for workload, deterministic any-to-any performance
§ Integrated detailed telemetry for real-time visibility
§ Open standards based to avoid technology cul-de-sac
§ Simple to design, capacity plan, scale and troubleshoot
§ Open management tools/techniques
§ Enables continuous innovation and “pay as you grow” scale
Tuesday, 30 September 14 PLNOG 13 - Krakow 2
3. Cloud thinking in a nutshell
Infrastructure specific to specific apps Applications abstracted from infrastructure
Vertically integrated, proprietary stacks Open technologies, maximum generalisation
Vendor lock-in, Forklift refreshes Best-of-breed, continuous innovation
Multiple management domains Homogenous, universal automation
Complex and custom architectures Simple, repeatable and scalable architectures
IT becomes the service provider
Tuesday, 30 September 14 PLNOG 13 - Krakow 3
4. Cloud Architecture: Universal Platform connecting the cloud
Cloud
IP Storage
VM farms
Big Data
VDI
Web 2.0
HFT
HPC
Legacy
Tuesday, 30 September 14 PLNOG 13 - Krakow 4
5. Cloud Architecture: End to End Automation
hypervisor
Tuesday, 30 September 14 PLNOG 13 - Krakow 5
6. Progressive technology adoption
Evolution not Revolution
• Gradual shift/unification of skills base
• Phase-out of legacy applications
manually
configured
ad-hoc bash
perl scripting
Physical +
virtual, cloud
orchestration
automated
provisioning
and monitoring
Puppet, chef, Ansible
Other IT frameworks
DevOps Meet
NetOps!
Tuesday, 30 September 14 PLNOG 13 - Krakow 6
7. Importance of the Underlay Network
Tuesday, 30 September 14 PLNOG 13 - Krakow 7
8. Clos Principals – Avoiding Suboptimal Designs
§ Multi-Tiers
- Non-equal performance
- Unequal Hop-count
- Cumulated o/s can be high
3 hop
5 hop
4:1 Oversubscription
8:1 Oversubscription
Cumulated
Oversubscription
is 32:1
§ The right physical topology…
- Physical Architecture Clos Leaf/Spine
- Consistent any-to-any latency/
throughput
- Consistent performance for all racks
- Fully non-block architecture if required
- Simple scaling of new racks
Spine Layer
10GbE/40GbE/100GbE
Layer 2/3
Leaf layer
40GbE/10Gbe/1Gbe
Layer 2/3
Tuesday, 30 September 14 PLNOG 13 - Krakow 8
9. Data Centre Transport
§ Active-active Layer2 topologies possible without new protocols
- Various Multi-chassis Link Aggregation, use known and trusted standard LACP protocol.
- Achieved without new hardware or any new operational challenges
- But at large scale same challenges as the new protocols, VLANs and MAC explosion
- Layer 2 can scale to some level (considering only port counts) without requiring new
protocols and hardware
Layer 2 Leaf-Spine – MLAG Design
L2-only Clos topology scaling depending on devices’ density/port count
(with oversubscription)
10G&nodes:&Scale&with&Arista&Leaf/Spine&Design:&L2&MLAG&
1440#
1152#
600#
4224#
8832#
13440#
0# 5000# 10000# 15000# 20000# 25000# 30000#
Device scale
> 1000x10G ports
7050#Leaf/75080Gen2#Spine#
7050#Leaf/75040Gen2#Spine#
7050#Leaf/7508#Spine#
7050#Leaf/7504#Spine#
7050#Leaf/7050064#Spine#
7050#Leaf/7050052#Spine#
7124#Leaf/7050064#Spine#
10GbE&nodes&interconnected&
Device scale
From 48 x10G ports
But!...The layer 2 approach only targets the VMobility challenge, what about
Layer2 Scale, Multi-tenancy, Simplicity and Big Data environments ?
27264#
Tuesday, 30 September 14 PLNOG 13 - Krakow 9
10. Data Centre Transport
§ Build a CLOS Fabric
- Add New protocols to widen the scope of V-mobility
- TRILL based/IEEE 802.1aq (SPB) solutions
- L3 routing (IS-IS) model for active-active
forwarding
§ Issues with large L2 networks.
- Can introduce MAC address explosion issues
- VLAN limited(4K), no overlapping VLANs
- New hardware with potential interop issues
- New protocols for the core/backbone, the
unknown, new operational and troubleshooting
challenges.
Physical CLOS topology and widen the scope of VM mobility with
a new layer 2 technology
Single large
L2 Domain
Tuesday, 30 September 14 PLNOG 13 - Krakow 10
11. For Scale, industry convergence on a Layer 3 infrastructure
§ Segmented Layer 3 design
- Routed traffic at the top of the rack
- OSPF/BGP between Leaf and Spine
- Proven and trusted protocols for scale
- Mature Open standards for interoperability
- Minimise the size of the Layer 2 domain
- Reducing the size of the fault & broadcast
- Standard scalable model for Virtualized and
non-virtualised solutions
Data Centre Transport
Scope of VM Mobility
restricted to within the rack
Subnet/VLAN A
Layer 2 Domain
Subnet/VLAN B
Layer 2 Domain
Subnet/VLAN C
Layer 2 Domain
Layer 3
between
Leaf & Spine
Subnet/VLAN D
Layer 2 Domain
Utilize tried and proven protocols and Management tools
Tuesday, 30 September 14 PLNOG 13 - Krakow 11
12. Data Centre Transport
§ VM mobility for Compute optimization and resilience
- For stateful vMotion/Live migration, the VM IP address must be preserved after the
Vmotion
- Ensuring zero disruption to any client communicating with the apps residing on the
motioned VM
- To ensure IP address preservation a VM can thus only be motioned to an ESXi host
residing in the same subnet/VLAN.
128.218.10.0/24
VLAN 10
128.218.10.4 128.218.10.4
VM mobility and Virtualization place a requirement on the physical network
Tuesday, 30 September 14 PLNOG 13 - Krakow 12
13. Overlay - VXLAN Overview
§ What is an Overlay Network
- Abstracts the virtualized environment from the
physical topology
- Constructs L2 tunnels across the physical
infrastructure
- Tunnels provide connectivity between physical
and virtual end-points
§ Physical Infrastructure
- Transparent to the overlay technology
- Allows the building of L3 infrastructure
- Physical provide the bandwidth and scale for the
communication
- Removes the scaling constraints of the physical
from the virtual
VXLAN
network
Layer 3
Leaf/Spine
Logical layer 2 tunnels across
the physical Infrastructure
Tuesday, 30 September 14 PLNOG 13 - Krakow 13
14. VXLAN as Overlay Network
Tuesday, 30 September 14 PLNOG 13 - Krakow 14
15. VXLAN Refresher
§ Standardized overlay technology for encapsulating layer 2 traffic on top
of an IP fabric
IP Fabric
VNI 5000
VTEP A VTEP B
Host
1
Host
2
Layer 2 Layer 2 over Layer 3 Layer 2
Tuesday, 30 September 14 PLNOG 13 - Krakow 15
16. VXLAN Components
§ VTEP encapsulates the Ethernet frame in a VXLAN header
- 24-bit VNI identifier to defining the VXLAN Layer 2 domain of the frame (8 bytes)
- UDP header, SRC port hash of the inner Ethernets header, Dst port = 4789 (8 bytes)
- Allowing load-balancing across a ECMP IP fabric which is VXLAN transparent
- IP address header SRC and DEST of the local and remote VTEP (20 bytes)
Ethernet header local VTEP MAC and default router MAC (14 bytes- 4 optional)
Layer 3 network core forwards packets
based on the IP/UDP info
Original Ethernet frame 50 byte VXLAN tunnel header
VNI
24-bit UDP header Dst VTEP
IP
Remote
VTEP
IP
VTEP
MAC
Next-hop
Mac
Local
Host
MAC
Remote
Host MAC
local Host
IP
Remote
Host
MAC
UDP Src Port hash of inner Frame
for entropy across ECMP network
Tuesday, 30 September 14 PLNOG 13 - Krakow 16
17. VXLAN Control Plane - Unicast
§ Head-end replication (HER) Mode
- Removes the reliance on a multicast control plane for flooding and MAC learning
- VTEPs configured with a “flood list” of the remote VTEPs within the VNI , Broadcast/
Multicast traffic replicated to the configured VTEP list for the VNI
VTEP flood list on VTEP-1
VNI 2000 à VTEP-3
VNI 2000 à VTEP-4
BUM traffic received locally on VTEP
VTEP flood list on VTEP-3
VNI 2000 à VTEP-1
VNI 2000 à VTEP-4
VTEP flood list on VTEP-4
VNI 2000 à VTEP-1
VNI 2000 à VTEP-3
VTEP flood list manually
configured on each VTEP
for each VNI
BUM* traffic
4 4
VTEP learns inner MAC
and maps to the outer
SRC IP (remote VTEP)
VTEP-1 VTEP-2 VTEP-3 VTEP-4
Unicast
to VTEP-4
VTEP creates a unicast frame
for each VTEP in the flood-list
of the specific VNI
1
2
3
Separate unicast on the wire for each VTEP in the VNI
* BUM = Broadcasts, Unknown unicasts, Multicasts
Tuesday, 30 September 14 PLNOG 13 - Krakow 17
18. Network Virtualization - Capabilities
Bare Metal
Hardware
VTEPs enable
bare-metal
servers to
connect to
virtualized
workloads
Storage
Hardware
VTEPs can
encap/decap at
line rate
10/40/100Gb
high
performance
storage
Services
VTEPs
integrate with:
Physical (HW
VTEP) and
virtual instances
of network
services
VMs
VTEPs can
support VMs
across multiple
versions of
virtualisation
platforms
Tuesday, 30 September 14 PLNOG 13 - Krakow 18
19. Network Virtualization - Optimization/Simplification
Overlay: VNI 2000 - 150.100.100.x/24
• VTEPs can automates VXLAN / MAC learning
• Ideally, automated provisioning of new workloads,
segments, and tenants (service chaining)
• Integration with orchestrators automate many of the
labor intensive network workflows
Tuesday, 30 September 14 PLNOG 13 - Krakow 19
20. Overlay Network
With a Layer 2 only Service the Tenant Networks are
abstracted from the IP Fabric, SP cloud model
Default Gateway for ECMP
Physical servers
Subnet
Spine1 routing table
10.10.10.0/24 à ToR1
10.10.20.0/24 à ToR2
10.10.30.1/32 à ToR3
10.10.40.1/32 à ToR4
10.10.10.0/24
VTEP
Hardware VTEP announce
only the loopback in OSPF
VTEP
10.10.40.1
VTEP VTEP
Subnet
10.10.20.0/24
VTEP
VTEP
10.10.10.1
VTEP
10.10.20.1
VTEP
10.10.30.1/32
VLAN 10
192.168.10.4
VLAN 10
192.168.10.5
VLAN 20
192.168.20.4
VLAN 20
192.168.20.5
VLAN 20
192.168.20.6
VLAN 10
192.168.10.6
VLAN 200
192.168.20.7
VLAN 100
192.168.10.6
Tenant DGW Tenant DGW
VLAN 10
192.168.10.9
VLAN translation on VTEP
Physical Server
(Bare Metal Server)
VNI-100
VNI-300
VRF-1 VRF-2
Virtual Servers Virtual Servers
VNI-200
VNI-200
Tuesday, 30 September 14 PLNOG 13 - Krakow 20
21. Overlay Network
§ Overlay Network provides
transparency
- Scalable Layer 2 services across a layer 3
transport
- Decouples the requirements of the Virtualized
from the constraints of the physical network
- Tenant network transparent to the transport for
Layer 3 scale
- Multi-Tenancy with 24-bit tenancy ID and
overlapping VLANs
- Network becomes a flexible bandwidth
platform
Overlay
network
Physical
Infrastructure
VNI 1000
VNI 2000
VNI 3000
Transparent
L2 Services
Layer 3
Transport
Scalable, multi-tenant Layer 2 services transparent to the Layer 3
transport network
Tuesday, 30 September 14 PLNOG 13 - Krakow 21
23. VXLAN Deployment Solutions
VTEP-1
Hardware VTEP
• Manually or automated VTEP
endpoints
• Traffic flooded via the HER
distribution
• Flow based MAC learning
• No need for Multicast in the IP
fabric: Unicast only
• Suitable for DCI solutions and
small scale intra-DC solution due
to manual config
Software VTEP
• Automated VTEP endpoints
• Network virtualisation controller
configure virtual endpoints
• For virtual switches only
• Also support other protocols
than VXLAN (e.g. GRE, etc)
• No communication between
virtual and Physical equipment
HW + SW VTEP
§ Automated VTEP endpoints
§ Driven by an orchestrator (Cloud
Management Platform)
§ CMP integrated with a third-party
network virtualisation controller
(NSX, Nuage, Plumgrid, etc), or use
dedicated drivers (e.g. OpenStack)
§ MAC address learning between
Software and hardware VTEPs.
§ VNI provisioning via centralized
controller
§ Solution for scalable DCs with HW
to SW VTEP automation
Small Scale DC and DCI solution Automated VXLAN – Virtual only Automation and integration
No Multicast Requirement*
with third-party controller and
Cloud Manamagent Platform
Tuesday, 30 September 14 PLNOG 13 - Krakow 23
*check the integration roadmap with your vendors of choice
24. etc
Network information
Mechanisms: APIs, XMPP, …
MP-BGP
VXLAN Integration
Cloud Management Platform
Network Virtualisation Controller
etc
Port config, MAC, VLAN, VXLAN (VNI, VTEPs)
Mechanisms: OVSDB, OpenFlow, APIs, …
Network (Software and Hardware)
Virtual
switches
Physical network
IP
Fabric
HW+SW networks operates:
• Head-end replication (direct or proxy)
• BUM to remote VTEPs (SW+HW)
• HW+SW MAC learning on connected VNI
• (or pre-provisioned)
Tuesday, 30 September 14 PLNOG 13 - Krakow 24