Hannes end-of-the-router-tnc17

The End of the
Router ?
Hannes Gredler
hannes@rtbrick.com

• Past: Director Product Management with Juniper
• 18 years work experience @ Juniper, Cisco,
Tellium, Infinera & Tata Consultancy
• MS (ECE), Rutgers NJ, MBA IIM Ahmedabad
• Software expertise in platform infrastructure,
applications, OO and component software
• Global PLM exposure with market segments
using advanced routing technologies
• 2 Patents, 3 Publications
• Past: Distinguished Engineer with Juniper
• 19 Years working experience, developing,
deploying and supporting Network Software
• Expertise: BGP, Link-state IGPs, MPLS
• 20+ Patents
• 20+ Proposed Standards, RFCs
• http://www.arkko.com/tools/allstats/hannesgredler.h
tml
• IETF WG chair (IS-IS)
Exploring the Wild, Wild West
Pravin Bhandarkar
CEO & Founder
Hannes Gredler
CTO & Founder

• Path of the
The “path of the more”
- Chassis Based
- Proprietary Base OS
- Closed databases
- One/few Management planes
- Hardware Optimized
- Curated Software Release
- Waterfall Development
- Black-box tests
The “path of the less”
- Pizza Box Based
- Linux (ONL, Ubuntu)
- noSQL Type Value Stores
- Many Management planes
- Developer Optimized
- micro-Service Architecture
- Continuous Integration
- Integration/Acceptance tests

BUILD A SYSTEM OF LITTLE BRICKS
• Microservice architecture / UNIX pipeline model
• Small pieces of software, serving a unique purpose
• Easy transfer of state from one stage to next
• Every node may is a filter / transformer
BestPath
Selection
Input policyI/O
Handler
Output
policy
I/O
Handler
iod –p bgp –h 192.168.1.1 | bgp_policy_i –f customerA-in | …
bgp_best_path_default | …
bgp_policy_o –f core-out | iod –p bgp –h 192.168.1.2

• Prevailing resiliency model
• Large Units, 1:1 box redundancy, hard-state coupling, bi-polar
availability
• Failure is the exception
• Swarm based resiliency model
• Small units, network redundancy, soft-state coupling, gradual
degradation
• Failure is the normal
RESILIENCY

Net-Centric 2016
•Issue: Lack of abstraction
/ model
• Any present management protocol is
manually bolted on some implementation of
some state keeping entity
• All bolt-on management protocols
condemned to lag (and hence fail)
MANAGEMENT PLANE (1)

Net-Centric 2016
•Fix: Introduce
Abstraction &
Schema
• DB one-stop shop for state
• One-off effort to create
connector to and from the DB
MANAGEMENT PLANE (2)

POST /bds/table/walk HTTP/1.1
{"table":{"table_name":"default.bgp.peer"}}
HTTP/1.1 200 OK
{ "table": { "table_name": "default.bgp.peer" }, "objects": [ {
"sequence": 1, "update": true, "attribute": { "peer_mrt_index": 0,
"peer_ipv4_address": "0.0.0.0", "instance_name": "default" } },{
"sequence": 2, "update": true, "attribute": { "peer_mrt_index": 0,
"peer_ipv6_address": "::", "instance_name": "default" } },{
"sequence": 35182, "update": true, "attribute": {
"peer_mrt_index": 1, "peer_ipv4_address": "30.10.10.2",
"source_ipv4_address": "30.8.8.11", "source_port": 0,
"destination_port": 0, "peer_type": "ibgp", "remote_router_id":
"0.0.0.0", "state": "Connect", "keepalive": "0", "holddown": "0",
"connect_retry": "10", "last_transition_time": "2017-05-
26T05:44:55.393566+0200", "last_reset_reason": "Holddown
Timer Error, Sub-Code: Unspecified", "local_as": 20570,
"router_id": "30.8.8.11", "my_holddown_time": "90",
"my_keepalive_time": "30”0 } } ] }
Python HTTP
request
Client
BGP Router
REST-API
Server
pySNMP
Client
pySNMP
Server
snmpwalk -v 2c -c myCom 127.0.0.1 1.3.6.1.2.1.15
SNMPv2-SMI::mib-2.15.1.0 = Hex-STRING: 10
SNMPv2-SMI::mib-2.15.2.0 = INTEGER: 20570
SNMPv2-SMI::mib-2.15.3.1.1.30.10.10.2 = IpAddress: 30.10.10.2
SNMPv2-SMI::mib-2.15.3.1.2.30.10.10.2 = INTEGER: 6
SNMPv2-SMI::mib-2.15.3.1.10.30.10.10.2 = Counter32: 15
SNMPv2-SMI::mib-2.15.3.1.14.30.10.10.2 = Hex-STRING: 04 00
SNMPv2-SMI::mib-2.15.3.1.16.30.10.10.2 = Gauge32: 236704
Transpose OID
to JSON
content
Transpose
JSON attributes
to SNMP OIDs
SNMP TO REST ADAPTOR

•Bit-optimized, binary protocols from the 80s
• Emphasis on state-representation rather than state-flow
• Micro-optimization thinking creeping through system design
• Re-inventing the wheel for every protocol
• FSM, In-memory-DB, Serializer/De-serializer, Flow-Control
• No sense for parallelism, horizontal scaling and fault-domains
CONTROL-PLANE

Feature Development time
ON PRODUCTIVITY
• Only 5% of time writing Application code
• Lots of repetitive tasks (re-inventing the wheel)
• IMDB, HA, Config Processing, UI-Mgmt have eternal cost
In Memory
Database
Config
Parser
Figure out what to do ?
(Diff processing)
Proto/IP
C
Parser
UI/Mgmt
HA
Transfer to SQA / Bug Triaging
early Support cycle
App Logic

Hardware vs. Dev-Time optimization ?
Time to Revenue (TTR) O(months)
Development SQA-Test Acceptance-
Test
Deploy
Vendor Service Provider
Dev Test Deploy
Time to Revenue (TTR) O(days)
Vendor & Service Provider

Abstraction Example:
In-memory Database

Database centric / Distributed Data Store
bds://local/bgp.neighbor
bds://local/isis.a
dj
bds://local/isis.lsdb.l2
bds://217.160.181.216/bgp.rib-in
PUBSUB

hannes@linux> curl -i -H "Content-Type: application/json" -X POST http://localhost:3102/bds/table/get
HTTP/1.1 200 OK
Content-Type: application/json
{
"objects": [
{
"attribute": {
"stats_update_out": 734566,
"stats_update_in": 699425,
"stats_notify_out": 0,
"stats_notify_in": 0,
"stats_keepalive_out": 9579,
"stats_keepalive_in": 9579,
"stats_open_out": 1,
"stats_open_in": 1,
"recv_capability_as4": true,
"recv_capability_gr": false,
"recv_capability_rr": true,
"my_capability_as4": true,
"my_capability_gr": true,
"my_holddown_time": "180",
"router_id": "10.10.10.1",
"local_as": 65310,
"last_transition_time": "2017-05-24T14:08:00.265523+0000",
"connect_retry": "10",
"holddown": "180",
"keepalive": "60",
"peer_ipv6_address": "fe80::1:2a:3dff:fe00:1",
"source_ipv6_address": "fe80::1:2a:3aff:fe00:2",
"source_port": 179,
"destination_port": 49153,
"peer_type": "ebgp",
"remote_router_id": "10.10.11.2",
"state": "Established",
"my_keepalive_time": "60",
"remote_as": 65311,
"peer_group_name": "ebgp",
"instance_name": "default",
}
}
}
MODEL-DRIVEN DB => MODEL-DRIVEN API

•No true programmability
• Architecture centered around IP / MPLS and ACLs thereof
• Hard coded pipelines
• no notion of “code” like branching, nesting, looping
• Forwarding tables oversized
• Every Prefix may carry 100% link speed, Really ?
• No support for Hardware/Software Hybrids
• Lack of host-path bandwidth
DATA-PLANE

•All features
packaged at
compile time
• Slows down everybody else
• Black box testing
• Test Matrix grows N^2
IS-IS
BGP
RSVP
LDP
Netflow
Sflow
OSPF
Trill
STP
PIM
L3VPN
L2VPN
SR
Statically Compiled
Monolithic NOS
CURATED RELEASE MODEL (1)

•Fix: plugin
architecture
• Load plugins at runtime
• Load (& Pay) what you needs
• Faster Test qualification
• Small subset of Test Matrix
executed
Core infra (DB, IPC, PKG)
IS-IS NetflowSRBGP
Dynamic Loaded Library
Modular NOS
CURATED RELEASE MODEL (2)

Net-Centric 2016
The Lindy effect is a theory of the life
expectancy of non-perishable things like a
technology or an idea.
Every additional day may imply a longer
(remaining) life expectancy.
THE LINDY EFFECT (1)

Net-Centric 2016
THE LINDY EFFECT (2) - AUTOMOBILES

Example Vertical: car industry
{brand, design, component, integration} - owner
HOW TO EVOLVE THE INDUSTRY ?

Fix: create a component eco-system
HOW TO EVOLVE THE INDUSTRY ?

WE’LL BE ANSWERING QUESTIONS NOW
Q A&
THANK YOU FOR YOUR TIME

Hannes end-of-the-router-tnc17

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Hannes end-of-the-router-tnc17

Similaire à Hannes end-of-the-router-tnc17 (20)

Dernier

Dernier (20)

Hannes end-of-the-router-tnc17

Notes de l'éditeur