PLNOG14: SteelCentral NPM Solution - Tomasz Winiarski
1. Copyright 2014 Riverbed Inc. Confidential.1
Tomasz Winiarski
Solutions Architect EE - CISSP, CISA, CSSK,
CRISC
SteelCentral NPM Solution
Let’s talk about visibility. Network is Rarely a culprit.
2. Copyright 2014 Riverbed Inc. Confidential.2
Typical process for troubleshooting an issue
Helpdesk opens ticket
and sends to a
network engineer.
Network engineer
checks his tools
and says it’s not in
the network and
forwards to server
team.
Server team says,
my servers are fine,
pass to app team.
App team says, our
apps are fine, pass
back to Helpdesk.
User calls
Helpdesk about
the network being
slow.
Helpdesk calls
user and says they
don’t see any
issues.
3. Copyright 2014 Riverbed Inc. Confidential.3
Proactive Problem Identification
Find Fix
SteelCentral
analytics
Find FixWithout analytics
MTTR
EVENTS
Application slows Call to help desk
Alerts to problems
sooner than users
Automates analysis of performance changes to provide proactive notification of issues
• Continuously learns network / application behavior to identify abnormal changes
• Focuses on errors and responsiveness issues that reflect deteriorating user experience
• Provides contextual evidence to streamline diagnosis
4. Copyright 2014 Riverbed Inc. Confidential.4
Business Intelligence, Not Data
Service
Dashboards
Dependency
Mapping Analytics
6. Copyright 2014 Riverbed Inc. Confidential.6
Out-of-the-box Dashboard
Templates
• Simplify out-of box
deployment
• Increase adoption by
more and different types
of users
• Improve information
sharing & collaboration
• Enable faster ID of trends
and issues with graphical
views
8. Copyright 2014 Riverbed Inc. Confidential.8
Application Intelligence via DPI
• Detect which applications are traversing through your network
• Recognize popular business and recreational applications
• No special configuration required – built into SteelHead & NetShark
• Gain insight into app type, usage - who used it, when, from where
• Easy to apply QoS rules based on NetProfiler data
• Quantify bandwidth usage of business critical apps
• Identify real-time apps that need more BW
9. Copyright 2014 Riverbed Inc. Confidential.9
Extended QoS Visibility
• Understand whether quality of service settings in SteelHead by application are meeting
expectations with new shaping and enforcement reporting in SteelCentral NetProfiler
• Identify traffic flow per class/user - drill into areas that matter (applications/users/IPs per
class drilldown)
• Detect under or over provisioned rules
10. Copyright 2014 Riverbed Inc. Confidential.10
VoIP Call Quality Visibility
• Monitor VoIP with at-a-
glance dashboard views
– % RTP (real time packet
loss)
– Jitter
– MOS (mean opinion score)
– R Factor
• Gain insight into branch-to-
branch or peer-to-peer
traffic
11. Copyright 2014 Riverbed Inc. Confidential.11
Comprehensive VoIP Reporting
Site to SiteQoS
VoIP quality metrics
• Jitter
• Packet loss
• MOS
• R-Factor
12. Copyright 2014 Riverbed Inc. Confidential.12
Virtualization Leader
Only vendor to support all forms of virtualization monitori
Server Virtualization
• VMware ESXi
Desktop/App Virtualization
• Citrix XenApp
• VMware View
Network
Virtualization
• VMware NSX
20. Copyright 2014 Riverbed Inc. Confidential.25 Copyright 2014 Riverbed Inc. Confidential.25
SteelCentral Dashboards
21. Copyright 2014 Riverbed Inc. Confidential.26
RPM Dashboards
Web Analyzer
AppInternals
AppResponse
• Web-based dashboards
blend data from multiple
solutions to provide end-
to-end service monitoring
– Increase adoption by more
and different users
– Improve information
sharing & collaboration
– Identify trends and issues
faster
• Customizable, role-
specific views
– Executive dashboards,
level-1 support, tier-n
troubleshooting, app
support, and more
• Contextual drill-down
from enterprise views to
root cause analysis
NetProfiler
Cascade analytics allow you to respond to IT performance problems faster, even before the user calls to complain.
Without analytics, IT would only start looking for the problem when the user calls the help desk.
With Cascade analytics, as soon as an application metrics goes outside the normal range, and IT can start looking for the problem right away. It may even be fixed before it starts impacting user performance.
In addition, analytics alert provides contextual evidence that further accelerates the triage process.
Behavioral analytics are the key to accelerating problem resolution. Cascade can track dozens of metrics such as response time, throughput and number of network connections, and alert the administrator as soon as any of them go outside the normal range. Analytics are completely automatic and dynamic – you don’t need to set any hard-coded thresholds. Cascade can even detect daily and weekly behavior patterns.
Analytics tracked: Connections (Active connections, Connection bandwidth, New connections); Efficiency (# TCP resets, TCP retransmissions bandwidth); User Experience (Average app throughput / connection, Average connection duration, Response time)
Dependency Mapping -You can’t monitor what you don’t know about
Uses real-time and historical traffic flows to automatically identify all components involved in delivering an application service to the end user
Discovers across all tiers of a multi-tier app, including load balancers
Fast, accurate, complete, easy to use and keep up to date
Analytics enable proactive outage avoidance.
Cascade service dashboards provide a quick view into the end-to-end health of an application or service and enable a top-down approach to troubleshooting. Application services are created using the discovery wizard which automates the process of mapping transactions to their underlying infrastructure, so that service definitions are always accurate and up to date. So when we say that we’re monitoring service health, we mean we are monitoring ALL of the components involved in delivering the service to the end user: users, web servers, load balancers, application servers, authentication and DNS servers, databases and the links between them. That’s true end-to-end service visibility. And this is a significant differentiator for Cascade.
NOC – apps (HTTP, SSL, SIP, SMB, H.323), tops hosts, top apps, top ports, apps (social) top app servers
WAN – Opt WAN, OPT LAN, Non-opt. WAN, Non-opt LAN, Top network interfaces
Service Dashboard – Service health, service health by location, Service map, location map, current service events
VoIP Call Quality and Usage – Avg MOS, Avg Jitter, % RTP Loss Packets, Traffic Volume, Host Group Pairs
VoIP Quality of Service – DSCPs (EF)
Response Time – Response Composition Chart, Server Delay for Top App Servers, Server Delay for Hosts using App WEB, Top Host Groups by Net RTT, Top Host Groups by Server Delay
Detect under or over provisioned rules
Cascade Profiler enables IT managers to determine how VoIP services are performing in conjunction with data resources in order to make effective capacity planning and optimization decisions, and improve end user experience. Supported VoIP quality metrics include mean opinion score (MOS), R-Factor, packet loss and jitter.
Jitter: The inter-packet arrival variation that results from variable delay in packet transmissions; excessive jitter results in “early” and “late” packet delivery and discards at the receiving jitter buffer.
Packet Loss: The number of VoIP packets that are discarded by the network due to congestion or packet corruption
MOS and R-Factor describe perceived VoIP quality
MOS scores range 1 to 5, where 1 is lowest and 5 is the highest perceived quality
R Factor scores range from 0 to 100
Supported VoIP protocols include:
Cisco SCCP “Skinny”
SIP
H.323
Virtualization brings with it great promise of flexibility, cost savings, and security. Most importantly it allows IT infrastructure to dynamically adapt to the needs of applications. This value comes with a cost in the form of increased complexity for Network and IT operations.
Cascade 10.0 provides network and IT operations a complete solution for managing network and application performance across all major virtualization areas.
This series of screenshots show Cascade’s top-down troubleshooting in action and how service dashboards, discovery and analytics all play a role in accelerating the triage process.
The top-down approach is much quicker and more cost-effective than looking one interface at a time like you do with many competing products.
Solving a tough problem may involve digging deep into the packet level. However, some tough problems require a wide view, e.g. you can solve the problem by seeing where else it has happened.
Cascade
Visibility helps maintain performance
Troubleshoot performance
Validate QoS
Monitor branch experience
Cascade is the only NPM solution with packet capture integrated directly into Steelhead appliances
Cascade ensures Steelhead optimization decisions are performing as expected
Steelhead
WAN optimization accelerates thick traffic
Application acceleration
Transport optimization
Data streamlining
QoS improves thin traffic performance
Accurate classification
Bandwidth reservation
Effective prioritization
Continuous Shark packet-capture built-in
Steelhead transforms into a virtual packet capture device
Deep application-level visibility
Integrated reporting of critical optimized traffic (CIFS, PCoIP, VDI) for ensuring QoS targets