How does Netflix stay on top of the operations of its Internet service with millions of users and billions of metrics? With Atlas, its own massively distributed, large-scale monitoring system. Come learn how Netflix built Atlas with multiple processing pipelines using Amazon S3 and Amazon EMR to provide low-latency access to billions of metrics while supporting query-time aggregation along multiple dimensions.
3. A Word About Me …
• About 20 years in technology
Friday, November 15, 13
4. A Word About Me …
• About 20 years in technology
• Systems engineering, networking,
software development, QA, release
management
Friday, November 15, 13
5. A Word About Me …
• About 20 years in technology
• Systems engineering, networking,
software development, QA, release
management
• Time at Netflix: 1599 days
Friday, November 15, 13
6. A Word About Me …
• About 20 years in technology
• Systems engineering, networking,
software development, QA, release
management
• Time at Netflix: 1599 days (4y:4m:15d)
Friday, November 15, 13
7. A Word About Me …
• About 20 years in technology
• Systems engineering, networking,
software development, QA, release
management
• Time at Netflix: 1599 days (4y:4m:15d)
• Before at Netflix: Service Delivery in
the IT/Ops, troubleshooter, Builder of
Python Things[tm]
Friday, November 15, 13
8. A Word About Me …
• About 20 years in technology
• Systems engineering, networking,
software development, QA, release
management
• Time at Netflix: 1599 days (4y:4m:15d)
• Before at Netflix: Service Delivery in
the IT/Ops, troubleshooter, Builder of
Python Things[tm]
• Current role: Cloud Monitoring
Friday, November 15, 13
9. A Word About Me …
• About 20 years in technology
• Systems engineering, networking,
software development, QA, release
management
• Time at Netflix: 1599 days (4y:4m:15d)
• Before at Netflix: Service Delivery in
the IT/Ops, troubleshooter, Builder of
Python Things[tm]
• Current role: Cloud Monitoring
•We build platforms
Friday, November 15, 13
10. A Word About Me …
• About 20 years in technology
• Systems engineering, networking,
software development, QA, release
management
• Time at Netflix: 1599 days (4y:4m:15d)
• Before at Netflix: Service Delivery in
the IT/Ops, troubleshooter, Builder of
Python Things[tm]
• Current role: Cloud Monitoring
•We build platforms
•Sometimes we make them easy to use
Friday, November 15, 13
18. A Word About Netflix …
Freedom and Responsibility Culture
Friday, November 15, 13
19. A Word About Netflix …
Freedom and Responsibility Culture
• Optimize speed of innovation
Constrain availability
Cost will be what cost will be
Friday, November 15, 13
20. A Word About Netflix …
Freedom and Responsibility Culture
• Optimize speed of innovation
Constrain availability
Cost will be what cost will be
• Hire smart (experienced) people
Get out of their way
Friday, November 15, 13
21. A Word About Netflix …
Freedom and Responsibility Culture
• Optimize speed of innovation
Constrain availability
Cost will be what cost will be
• Hire smart (experienced) people
Get out of their way
• Anti-process bias
Friday, November 15, 13
23. A Word About Netflix …
Technology and Operations
Friday, November 15, 13
24. A Word About Netflix …
Technology and Operations
• Service Oriented Architecture
Friday, November 15, 13
25. A Word About Netflix …
Technology and Operations
• Service Oriented Architecture
• Decentralized Operations. You
Friday, November 15, 13
26. A Word About Netflix …
Technology and Operations
• Service Oriented Architecture
• Decentralized Operations. You
• Build
Friday, November 15, 13
27. A Word About Netflix …
Technology and Operations
• Service Oriented Architecture
• Decentralized Operations. You
• Build
• Test
Friday, November 15, 13
28. A Word About Netflix …
Technology and Operations
• Service Oriented Architecture
• Decentralized Operations. You
• Build
• Test
• Deploy
Friday, November 15, 13
29. A Word About Netflix …
Technology and Operations
• Service Oriented Architecture
• Decentralized Operations. You
• Build
• Test
• Deploy
• Set up alerting and monitoring
Friday, November 15, 13
30. A Word About Netflix …
Technology and Operations
• Service Oriented Architecture
• Decentralized Operations. You
• Build
• Test
• Deploy
• Set up alerting and monitoring
• Wake up at 2AM
Friday, November 15, 13
31. A Word About Netflix …
Technology and Operations
Friday, November 15, 13
32. A Word About Netflix …
Technology and Operations
• AWS-based for 100% of streaming*
Friday, November 15, 13
33. A Word About Netflix …
Technology and Operations
• AWS-based for 100% of streaming*
• Huge expansion
Friday, November 15, 13
34. A Word About Netflix …
Technology and Operations
• AWS-based for 100% of streaming*
• Huge expansion
• Customer Growth
Friday, November 15, 13
35. A Word About Netflix …
Technology and Operations
• AWS-based for 100% of streaming*
• Huge expansion
• Customer Growth
• New markets
Friday, November 15, 13
36. A Word About Netflix …
Technology and Operations
• AWS-based for 100% of streaming*
• Huge expansion
• Customer Growth
• New markets
• Metrics
Friday, November 15, 13
37. In the Old Days …
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Friday, November 15, 13
38. In the Old Days …
Our Old Alerting System
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Friday, November 15, 13
39. In the Old Days …
Our Old Alerting System
• Enterprise IT Solution
Copyright USAID Microlinks. CC Attribution 2.0 License
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Friday, November 15, 13
40. In the Old Days …
Our Old Alerting System
• Enterprise IT Solution
• Managed by the Enterprise IT Alerting People
Copyright USAID Microlinks. CC Attribution 2.0 License
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Friday, November 15, 13
41. In the Old Days …
Our Old Alerting System
• Enterprise IT Solution
• Managed by the Enterprise IT Alerting People
• File Tickets
Copyright: http://www.flickr.com/photos/s_w_ellis
CC Attribution 2.0 License
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Friday, November 15, 13
42. In the Old Days …
Our Old Alerting System
• Enterprise IT Solution
• Managed by the Enterprise IT Alerting People
• File Tickets
• Send alerts to NOC
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Friday, November 15, 13
43. In the Old Days …
Our Old Alerting System
• Enterprise IT Solution
• Managed by the Enterprise IT Alerting People
• File Tickets
• Send alerts to NOC
• Completely separate from telemetry system
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Friday, November 15, 13
44. In the Old Days …
In the Old Days …
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Friday, November 15, 13
45. In the Old Days …
In the Old Days …
Our Old Telemetry System
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Friday, November 15, 13
46. In the Old Days …
In the Old Days …
Our Old Telemetry System
• Spare-time effort by a lone sysadmin
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Friday, November 15, 13
47. In the Old Days …
In the Old Days …
Our Old Telemetry System
• Spare-time effort by a lone sysadmin
• Loved by developers
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Friday, November 15, 13
48. In the Old Days …
In the Old Days …
Our Old Telemetry System
• Spare-time effort by a lone sysadmin
• Loved by developers
• Custom TCP protocol
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Friday, November 15, 13
49. In the Old Days …
In the Old Days …
Our Old Telemetry System
• Spare-time effort by a lone sysadmin
• Loved by developers
• Custom TCP protocol
• RRD file back-end storage
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Friday, November 15, 13
50. In the Old Days …
In the Old Days …
Our Old Telemetry System
• Spare-time effort by a lone sysadmin
• Loved by developers
• Custom TCP protocol
• RRD file back-end storage
• Mostly Perl
Copyright: http://www.flickr.com/photos/acme
CC Attribution 2.0 License
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Friday, November 15, 13
51. In the Old Days …
In the Old Days …
Our Old Telemetry System
• Spare-time effort by a lone sysadmin
• Loved by developers
• Custom TCP protocol
• RRD file back-end storage
• Mostly Perl
• Datacenter-bound (and limited)
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Friday, November 15, 13
52. In the Old Days …
In the Old Days …
Our Old Telemetry System
• Spare-time effort by a lone sysadmin
• Loved by developers
• Custom TCP protocol
• RRD file back-end storage
• Mostly Perl
• Datacenter-bound (and limited)
• Starting to falter under metrics growth
Copyright: State Library of Victoria Collections. CC Attribution 2.0 License
Friday, November 15, 13
56. Speaking of Growth
By way of comparison
• Every person in the world
• twice
Friday, November 15, 13
57. Speaking of Growth
By way of comparison
• Every person in the world
• twice
• Every smartphone in the
world
• ten times
Friday, November 15, 13
58. So We Built Something Better
Copyright: http://www.flickr.com/photos/76651030@N02/
CC Attribution 2.0 License
Friday, November 15, 13
59. So We Built Something Better
UI Layer Fronts Multiple Systems
UI
Atlas
Epic
Cloud
Watch
Copyright: http://www.flickr.com/photos/76651030@N02/
CC Attribution 2.0 License
Friday, November 15, 13
60. So We Built Something Better
Clear Regional Separation
• And aggregation
U
A E C
global
us-east-1 us-west-1 us-west-2 eu-west-1
Copyright: http://www.flickr.com/photos/76651030@N02/
CC Attribution 2.0 License
Friday, November 15, 13
61. So We Built Something Better
U
A E C
Localized Node/Metric Identification
Before:
Now:
gl
us us us e
Here’s a
metric!
I think
You’re Bob
I’m
Bob. Here’s
a metric!
OK!
Copyright: http://www.flickr.com/photos/76651030@N02/
CC Attribution 2.0 License
Friday, November 15, 13
62. So We Built Something Better
U
A E C
gl
us us us e
Friday, November 15, 13
63. So We Built Something Better
U
A E C
What’s a Metric?
Friday, November 15, 13
gl
us us us e
64. So We Built Something Better
U
A E C
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
Friday, November 15, 13
gl
us us us e
65. So We Built Something Better
U
A E C
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!
Friday, November 15, 13
gl
us us us e
66. So We Built Something Better
U
A E C
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!
• This is better:
Friday, November 15, 13
gl
us us us e
67. So We Built Something Better
U
A E C
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!
• This is better:
nf.ami
Friday, November 15, 13
ami-aa5166ef
gl
us us us e
68. So We Built Something Better
U
A E C
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!
• This is better:
nf.ami
nf.app
Friday, November 15, 13
ami-aa5166ef
wp
gl
us us us e
69. So We Built Something Better
U
A E C
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!
• This is better:
nf.ami
nf.app
ami-aa5166ef
wp
nf.cluster
wp-batch
Friday, November 15, 13
gl
us us us e
70. So We Built Something Better
U
A E C
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!
• This is better:
nf.ami
nf.app
ami-aa5166ef
wp
nf.cluster
wp-batch
nf.asg
wp-batch-v163
Friday, November 15, 13
gl
us us us e
71. So We Built Something Better
U
A E C
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!
• This is better:
nf.ami
nf.app
ami-aa5166ef
wp
nf.cluster
wp-batch
nf.asg wp-batch-v163
nf.country
us
Friday, November 15, 13
gl
us us us e
72. So We Built Something Better
U
A E C
gl
us us us e
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!
• This is better:
nf.ami
nf.app
ami-aa5166ef
wp
nf.cluster
wp-batch
nf.asg wp-batch-v163
nf.country
us
Friday, November 15, 13
nf.node
i-097c0e52
73. So We Built Something Better
U
A E C
gl
us us us e
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!
• This is better:
nf.ami
nf.app
ami-aa5166ef
wp
nf.cluster
wp-batch
nf.asg wp-batch-v163
nf.country
us
Friday, November 15, 13
nf.node
nf.region
i-097c0e52
us-west-1
74. So We Built Something Better
U
A E C
gl
us us us e
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!
• This is better:
nf.ami
nf.app
ami-aa5166ef
wp
nf.node
nf.region
i-097c0e52
us-west-1
nf.cluster
wp-batch
nf.zone
us-west-1b
nf.asg wp-batch-v163
nf.country
us
Friday, November 15, 13
75. So We Built Something Better
U
A E C
gl
us us us e
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!
• This is better:
nf.ami
nf.app
ami-aa5166ef
wp
nf.node
nf.region
i-097c0e52
us-west-1
nf.cluster
wp-batch
nf.zone
us-west-1b
class
nccp
nf.asg wp-batch-v163
nf.country
us
Friday, November 15, 13
76. So We Built Something Better
U
A E C
gl
us us us e
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!
• This is better:
nf.ami
nf.app
ami-aa5166ef
wp
nf.node
nf.region
i-097c0e52
us-west-1
nf.cluster
wp-batch
nf.zone
us-west-1b
class
type
nccp
request
nf.asg wp-batch-v163
nf.country
us
Friday, November 15, 13
77. So We Built Something Better
U
A E C
gl
us us us e
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!
• This is better:
nf.ami
nf.app
ami-aa5166ef
wp
nf.node
nf.region
i-097c0e52
us-west-1
nf.cluster
wp-batch
nf.zone
us-west-1b
class
type
nccp
request
nf.asg wp-batch-v163
nf.country
us
Friday, November 15, 13
uiversion
UI_169_mid
78. So We Built Something Better
U
A E C
gl
us us us e
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!
• This is better:
nf.ami
nf.app
ami-aa5166ef
wp
nf.node
nf.region
i-097c0e52
us-west-1
nf.cluster
wp-batch
nf.zone
us-west-1b
class
type
nccp
request
nf.asg wp-batch-v163
nf.country
us
Friday, November 15, 13
uiversion
action
UI_169_mid
authorization
79. So We Built Something Better
U
A E C
gl
us us us e
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!
• This is better:
nf.ami
nf.app
ami-aa5166ef
wp
nf.node
nf.region
i-097c0e52
us-west-1
uiversion
action
UI_169_mid
authorization
nf.cluster
wp-batch
nf.zone
us-west-1b
devtype
101
class
type
nccp
request
nf.asg wp-batch-v163
nf.country
us
Friday, November 15, 13
80. So We Built Something Better
U
A E C
gl
us us us e
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!
• This is better:
nf.ami
nf.app
ami-aa5166ef
wp
nf.node
nf.region
i-097c0e52
us-west-1
uiversion
action
UI_169_mid
authorization
nf.cluster
wp-batch
nf.zone
us-west-1b
devtype
101
class
type
nccp
request
clver
PHL_0AB
nf.asg wp-batch-v163
nf.country
us
Friday, November 15, 13
81. So We Built Something Better
U
A E C
gl
us us us e
What’s a Metric?
• com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US
• 256 characters aren’t enough!
• This is better:
nf.ami
nf.app
ami-aa5166ef
wp
nf.node
nf.region
i-097c0e52
us-west-1
uiversion
action
UI_169_mid
authorization
nf.cluster
wp-batch
nf.zone
us-west-1b
devtype
101
class
type
nccp
request
clver
geo
PHL_0AB
us
nf.asg wp-batch-v163
nf.country
us
Friday, November 15, 13
82. So We Built Something Better
U
A E C
gl
us us us e
Copyright: Kurt Moerman
CC Attribution 2.0 License
Friday, November 15, 13
83. So We Built Something Better
U
A E C
gl
us us us e
Powerful queries
Copyright: Kurt Moerman
CC Attribution 2.0 License
Friday, November 15, 13
84. So We Built Something Better
U
A E C
gl
us us us e
Powerful queries
• Make the complex possible
Copyright: Kurt Moerman
CC Attribution 2.0 License
Friday, November 15, 13
85. So We Built Something Better
U
A E C
gl
us us us e
Powerful queries
• Make the complex possible
• Make the simple … sort of hard
Copyright: Kurt Moerman
CC Attribution 2.0 License
Friday, November 15, 13
86. So We Built Something Better
U
A E C
gl
us us us e
Powerful queries
• Make the complex possible
• Make the simple … sort of hard
Friday, November 15, 13
87. So We Built Something Better
U
A E C
gl
us us us e
Powerful queries
• Make the complex possible
• Make the simple … sort of hard
http://atlas/api/v1/graph?
q=nf.region,us-west-1,:eq,nf.app,employeeinfo,:eq,:and,name,employeeinfo_api,:eq,:and,:sum
&e=now-5m&s=e-3h
Friday, November 15, 13
88. So We Built Something Better
U
A E C
gl
us us us e
Powerful queries
• Make the complex possible
• Make the simple … sort of hard
Friday, November 15, 13
89. So We Built Something Better
U
A E C
gl
us us us e
Powerful queries
• Make the complex possible
• Make the simple … sort of hard
http://atlas/api/v1/graph?
q=nf.region,us-west-1,:eq,nf.app,employeeinfo,:eq,:and,name,employeeinfo_api,:eq,:and,:sum,(,nf.zone,),:by
&e=now-5m&s=e-3h
Friday, November 15, 13
90. So We Built Something Better
U
A E C
gl
us us us e
Powerful queries
• Make the complex possible
• Make the simple … sort of hard
Friday, November 15, 13
91. So We Built Something Better
U
A E C
gl
us us us e
Powerful queries
• Make the complex possible
• Make the simple … sort of hard
http://atlas/api/v1/graph?q=sps,nf.cluster,(,nccp-legacy,
nccp-modern,),:in,nccprt,(,NCCPLicense,
com_netflix_streaming_nccp_request_license,),:in,:and,stat,
SuccessfulRequests,:eq,:and,device.rollup,3ds,:eq,:and,:sum,:set,entering_trough,sps,:get,1h,:offset,0.95,:mul,sps,:get,:gt,:set,smoothed,sps,:get,
10,0.1,0.02,:des,:set,low_volume,smoothed,:get,-0.005,:mul,0.1,:add,:set,mid_volume,smoothed,:get,-0.00125,:mul,0.1,:add,:set,base,0.06,:set,min_pct,
1,smoothed,:get,20,:lt,low_volume,:get,:mul,smoothed,:get,80,:lt,mid_volume,:get,:mul,:add,entering_trough,:get,0.05,:mul,:add,base,:get,:add,:sub,
10,0.1,0.02,:des,:set,sps,:get,$(device.rollup)SPS,:legend,min_pct,:get,smoothed,:get,:mul,lowerbound,:legend,sps,:get,min_pct,:get,smoothed,:get,:mul,:lt,
5,:rolling-count,2,:ge,:vspan,60,:alpha,$(device.rollup),:legend
Friday, November 15, 13
92. So We Built Something Better
U
A E C
gl
us us us e
Friday, November 15, 13
93. So We Built Something Better
U
A E C
Ridiculous Read Volume:
• Engage
• Graphs and Dashboards
Friday, November 15, 13
gl
us us us e
94. So We Built Something Better
U
A E C
Ridiculous Read Volume:
• Engage
• Graphs and Dashboards
• Alerting
Friday, November 15, 13
gl
us us us e
95. So We Built Something Better
U
A E C
Ridiculous Read Volume:
• Engage
• Graphs and Dashboards
• Alerting
• Automated Canaries
Friday, November 15, 13
gl
us us us e
96. So We Built Something Better
U
A E C
Ridiculous Read Volume:
• Engage
• Graphs and Dashboards
• Alerting
• Automated Canaries
• Capacity Analytics
Friday, November 15, 13
gl
us us us e
97. So We Built Something Better
U
A E C
Ridiculous Read Volume:
• Engage
• Graphs and Dashboards
• Alerting
• Automated Canaries
• Capacity Analytics
• Special Projects
Friday, November 15, 13
gl
us us us e
98. So We Built Something Better
U
A E C
Ridiculous Read Volume:
• Engage
• Graphs and Dashboards
• Alerting
• Automated Canaries
• Capacity Analytics
• Special Projects
• BI
Friday, November 15, 13
gl
us us us e
99. So We Built Something Better
U
A E C
Ridiculous Read Volume:
• Engage
• Graphs and Dashboards
• Alerting
• Automated Canaries
• Capacity Analytics
• Special Projects
• BI
Friday, November 15, 13
gl
us us us e
100. So We Built Something Better
U
A E C
Ridiculous Read Volume:
• Engage
• Graphs and Dashboards
• Alerting
• Automated Canaries
• Capacity Analytics
• Special Projects
• BI
Friday, November 15, 13
gl
us us us e
101. So We Built Something Better
U
A E C
Ridiculous Read Volume:
• Engage
• Graphs and Dashboards
• Alerting
• Automated Canaries
• Capacity Analytics
• Special Projects
• BI
Friday, November 15, 13
gl
us us us e
102. So We Built Something Better
global
endpoint
U
A E C
gl
us us us e
backend
backend
instance
backend
instance
backend
instance
backend
instance
backend
instance
backend
instance
instance
Friday, November 15, 13
regional
endpoint
103. So We Built Something Better
global
endpoint
U
A E C
gl
us us us e
client
instance
Friday, November 15, 13
backend
backend
instance
backend
instance
backend
instance
backend
instance
backend
instance
backend
instance
instance
regional
endpoint
104. So We Built Something Better
global
endpoint
U
A E C
gl
us us us e
client
instance
Friday, November 15, 13
publish
cluster
backend
backend
instance
backend
instance
backend
instance
backend
instance
backend
instance
backend
instance
instance
regional
endpoint
105. So We Built Something Better
global
endpoint
U
A E C
gl
us us us e
client
instance
publish
cluster
backend
backend
instance
backend
instance
backend
instance
backend
instance
backend
instance
backend
instance
instance
Amazon S3
Friday, November 15, 13
regional
endpoint
106. So We Built Something Better
global
endpoint
gl
us us us e
poller
cluster
client
instance
publish
cluster
backend
backend
instance
backend
instance
backend
instance
backend
instance
backend
instance
backend
instance
instance
Amazon S3
Friday, November 15, 13
U
A E C
regional
endpoint
107. So We Built Something Better
global
endpoint
gl
us us us e
poller
cluster
client
instance
publish
m
cluster
backend
backend
instance
backend
instance
backend
instance
backend
instance
backend
instance
backend
instance
instance
Amazon S3
Friday, November 15, 13
U
A E C
regional
endpoint
108. So We Built Something Better
global
endpoint
gl
us us us e
poller
cluster
client
instance
publish
m
m
cluster
backend
backend
instance
backend
instance
backend
instance
backend
instance
backend
instance
backend
instance
instance
Amazon S3
Friday, November 15, 13
U
A E C
regional
endpoint
109. So We Built Something Better
global
endpoint
gl
us us us e
poller
cluster
client
instance
publish
m
m
cluster
backend
m backend
instance
backend
instance
backend
instance
backend
instance
backend
instance
backend
instance
instance
Amazon S3
Friday, November 15, 13
U
A E C
regional
endpoint
110. So We Built Something Better
global
endpoint
gl
us us us e
poller
cluster
client
instance
publish
m
m
cluster
backend
m backend
instance
backend
instance
backend
instance
backend
instance
backend
instance
backend
instance
instance
Amazon S3
Friday, November 15, 13
U
A E C
regional
endpoint
112. That Sounds Great!
Surely there are no problems
Copyright: http://www.flickr.com/photos/lainetrees/
CC Attribution 2.0 License
Friday, November 15, 13
114. That Sounds Great!
Surely there are no problems
•Speed is hard
•Speed at volume is harder
Friday, November 15, 13
115. That Sounds Great!
Surely there are no problems
•Speed is hard
•Speed at volume is harder
•We looked at spinning disks
Friday, November 15, 13
116. That Sounds Great!
Surely there are no problems
•Speed is hard
•Speed at volume is harder
•We looked at spinning disks
•Memory’s the way to go
Friday, November 15, 13
117. That Sounds Great!
Surely there are no problems
•Speed is hard
•Speed at volume is harder
•We looked at spinning disks
•Memory’s the way to go
•m2.4xlarge
Friday, November 15, 13
118. That Sounds Great!
Surely there are no problems
•Speed is hard
•Speed at volume is harder
•We looked at spinning disks
•Memory’s the way to go
•m2.4xlarge
•This is operational data
Friday, November 15, 13
119. That Sounds Great!
Surely there are no problems
•Speed is hard
•Speed at volume is harder
•We looked at spinning disks
•Memory’s the way to go
•m2.4xlarge
•This is operational data
•People want it available, fast
Friday, November 15, 13
120. That Sounds Great!
Surely there are no problems
•Speed is hard
•Speed at volume is harder
•We looked at spinning disks
•Memory’s the way to go
•m2.4xlarge
•This is operational data
•People want it available, fast
•Operations have short memories
Friday, November 15, 13
121. That Sounds Great!
Surely there are no problems
•Speed is hard
•Speed at volume is harder
•We looked at spinning disks
•Memory’s the way to go
•m2.4xlarge
•This is operational data
•People want it available, fast
•Operations have short memories
Friday, November 15, 13
20,160 m2.4xlarge
$32,094,720 upfront
$8,005,939/month
per region
with no redundancy
122. That Sounds Great!
Surely there are no problems
•Speed is hard
•Speed at volume is harder
•We looked at spinning disks
•Memory’s the way to go
•m2.4xlarge
•This is operational data
•People want it available, fast
•Operations have short memories
Friday, November 15, 13
Copyright: http://www.flickr.com/photos/amenk/
CC Attribution 2.0 License
124. That Doesn’t Sound Great!
•If only we could reduce it …
Friday, November 15, 13
125. That Doesn’t Sound Great!
•If only we could reduce it …
•“Reduce”? Get it? Get it?
Friday, November 15, 13
126. •If only we could reduce it …
•“Reduce”? Get it? Get it?
•Our granularity is two-dimensional
Dimensionality (tags)
That Doesn’t Sound Great!
Step size (time)
Friday, November 15, 13
127. •If only we could reduce it …
•“Reduce”? Get it? Get it?
•Our granularity is two-dimensional
•We can reduce on either dimension
Dimensionality (tags)
That Doesn’t Sound Great!
Step size (time)
Friday, November 15, 13
128. •If only we could reduce it …
•“Reduce”? Get it? Get it?
•Our granularity is two-dimensional
•We can reduce on either dimension
•Some tags make sense for very
rapid reduction
Dimensionality (tags)
That Doesn’t Sound Great!
Step size (time)
Friday, November 15, 13
129. •If only we could reduce it …
•“Reduce”? Get it? Get it?
•Our granularity is two-dimensional
•We can reduce on either dimension
•Some tags make sense for very
rapid reduction
•Hystrix
Dimensionality (tags)
That Doesn’t Sound Great!
Step size (time)
Friday, November 15, 13
130. •If only we could reduce it …
•“Reduce”? Get it? Get it?
•Our granularity is two-dimensional
•We can reduce on either dimension
•Some tags make sense for very
rapid reduction
•Hystrix
•nf.node
Dimensionality (tags)
That Doesn’t Sound Great!
Step size (time)
Friday, November 15, 13
131. •If only we could reduce it …
•“Reduce”? Get it? Get it?
•Our granularity is two-dimensional
•We can reduce on either dimension
•Some tags make sense for very
rapid reduction
•Hystrix
•nf.node
•Sometimes a lot (vhs)
Friday, November 15, 13
Dimensionality (tags)
That Doesn’t Sound Great!
Step size (time)
132. •If only we could reduce it …
•“Reduce”? Get it? Get it?
•Our granularity is two-dimensional
•We can reduce on either dimension
•Some tags make sense for very
rapid reduction
•Hystrix
•nf.node
•Sometimes a lot (vhs)
•Sometimes a little (Cassandra)
Friday, November 15, 13
Dimensionality (tags)
That Doesn’t Sound Great!
Step size (time)
136. A Reductive Approach
•For a series of values, reduce and keep:
•minimum
•maximum
Friday, November 15, 13
137. A Reductive Approach
•For a series of values, reduce and keep:
•minimum
•maximum
•total
Friday, November 15, 13
138. A Reductive Approach
•For a series of values, reduce and keep:
•minimum
•maximum
•total
•count
Friday, November 15, 13
139. A Reductive Approach
•For a series of values, reduce and keep:
•minimum
•maximum
•total
•count
•Example:
Friday, November 15, 13
140. A Reductive Approach
•For a series of values, reduce and keep:
•minimum
•maximum
•total
•count
•Example:
•3,5,9,14,20: min 3, max 20, tot 51, count 5
Friday, November 15, 13
141. A Reductive Approach
•For a series of values, reduce and keep:
•minimum
•maximum
•total
•count
•Example:
•3,5,9,14,20: min 3, max 20, tot 51, count 5
•Allows for sense of scale
Friday, November 15, 13
142. A Reductive Approach
•For a series of values, reduce and keep:
•minimum
•maximum
•total
•count
•Example:
•3,5,9,14,20: min 3, max 20, tot 51, count 5
•Allows for sense of scale
•Allows for arbitrary further reduction w/o loss of precision
Friday, November 15, 13
156. Reduction: Benefits
•Indefinite storage in Amazon S3
•Fear of commitment achievement: Unlocked
Copyright: http://www.flickr.com/photos/dr_pete/
CC Attribution 2.0 License
Friday, November 15, 13
157. Reduction: Benefits
•Indefinite storage in Amazon S3
•Fear of commitment achievement: Unlocked
•Can be aggressive about hiding metrics
Copyright: http://www.flickr.com/photos/dr_pete/
CC Attribution 2.0 License
Friday, November 15, 13
158. Reduction: Benefits
•Indefinite storage in Amazon S3
•Fear of commitment achievement: Unlocked
•Can be aggressive about hiding metrics
•High granularity for special days
Copyright: http://www.flickr.com/photos/dr_pete/
CC Attribution 2.0 License
Friday, November 15, 13
159. Reduction: Benefits
•Indefinite storage in Amazon S3
•Fear of commitment achievement: Unlocked
•Can be aggressive about hiding metrics
•High granularity for special days
•Automated for regular operations*
Copyright: http://www.flickr.com/photos/dr_pete/
CC Attribution 2.0 License
Friday, November 15, 13
160. Reduction: Benefits
•Indefinite storage in Amazon S3
•Fear of commitment achievement: Unlocked
•Can be aggressive about hiding metrics
•High granularity for special days
•Automated for regular operations*
•Not in critical path for visibility SLA
Copyright: http://www.flickr.com/photos/dr_pete/
CC Attribution 2.0 License
Friday, November 15, 13
161. Reduction: Benefits
•Indefinite storage in Amazon S3
•Fear of commitment achievement: Unlocked
•Can be aggressive about hiding metrics
•High granularity for special days
•Automated for regular operations*
•Not in critical path for visibility SLA
•Firewalls accidental metric explosions
Copyright: http://www.flickr.com/photos/dr_pete/
CC Attribution 2.0 License
Friday, November 15, 13
162. Reduction: Benefits
•Indefinite storage in Amazon S3
•Fear of commitment achievement: Unlocked
•Can be aggressive about hiding metrics
•High granularity for special days
•Automated for regular operations*
•Not in critical path for visibility SLA
•Firewalls accidental metric explosions
•Huge efficiency gains
Copyright: http://www.flickr.com/photos/dr_pete/
CC Attribution 2.0 License
Friday, November 15, 13
172. Previews
•Self-service for special requests
Copyright: http://www.flickr.com/photos/creativealan/
CC Attribution 2.0 License
Friday, November 15, 13
173. Previews
•Self-service for special requests
•Different instance types
Copyright: http://www.flickr.com/photos/creativealan/
CC Attribution 2.0 License
Friday, November 15, 13
174. Previews
•Self-service for special requests
•Different instance types
•cr1.8xlarge
Copyright: http://www.flickr.com/photos/creativealan/
CC Attribution 2.0 License
Friday, November 15, 13
175. Previews
•Self-service for special requests
•Different instance types
•cr1.8xlarge
•hi1.4xlarge
Copyright: http://www.flickr.com/photos/creativealan/
CC Attribution 2.0 License
Friday, November 15, 13
176. Previews
•Self-service for special requests
•Different instance types
•cr1.8xlarge
•hi1.4xlarge
•Multi-tiered metric visibility
Copyright: http://www.flickr.com/photos/creativealan/
CC Attribution 2.0 License
Friday, November 15, 13
183. And a Last Word About Costs
Friday, November 15, 13
184. And a Last Word About Costs
Friday, November 15, 13
185. And a Last Word About Costs
•Priorities Reminder
Friday, November 15, 13
186. And a Last Word About Costs
•Priorities Reminder
•Speed of Innovation
Friday, November 15, 13
187. And a Last Word About Costs
•Priorities Reminder
•Speed of Innovation
•Availability
Friday, November 15, 13
188. And a Last Word About Costs
•Priorities Reminder
•Speed of Innovation
•Availability
•Cost
Friday, November 15, 13
189. And a Last Word About Costs
•Priorities Reminder
•Speed of Innovation
•Availability
•Cost
•Never intended to lower costs
Friday, November 15, 13
190. And a Last Word About Costs
•Priorities Reminder
•Speed of Innovation
•Availability
•Cost
•Never intended to lower costs
•Cloud migration
Friday, November 15, 13
191. And a Last Word About Costs
•Priorities Reminder
•Speed of Innovation
•Availability
•Cost
•Never intended to lower costs
•Cloud migration
•Additional features
Friday, November 15, 13
192. And a Last Word About Costs
•Priorities Reminder
•Speed of Innovation
•Availability
•Cost
•Never intended to lower costs
•Cloud migration
•Additional features
•Massive Performance
Friday, November 15, 13
193. And a Last Word About Costs
•Priorities Reminder
•Speed of Innovation
•Availability
•Cost
•Never intended to lower costs
•Cloud migration
•Additional features
•Massive Performance
Friday, November 15, 13
196. Please give us your feedback on this
presentation
BDT302
As a thank you, we will select prize
winners daily for completed surveys!
Friday, November 15, 13
Thank You