Introduction to the Netflix Cloud Architecture Tutorial - discusses the why and what of cloud including the thinking behind Netflix choice of AWS, and the product features that Netflix runs in the cloud.
Unleash Your Potential - Namagunga Girls Coding Club
Cloud Architecture Tutorial - Why and What (1of 3)
1. Cloud
Architecture
Tutorial
How
Ne3lix
Built
a
Scalable
Java
oriented
PaaS
running
on
AWS
Part
1
of
3
Qcon
London
March
5th,
2012
Adrian
Cockcro6
@adrianco
#ne:lixcloud
h>p://www.linkedin.com/in/adriancockcro6
2. Tutorial
Abstract
–
Set
Context
• StarJng
with
the
usual
quesJons:
“Why
Ne:lix,
why
cloud,
why
AWS?”
• This
tutorial
explains
which
business
models
and
applicaJons
benefit
most
from
cloud,
what
to
look
for
in
a
cloud
provider,
and
how
the
tradiJonal
enterprise
compuJng
marketplace
is
being
disrupted.
• Moving
on
to
the
next
quesJon:
“What
can
run
in
the
cloud?”
a
step
by
step
approach
to
cloud
migraJon
is
described,
along
with
a
varied
set
of
use
cases
for
both
customer
facing
and
internal
web
services,
big
data
analyJcs
and
bulk
computaJon.
Cloud
migraJon
starts
by
moving
developers
to
work
on
cloud
using
“boot
camp”
training
sessions,
then
a6er
building
out
the
iniJal
core
pla:orm,
the
first
applicaJons
are
launched.
• The
real
meat
of
the
tutorial
comes
when
we
look
at
how
to
construct
an
applicaJon
with
a
host
of
important
properJes:
elasJc,
dynamic,
scalable,
agile,
fast,
cheap,
robust,
durable,
observable,
secure.
Over
the
last
three
years
Ne:lix
has
figured
out
cloud
based
soluJons
with
these
properJes,
deployed
them
globally
at
large
scale
and
refined
them
into
a
global
Java
oriented
Pla:orm
as
a
Service.
The
PaaS
is
based
on
low
cost
open
source
building
blocks
such
as
Apache
Tomcat,
Apache
Cassandra,
and
Memcached.
Components
of
this
pla:orm
are
in
the
process
of
being
open-‐sourced
by
Ne:lix,
so
that
other
companies
can
get
a
start
on
building
their
own
customized
PaaS
that
leverages
advanced
features
of
AWS
and
supports
rapid
agile
development.
• The
architecture
is
described
in
terms
of
anJ-‐pa>erns
-‐
things
to
avoid
in
the
datacenter
to
cloud
transiJon.
A
scalable
global
persistence
Jer
based
on
Cassandra
provides
a
highly
available
and
durable
under-‐pinning.
Lessons
learned
will
cover
soluJons
to
common
problems,
availability
and
robustness,
observability.
A>endees
should
leave
the
tutorial
with
a
clear
understanding
of
what
is
different
about
cloud
architectures,
why,
what
and
how
to
make
the
transiJon,
and
a
set
of
flexible
and
scalable
open
source
building
blocks
that
can
be
used
to
construct
their
own
cloud
pla:orm.
3. PresentaJon
vs.
Tutorial
• PresentaJon
– Short
duraJon,
focused
subject
– One
presenter
to
many
anonymous
audience
– A
few
quesJons
at
the
end
• Tutorial
– Time
to
explore
in
and
around
the
subject
– Tutor
gets
to
know
the
audience
– Discussion,
rat-‐holes,
“bring
out
your
dead”
4. Tutorial
SecJons
Intro:
Who
are
you,
what
are
your
quesJons?
Part
1
-‐
Why
use
cloud,
what
runs
in
the
cloud
Part
2
-‐
Pla:orm
component
architecture
Part
3
-‐
Running
in
the
cloud
5. Adrian
Cockcro6
• Director,
Architecture
for
Cloud
Systems,
Ne:lix
Inc.
– Previously
Director
for
PersonalizaJon
Pla:orm
• DisJnguished
Availability
Engineer,
eBay
Inc.
2004-‐7
– Founding
member
of
eBay
Research
Labs
• DisJnguished
Engineer,
Sun
Microsystems
Inc.
1988-‐2004
– 2003-‐4
Chief
Architect
High
Performance
Technical
CompuJng
– 2001
Author:
Capacity
Planning
for
Web
Services
– 1999
Author:
Resource
Management
– 1995
&
1998
Author:
Sun
Performance
and
Tuning
– 1996
Japanese
EdiJon
of
Sun
Performance
and
Tuning
•
SPARC
&
Solarisパフォーマンスチューニング (サンソフトプレスシリーズ)
• More
– Twi>er
@adrianco
–
Blog
h>p://perfcap.blogspot.com
– PresentaJons
at
h>p://www.slideshare.net/adrianco
6. A>endee
IntroducJons
• Who
are
you,
where
do
you
work
• Why
are
you
here
today,
what
do
you
need
• “Bring
out
your
dead”
– Do
you
have
a
specific
problem
or
quesJon?
– One
sentence
elevator
pitch
8. Ne:lix
Inc.
With
more
than
23
million
streaming
members
in
the
United
States,
Canada,
LaBn
America,
the
United
Kingdom
and
Ireland,
NeGlix,
Inc.
is
the
world's
leading
internet
subscripBon
service
for
enjoying
movies
and
TV
series..
Source:
h>p://ir.ne:lix.com
9. What
kind
of
Cloud?
• So6ware
as
a
Service
–
SaaS
– Replaces
in
house
applicaJons
– Targets
end
users
• Pla:orm
as
a
Service
–
PaaS
– Replaces
in
house
operaJons
funcJons
– Targets
developers
• Infrastructure
as
a
Service
–
IaaS
– Replaces
in
house
datacenter
capacity
– Targets
developers
and
ITops
10. What
Ne:lix
Did
• Moved
to
SaaS
– Corporate
IT
–
Workday
etc.
– Tools
–
Pagerduty,
AppDynamics,
ElasJc
MapReduce
• Built
our
own
PaaS
<-‐
today’s
focus
– Customized
to
make
our
developers
producJve
– When
we
started,
we
had
li>le
choice
• Moved
incremental
capacity
to
IaaS
– No
new
datacenter
space
since
2008
as
we
grew
– Moved
our
streaming
apps
to
the
cloud
14. Data
Center
Ne:lix
could
not
build
new
datacenters
fast
enough
Capacity
growth
is
acceleraJng,
unpredictable
Product
launch
spikes
-‐
iPhone,
Wii,
PS3,
Xbox
InternaJonal
–
Canada,
LaJn
America,
UK/Ireland
15. Out-‐Growing
Data
Center
h>p://techblog.ne:lix.com/2011/02/redesigning-‐ne:lix-‐api.html
37x
Growth
Jan
2010-‐Jan
2011
Datacenter
Capacity
16. Ne:lix.com
is
now
~100%
Cloud
A
few
small
back
end
data
sources
sJll
in
progress
All
internaJonal
product
is
cloud
based
USA
specific
logisJcs
remains
in
the
Datacenter
Working
on
SOX,
PCI
as
scope
starts
to
include
AWS
17. Ne:lix
Choice
was
AWS
with
our
own
pla:orm
and
tools
Unique
pla:orm
requirements
and
extreme
scale,
agility
and
flexibility
18. Leverage
AWS
Scale
“the
biggest
public
cloud”
AWS
investment
in
features
and
automaJon
Use
AWS
zones
and
regions
for
high
availability,
scalability
and
global
deployment
19. But
isn’t
Amazon
a
compeJtor?
Many
products
that
compete
with
Amazon
run
on
AWS
We
are
a
“poster
child”
for
the
AWS
Architecture
Ne:lix
is
one
of
the
biggest
AWS
customers
Co-‐opeJJon
-‐
compeJtors
are
also
partners
20. Could
Ne:lix
use
another
cloud?
Would
be
nice,
we
use
three
interchangeable
CDN
Vendors
But
no-‐one
else
has
the
scale
and
features
of
AWS
You
have
to
be
this
tall
to
ride
this
ride…
Maybe
in
2-‐3
years?
21. We
want
to
use
clouds,
we
don’t
have
Jme
to
build
them
Public
cloud
for
agility
and
scale
We
use
electricity
too,
but
don’t
want
to
build
our
own
power
staJon…
AWS
because
they
are
big
enough
to
allocate
thousands
of
instances
per
hour
when
we
need
to
22. Amazon Cloud Terminology Reference
See http://aws.amazon.com/ This is not a full list of Amazon Web Service features
• AWS
–
Amazon
Web
Services
(common
name
for
Amazon
cloud)
• AMI
–
Amazon
Machine
Image
(archived
boot
disk,
Linux,
Windows
etc.
plus
applicaJon
code)
• EC2
–
ElasJc
Compute
Cloud
– Range
of
virtual
machine
types
m1,
m2,
c1,
cc,
cg.
Varying
memory,
CPU
and
disk
configuraJons.
– Instance
–
a
running
computer
system.
Ephemeral,
when
it
is
de-‐allocated
nothing
is
kept.
– Reserved
Instances
–
pre-‐paid
to
reduce
cost
for
long
term
usage
– Availability
Zone
–
datacenter
with
own
power
and
cooling
hosJng
cloud
instances
– Region
–
group
of
Avail
Zones
–
US-‐East,
US-‐West,
EU-‐Eire,
Asia-‐Singapore,
Asia-‐Japan,
SA-‐Brazil,
US-‐Gov
• ASG
–
Auto
Scaling
Group
(instances
booJng
from
the
same
AMI)
• S3
–
Simple
Storage
Service
(h>p
access)
• EBS
–
ElasJc
Block
Storage
(network
disk
filesystem
can
be
mounted
on
an
instance)
• RDS
–
RelaJonal
Database
Service
(managed
MySQL
master
and
slaves)
• DynamoDB/SDB
–
Simple
Data
Base
(hosted
h>p
based
NoSQL
datastore,
DynamoDB
replaces
SDB)
• SQS
–
Simple
Queue
Service
(h>p
based
message
queue)
• SNS
–
Simple
NoJficaJon
Service
(h>p
and
email
based
topics
and
messages)
• EMR
–
ElasJc
Map
Reduce
(automaJcally
managed
Hadoop
cluster)
• ELB
–
ElasJc
Load
Balancer
• EIP
–
ElasJc
IP
(stable
IP
address
mapping
assigned
to
instance
or
ELB)
• VPC
–
Virtual
Private
Cloud
(single
tenant,
more
flexible
network
and
security
constructs)
• DirectConnect
–
secure
pipe
from
AWS
VPC
to
external
datacenter
• IAM
–
IdenJty
and
Access
Management
(fine
grain
role
based
security
keys)
23. AWS
and
the
Seven
Dwarfs
• Public
Cloud
AlternaJves
to
AWS
– Far
fewer
features,
much
smaller
scale
– Less
mature
APIs,
many
variants
of
APIs
– Some
have
addiJonal
features
or
performance
• Private
Cloud
AlternaJves
– O6en
harder
to
build
and
run
than
you
think
– Without
scale
and
mulJ-‐tenancy,
much
higher
costs
– O6en
driven
by
ITops
needs
rather
than
developers
24. Some
AlternaJve
Public
Clouds
IaaS
that
you
could
build
your
own
PaaS
architecture
on
• OpenStack
Based
– Rackspace
– HP
Cloud
– ATT
Cloud
• GoGrid
• Joyent
–
Solaris
in
the
cloud
• Memset
–
UK
based
25. What
about
other
PaaS?
• CloudFoundry
–
Open
Source
by
VMWare
– Developer-‐friendly,
easy
to
get
started
– Missing
scale
and
some
enterprise
features
• Rightscale
– Widely
used
to
abstract
away
from
AWS
– Creates
it’s
own
lock-‐in
problem…
• AWS
is
growing
into
this
space
– We
didn’t
want
a
vendor
between
us
and
AWS
– We
wanted
to
build
a
thin
PaaS,
that
gets
thinner
26. Enterprise
Market
DisrupJon
• Enterprise
CompuJng
Vendors
$$$$$$
– IBM,
HP,
Dell,
Oracle,
EMC,
NetApp…
– CIO/ITOps
integrates
and
provisions
– TradiJonal,
and
moving
towards
private
clouds
• IaaS
Vendors
Sell
Directly
to
Developers
$
– Bypassing
ITOps
with
stealth
cloud
based
projects
– Bypassing
enterprise
vendor
supply
chain
– Low
margin,
low
fricJon,
dollar
at
a
Jme
27. What
Runs
in
the
Cloud?
Step
by
Step
Ne:lix
Product
TransiJon
28. Ne:lix
Deployed
on
AWS
2009
2009
2010
2010
2010
2011
Content
Logs
Play
WWW
API
CS
Video
InternaJonal
Masters
S3
DRM
Sign-‐Up
Metadata
CS
lookup
Device
DiagnosJcs
EC2
EMR
Hadoop
CDN
rouJng
Search
Config
&
AcJons
Movie
TV
Movie
Customer
S3
Hive
Bookmarks
Choosing
Choosing
Call
Log
Business
Social
CDNs
Logging
RaJngs
Facebook
CS
AnalyJcs
Intelligence
29. Movie
Encoding
farm
(2009)
• Tens
of
thousands
of
videos
Content
• Thousands
of
EC2
instances
Video
• Encoding
apps
on
Windows/Linux
Masters
• ~100
files
per
video
• Petabytes
of
S3
EC2
• Content
Delivery
Networks
S3
“NeGlix
is
one
of
the
largest
customers
of
the
biggest
CDNs
Level3,
Akamai
and
Limelight”
CDNs
30. Cloud
Encoding
Pipeline
Encode
S3
Encode
S3
Movie
Master
Network
S3
Copy
to
CDN
Stream
Studios
Ne:lix
Master
Mezza-‐ Mezza-‐ to
~100
Origin
Origin
Tape
Upload
nine
files
CDN
to
TV
nine
files
Licensed
content
is
provided
to
Ne:lix
via
Post
ProducJon
Companies
Many
formats
are
reduced
to
a
single
high
quality
mezzanine
format
on
S3
Individual
formats
and
speeds
etc.
are
encoded
in
~100
files
Many
formats
for
older
and
newer
hardware
and
various
game
consoles
Many
speeds
from
mobile
through
standard
and
high
definiJon
SubJtles
for
many
languages,
sJll
frames
etc.
StaJc
files
are
copied
to
each
Content
Delivery
Network’s
“origin
server”
CDNs
migrate
files
to
“edge
servers”
near
the
end
user
as
needed
Files
stream
to
PC/Mac/iPad
or
TV
over
HTTP
using
“range
get”
to
move
chunks
31. Ne:lix
EC2
Instances
per
Account
(summer
2010,
producJon
is
much
higher
now…)
“Many
Thousands”
Content
Encoding
Test
and
ProducJon
Log
Analysis
“Several
Months”
32. Hadoop
-‐
ElasJc
Map-‐Reduce
(2009)
• Web
Access
Logs
Logs
• Streaming
Service
Logs
S3
• Terabytes
per
day
scale
EMR
• Easy
Hadoop
via
Amazon
EMR
Hadoop
• Hive
SQL
“Data
Mart”
Hive
• Gateway
to
Datacenter
BI
with
Pig
See
www.slideshare.net/Ne:lix
for
more
details
Business
Intelligence
33. Streaming
Service
Back-‐end
(early
2010)
• PC/Mac
Silverlight
Player
Support
Play
• Highly
available
“play
bu>on”
DRM
• DRM
Key
Management
CDN
• Generate
route
to
stream
on
CDN
rouJng
• Lookup
bookmark
for
user/movie
Bookmarks
• Update
bookmark
for
user/movie
• Log
quality
of
service
Logging
34. Web
site,
a
page
at
a
Jme
(through
2010
and
2011)
• Clean
presentaJon
layer
rewrite
WWW
• Search
auto-‐complete
Signup
• Search
backend
and
landing
page
• Movie
and
genre
choosing
Search
• Star
raJngs
and
recommendaJons
• Similar
movies
Movie
• Page
by
page
to
100%
of
views
Choosing
(Account
signup
parJally
migrated
in
2011)
RaJngs
35. API
for
TV
devices
and
iPhone
etc.
(2010)
• Public
API:
developer.ne:lix.com
API
• Interfaces
to
everything
else
Metadata
• TV
Device
ConfiguraJon
• Personalized
movie
choosing
Device
Config
• Facebook
integraJon
TV
Movie
• See
presentaJons
on
slideshare
Choosing
“NeGlix
is
an
API
for
streaming
to
TVs
Social
(we
also
do
DVD’s
and
a
web
site)”
Facebook
36. Customer
Service
Tools
(2011)
• Support
external
contract
CS
CS
– Canada
–
French
Canadian
InternaJonal
CS
lookup
– LaJn
America
–
Spanish/Portuguese
– Future
flexibility
worldwide
DiagnosJcs
&
AcJons
• Needed
CS
tools
for
Partners
– Rewrote
for
cloud
architecture
Customer
Call
Log
– Migrated
from
Oracle
to
Cassandra
– Web
based
SOA
tools
CS
AnalyJcs
37. Takeaway
NeGlix
has
built
and
deployed
a
scalable
global
PlaGorm
as
a
Service.
Key
components
of
the
NeGlix
PaaS
are
being
released
as
Open
Source
projects
so
you
can
build
your
own
custom
PaaS.
h>p://github.com/Ne:lix
h>p://techblog.ne:lix.com
h>p://slideshare.net/Ne:lix
h>p://www.linkedin.com/in/adriancockcro6
@adrianco
#ne:lixcloud
End
of
Part
1
of
3