SlideShare une entreprise Scribd logo
1  sur  66
Ne#lix	
  Cloud	
  Architecture	
  

 Velocity	
  Conference	
  June	
  14,	
  2011	
  
             Adrian	
  Cockcro=	
  
  @adrianco	
  #ne#lixcloud	
  h@p://slideshare.net/adrianco	
  
                acockcro=@ne#lix.com	
  
Who,	
  Why,	
  What	
  
           Ne#lix	
  in	
  the	
  Cloud	
  
   Cloud	
  Challenges	
  and	
  Learnings	
  
                   (Ignite)	
  
Systems	
  and	
  OperaOons	
  Architecture	
  
                        	
  
Ne#lix	
  Inc.	
  
     With	
  more	
  than	
  23	
  million	
  subscribers	
  in	
  the	
  United	
  
     States	
  and	
  Canada,	
  Ne9lix,	
  Inc.	
  is	
  the	
  world’s	
  leading	
  
     Internet	
  subscripAon	
  service	
  for	
  enjoying	
  movies	
  and	
  
                                      TV	
  shows.	
  
                                             	
  
                           InternaAonal	
  Expansion	
  
     We	
  plan	
  to	
  expand	
  into	
  an	
  addiAonal	
  market	
  in	
  the	
  
     second	
  half	
  of	
  2011…	
  If	
  the	
  second	
  market	
  meets	
  our	
  
     expectaAons…	
  we	
  will	
  conAnue	
  to	
  invest	
  and	
  expand	
  
                              aggressively	
  in	
  2012.	
  
Source:	
  h@p://ir.ne#lix.com	
  
Unlimited	
  streaming	
  for	
  $7.99/month,	
  large	
  and	
  growing	
  catalog	
  of	
  movies	
  and	
  TV	
  
Adrian	
  Cockcro=	
  
•  Director,	
  Architecture	
  for	
  Cloud	
  Systems,	
  Ne#lix	
  Inc.	
  
      –  Previously	
  Director	
  for	
  PersonalizaOon	
  Pla#orm	
  

•  DisOnguished	
  Availability	
  Engineer,	
  eBay	
  Inc.	
  2004-­‐7	
  
      –  Founding	
  member	
  of	
  eBay	
  Research	
  Labs	
  

•  DisOnguished	
  Engineer,	
  Sun	
  Microsystems	
  Inc.	
  1988-­‐2004	
  
      –    2003-­‐4	
  Chief	
  Architect	
  High	
  Performance	
  Technical	
  CompuOng	
  
      –    2001	
  Author:	
  Capacity	
  Planning	
  for	
  Web	
  Services	
  
      –    1999	
  Author:	
  Resource	
  Management	
  
      –    1995	
  &	
  1998	
  Author:	
  Sun	
  Performance	
  and	
  Tuning	
  
      –    1996	
  Japanese	
  EdiOon	
  of	
  Sun	
  Performance	
  and	
  Tuning	
  
             •  	
  SPARC	
  &	
  Solaris                     (                          )	
  
Why	
  is	
  Ne#lix	
  Talking	
  about	
  
               Cloud?	
  
Ne#lix	
  is	
  Path-­‐finding	
  

   The	
  Cloud	
  ecosystem	
  is	
  evolving	
  very	
  fast	
  
Share	
  with	
  and	
  learn	
  from	
  the	
  cloud	
  community	
  
We	
  want	
  to	
  use	
  clouds,	
  
             not	
  build	
  them	
  
   Cloud	
  technology	
  should	
  be	
  a	
  commodity	
  
Public	
  cloud	
  and	
  open	
  source	
  for	
  agility	
  and	
  scale	
  
Why	
  Use	
  Cloud?	
  
                         	
  
        For	
  Be@er	
  Business	
  Agility	
  
For	
  Unpredictable	
  Business	
  Growth	
  
Data	
  Center	
                  Ne#lix	
  could	
  not	
  
                                     build	
  new	
  
                                  datacenters	
  fast	
  
                                      enough	
  

  Capacity	
  growth	
  is	
  acceleraOng,	
  unpredictable	
  
  Product	
  launch	
  spikes	
  -­‐	
  iPhone,	
  Wii,	
  PS3,	
  XBox	
  
23	
  Million	
  Customers	
  
                        2011-­‐Q1	
  year/year	
  customers	
  +69%	
  	
  
          25	
  

           20	
  

           15	
  

            10	
  

               5	
  

                0	
  




Source:	
  h@p://ir.ne#lix.com	
  
Out-­‐Growing	
  Data	
  Center	
  
             h@p://techblog.ne#lix.com/2011/02/redesigning-­‐ne#lix-­‐api.html   	
  


                               37x	
  Growth	
  Jan	
  
                               2010-­‐Jan	
  2011	
  


Datacenter	
  
Capacity	
  
Ne#lix.com	
  is	
  now	
  ~100%	
  Cloud	
  

   Account	
  sign-­‐up	
  is	
  currently	
  being	
  moved	
  to	
  cloud	
  
     All	
  internaOonal	
  product	
  will	
  be	
  cloud	
  based	
  
    USA	
  specific	
  logisOcs	
  remains	
  in	
  the	
  Datacenter	
  	
  
Leverage	
  AWS	
  Scale	
  
   “the	
  biggest	
  public	
  cloud”	
  
       AWS	
  investment	
  in	
  tooling	
  and	
  automaOon	
  
Use	
  many	
  AWS	
  zones	
  for	
  high	
  availability,	
  scalability	
  
       AWS	
  skills	
  are	
  most	
  common	
  on	
  resumes…	
  
Leverage	
  AWS	
  Feature	
  Set	
  
      “the	
  market	
  leader”	
  
EC2,	
  S3,	
  SDB,	
  SQS,	
  EBS,	
  EMR,	
  ELB,	
  ASG,	
  IAM,	
  RDB,	
  VPC…	
  
                        h@p://aws.amazon.com/jp	
  
“The	
  cloud	
  lets	
  its	
  users	
  focus	
  
         on	
  delivering	
  differenAaAng	
  
         business	
  value	
  instead	
  of	
  
         wasAng	
  valuable	
  resources	
  
         on	
  the	
  undifferen)ated	
  
         heavy	
  li0ing	
  that	
  makes	
  
         up	
  most	
  of	
  IT	
  
         infrastructure.”	
  
	
  
     	
  Werner	
  Vogels	
  
     	
  Amazon	
  CTO	
  
	
  
We	
  want	
  to	
  use	
  clouds,	
  
we	
  don’t	
  have	
  Ome	
  to	
  build	
  them	
  
                  Public	
  cloud	
  for	
  agility	
  and	
  scale	
  
 AWS	
  because	
  they	
  are	
  big	
  enough	
  to	
  allocate	
  thousands	
  
           of	
  instances	
  per	
  hour	
  when	
  we	
  need	
  to	
  
Ne#lix	
  EC2	
  Instances	
  per	
  Account	
  
          (summer	
  2010,	
  producOon	
  is	
  much	
  higher	
  now…)	
  
“Many	
  Thousands”	
  




           Content	
  Encoding	
  




          Test	
  and	
  ProducOon	
  
                                             Log	
  Analysis	
  

                                         “Several	
  Months”	
  
Ne#lix	
  Deployed	
  on	
  AWS	
  

Content	
            Logs	
             Play	
          WWW	
             API	
  
    Video	
  
                           S3	
            DRM	
          Sign-­‐Up	
     Metadata	
  
   Masters	
  


                        EMR	
              CDN	
                            Device	
  
     EC2	
                                                Search	
  
                       Hadoop	
           rouOng	
                          Config	
  


                                                          Movie	
         TV	
  Movie	
  
      S3	
               Hive	
         Bookmarks	
  
                                                         Choosing	
       Choosing	
  

                       Business	
                                          Mobile	
  
     CDN	
                               Logging	
        RaOngs	
  
                     Intelligence	
                                        iPhone	
  
Cloud	
  Encoding	
  Pipeline	
  

                                                                   Encode	
       S3	
      Encode	
              S3	
  
Movie	
       Master	
                  Network	
      S3	
                                                                  Copy	
  to	
      CDN	
       Stream	
  
Studios	
                  Ne#lix	
                   Master	
     Mezza-­‐     Mezza-­‐    to	
  	
  50+	
     Origin	
  
                                                                                                                                              Origin	
  
              Tapes	
                   Upload	
                                 nine	
                          files	
       CDN	
                         to	
  TV	
  
                                                                    nine	
                   files	
  




     Licensed	
  content	
  is	
  provided	
  to	
  Ne#lix	
  as	
  high	
  quality	
  master	
  tapes	
  
     Many	
  formats	
  are	
  reduced	
  to	
  a	
  single	
  high	
  quality	
  mezzanine	
  format	
  on	
  S3	
  
     Individual	
  formats	
  and	
  speeds	
  are	
  encoded	
  in	
  over	
  50	
  combinaOons	
  
          	
  Many	
  formats	
  for	
  older	
  and	
  newer	
  hardware	
  and	
  various	
  game	
  consoles	
  
          	
  Many	
  speeds	
  from	
  mobile	
  through	
  standard	
  and	
  high	
  definiOon	
  
     StaOc	
  files	
  are	
  copied	
  to	
  each	
  Content	
  Delivery	
  Network’s	
  “origin	
  server”	
  
     CDNs	
  migrate	
  files	
  to	
  “edge	
  servers”	
  near	
  the	
  end	
  user	
  
     Files	
  stream	
  to	
  PC/Mac/iPad	
  or	
  TV	
  over	
  HTTP	
  using	
  “range	
  get”	
  to	
  move	
  chunks	
  
Cloud	
  Architecture	
  

         Ignite!	
  
Product	
  Trade-­‐off	
  
User	
  Experience	
     ImplementaOon	
  




  Consistent	
           Development	
  
  Experience	
            complexity	
  


                          OperaOonal	
  
 Low	
  Latency	
  
                          complexity	
  
Ne#lix	
  Cloud	
  Goals	
  
•  Faster	
  
     –  Lower	
  latency	
  than	
  the	
  equivalent	
  datacenter	
  web	
  pages	
  and	
  API	
  calls	
  
     –  Measured	
  as	
  mean	
  and	
  99th	
  percenOle	
  
     –  For	
  both	
  first	
  hit	
  (e.g.	
  home	
  page)	
  and	
  in-­‐session	
  hits	
  for	
  the	
  same	
  user	
  
•  Scalable	
  
     –  Avoid	
  needing	
  any	
  more	
  datacenter	
  capacity	
  as	
  subscriber	
  count	
  increases	
  
     –  No	
  central	
  verOcally	
  scaled	
  databases	
  
     –  Leverage	
  AWS	
  elasOc	
  capacity	
  effecOvely	
  
•  Available	
  
     –  SubstanOally	
  higher	
  robustness	
  and	
  availability	
  than	
  datacenter	
  services	
  
     –  Leverage	
  mulOple	
  AWS	
  availability	
  zones	
  
     –  No	
  scheduled	
  down	
  Ome,	
  no	
  central	
  database	
  schema	
  to	
  change	
  
•  ProducOve	
  
     –  OpOmize	
  agility	
  of	
  a	
  large	
  development	
  team	
  with	
  automaOon	
  and	
  tools	
  
     –  Leave	
  behind	
  complex	
  tangled	
  datacenter	
  code	
  base	
  (~8	
  year	
  old	
  architecture)	
  
     –  Enforce	
  clean	
  layered	
  interfaces	
  and	
  re-­‐usable	
  components	
  
Old	
  Datacenter	
  vs.	
  New	
  Cloud	
  Arch	
  
    Central	
  SQL	
  Database	
          Distributed	
  Key/Value	
  NoSQL	
  

 SOcky	
  In-­‐Memory	
  Session	
         Shared	
  Memcached	
  Session	
  

       Cha@y	
  Protocols	
                 Latency	
  Tolerant	
  Protocols	
  

 Tangled	
  Service	
  Interfaces	
         Layered	
  Service	
  Interfaces	
  

     Instrumented	
  Code	
              Instrumented	
  Service	
  Pa@erns	
  

    Fat	
  Complex	
  Objects	
          Lightweight	
  Serializable	
  Objects	
  

  Components	
  as	
  Jar	
  Files	
         Components	
  as	
  Services	
  
The	
  Central	
  SQL	
  Database	
  
•  Datacenter	
  has	
  a	
  central	
  database	
  
   –  Everything	
  in	
  one	
  place	
  is	
  convenient	
  unOl	
  it	
  fails	
  
   –  Customers,	
  movies,	
  history,	
  configuraOon	
  
•  Schema	
  changes	
  require	
  downOme	
  
                              	
  
    AnA-­‐paUern	
  impacts	
  scalability,	
  availability	
  
The	
  Distributed	
  Key-­‐Value	
  Store	
  
•  Cloud	
  has	
  many	
  key-­‐value	
  data	
  stores	
  
    –  More	
  complex	
  to	
  keep	
  track	
  of,	
  do	
  backups	
  etc.	
  
    –  Each	
  store	
  is	
  much	
  simpler	
  to	
  administer	
   DBA	
  
    –  Joins	
  take	
  place	
  in	
  java	
  code	
  
•  No	
  schema	
  to	
  change,	
  no	
  scheduled	
  downOme	
  
•  Latency	
  for	
  Memcached	
  vs.	
  Oracle	
  vs.	
  SimpleDB	
  
    –  Memcached	
  is	
  dominated	
  by	
  network	
  latency	
  <1ms	
  
    –  Oracle	
  for	
  simple	
  queries	
  is	
  a	
  few	
  milliseconds	
  
    –  SimpleDB	
  has	
  replicaOon	
  and	
  REST	
  overheads	
  >10ms	
  
The	
  SOcky	
  Session	
  
•  Datacenter	
  SOcky	
  Load	
  Balancing	
  
   –  Efficient	
  caching	
  for	
  low	
  latency	
  
   –  Tricky	
  session	
  handling	
  code	
  
   –  Middle	
  Oer	
  load	
  balancer	
  has	
  issues	
  in	
  pracOce	
  
•  Encourages	
  concentrated	
  funcOonality	
  
   –  one	
  service	
  that	
  does	
  everything	
  
                              	
  
  AnA-­‐paUern	
  impacts	
  producAvity,	
  availability	
  
The	
  Shared	
  Session	
  
•  Cloud	
  Uses	
  Round-­‐Robin	
  Load	
  Balancing	
  
    –  Simple	
  request-­‐based	
  code	
  
    –  External	
  shared	
  caching	
  with	
  memcached	
  
•  More	
  flexible	
  fine	
  grain	
  services	
  
    –  Works	
  be@er	
  with	
  auto-­‐scaled	
  instance	
  counts	
  
Cha@y	
  Opaque	
  and	
  Bri@le	
  Protocols	
  
•  Datacenter	
  service	
  protocols	
  
    –  Assumed	
  low	
  latency	
  for	
  many	
  simple	
  requests	
  
•  Based	
  on	
  serializing	
  exisOng	
  java	
  objects	
  
    –  Inefficient	
  formats	
  
    –  IncompaOble	
  when	
  definiOons	
  change	
  
                               	
  
   AnA-­‐paUern	
  causes	
  producAvity,	
  latency	
  and	
  
                     availability	
  issues	
  
Robust	
  and	
  Flexible	
  Protocols	
  
•  Cloud	
  service	
  protocols	
  
   –  JSR311/Jersey	
  is	
  used	
  for	
  REST/HTTP	
  service	
  calls	
  
   –  Custom	
  client	
  code	
  includes	
  service	
  discovery	
  
   –  Support	
  complex	
  data	
  types	
  in	
  a	
  single	
  request	
  
•  Apache	
  Avro	
  
   –  Evolved	
  from	
  Protocol	
  Buffers	
  and	
  Thri=	
  
   –  Includes	
  JSON	
  header	
  defining	
  key/value	
  protocol	
  
   –  Avro	
  serializaOon	
  is	
  half	
  the	
  size	
  and	
  several	
  Omes	
  
      faster	
  than	
  Java	
  serializaOon,	
  more	
  work	
  to	
  code	
  
Persisted	
  Protocols	
  
•  Persist	
  Avro	
  in	
  Memcached	
  
   –  Save	
  space/latency	
  (zigzag	
  encoding,	
  half	
  the	
  size)	
  
   –  Less	
  bri@le	
  across	
  versions	
  
   –  New	
  keys	
  are	
  ignored	
  
   –  Missing	
  keys	
  are	
  handled	
  cleanly	
  
•  Avro	
  protocol	
  definiOons	
  
   –  Can	
  be	
  wri@en	
  in	
  JSON	
  or	
  generated	
  from	
  POJOs	
  
   –  It’s	
  hard,	
  needs	
  be@er	
  tooling	
  
Tangled	
  Service	
  Interfaces	
  
•  Datacenter	
  implementaOon	
  is	
  exposed	
  
   –  Oracle	
  SQL	
  queries	
  mixed	
  into	
  business	
  logic	
  
•  Tangled	
  code	
  
   –  Deep	
  dependencies,	
  false	
  sharing	
  
•  Data	
  providers	
  with	
  sideways	
  dependencies	
  
   –  Everything	
  depends	
  on	
  everything	
  else	
  


   AnA-­‐paUern	
  affects	
  producAvity,	
  availability	
  
Untangled	
  Service	
  Interfaces	
  
•  New	
  Cloud	
  Code	
  With	
  Strict	
  Layering	
  
    –  Compile	
  against	
  interface	
  jar	
  
    –  Can	
  use	
  spring	
  runOme	
  binding	
  to	
  enforce	
  
•  Service	
  interface	
  is	
  the	
  service	
  
    –  ImplementaOon	
  is	
  completely	
  hidden	
  
    –  Can	
  be	
  implemented	
  locally	
  or	
  remotely	
  
    –  ImplementaOon	
  can	
  evolve	
  independently	
  
Untangled	
  Service	
  Interfaces	
  
Two	
  layers:	
  
•  SAL	
  -­‐	
  Service	
  Access	
  Library	
  
    –  Basic	
  serializaOon	
  and	
  error	
  handling	
  
    –  REST	
  or	
  POJO’s	
  defined	
  by	
  data	
  provider	
  
•  ESL	
  -­‐	
  Extended	
  Service	
  Library	
  
    –  Caching,	
  conveniences	
  
    –  Can	
  combine	
  several	
  SALs	
  
    –  Exposes	
  faceted	
  type	
  system	
  (described	
  later)	
  
    –  Interface	
  defined	
  by	
  data	
  consumer	
  in	
  many	
  cases	
  
Service	
  InteracOon	
  Pa@ern	
  
    Sample	
  Swimlane	
  Diagram	
  
Service	
  Architecture	
  Pa@erns	
  
•  Internal	
  Interfaces	
  Between	
  Services	
  
   –  Common	
  pa@erns	
  as	
  templates	
  
   –  Highly	
  instrumented,	
  observable,	
  analyOcs	
  
   –  Service	
  Level	
  Agreements	
  –	
  SLAs	
  
•  Library	
  templates	
  for	
  generic	
  features	
  
   –  Instrumented	
  Ne#lix	
  Base	
  Servlet	
  template	
  
   –  Instrumented	
  generic	
  client	
  interface	
  template	
  
   –  Instrumented	
  S3,	
  SimpleDB,	
  Memcached	
  clients	
  
CLIENT	
  
                                                                  Request	
  Start	
  
                                                                   Timestamp,	
               Client	
  
                                          Inbound	
               Request	
  End	
          outbound	
  
                                       deserialize	
  end	
        Timestamp	
            serialize	
  start	
  
                                         Omestamp	
  
                                                                                           Omestamp	
  

                  Inbound	
                                                                                            Client	
  
                 deserialize	
                                                                                      outbound	
  
                    start	
                                                                                        serialize	
  end	
  
                 Omestamp	
                                                                                         Omestamp	
  




Client	
  network	
  
    receive	
  
  Omestamp	
  
                                       Service	
  Request	
                                                                       Client	
  Network	
  
                                                                                                                                       send	
  
                                                                                                                                    Omestamp	
  



                                      Instruments	
  Every	
  
   Service	
  
network	
  send	
  
 Omestamp	
  
                                        Step	
  in	
  the	
  call	
                                                                   Service	
  
                                                                                                                                      Network	
  
                                                                                                                                      receive	
  
                                                                                                                                     Omestamp	
  




                  Service	
                                                                                           Service	
  
                outbound	
                                                                                           inbound	
  
               serialize	
  end	
                                                                                  serialize	
  start	
  
                Omestamp	
                                                                                          Omestamp	
  

                                           Service	
                                         Service	
  
                                          outbound	
                                        inbound	
  
                                        serialize	
  start	
     SERVICE	
  execute	
     serialize	
  end	
  
                                                                   request	
  start	
  
                                         Omestamp	
                                        Omestamp	
  
                                                                    Omestamp,	
  
                                                                 execute	
  request	
  
                                                                  end	
  Omestamp	
  
Boundary	
  Interfaces	
  
•  Isolate	
  teams	
  from	
  external	
  dependencies	
  
   –  Fake	
  SAL	
  built	
  by	
  cloud	
  team	
  
   –  Real	
  SAL	
  provided	
  by	
  data	
  provider	
  team	
  later	
  
   –  ESL	
  built	
  by	
  cloud	
  team	
  using	
  faceted	
  objects	
  
•  Fake	
  data	
  sources	
  allow	
  development	
  to	
  start	
  
   –  e.g.	
  Fake	
  IdenOty	
  SAL	
  for	
  a	
  test	
  set	
  of	
  customers	
  
   –  Development	
  solidifies	
  dependencies	
  early	
  
   –  Helps	
  external	
  team	
  provide	
  the	
  right	
  interface	
  
One	
  Object	
  That	
  Does	
  Everything	
  
•  Datacenter	
  uses	
  a	
  few	
  big	
  complex	
  objects	
  
    –  Movie	
  and	
  Customer	
  objects	
  are	
  the	
  foundaOon	
  
    –  Good	
  choice	
  for	
  a	
  small	
  team	
  and	
  one	
  instance	
  
    –  ProblemaOc	
  for	
  large	
  teams	
  and	
  many	
  instances	
  
•  False	
  sharing	
  causes	
  tangled	
  dependencies	
  
    –  UnproducOve	
  re-­‐integraOon	
  work	
  
                            	
  
       AnA-­‐paUern	
  impacAng	
  producAvity	
  and	
  
                         availability	
  
An	
  Interface	
  For	
  Each	
  Component	
  
•  Cloud	
  uses	
  faceted	
  Video	
  and	
  Visitor	
  
    –  Basic	
  types	
  hold	
  only	
  the	
  idenOfier	
  
    –  Facets	
  scope	
  the	
  interface	
  you	
  actually	
  need	
  
    –  Each	
  component	
  can	
  define	
  its	
  own	
  facets	
  
•  No	
  false-­‐sharing	
  and	
  dependency	
  chains	
  
    –  Type	
  manager	
  converts	
  between	
  facets	
  as	
  needed	
  
    –  video.asA(PresentaOonVideo)	
  for	
  www	
  
    –  video.asA(MerchableVideo)	
  for	
  middle	
  Oer	
  
So=ware	
  Architecture	
  Pa@erns	
  
•  Object	
  Models	
  
   –  Basic	
  and	
  derived	
  types,	
  facets,	
  serializable	
  
   –  Pass	
  by	
  reference	
  within	
  a	
  service	
  
   –  Pass	
  by	
  value	
  between	
  services	
  
•  ComputaOon	
  and	
  I/O	
  Models	
  
   –  Service	
  ExecuOon	
  using	
  Best	
  Effort	
  
   –  Common	
  thread	
  pool	
  management	
  
Ne#lix	
  Systems	
  Architecture	
  
API	
  
 AWS	
  EC2	
  
                                         Front	
  End	
  Load	
  Balancer	
  
             Discovery	
  
              Service	
                            API	
  Proxy	
                              API	
  etc.	
  

                                                Load	
  Balancer	
  


           Component	
                                  API	
               SQS	
  
            Services	
                                                                       Oracl
                                                                                              e	
  
                                                                                              Oracle	
  
                                                                                                    Oracle	
  
                     memcached	
                         memcached	
        ReplicaOon	
  



        EBS	
                                                                                NeAlix	
  
                                S3	
                                                         Data	
  Center	
  
AWS	
  Storage	
                                                      SimpleDB	
  
Database	
  MigraOon	
  
•  Why	
  SimpleDB?	
  
    –  No	
  DBA’s	
  in	
  the	
  cloud,	
  Amazon	
  hosted	
  service	
  
    –  Work	
  started	
  two	
  years	
  ago,	
  fewer	
  viable	
  opOons	
  
    –  Worked	
  with	
  Amazon	
  to	
  speed	
  up	
  and	
  scale	
  SimpleDB	
  
•  AlternaOves?	
  
    –  Rolling	
  out	
  Cassandra	
  as	
  “upgrade”	
  from	
  SimpleDB	
  
    –  Need	
  several	
  opOons	
  to	
  match	
  use	
  cases	
  well	
  
•  Detailed	
  NoSQL	
  and	
  SimpleDB	
  Advice	
  
    –  Sid	
  Anand	
  	
  -­‐	
  QConSF	
  Nov	
  5th	
  –	
  Ne#lix’	
  TransiOon	
  to	
  High	
  
       Availability	
  Storage	
  Systems	
  
    –  Blog	
  -­‐	
  h@p://pracOcalcloudcompuOng.com/	
  
    –  Download	
  Paper	
  PDF	
  -­‐	
  h@p://bit.ly/bhOTLu	
  
Cloud	
  OperaOons	
  

  Model	
  Driven	
  Architecture	
  
Capacity	
  Planning	
  &	
  Monitoring	
  
Tools	
  and	
  AutomaOon	
  
•  Developer	
  and	
  Build	
  Tools	
  
     –  Jira,	
  Perforce,	
  Eclipse,	
  Jenkins,	
  Ivy,	
  ArOfactory	
  
     –  Builds,	
  creates	
  .war	
  file,	
  .rpm,	
  bakes	
  AMI	
  and	
  launches	
  
•  Custom	
  Ne#lix	
  ApplicaOon	
  Console	
  
     –  AWS	
  Features	
  at	
  Enterprise	
  Scale	
  (hide	
  the	
  AWS	
  security	
  keys!)	
  
     –  Auto	
  Scaler	
  Group	
  is	
  unit	
  of	
  deployment	
  to	
  producOon	
  
•  Open	
  Source	
  +	
  Support	
  
     –  Apache,	
  Tomcat,	
  Cassandra,	
  Hadoop,	
  OpenJDK,	
  CentOS	
  
•  Monitoring	
  Tools	
  
     –    Keynote	
  –	
  service	
  monitoring	
  and	
  alerOng	
  
     –    Custom	
  metric	
  collecOon	
  and	
  alerOng	
  under	
  development	
  
     –    Datastax	
  OpsCenter	
  –	
  Cassandra	
  Monitoring	
  
     –    AppDynamics	
  –	
  Developer	
  focus	
  for	
  cloud	
  h@p://appdynamics.com	
  
Model	
  Driven	
  Architecture	
  
•  Datacenter	
  PracOces	
  
   –  Lots	
  of	
  unique	
  hand-­‐tweaked	
  systems	
  
   –  Hard	
  to	
  enforce	
  pa@erns	
  

•  Model	
  Driven	
  Cloud	
  Architecture	
  
   –  Perforce/Ivy/Jenkins	
  based	
  builds	
  for	
  everything	
  
   –  Every	
  producOon	
  instance	
  is	
  a	
  pre-­‐baked	
  AMI	
  
   –  Every	
  applicaOon	
  is	
  managed	
  by	
  an	
  Autoscaler	
  

                       Every	
  change	
  is	
  a	
  new	
  AMI	
  
High	
  Availability	
  Zones	
  
•  Each	
  zone	
  is	
  a	
  separate	
  datacenter	
  
    –  Private	
  power,	
  cooling,	
  network	
  connecOons	
  
    –  Located	
  close	
  together	
  for	
  low	
  latency	
  
•  ASG	
  Instances	
  are	
  distributed	
  over	
  3	
  zones	
  
•  Data	
  wri@en	
  to	
  one	
  zone	
  appears	
  in	
  all	
  zones	
  
•  Ne#lix	
  survived	
  total	
  failure	
  of	
  one	
  zone	
  (!)	
  
    –  Increase	
  capacity	
  of	
  exisOng	
  zones	
  by	
  50%	
  
    –  Small	
  or	
  zero	
  downOme	
  
Cross	
  Region	
  Backups	
  
•  Data	
  is	
  backed	
  up	
  into	
  a	
  different	
  cloud	
  region	
  
    –  Different	
  AWS	
  S3	
  account,	
  encrypted	
  for	
  security	
  
    –  AddiOonal	
  archive’s	
  created	
  on	
  a	
  different	
  vendor	
  


•  Restore	
  to	
  a	
  new	
  region	
  
    –  Create	
  model	
  driven	
  architecture	
  
    –  Send	
  traffic	
  to	
  new	
  region	
  
Model	
  Driven	
  ImplicaOons	
  
•  Automated	
  “Least	
  Privilege”	
  Security	
  
   –  Tightly	
  specified	
  security	
  groups	
  
   –  Fine	
  grain	
  IAM	
  keys	
  to	
  access	
  AWS	
  resources	
  
   –  Performance	
  tools	
  security	
  and	
  integraOon	
  


•  Model	
  Driven	
  Performance	
  Monitoring	
  
   –  Hundreds	
  of	
  instances	
  appear	
  in	
  a	
  few	
  minutes…	
  
   –  Tools	
  have	
  to	
  “garbage	
  collect”	
  dead	
  instances	
  	
  
Ne#lix	
  App	
  Console	
  
Auto	
  Scale	
  Group	
  ConfiguraOon	
  
Learnings	
  
•  Datacenter	
  oriented	
  tools	
  don’t	
  work	
  
     –  Ephemeral	
  instances	
  
     –  High	
  rate	
  of	
  change	
  
     –  Need	
  too	
  much	
  hand-­‐holding	
  and	
  manual	
  setup	
  

•  Many	
  Cloud	
  Tools	
  Don’t	
  Scale	
  for	
  Enterprise	
  
     –  Too	
  many	
  tools	
  are	
  “Startup”	
  oriented	
  
     –  Built	
  our	
  own	
  tools	
  for	
  1000’s	
  of	
  instances	
  
     –  Drove	
  vendors	
  to	
  be	
  dynamic,	
  scale,	
  add	
  APIs	
  

•  Un-­‐modified	
  Datacenter	
  Apps	
  are	
  Fragile	
  
     –  Too	
  many	
  datacenter	
  oriented	
  assumpOons	
  
     –  We	
  re-­‐wrote	
  our	
  code	
  base!	
  
     –  (We	
  re-­‐write	
  it	
  conOnuously	
  anyway)	
  
Capacity	
  Planning	
  &	
  Monitoring	
  
Capacity	
  Planning	
  in	
  Clouds	
  
                     (a	
  few	
  things	
  have	
  changed…)	
  

•    Capacity	
  is	
  expensive	
  
•    Capacity	
  takes	
  Ome	
  to	
  buy	
  and	
  provision	
  
•    Capacity	
  only	
  increases,	
  can’t	
  be	
  shrunk	
  easily	
  
•    Capacity	
  comes	
  in	
  big	
  chunks,	
  paid	
  up	
  front	
  
•    Planning	
  errors	
  can	
  cause	
  big	
  problems	
  
•    Systems	
  are	
  clearly	
  defined	
  assets	
  
•    Systems	
  can	
  be	
  instrumented	
  in	
  detail	
  
•    Depreciate	
  assets	
  over	
  3	
  years	
  (reservaOons!)	
  
Monitoring	
  Issues	
  
•  Problem	
  
   –  Too	
  many	
  tools,	
  each	
  with	
  a	
  good	
  reason	
  to	
  exist	
  
   –  Hard	
  to	
  get	
  an	
  integrated	
  view	
  of	
  a	
  problem	
  
   –  Too	
  much	
  manual	
  work	
  building	
  dashboards	
  
   –  Tools	
  are	
  not	
  discoverable,	
  views	
  are	
  not	
  filtered	
  

•  SoluOon	
  
   –  Get	
  vendors	
  to	
  add	
  deep	
  linking	
  URLs	
  and	
  APIs	
  
   –  IntegraOon	
  “portal”	
  Oes	
  everything	
  together	
  
   –  Underlying	
  dependency	
  database	
  
   –  Dynamic	
  portal	
  generaOon,	
  relevant	
  data,	
  all	
  tools	
  
Data	
  Sources	
  
                                      • External	
  URL	
  availability	
  and	
  latency	
  alerts	
  and	
  reports	
  –	
  Keynote	
  
     External	
  TesOng	
             • Stress	
  tesOng	
  -­‐	
  SOASTA	
  

                                      • Ne#lix	
  REST	
  calls	
  –	
  Chukwa	
  to	
  DataOven	
  with	
  GUID	
  transacOon	
  idenOfier	
  
 Request	
  Trace	
  Logging	
        • Generic	
  HTTP	
  –	
  AppDynamics	
  service	
  Oer	
  aggregaOon,	
  end	
  to	
  end	
  tracking	
  

                                      • Tracers	
  and	
  counters	
  –	
  log4j,	
  tracer	
  central,	
  Chukwa	
  to	
  DataOven	
  
   ApplicaOon	
  logging	
            • Trackid	
  and	
  Audit/Debug	
  logging	
  –	
  DataOven,	
  Appdynamics	
  	
  GUID	
  cross	
  reference	
  

                                      • ApplicaOon	
  specific	
  real	
  Ome	
  –	
  Nimso=,	
  Appdynamics,	
  Epic	
  
        JMX	
  	
  Metrics	
          • Service	
  and	
  SLA	
  percenOles	
  –	
  Nimso=,	
  Appdynamics,	
  Epic,logged	
  to	
  DataOven	
  

                                      • Stdout	
  logs	
  –	
  S3	
  –	
  DataOven,	
  Nimso=	
  alerOng	
  
Tomcat	
  and	
  Apache	
  logs	
     • Standard	
  format	
  Access	
  and	
  Error	
  logs	
  –	
  S3	
  –	
  DataOven,	
  Nimso=	
  AlerOng	
  

                                      • Garbage	
  CollecOon	
  –	
  Nimso=,	
  Appdynamics	
  
               JVM	
                  • Memory	
  usage,	
  call	
  stacks,	
  resource/call	
  -­‐	
  AppDynamics	
  

                                      • system	
  CPU/Net/RAM/Disk	
  metrics	
  –	
  AppDynamics,	
  Epic,	
  Nimso=	
  AlerOng	
  
              Linux	
                 • SNMP	
  metrics	
  –	
  Epic,	
  Network	
  flows	
  –	
  boundary.com	
  

                                      • Load	
  balancer	
  traffic	
  –	
  Amazon	
  Cloudwatch,	
  SimpleDB	
  usage	
  stats	
  
              AWS	
                   • System	
  configuraOon	
  	
  -­‐	
  CPU	
  count/speed	
  and	
  RAM	
  size,	
  overall	
  usage	
  -­‐	
  AWS	
  
AppDynamics	
  
        How	
  to	
  look	
  deep	
  inside	
  your	
  cloud	
  applicaOons	
  

•  AutomaOc	
  Monitoring	
  
   –  Base	
  AMI	
  bakes	
  in	
  all	
  monitoring	
  tools	
  
   –  Outbound	
  calls	
  only	
  –	
  no	
  discovery/polling	
  issues	
  
   –  InacOve	
  instances	
  removed	
  a=er	
  a	
  few	
  days	
  
   	
  
•  Incident	
  Alarms	
  (deviaOon	
  from	
  baseline)	
  
   –  Business	
  TransacOon	
  latency	
  and	
  error	
  rate	
  
   –  Alarm	
  thresholds	
  discover	
  their	
  own	
  baseline	
  
   –  Email	
  contains	
  URL	
  to	
  Incident	
  Workbench	
  UI	
  
Using	
  AppDynamics	
  
(simple	
  example	
  from	
  early	
  2010)	
  
Point	
  Finger	
  and	
  Assess	
  Impact	
  
 (an	
  async	
  S3	
  write	
  was	
  slow,	
  no	
  big	
  deal)	
  
Monitoring	
  Summary	
  
•  Broken	
  datacenter	
  oriented	
  tools	
  is	
  a	
  big	
  problem	
  

•  IntegraOng	
  many	
  different	
  tools	
  
     –  They	
  are	
  not	
  designed	
  to	
  be	
  integrated	
  
     –  We	
  have	
  “persuaded”	
  vendors	
  to	
  add	
  APIs	
  


•  If	
  you	
  can’t	
  see	
  deep	
  inside	
  your	
  app,	
  you’re	
  L	
  
Wrap	
  Up	
  
ImplicaOons	
  for	
  IT	
  OperaOons	
  
•  Cloud	
  is	
  run	
  by	
  developer	
  organizaOon	
  
    –  Our	
  IT	
  department	
  is	
  Amazon	
  Cloud	
  
    –  Forming	
  “Cloud	
  OperaOons	
  Reliability	
  Eng”	
  team	
  
    	
  
•  TradiOonal	
  IT	
  Roles	
  are	
  going	
  away	
  
    –  Don’t	
  need	
  SA,	
  DBA,	
  Storage,	
  Network	
  admins	
  
    –  Database	
  Engineering	
  Team	
  runs	
  SDB/Cassandra	
  
Next	
  Few	
  Years…	
  
•  “System	
  of	
  Record”	
  moves	
  to	
  Cloud	
  (now)	
  
      –  Master	
  copies	
  of	
  data	
  live	
  only	
  in	
  the	
  cloud,	
  with	
  backups	
  
      –  Cut	
  the	
  datacenter	
  to	
  cloud	
  replicaOon	
  link,	
  turn	
  off	
  Oracle	
  databases	
  

•  InternaOonal	
  Expansion	
  –	
  Global	
  Clouds	
  (later	
  in	
  2011)	
  
      –  Rapid	
  deployments	
  to	
  new	
  markets	
  

•  Cloud	
  StandardizaOon?	
  
      –      Cloud	
  features	
  and	
  APIs	
  should	
  be	
  a	
  commodity	
  not	
  a	
  differenOator	
  
      –      DifferenOate	
  on	
  scale	
  and	
  quality	
  of	
  service	
  
      –      CompeOOon	
  and	
  scale	
  drives	
  cost	
  down	
  
      –      Higher	
  resilience	
  and	
  scalability	
  

      	
  
      We	
  would	
  prefer	
  to	
  be	
  an	
  insignificant	
  customer	
  in	
  a	
  giant	
  cloud	
  
Takeaway	
  
                                	
  
Ne9lix	
  is	
  path-­‐finding	
  the	
  use	
  of	
  public	
  AWS	
  
 cloud	
  to	
  replace	
  in-­‐house	
  IT	
  for	
  non-­‐trivial	
  
applicaAons	
  with	
  hundreds	
  of	
  developers	
  and	
  
                  thousands	
  of	
  systems.	
  
                                	
  
                    acockcro=@ne#lix.com	
  
            h@p://www.linkedin.com/in/adriancockcro=	
  
                    @adrianco	
  #ne#lixcloud	
  
Amazon Cloud Terminology
                                   See http://aws.amazon.com/jp for Japanese
                               This is not a full list of Amazon Web Service features

•    AWS	
  –	
  Amazon	
  Web	
  Services	
  (common	
  name	
  for	
  Amazon	
  cloud)	
  
•    AMI	
  –	
  Amazon	
  Machine	
  Image	
  (archived	
  boot	
  disk,	
  Linux,	
  Windows	
  etc.	
  plus	
  applicaOon	
  code)	
  
•    EC2	
  –	
  ElasOc	
  Compute	
  Cloud	
  
       –    Range	
  of	
  virtual	
  machine	
  types	
  m1,	
  m2,	
  c1,	
  cc,	
  cg.	
  Varying	
  memory,	
  CPU	
  and	
  disk	
  configuraOons.	
  
       –    Instance	
  –	
  a	
  running	
  computer	
  system.	
  Ephemeral,	
  when	
  it	
  is	
  de-­‐allocated	
  nothing	
  is	
  kept.	
  
       –    Reserved	
  Instances	
  –	
  pre-­‐paid	
  to	
  reduce	
  cost	
  for	
  long	
  term	
  usage	
  
       –    Availability	
  Zone	
  –	
  datacenter	
  with	
  own	
  power	
  and	
  cooling	
  hosOng	
  cloud	
  instances	
  
       –    Region	
  –	
  group	
  of	
  Availability	
  Zones	
  –	
  US-­‐East,	
  US-­‐West,	
  EU-­‐Eire,	
  Asia-­‐Singapore,	
  Asia-­‐Japan	
  
•    ASG	
  –	
  Auto	
  Scaling	
  Group	
  (instances	
  booOng	
  from	
  the	
  same	
  AMI)	
  
•    S3	
  –	
  Simple	
  Storage	
  Service	
  (h@p	
  access)	
  
•    EBS	
  –	
  ElasOc	
  Block	
  Storage	
  (network	
  disk	
  filesystem	
  can	
  be	
  mounted	
  on	
  an	
  instance)	
  
•    RDB	
  –	
  RelaOonal	
  Data	
  Base	
  (managed	
  MySQL	
  master	
  and	
  slaves)	
  
•    SDB	
  –	
  Simple	
  Data	
  Base	
  (hosted	
  h@p	
  based	
  NoSQL	
  data	
  store)	
  
•    SQS	
  –	
  Simple	
  Queue	
  Service	
  (h@p	
  based	
  message	
  queue)	
  
•    SNS	
  –	
  Simple	
  NoOficaOon	
  Service	
  (h@p	
  and	
  email	
  based	
  topics	
  and	
  messages)	
  
•    EMR	
  –	
  ElasOc	
  Map	
  Reduce	
  (automaOcally	
  managed	
  Hadoop	
  cluster)	
  
•    ELB	
  –	
  ElasOc	
  Load	
  Balancer	
  
•    EIP	
  –	
  ElasOc	
  IP	
  (stable	
  IP	
  address	
  mapping	
  assigned	
  to	
  instance	
  or	
  ELB)	
  
•    VPC	
  –	
  Virtual	
  Private	
  Cloud	
  (extension	
  of	
  enterprise	
  datacenter	
  network	
  into	
  cloud)	
  
•    IAM	
  –	
  IdenOty	
  and	
  Access	
  Management	
  (fine	
  grain	
  role	
  based	
  security	
  keys)	
  

Contenu connexe

Tendances

금융권 최신 AWS 도입 사례 총정리 – 신한 제주 은행, KB손해보험 사례를 중심으로 - 지성국 사업 개발 담당 이사, AWS / 정을용...
금융권 최신 AWS 도입 사례 총정리 – 신한 제주 은행, KB손해보험 사례를 중심으로 - 지성국 사업 개발 담당 이사, AWS / 정을용...금융권 최신 AWS 도입 사례 총정리 – 신한 제주 은행, KB손해보험 사례를 중심으로 - 지성국 사업 개발 담당 이사, AWS / 정을용...
금융권 최신 AWS 도입 사례 총정리 – 신한 제주 은행, KB손해보험 사례를 중심으로 - 지성국 사업 개발 담당 이사, AWS / 정을용...Amazon Web Services Korea
 
はじめてのOracle Cloud Infrastructure (Oracle Cloudウェビナーシリーズ: 2021年6月16日)
はじめてのOracle Cloud Infrastructure (Oracle Cloudウェビナーシリーズ: 2021年6月16日)はじめてのOracle Cloud Infrastructure (Oracle Cloudウェビナーシリーズ: 2021年6月16日)
はじめてのOracle Cloud Infrastructure (Oracle Cloudウェビナーシリーズ: 2021年6月16日)オラクルエンジニア通信
 
既存Redshift/ETLからSpectrum/Glueへの移行を徹底解明!
既存Redshift/ETLからSpectrum/Glueへの移行を徹底解明!既存Redshift/ETLからSpectrum/Glueへの移行を徹底解明!
既存Redshift/ETLからSpectrum/Glueへの移行を徹底解明!Recruit Lifestyle Co., Ltd.
 
Securing Prometheus. Lessons Learned from OpenShift.pdf
Securing Prometheus. Lessons Learned from OpenShift.pdfSecuring Prometheus. Lessons Learned from OpenShift.pdf
Securing Prometheus. Lessons Learned from OpenShift.pdfJesús Ángel Samitier
 
20180704(20190520 Renewed) AWS Black Belt Online Seminar Amazon Elastic File ...
20180704(20190520 Renewed) AWS Black Belt Online Seminar Amazon Elastic File ...20180704(20190520 Renewed) AWS Black Belt Online Seminar Amazon Elastic File ...
20180704(20190520 Renewed) AWS Black Belt Online Seminar Amazon Elastic File ...Amazon Web Services Japan
 
CODT2020 ビジネスプラットフォームを支えるCI/CDパイプライン ~エンタープライズのDevOpsを加速させる運用改善Tips~
CODT2020 ビジネスプラットフォームを支えるCI/CDパイプライン ~エンタープライズのDevOpsを加速させる運用改善Tips~CODT2020 ビジネスプラットフォームを支えるCI/CDパイプライン ~エンタープライズのDevOpsを加速させる運用改善Tips~
CODT2020 ビジネスプラットフォームを支えるCI/CDパイプライン ~エンタープライズのDevOpsを加速させる運用改善Tips~Yuki Ando
 
データ収集の基本と「JapanTaxi」アプリにおける実践例
データ収集の基本と「JapanTaxi」アプリにおける実践例データ収集の基本と「JapanTaxi」アプリにおける実践例
データ収集の基本と「JapanTaxi」アプリにおける実践例Tetsutaro Watanabe
 
ぼくがAthenaで死ぬまで
ぼくがAthenaで死ぬまでぼくがAthenaで死ぬまで
ぼくがAthenaで死ぬまでShinichi Takahashi
 
[AKIBA.AWS] AWS Elemental MediaConvertから学ぶコーデック入門
[AKIBA.AWS] AWS Elemental MediaConvertから学ぶコーデック入門[AKIBA.AWS] AWS Elemental MediaConvertから学ぶコーデック入門
[AKIBA.AWS] AWS Elemental MediaConvertから学ぶコーデック入門Shuji Kikuchi
 
Well-Architectedな組織を
実現するためのチャレンジ - なぜ、CA W-Aを作ろうと思ったのか - #jawsdays 2019
Well-Architectedな組織を
実現するためのチャレンジ - なぜ、CA W-Aを作ろうと思ったのか - #jawsdays 2019Well-Architectedな組織を
実現するためのチャレンジ - なぜ、CA W-Aを作ろうと思ったのか - #jawsdays 2019
Well-Architectedな組織を
実現するためのチャレンジ - なぜ、CA W-Aを作ろうと思ったのか - #jawsdays 2019Shota Tsuge
 
20190226 AWS Black Belt Online Seminar Amazon WorkSpaces
20190226 AWS Black Belt Online Seminar Amazon WorkSpaces20190226 AWS Black Belt Online Seminar Amazon WorkSpaces
20190226 AWS Black Belt Online Seminar Amazon WorkSpacesAmazon Web Services Japan
 
AWS エンジニア育成における効果的なトレーニング活用のすすめ
AWS エンジニア育成における効果的なトレーニング活用のすすめAWS エンジニア育成における効果的なトレーニング活用のすすめ
AWS エンジニア育成における効果的なトレーニング活用のすすめTrainocate Japan, Ltd.
 
20200219 AWS Black Belt Online Seminar オンプレミスとAWS間の冗長化接続
20200219 AWS Black Belt Online Seminar オンプレミスとAWS間の冗長化接続20200219 AWS Black Belt Online Seminar オンプレミスとAWS間の冗長化接続
20200219 AWS Black Belt Online Seminar オンプレミスとAWS間の冗長化接続Amazon Web Services Japan
 
AWS Black Belt Online Seminar AWS上のJenkins活用方法
AWS Black Belt Online Seminar AWS上のJenkins活用方法AWS Black Belt Online Seminar AWS上のJenkins活用方法
AWS Black Belt Online Seminar AWS上のJenkins活用方法Amazon Web Services Japan
 
[最新バージョンの情報がDescription欄にございます]AWS Black Belt Online Seminar 2018 Amazon Connect
[最新バージョンの情報がDescription欄にございます]AWS Black Belt Online Seminar 2018 Amazon Connect[最新バージョンの情報がDescription欄にございます]AWS Black Belt Online Seminar 2018 Amazon Connect
[最新バージョンの情報がDescription欄にございます]AWS Black Belt Online Seminar 2018 Amazon ConnectAmazon Web Services Japan
 
AWSで実現するバックアップとディザスタリカバリ
AWSで実現するバックアップとディザスタリカバリAWSで実現するバックアップとディザスタリカバリ
AWSで実現するバックアップとディザスタリカバリAmazon Web Services Japan
 
20211209 Ops-JAWS Re invent2021re-cap-cloud operations
20211209 Ops-JAWS Re invent2021re-cap-cloud operations20211209 Ops-JAWS Re invent2021re-cap-cloud operations
20211209 Ops-JAWS Re invent2021re-cap-cloud operationsAmazon Web Services Japan
 
Amazon S3 고급 활용 기법 - AWS Summit Seoul 2017
Amazon S3 고급 활용 기법  - AWS Summit Seoul 2017Amazon S3 고급 활용 기법  - AWS Summit Seoul 2017
Amazon S3 고급 활용 기법 - AWS Summit Seoul 2017Amazon Web Services Korea
 
20220409 AWS BLEA 開発にあたって検討したこと
20220409 AWS BLEA 開発にあたって検討したこと20220409 AWS BLEA 開発にあたって検討したこと
20220409 AWS BLEA 開発にあたって検討したことAmazon Web Services Japan
 

Tendances (20)

금융권 최신 AWS 도입 사례 총정리 – 신한 제주 은행, KB손해보험 사례를 중심으로 - 지성국 사업 개발 담당 이사, AWS / 정을용...
금융권 최신 AWS 도입 사례 총정리 – 신한 제주 은행, KB손해보험 사례를 중심으로 - 지성국 사업 개발 담당 이사, AWS / 정을용...금융권 최신 AWS 도입 사례 총정리 – 신한 제주 은행, KB손해보험 사례를 중심으로 - 지성국 사업 개발 담당 이사, AWS / 정을용...
금융권 최신 AWS 도입 사례 총정리 – 신한 제주 은행, KB손해보험 사례를 중심으로 - 지성국 사업 개발 담당 이사, AWS / 정을용...
 
はじめてのOracle Cloud Infrastructure (Oracle Cloudウェビナーシリーズ: 2021年6月16日)
はじめてのOracle Cloud Infrastructure (Oracle Cloudウェビナーシリーズ: 2021年6月16日)はじめてのOracle Cloud Infrastructure (Oracle Cloudウェビナーシリーズ: 2021年6月16日)
はじめてのOracle Cloud Infrastructure (Oracle Cloudウェビナーシリーズ: 2021年6月16日)
 
既存Redshift/ETLからSpectrum/Glueへの移行を徹底解明!
既存Redshift/ETLからSpectrum/Glueへの移行を徹底解明!既存Redshift/ETLからSpectrum/Glueへの移行を徹底解明!
既存Redshift/ETLからSpectrum/Glueへの移行を徹底解明!
 
Securing Prometheus. Lessons Learned from OpenShift.pdf
Securing Prometheus. Lessons Learned from OpenShift.pdfSecuring Prometheus. Lessons Learned from OpenShift.pdf
Securing Prometheus. Lessons Learned from OpenShift.pdf
 
20180704(20190520 Renewed) AWS Black Belt Online Seminar Amazon Elastic File ...
20180704(20190520 Renewed) AWS Black Belt Online Seminar Amazon Elastic File ...20180704(20190520 Renewed) AWS Black Belt Online Seminar Amazon Elastic File ...
20180704(20190520 Renewed) AWS Black Belt Online Seminar Amazon Elastic File ...
 
CODT2020 ビジネスプラットフォームを支えるCI/CDパイプライン ~エンタープライズのDevOpsを加速させる運用改善Tips~
CODT2020 ビジネスプラットフォームを支えるCI/CDパイプライン ~エンタープライズのDevOpsを加速させる運用改善Tips~CODT2020 ビジネスプラットフォームを支えるCI/CDパイプライン ~エンタープライズのDevOpsを加速させる運用改善Tips~
CODT2020 ビジネスプラットフォームを支えるCI/CDパイプライン ~エンタープライズのDevOpsを加速させる運用改善Tips~
 
データ収集の基本と「JapanTaxi」アプリにおける実践例
データ収集の基本と「JapanTaxi」アプリにおける実践例データ収集の基本と「JapanTaxi」アプリにおける実践例
データ収集の基本と「JapanTaxi」アプリにおける実践例
 
ぼくがAthenaで死ぬまで
ぼくがAthenaで死ぬまでぼくがAthenaで死ぬまで
ぼくがAthenaで死ぬまで
 
[AKIBA.AWS] AWS Elemental MediaConvertから学ぶコーデック入門
[AKIBA.AWS] AWS Elemental MediaConvertから学ぶコーデック入門[AKIBA.AWS] AWS Elemental MediaConvertから学ぶコーデック入門
[AKIBA.AWS] AWS Elemental MediaConvertから学ぶコーデック入門
 
Well-Architectedな組織を
実現するためのチャレンジ - なぜ、CA W-Aを作ろうと思ったのか - #jawsdays 2019
Well-Architectedな組織を
実現するためのチャレンジ - なぜ、CA W-Aを作ろうと思ったのか - #jawsdays 2019Well-Architectedな組織を
実現するためのチャレンジ - なぜ、CA W-Aを作ろうと思ったのか - #jawsdays 2019
Well-Architectedな組織を
実現するためのチャレンジ - なぜ、CA W-Aを作ろうと思ったのか - #jawsdays 2019
 
20190226 AWS Black Belt Online Seminar Amazon WorkSpaces
20190226 AWS Black Belt Online Seminar Amazon WorkSpaces20190226 AWS Black Belt Online Seminar Amazon WorkSpaces
20190226 AWS Black Belt Online Seminar Amazon WorkSpaces
 
AWS エンジニア育成における効果的なトレーニング活用のすすめ
AWS エンジニア育成における効果的なトレーニング活用のすすめAWS エンジニア育成における効果的なトレーニング活用のすすめ
AWS エンジニア育成における効果的なトレーニング活用のすすめ
 
20200219 AWS Black Belt Online Seminar オンプレミスとAWS間の冗長化接続
20200219 AWS Black Belt Online Seminar オンプレミスとAWS間の冗長化接続20200219 AWS Black Belt Online Seminar オンプレミスとAWS間の冗長化接続
20200219 AWS Black Belt Online Seminar オンプレミスとAWS間の冗長化接続
 
AWS Black Belt Online Seminar AWS上のJenkins活用方法
AWS Black Belt Online Seminar AWS上のJenkins活用方法AWS Black Belt Online Seminar AWS上のJenkins活用方法
AWS Black Belt Online Seminar AWS上のJenkins活用方法
 
[最新バージョンの情報がDescription欄にございます]AWS Black Belt Online Seminar 2018 Amazon Connect
[最新バージョンの情報がDescription欄にございます]AWS Black Belt Online Seminar 2018 Amazon Connect[最新バージョンの情報がDescription欄にございます]AWS Black Belt Online Seminar 2018 Amazon Connect
[最新バージョンの情報がDescription欄にございます]AWS Black Belt Online Seminar 2018 Amazon Connect
 
AWSで実現するバックアップとディザスタリカバリ
AWSで実現するバックアップとディザスタリカバリAWSで実現するバックアップとディザスタリカバリ
AWSで実現するバックアップとディザスタリカバリ
 
20211209 Ops-JAWS Re invent2021re-cap-cloud operations
20211209 Ops-JAWS Re invent2021re-cap-cloud operations20211209 Ops-JAWS Re invent2021re-cap-cloud operations
20211209 Ops-JAWS Re invent2021re-cap-cloud operations
 
Amazon S3 고급 활용 기법 - AWS Summit Seoul 2017
Amazon S3 고급 활용 기법  - AWS Summit Seoul 2017Amazon S3 고급 활용 기법  - AWS Summit Seoul 2017
Amazon S3 고급 활용 기법 - AWS Summit Seoul 2017
 
AWS Black Belt online seminar 2017 Snowball
AWS Black Belt online seminar 2017 SnowballAWS Black Belt online seminar 2017 Snowball
AWS Black Belt online seminar 2017 Snowball
 
20220409 AWS BLEA 開発にあたって検討したこと
20220409 AWS BLEA 開発にあたって検討したこと20220409 AWS BLEA 開発にあたって検討したこと
20220409 AWS BLEA 開発にあたって検討したこと
 

Similaire à Netflix Velocity Conference 2011

Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsAdrian Cockcroft
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumAdrian Cockcroft
 
Netflix keynote-adrian-qcon
Netflix keynote-adrian-qconNetflix keynote-adrian-qcon
Netflix keynote-adrian-qconYiwei Ma
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraAdrian Cockcroft
 
Razorfish Technology Summit 2012 - Introduction
Razorfish Technology Summit 2012 - IntroductionRazorfish Technology Summit 2012 - Introduction
Razorfish Technology Summit 2012 - IntroductionRazorfish
 
Netflix web-adrian-qcon
Netflix web-adrian-qconNetflix web-adrian-qcon
Netflix web-adrian-qconYiwei Ma
 
AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...
AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...
AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...Amazon Web Services
 
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Adrian Cockcroft
 
Building Complex Workloads in Cloud - AWS PS Summit Canberra
Building Complex Workloads in Cloud - AWS PS Summit CanberraBuilding Complex Workloads in Cloud - AWS PS Summit Canberra
Building Complex Workloads in Cloud - AWS PS Summit CanberraAmazon Web Services
 
Deploy Deep Learning Models on Amazon ECS - DevDay Austin 2017
Deploy Deep Learning Models on Amazon ECS - DevDay Austin 2017Deploy Deep Learning Models on Amazon ECS - DevDay Austin 2017
Deploy Deep Learning Models on Amazon ECS - DevDay Austin 2017Amazon Web Services
 
Deploy Deep Learning Models on Amazon ECS - DevDay Los Angeles 2017
Deploy Deep Learning Models on Amazon ECS - DevDay Los Angeles 2017Deploy Deep Learning Models on Amazon ECS - DevDay Los Angeles 2017
Deploy Deep Learning Models on Amazon ECS - DevDay Los Angeles 2017Amazon Web Services
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Adrian Cockcroft
 
Amazon on Amazon: How Amazon Designs Chips on AWS (MFG305) - AWS re:Invent 2018
Amazon on Amazon: How Amazon Designs Chips on AWS (MFG305) - AWS re:Invent 2018Amazon on Amazon: How Amazon Designs Chips on AWS (MFG305) - AWS re:Invent 2018
Amazon on Amazon: How Amazon Designs Chips on AWS (MFG305) - AWS re:Invent 2018Amazon Web Services
 
Zenko & MetalK8s @ Dublin Docker Meetup, June 2018
Zenko & MetalK8s @ Dublin Docker Meetup, June 2018Zenko & MetalK8s @ Dublin Docker Meetup, June 2018
Zenko & MetalK8s @ Dublin Docker Meetup, June 2018Laure Vergeron
 
Serving Media From The Edge - Miles Ward - AWS Summit 2012 Australia
Serving Media From The Edge - Miles Ward - AWS Summit 2012 AustraliaServing Media From The Edge - Miles Ward - AWS Summit 2012 Australia
Serving Media From The Edge - Miles Ward - AWS Summit 2012 AustraliaAmazon Web Services
 
AWS IoT: From Testing to Scaling
AWS IoT: From Testing to ScalingAWS IoT: From Testing to Scaling
AWS IoT: From Testing to ScalingNeel Sendas
 
2016 AWS Media & Entertainment Cloud Symposium - New York, NY: May 18, 2016
2016 AWS Media & Entertainment Cloud Symposium - New York, NY:  May 18, 20162016 AWS Media & Entertainment Cloud Symposium - New York, NY:  May 18, 2016
2016 AWS Media & Entertainment Cloud Symposium - New York, NY: May 18, 2016Amazon Web Services
 

Similaire à Netflix Velocity Conference 2011 (20)

Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and Ops
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV Forum
 
Netflix keynote-adrian-qcon
Netflix keynote-adrian-qconNetflix keynote-adrian-qcon
Netflix keynote-adrian-qcon
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global Cassandra
 
Global Netflix Platform
Global Netflix PlatformGlobal Netflix Platform
Global Netflix Platform
 
Netflix in the Cloud
Netflix in the CloudNetflix in the Cloud
Netflix in the Cloud
 
Razorfish Technology Summit 2012 - Introduction
Razorfish Technology Summit 2012 - IntroductionRazorfish Technology Summit 2012 - Introduction
Razorfish Technology Summit 2012 - Introduction
 
Netflix web-adrian-qcon
Netflix web-adrian-qconNetflix web-adrian-qcon
Netflix web-adrian-qcon
 
AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...
AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...
AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...
 
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
 
Building Complex Workloads in Cloud - AWS PS Summit Canberra
Building Complex Workloads in Cloud - AWS PS Summit CanberraBuilding Complex Workloads in Cloud - AWS PS Summit Canberra
Building Complex Workloads in Cloud - AWS PS Summit Canberra
 
Deploy Deep Learning Models on Amazon ECS - DevDay Austin 2017
Deploy Deep Learning Models on Amazon ECS - DevDay Austin 2017Deploy Deep Learning Models on Amazon ECS - DevDay Austin 2017
Deploy Deep Learning Models on Amazon ECS - DevDay Austin 2017
 
Deploy Deep Learning Models on Amazon ECS - DevDay Los Angeles 2017
Deploy Deep Learning Models on Amazon ECS - DevDay Los Angeles 2017Deploy Deep Learning Models on Amazon ECS - DevDay Los Angeles 2017
Deploy Deep Learning Models on Amazon ECS - DevDay Los Angeles 2017
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3)
 
Amazon on Amazon: How Amazon Designs Chips on AWS (MFG305) - AWS re:Invent 2018
Amazon on Amazon: How Amazon Designs Chips on AWS (MFG305) - AWS re:Invent 2018Amazon on Amazon: How Amazon Designs Chips on AWS (MFG305) - AWS re:Invent 2018
Amazon on Amazon: How Amazon Designs Chips on AWS (MFG305) - AWS re:Invent 2018
 
Zenko & MetalK8s @ Dublin Docker Meetup, June 2018
Zenko & MetalK8s @ Dublin Docker Meetup, June 2018Zenko & MetalK8s @ Dublin Docker Meetup, June 2018
Zenko & MetalK8s @ Dublin Docker Meetup, June 2018
 
Deep Learning on ECS
Deep Learning on ECSDeep Learning on ECS
Deep Learning on ECS
 
Serving Media From The Edge - Miles Ward - AWS Summit 2012 Australia
Serving Media From The Edge - Miles Ward - AWS Summit 2012 AustraliaServing Media From The Edge - Miles Ward - AWS Summit 2012 Australia
Serving Media From The Edge - Miles Ward - AWS Summit 2012 Australia
 
AWS IoT: From Testing to Scaling
AWS IoT: From Testing to ScalingAWS IoT: From Testing to Scaling
AWS IoT: From Testing to Scaling
 
2016 AWS Media & Entertainment Cloud Symposium - New York, NY: May 18, 2016
2016 AWS Media & Entertainment Cloud Symposium - New York, NY:  May 18, 20162016 AWS Media & Entertainment Cloud Symposium - New York, NY:  May 18, 2016
2016 AWS Media & Entertainment Cloud Symposium - New York, NY: May 18, 2016
 

Plus de Adrian Cockcroft

Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesAdrian Cockcroft
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Adrian Cockcroft
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...Adrian Cockcroft
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Adrian Cockcroft
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowAdrian Cockcroft
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Adrian Cockcroft
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionAdrian Cockcroft
 
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAdrian Cockcroft
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSFAdrian Cockcroft
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud ArchitectureAdrian Cockcroft
 
SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformAdrian Cockcroft
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSAdrian Cockcroft
 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconAdrian Cockcroft
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Adrian Cockcroft
 
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Adrian Cockcroft
 

Plus de Adrian Cockcroft (20)

Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
 
Gluecon keynote
Gluecon keynoteGluecon keynote
Gluecon keynote
 
Dystopia as a Service
Dystopia as a ServiceDystopia as a Service
Dystopia as a Service
 
Netflix and Open Source
Netflix and Open SourceNetflix and Open Source
Netflix and Open Source
 
NetflixOSS Meetup
NetflixOSS MeetupNetflixOSS Meetup
NetflixOSS Meetup
 
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at Netflix
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSF
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud Architecture
 
SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source Platform
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWS
 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at Gluecon
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
 
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)
 
Migrating to Public Cloud
Migrating to Public CloudMigrating to Public Cloud
Migrating to Public Cloud
 

Dernier

Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 

Dernier (20)

Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 

Netflix Velocity Conference 2011

  • 1. Ne#lix  Cloud  Architecture   Velocity  Conference  June  14,  2011   Adrian  Cockcro=   @adrianco  #ne#lixcloud  h@p://slideshare.net/adrianco   acockcro=@ne#lix.com  
  • 2. Who,  Why,  What   Ne#lix  in  the  Cloud   Cloud  Challenges  and  Learnings   (Ignite)   Systems  and  OperaOons  Architecture    
  • 3. Ne#lix  Inc.   With  more  than  23  million  subscribers  in  the  United   States  and  Canada,  Ne9lix,  Inc.  is  the  world’s  leading   Internet  subscripAon  service  for  enjoying  movies  and   TV  shows.     InternaAonal  Expansion   We  plan  to  expand  into  an  addiAonal  market  in  the   second  half  of  2011…  If  the  second  market  meets  our   expectaAons…  we  will  conAnue  to  invest  and  expand   aggressively  in  2012.   Source:  h@p://ir.ne#lix.com  
  • 4. Unlimited  streaming  for  $7.99/month,  large  and  growing  catalog  of  movies  and  TV  
  • 5. Adrian  Cockcro=   •  Director,  Architecture  for  Cloud  Systems,  Ne#lix  Inc.   –  Previously  Director  for  PersonalizaOon  Pla#orm   •  DisOnguished  Availability  Engineer,  eBay  Inc.  2004-­‐7   –  Founding  member  of  eBay  Research  Labs   •  DisOnguished  Engineer,  Sun  Microsystems  Inc.  1988-­‐2004   –  2003-­‐4  Chief  Architect  High  Performance  Technical  CompuOng   –  2001  Author:  Capacity  Planning  for  Web  Services   –  1999  Author:  Resource  Management   –  1995  &  1998  Author:  Sun  Performance  and  Tuning   –  1996  Japanese  EdiOon  of  Sun  Performance  and  Tuning   •   SPARC  &  Solaris ( )  
  • 6. Why  is  Ne#lix  Talking  about   Cloud?  
  • 7. Ne#lix  is  Path-­‐finding   The  Cloud  ecosystem  is  evolving  very  fast   Share  with  and  learn  from  the  cloud  community  
  • 8. We  want  to  use  clouds,   not  build  them   Cloud  technology  should  be  a  commodity   Public  cloud  and  open  source  for  agility  and  scale  
  • 9. Why  Use  Cloud?     For  Be@er  Business  Agility   For  Unpredictable  Business  Growth  
  • 10. Data  Center   Ne#lix  could  not   build  new   datacenters  fast   enough   Capacity  growth  is  acceleraOng,  unpredictable   Product  launch  spikes  -­‐  iPhone,  Wii,  PS3,  XBox  
  • 11. 23  Million  Customers   2011-­‐Q1  year/year  customers  +69%     25   20   15   10   5   0   Source:  h@p://ir.ne#lix.com  
  • 12. Out-­‐Growing  Data  Center   h@p://techblog.ne#lix.com/2011/02/redesigning-­‐ne#lix-­‐api.html   37x  Growth  Jan   2010-­‐Jan  2011   Datacenter   Capacity  
  • 13. Ne#lix.com  is  now  ~100%  Cloud   Account  sign-­‐up  is  currently  being  moved  to  cloud   All  internaOonal  product  will  be  cloud  based   USA  specific  logisOcs  remains  in  the  Datacenter    
  • 14. Leverage  AWS  Scale   “the  biggest  public  cloud”   AWS  investment  in  tooling  and  automaOon   Use  many  AWS  zones  for  high  availability,  scalability   AWS  skills  are  most  common  on  resumes…  
  • 15. Leverage  AWS  Feature  Set   “the  market  leader”   EC2,  S3,  SDB,  SQS,  EBS,  EMR,  ELB,  ASG,  IAM,  RDB,  VPC…   h@p://aws.amazon.com/jp  
  • 16. “The  cloud  lets  its  users  focus   on  delivering  differenAaAng   business  value  instead  of   wasAng  valuable  resources   on  the  undifferen)ated   heavy  li0ing  that  makes   up  most  of  IT   infrastructure.”      Werner  Vogels    Amazon  CTO    
  • 17. We  want  to  use  clouds,   we  don’t  have  Ome  to  build  them   Public  cloud  for  agility  and  scale   AWS  because  they  are  big  enough  to  allocate  thousands   of  instances  per  hour  when  we  need  to  
  • 18. Ne#lix  EC2  Instances  per  Account   (summer  2010,  producOon  is  much  higher  now…)   “Many  Thousands”   Content  Encoding   Test  and  ProducOon   Log  Analysis   “Several  Months”  
  • 19. Ne#lix  Deployed  on  AWS   Content   Logs   Play   WWW   API   Video   S3   DRM   Sign-­‐Up   Metadata   Masters   EMR   CDN   Device   EC2   Search   Hadoop   rouOng   Config   Movie   TV  Movie   S3   Hive   Bookmarks   Choosing   Choosing   Business   Mobile   CDN   Logging   RaOngs   Intelligence   iPhone  
  • 20. Cloud  Encoding  Pipeline   Encode   S3   Encode   S3   Movie   Master   Network   S3   Copy  to   CDN   Stream   Studios   Ne#lix   Master   Mezza-­‐ Mezza-­‐ to    50+   Origin   Origin   Tapes   Upload   nine   files   CDN   to  TV   nine   files   Licensed  content  is  provided  to  Ne#lix  as  high  quality  master  tapes   Many  formats  are  reduced  to  a  single  high  quality  mezzanine  format  on  S3   Individual  formats  and  speeds  are  encoded  in  over  50  combinaOons    Many  formats  for  older  and  newer  hardware  and  various  game  consoles    Many  speeds  from  mobile  through  standard  and  high  definiOon   StaOc  files  are  copied  to  each  Content  Delivery  Network’s  “origin  server”   CDNs  migrate  files  to  “edge  servers”  near  the  end  user   Files  stream  to  PC/Mac/iPad  or  TV  over  HTTP  using  “range  get”  to  move  chunks  
  • 22. Product  Trade-­‐off   User  Experience   ImplementaOon   Consistent   Development   Experience   complexity   OperaOonal   Low  Latency   complexity  
  • 23. Ne#lix  Cloud  Goals   •  Faster   –  Lower  latency  than  the  equivalent  datacenter  web  pages  and  API  calls   –  Measured  as  mean  and  99th  percenOle   –  For  both  first  hit  (e.g.  home  page)  and  in-­‐session  hits  for  the  same  user   •  Scalable   –  Avoid  needing  any  more  datacenter  capacity  as  subscriber  count  increases   –  No  central  verOcally  scaled  databases   –  Leverage  AWS  elasOc  capacity  effecOvely   •  Available   –  SubstanOally  higher  robustness  and  availability  than  datacenter  services   –  Leverage  mulOple  AWS  availability  zones   –  No  scheduled  down  Ome,  no  central  database  schema  to  change   •  ProducOve   –  OpOmize  agility  of  a  large  development  team  with  automaOon  and  tools   –  Leave  behind  complex  tangled  datacenter  code  base  (~8  year  old  architecture)   –  Enforce  clean  layered  interfaces  and  re-­‐usable  components  
  • 24. Old  Datacenter  vs.  New  Cloud  Arch   Central  SQL  Database   Distributed  Key/Value  NoSQL   SOcky  In-­‐Memory  Session   Shared  Memcached  Session   Cha@y  Protocols   Latency  Tolerant  Protocols   Tangled  Service  Interfaces   Layered  Service  Interfaces   Instrumented  Code   Instrumented  Service  Pa@erns   Fat  Complex  Objects   Lightweight  Serializable  Objects   Components  as  Jar  Files   Components  as  Services  
  • 25. The  Central  SQL  Database   •  Datacenter  has  a  central  database   –  Everything  in  one  place  is  convenient  unOl  it  fails   –  Customers,  movies,  history,  configuraOon   •  Schema  changes  require  downOme     AnA-­‐paUern  impacts  scalability,  availability  
  • 26. The  Distributed  Key-­‐Value  Store   •  Cloud  has  many  key-­‐value  data  stores   –  More  complex  to  keep  track  of,  do  backups  etc.   –  Each  store  is  much  simpler  to  administer   DBA   –  Joins  take  place  in  java  code   •  No  schema  to  change,  no  scheduled  downOme   •  Latency  for  Memcached  vs.  Oracle  vs.  SimpleDB   –  Memcached  is  dominated  by  network  latency  <1ms   –  Oracle  for  simple  queries  is  a  few  milliseconds   –  SimpleDB  has  replicaOon  and  REST  overheads  >10ms  
  • 27. The  SOcky  Session   •  Datacenter  SOcky  Load  Balancing   –  Efficient  caching  for  low  latency   –  Tricky  session  handling  code   –  Middle  Oer  load  balancer  has  issues  in  pracOce   •  Encourages  concentrated  funcOonality   –  one  service  that  does  everything     AnA-­‐paUern  impacts  producAvity,  availability  
  • 28. The  Shared  Session   •  Cloud  Uses  Round-­‐Robin  Load  Balancing   –  Simple  request-­‐based  code   –  External  shared  caching  with  memcached   •  More  flexible  fine  grain  services   –  Works  be@er  with  auto-­‐scaled  instance  counts  
  • 29. Cha@y  Opaque  and  Bri@le  Protocols   •  Datacenter  service  protocols   –  Assumed  low  latency  for  many  simple  requests   •  Based  on  serializing  exisOng  java  objects   –  Inefficient  formats   –  IncompaOble  when  definiOons  change     AnA-­‐paUern  causes  producAvity,  latency  and   availability  issues  
  • 30. Robust  and  Flexible  Protocols   •  Cloud  service  protocols   –  JSR311/Jersey  is  used  for  REST/HTTP  service  calls   –  Custom  client  code  includes  service  discovery   –  Support  complex  data  types  in  a  single  request   •  Apache  Avro   –  Evolved  from  Protocol  Buffers  and  Thri=   –  Includes  JSON  header  defining  key/value  protocol   –  Avro  serializaOon  is  half  the  size  and  several  Omes   faster  than  Java  serializaOon,  more  work  to  code  
  • 31. Persisted  Protocols   •  Persist  Avro  in  Memcached   –  Save  space/latency  (zigzag  encoding,  half  the  size)   –  Less  bri@le  across  versions   –  New  keys  are  ignored   –  Missing  keys  are  handled  cleanly   •  Avro  protocol  definiOons   –  Can  be  wri@en  in  JSON  or  generated  from  POJOs   –  It’s  hard,  needs  be@er  tooling  
  • 32. Tangled  Service  Interfaces   •  Datacenter  implementaOon  is  exposed   –  Oracle  SQL  queries  mixed  into  business  logic   •  Tangled  code   –  Deep  dependencies,  false  sharing   •  Data  providers  with  sideways  dependencies   –  Everything  depends  on  everything  else   AnA-­‐paUern  affects  producAvity,  availability  
  • 33. Untangled  Service  Interfaces   •  New  Cloud  Code  With  Strict  Layering   –  Compile  against  interface  jar   –  Can  use  spring  runOme  binding  to  enforce   •  Service  interface  is  the  service   –  ImplementaOon  is  completely  hidden   –  Can  be  implemented  locally  or  remotely   –  ImplementaOon  can  evolve  independently  
  • 34. Untangled  Service  Interfaces   Two  layers:   •  SAL  -­‐  Service  Access  Library   –  Basic  serializaOon  and  error  handling   –  REST  or  POJO’s  defined  by  data  provider   •  ESL  -­‐  Extended  Service  Library   –  Caching,  conveniences   –  Can  combine  several  SALs   –  Exposes  faceted  type  system  (described  later)   –  Interface  defined  by  data  consumer  in  many  cases  
  • 35. Service  InteracOon  Pa@ern   Sample  Swimlane  Diagram  
  • 36. Service  Architecture  Pa@erns   •  Internal  Interfaces  Between  Services   –  Common  pa@erns  as  templates   –  Highly  instrumented,  observable,  analyOcs   –  Service  Level  Agreements  –  SLAs   •  Library  templates  for  generic  features   –  Instrumented  Ne#lix  Base  Servlet  template   –  Instrumented  generic  client  interface  template   –  Instrumented  S3,  SimpleDB,  Memcached  clients  
  • 37. CLIENT   Request  Start   Timestamp,   Client   Inbound   Request  End   outbound   deserialize  end   Timestamp   serialize  start   Omestamp   Omestamp   Inbound   Client   deserialize   outbound   start   serialize  end   Omestamp   Omestamp   Client  network   receive   Omestamp   Service  Request   Client  Network   send   Omestamp   Instruments  Every   Service   network  send   Omestamp   Step  in  the  call   Service   Network   receive   Omestamp   Service   Service   outbound   inbound   serialize  end   serialize  start   Omestamp   Omestamp   Service   Service   outbound   inbound   serialize  start   SERVICE  execute   serialize  end   request  start   Omestamp   Omestamp   Omestamp,   execute  request   end  Omestamp  
  • 38. Boundary  Interfaces   •  Isolate  teams  from  external  dependencies   –  Fake  SAL  built  by  cloud  team   –  Real  SAL  provided  by  data  provider  team  later   –  ESL  built  by  cloud  team  using  faceted  objects   •  Fake  data  sources  allow  development  to  start   –  e.g.  Fake  IdenOty  SAL  for  a  test  set  of  customers   –  Development  solidifies  dependencies  early   –  Helps  external  team  provide  the  right  interface  
  • 39. One  Object  That  Does  Everything   •  Datacenter  uses  a  few  big  complex  objects   –  Movie  and  Customer  objects  are  the  foundaOon   –  Good  choice  for  a  small  team  and  one  instance   –  ProblemaOc  for  large  teams  and  many  instances   •  False  sharing  causes  tangled  dependencies   –  UnproducOve  re-­‐integraOon  work     AnA-­‐paUern  impacAng  producAvity  and   availability  
  • 40. An  Interface  For  Each  Component   •  Cloud  uses  faceted  Video  and  Visitor   –  Basic  types  hold  only  the  idenOfier   –  Facets  scope  the  interface  you  actually  need   –  Each  component  can  define  its  own  facets   •  No  false-­‐sharing  and  dependency  chains   –  Type  manager  converts  between  facets  as  needed   –  video.asA(PresentaOonVideo)  for  www   –  video.asA(MerchableVideo)  for  middle  Oer  
  • 41. So=ware  Architecture  Pa@erns   •  Object  Models   –  Basic  and  derived  types,  facets,  serializable   –  Pass  by  reference  within  a  service   –  Pass  by  value  between  services   •  ComputaOon  and  I/O  Models   –  Service  ExecuOon  using  Best  Effort   –  Common  thread  pool  management  
  • 43. API   AWS  EC2   Front  End  Load  Balancer   Discovery   Service   API  Proxy   API  etc.   Load  Balancer   Component   API   SQS   Services   Oracl e   Oracle   Oracle   memcached   memcached   ReplicaOon   EBS   NeAlix   S3   Data  Center   AWS  Storage   SimpleDB  
  • 44. Database  MigraOon   •  Why  SimpleDB?   –  No  DBA’s  in  the  cloud,  Amazon  hosted  service   –  Work  started  two  years  ago,  fewer  viable  opOons   –  Worked  with  Amazon  to  speed  up  and  scale  SimpleDB   •  AlternaOves?   –  Rolling  out  Cassandra  as  “upgrade”  from  SimpleDB   –  Need  several  opOons  to  match  use  cases  well   •  Detailed  NoSQL  and  SimpleDB  Advice   –  Sid  Anand    -­‐  QConSF  Nov  5th  –  Ne#lix’  TransiOon  to  High   Availability  Storage  Systems   –  Blog  -­‐  h@p://pracOcalcloudcompuOng.com/   –  Download  Paper  PDF  -­‐  h@p://bit.ly/bhOTLu  
  • 45. Cloud  OperaOons   Model  Driven  Architecture   Capacity  Planning  &  Monitoring  
  • 46. Tools  and  AutomaOon   •  Developer  and  Build  Tools   –  Jira,  Perforce,  Eclipse,  Jenkins,  Ivy,  ArOfactory   –  Builds,  creates  .war  file,  .rpm,  bakes  AMI  and  launches   •  Custom  Ne#lix  ApplicaOon  Console   –  AWS  Features  at  Enterprise  Scale  (hide  the  AWS  security  keys!)   –  Auto  Scaler  Group  is  unit  of  deployment  to  producOon   •  Open  Source  +  Support   –  Apache,  Tomcat,  Cassandra,  Hadoop,  OpenJDK,  CentOS   •  Monitoring  Tools   –  Keynote  –  service  monitoring  and  alerOng   –  Custom  metric  collecOon  and  alerOng  under  development   –  Datastax  OpsCenter  –  Cassandra  Monitoring   –  AppDynamics  –  Developer  focus  for  cloud  h@p://appdynamics.com  
  • 47. Model  Driven  Architecture   •  Datacenter  PracOces   –  Lots  of  unique  hand-­‐tweaked  systems   –  Hard  to  enforce  pa@erns   •  Model  Driven  Cloud  Architecture   –  Perforce/Ivy/Jenkins  based  builds  for  everything   –  Every  producOon  instance  is  a  pre-­‐baked  AMI   –  Every  applicaOon  is  managed  by  an  Autoscaler   Every  change  is  a  new  AMI  
  • 48. High  Availability  Zones   •  Each  zone  is  a  separate  datacenter   –  Private  power,  cooling,  network  connecOons   –  Located  close  together  for  low  latency   •  ASG  Instances  are  distributed  over  3  zones   •  Data  wri@en  to  one  zone  appears  in  all  zones   •  Ne#lix  survived  total  failure  of  one  zone  (!)   –  Increase  capacity  of  exisOng  zones  by  50%   –  Small  or  zero  downOme  
  • 49. Cross  Region  Backups   •  Data  is  backed  up  into  a  different  cloud  region   –  Different  AWS  S3  account,  encrypted  for  security   –  AddiOonal  archive’s  created  on  a  different  vendor   •  Restore  to  a  new  region   –  Create  model  driven  architecture   –  Send  traffic  to  new  region  
  • 50. Model  Driven  ImplicaOons   •  Automated  “Least  Privilege”  Security   –  Tightly  specified  security  groups   –  Fine  grain  IAM  keys  to  access  AWS  resources   –  Performance  tools  security  and  integraOon   •  Model  Driven  Performance  Monitoring   –  Hundreds  of  instances  appear  in  a  few  minutes…   –  Tools  have  to  “garbage  collect”  dead  instances    
  • 52. Auto  Scale  Group  ConfiguraOon  
  • 53. Learnings   •  Datacenter  oriented  tools  don’t  work   –  Ephemeral  instances   –  High  rate  of  change   –  Need  too  much  hand-­‐holding  and  manual  setup   •  Many  Cloud  Tools  Don’t  Scale  for  Enterprise   –  Too  many  tools  are  “Startup”  oriented   –  Built  our  own  tools  for  1000’s  of  instances   –  Drove  vendors  to  be  dynamic,  scale,  add  APIs   •  Un-­‐modified  Datacenter  Apps  are  Fragile   –  Too  many  datacenter  oriented  assumpOons   –  We  re-­‐wrote  our  code  base!   –  (We  re-­‐write  it  conOnuously  anyway)  
  • 54. Capacity  Planning  &  Monitoring  
  • 55. Capacity  Planning  in  Clouds   (a  few  things  have  changed…)   •  Capacity  is  expensive   •  Capacity  takes  Ome  to  buy  and  provision   •  Capacity  only  increases,  can’t  be  shrunk  easily   •  Capacity  comes  in  big  chunks,  paid  up  front   •  Planning  errors  can  cause  big  problems   •  Systems  are  clearly  defined  assets   •  Systems  can  be  instrumented  in  detail   •  Depreciate  assets  over  3  years  (reservaOons!)  
  • 56. Monitoring  Issues   •  Problem   –  Too  many  tools,  each  with  a  good  reason  to  exist   –  Hard  to  get  an  integrated  view  of  a  problem   –  Too  much  manual  work  building  dashboards   –  Tools  are  not  discoverable,  views  are  not  filtered   •  SoluOon   –  Get  vendors  to  add  deep  linking  URLs  and  APIs   –  IntegraOon  “portal”  Oes  everything  together   –  Underlying  dependency  database   –  Dynamic  portal  generaOon,  relevant  data,  all  tools  
  • 57. Data  Sources   • External  URL  availability  and  latency  alerts  and  reports  –  Keynote   External  TesOng   • Stress  tesOng  -­‐  SOASTA   • Ne#lix  REST  calls  –  Chukwa  to  DataOven  with  GUID  transacOon  idenOfier   Request  Trace  Logging   • Generic  HTTP  –  AppDynamics  service  Oer  aggregaOon,  end  to  end  tracking   • Tracers  and  counters  –  log4j,  tracer  central,  Chukwa  to  DataOven   ApplicaOon  logging   • Trackid  and  Audit/Debug  logging  –  DataOven,  Appdynamics    GUID  cross  reference   • ApplicaOon  specific  real  Ome  –  Nimso=,  Appdynamics,  Epic   JMX    Metrics   • Service  and  SLA  percenOles  –  Nimso=,  Appdynamics,  Epic,logged  to  DataOven   • Stdout  logs  –  S3  –  DataOven,  Nimso=  alerOng   Tomcat  and  Apache  logs   • Standard  format  Access  and  Error  logs  –  S3  –  DataOven,  Nimso=  AlerOng   • Garbage  CollecOon  –  Nimso=,  Appdynamics   JVM   • Memory  usage,  call  stacks,  resource/call  -­‐  AppDynamics   • system  CPU/Net/RAM/Disk  metrics  –  AppDynamics,  Epic,  Nimso=  AlerOng   Linux   • SNMP  metrics  –  Epic,  Network  flows  –  boundary.com   • Load  balancer  traffic  –  Amazon  Cloudwatch,  SimpleDB  usage  stats   AWS   • System  configuraOon    -­‐  CPU  count/speed  and  RAM  size,  overall  usage  -­‐  AWS  
  • 58. AppDynamics   How  to  look  deep  inside  your  cloud  applicaOons   •  AutomaOc  Monitoring   –  Base  AMI  bakes  in  all  monitoring  tools   –  Outbound  calls  only  –  no  discovery/polling  issues   –  InacOve  instances  removed  a=er  a  few  days     •  Incident  Alarms  (deviaOon  from  baseline)   –  Business  TransacOon  latency  and  error  rate   –  Alarm  thresholds  discover  their  own  baseline   –  Email  contains  URL  to  Incident  Workbench  UI  
  • 59. Using  AppDynamics   (simple  example  from  early  2010)  
  • 60. Point  Finger  and  Assess  Impact   (an  async  S3  write  was  slow,  no  big  deal)  
  • 61. Monitoring  Summary   •  Broken  datacenter  oriented  tools  is  a  big  problem   •  IntegraOng  many  different  tools   –  They  are  not  designed  to  be  integrated   –  We  have  “persuaded”  vendors  to  add  APIs   •  If  you  can’t  see  deep  inside  your  app,  you’re  L  
  • 63. ImplicaOons  for  IT  OperaOons   •  Cloud  is  run  by  developer  organizaOon   –  Our  IT  department  is  Amazon  Cloud   –  Forming  “Cloud  OperaOons  Reliability  Eng”  team     •  TradiOonal  IT  Roles  are  going  away   –  Don’t  need  SA,  DBA,  Storage,  Network  admins   –  Database  Engineering  Team  runs  SDB/Cassandra  
  • 64. Next  Few  Years…   •  “System  of  Record”  moves  to  Cloud  (now)   –  Master  copies  of  data  live  only  in  the  cloud,  with  backups   –  Cut  the  datacenter  to  cloud  replicaOon  link,  turn  off  Oracle  databases   •  InternaOonal  Expansion  –  Global  Clouds  (later  in  2011)   –  Rapid  deployments  to  new  markets   •  Cloud  StandardizaOon?   –  Cloud  features  and  APIs  should  be  a  commodity  not  a  differenOator   –  DifferenOate  on  scale  and  quality  of  service   –  CompeOOon  and  scale  drives  cost  down   –  Higher  resilience  and  scalability     We  would  prefer  to  be  an  insignificant  customer  in  a  giant  cloud  
  • 65. Takeaway     Ne9lix  is  path-­‐finding  the  use  of  public  AWS   cloud  to  replace  in-­‐house  IT  for  non-­‐trivial   applicaAons  with  hundreds  of  developers  and   thousands  of  systems.     acockcro=@ne#lix.com   h@p://www.linkedin.com/in/adriancockcro=   @adrianco  #ne#lixcloud  
  • 66. Amazon Cloud Terminology See http://aws.amazon.com/jp for Japanese This is not a full list of Amazon Web Service features •  AWS  –  Amazon  Web  Services  (common  name  for  Amazon  cloud)   •  AMI  –  Amazon  Machine  Image  (archived  boot  disk,  Linux,  Windows  etc.  plus  applicaOon  code)   •  EC2  –  ElasOc  Compute  Cloud   –  Range  of  virtual  machine  types  m1,  m2,  c1,  cc,  cg.  Varying  memory,  CPU  and  disk  configuraOons.   –  Instance  –  a  running  computer  system.  Ephemeral,  when  it  is  de-­‐allocated  nothing  is  kept.   –  Reserved  Instances  –  pre-­‐paid  to  reduce  cost  for  long  term  usage   –  Availability  Zone  –  datacenter  with  own  power  and  cooling  hosOng  cloud  instances   –  Region  –  group  of  Availability  Zones  –  US-­‐East,  US-­‐West,  EU-­‐Eire,  Asia-­‐Singapore,  Asia-­‐Japan   •  ASG  –  Auto  Scaling  Group  (instances  booOng  from  the  same  AMI)   •  S3  –  Simple  Storage  Service  (h@p  access)   •  EBS  –  ElasOc  Block  Storage  (network  disk  filesystem  can  be  mounted  on  an  instance)   •  RDB  –  RelaOonal  Data  Base  (managed  MySQL  master  and  slaves)   •  SDB  –  Simple  Data  Base  (hosted  h@p  based  NoSQL  data  store)   •  SQS  –  Simple  Queue  Service  (h@p  based  message  queue)   •  SNS  –  Simple  NoOficaOon  Service  (h@p  and  email  based  topics  and  messages)   •  EMR  –  ElasOc  Map  Reduce  (automaOcally  managed  Hadoop  cluster)   •  ELB  –  ElasOc  Load  Balancer   •  EIP  –  ElasOc  IP  (stable  IP  address  mapping  assigned  to  instance  or  ELB)   •  VPC  –  Virtual  Private  Cloud  (extension  of  enterprise  datacenter  network  into  cloud)   •  IAM  –  IdenOty  and  Access  Management  (fine  grain  role  based  security  keys)