SlideShare une entreprise Scribd logo
1  sur  60
Télécharger pour lire hors ligne
The	
  Ne&lix	
  Open	
  Source	
  
         Pla&orm	
  
           September	
  26th,	
  2012	
  
 Adrian	
  Cockcro8,	
  Ruslan	
  Meshenberg	
  
                        	
  
          @adrianco	
  @rusmeshenberg	
  #neAlixcloud	
  
         hCp://www.linkedin.com/in/adriancockcro8	
  
        hCp://www.linkedin.com/in/ruslanmeshenberg	
  
                               	
  
What	
  NeAlix	
  Did	
  
•  Moved	
  to	
  SaaS	
  
    –  Corporate	
  IT	
  –	
  OneLogin,	
  Workday,	
  Box,	
  Evernote…	
  
    –  Tools	
  –	
  Pagerduty,	
  AppDynamics,	
  ElasVc	
  MapReduce	
  
•  Built	
  our	
  own	
  PaaS	
  
    –  Customized	
  to	
  make	
  our	
  developers	
  producVve	
  
    –  When	
  we	
  started,	
  we	
  had	
  liCle	
  choice	
  
•  Moved	
  incremental	
  capacity	
  to	
  IaaS	
  
    –  No	
  new	
  datacenter	
  space	
  since	
  2008	
  as	
  we	
  grew	
  
    –  Moved	
  our	
  streaming	
  apps	
  to	
  the	
  cloud	
  
Why	
  Use	
  Cloud?	
  
         	
  
           	
  
Things	
  we	
  don’t	
  do	
  
NeAlix	
  Choice	
  was	
  AWS	
  with	
  our	
  
   own	
  plaAorm	
  and	
  tools	
  
     Unique	
  plaAorm	
  requirements	
  and	
  
     extreme	
  scale,	
  agility	
  and	
  flexibility	
  
Leverage	
  AWS	
  Scale	
  
   “the	
  biggest	
  public	
  cloud”	
  
 AWS	
  investment	
  in	
  features	
  and	
  automaVon	
  
Use	
  AWS	
  zones	
  and	
  regions	
  for	
  high	
  availability,	
  
         scalability	
  and	
  global	
  deployment	
  
What	
  about	
  other	
  PaaS?	
  
•  CloudFoundry	
  –	
  Open	
  Source	
  by	
  VMWare	
  
    –  Developer-­‐friendly,	
  easy	
  to	
  get	
  started	
  
    –  Missing	
  scale	
  and	
  some	
  enterprise	
  features	
  
•  Rightscale	
  
    –  Widely	
  used	
  to	
  abstract	
  away	
  from	
  AWS	
  
    –  Creates	
  it’s	
  own	
  lock-­‐in	
  problem…	
  
•  AWS	
  is	
  growing	
  into	
  this	
  space	
  
    –  We	
  didn’t	
  want	
  a	
  vendor	
  between	
  us	
  and	
  AWS	
  
    –  We	
  wanted	
  to	
  build	
  a	
  thin	
  PaaS,	
  that	
  gets	
  thinner	
  
What	
  do	
  developers	
  care	
  about?	
  
Keeping	
  up	
  with	
  Developer	
  Trends	
  
                                                               In	
  producVon	
  
                                                               at	
  NeAlix	
  
•    Big	
  Data/Hadoop	
                                       2009	
  
•    AWS	
  Cloud	
                                             2009	
  
•    ApplicaVon	
  Performance	
  Management	
   2010	
  
•    Integrated	
  DevOps	
  PracVces	
                         2010	
  
•    ConVnuous	
  IntegraVon/Delivery	
                         2010	
  
•    NoSQL	
                                                    2010	
  
•    PlaAorm	
  as	
  a	
  Service;	
  Fine	
  grain	
  SOA	
   2010	
  
•    Social	
  coding,	
  open	
  development/github	
   2011	
  
AWS	
  specific	
  feature	
  dependence….	
  
                      	
  
                     	
  
Portability	
  vs.	
  FuncVonality	
  
•  Portability	
  –	
  the	
  OperaVons	
  focus	
  
   –  Avoid	
  vendor	
  lock-­‐in	
  
   –  Support	
  datacenter	
  based	
  use	
  cases	
  
   –  Possible	
  operaVons	
  cost	
  savings	
  

•  FuncVonality	
  –	
  the	
  Developer	
  focus	
  
   –  Less	
  complex	
  test	
  and	
  debug,	
  one	
  mature	
  supplier	
  
   –  Faster	
  Vme	
  to	
  market	
  for	
  your	
  products	
  
   –  Possible	
  developer	
  cost	
  savings	
  
Portable	
  PaaS	
  
•  Portable	
  IaaS	
  Base	
  -­‐	
  some	
  AWS	
  compaVbility	
  
    –  Eucalyptus	
  –	
  AWS	
  licensed	
  compaVble	
  subset	
  
    –  CloudStack	
  –	
  Citrix	
  Apache	
  project	
  
    –  OpenStack	
  –	
  Rackspace,	
  Cloudscaling,	
  HP	
  etc.	
  


•  Portable	
  PaaS	
  
    –  VMWare	
  Cloud	
  Foundry	
  -­‐	
  run	
  it	
  yourself	
  in	
  your	
  DC	
  
    –  AppFog	
  and	
  Stackato	
  –	
  Cloud	
  Foundry/Openstack	
  
    –  Vendor	
  opVons:	
  Rightscale,	
  Enstratus,	
  Smartscale	
  
FuncVonal	
  PaaS	
  
•  IaaS	
  base	
  -­‐	
  all	
  the	
  features	
  of	
  AWS	
  
     –  Very	
  large	
  scale,	
  mature,	
  global,	
  evolving	
  rapidly	
  
     –  ELB,	
  Autoscale,	
  VPC,	
  SQS,	
  EIP,	
  EMR,	
  DynamoDB	
  etc.	
  
     –  Large	
  files	
  (TB)	
  and	
  mulVpart	
  writes	
  in	
  S3	
  


•  FuncVonal	
  PaaS	
  –	
  NeAlix	
  added	
  features	
  
     –  Very	
  large	
  scale,	
  mature,	
  flexible,	
  customizable	
  
     –  Asgard	
  console,	
  Monkeys,	
  Big	
  data	
  tools	
  
     –  Cassandra/Zookeeper	
  data	
  store	
  automaVon	
  
Developers	
  choose	
  FuncVonal	
  
                  	
  
   Don’t	
  let	
  the	
  roadie	
  write	
  the	
  set	
  list!	
  
(yes	
  you	
  do	
  need	
  all	
  those	
  guitars	
  on	
  tour…)	
  
Freedom	
  and	
  Responsibility	
  
•  Developers	
  leverage	
  cloud	
  to	
  get	
  freedom	
  
   –  Agility	
  of	
  a	
  single	
  organizaVon,	
  no	
  silos	
  

•  But	
  now	
  developers	
  are	
  responsible	
  
   –  For	
  compliance,	
  performance,	
  availability	
  etc.	
  

   “As	
  far	
  as	
  my	
  rehab	
  is	
  concerned,	
  it	
  is	
  within	
  my	
  
   ability	
  to	
  change	
  and	
  change	
  for	
  the	
  be>er	
  -­‐	
  Eddie	
  
   Van	
  Halen”	
  	
  
Amazon Cloud Terminology Reference
     See http://aws.amazon.com/ This is not a full list of Amazon Web Service features

•    AWS	
  –	
  Amazon	
  Web	
  Services	
  (common	
  name	
  for	
  Amazon	
  cloud)	
  
•    AMI	
  –	
  Amazon	
  Machine	
  Image	
  (archived	
  boot	
  disk,	
  Linux,	
  Windows	
  etc.	
  plus	
  applicaVon	
  code)	
  
•    EC2	
  –	
  ElasVc	
  Compute	
  Cloud	
  
       –    Range	
  of	
  virtual	
  machine	
  types	
  m1,	
  m2,	
  c1,	
  cc,	
  cg.	
  Varying	
  memory,	
  CPU	
  and	
  disk	
  configuraVons.	
  
       –    Instance	
  –	
  a	
  running	
  computer	
  system.	
  Ephemeral,	
  when	
  it	
  is	
  de-­‐allocated	
  nothing	
  is	
  kept.	
  
       –    Reserved	
  Instances	
  –	
  pre-­‐paid	
  to	
  reduce	
  cost	
  for	
  long	
  term	
  usage	
  
       –    Availability	
  Zone	
  –	
  datacenter	
  with	
  own	
  power	
  and	
  cooling	
  hosVng	
  cloud	
  instances	
  
       –    Region	
  –	
  group	
  of	
  Avail	
  Zones	
  –	
  US-­‐East,	
  US-­‐West,	
  EU-­‐Eire,	
  Asia-­‐Singapore,	
  Asia-­‐Japan,	
  SA-­‐Brazil,	
  US-­‐Gov	
  
•    ASG	
  –	
  Auto	
  Scaling	
  Group	
  (instances	
  booVng	
  from	
  the	
  same	
  AMI)	
  
•    S3	
  –	
  Simple	
  Storage	
  Service	
  (hCp	
  access)	
  
•    EBS	
  –	
  ElasVc	
  Block	
  Storage	
  (network	
  disk	
  filesystem	
  can	
  be	
  mounted	
  on	
  an	
  instance)	
  
•    RDS	
  –	
  RelaVonal	
  Database	
  Service	
  (managed	
  MySQL	
  master	
  and	
  slaves)	
  
•    DynamoDB/SDB	
  –	
  Simple	
  Data	
  Base	
  (hosted	
  hCp	
  based	
  NoSQL	
  datastore,	
  DynamoDB	
  replaces	
  SDB)	
  
•    SQS	
  –	
  Simple	
  Queue	
  Service	
  (hCp	
  based	
  message	
  queue)	
  
•    SNS	
  –	
  Simple	
  NoVficaVon	
  Service	
  (hCp	
  and	
  email	
  based	
  topics	
  and	
  messages)	
  
•    EMR	
  –	
  ElasVc	
  Map	
  Reduce	
  (automaVcally	
  managed	
  Hadoop	
  cluster)	
  
•    ELB	
  –	
  ElasVc	
  Load	
  Balancer	
  
•    EIP	
  –	
  ElasVc	
  IP	
  (stable	
  IP	
  address	
  mapping	
  assigned	
  to	
  instance	
  or	
  ELB)	
  
•    VPC	
  –	
  Virtual	
  Private	
  Cloud	
  (single	
  tenant,	
  more	
  flexible	
  network	
  and	
  security	
  constructs)	
  
•    DirectConnect	
  –	
  secure	
  pipe	
  from	
  AWS	
  VPC	
  to	
  external	
  datacenter	
  
•    IAM	
  –	
  IdenVty	
  and	
  Access	
  Management	
  (fine	
  grain	
  role	
  based	
  security	
  keys)	
  
What	
  Runs	
  in	
  the	
  Cloud?	
  

   Step	
  by	
  Step	
  NeAlix	
  Product	
  
                  TransiVon	
  
Non-­‐Member	
  Web	
  Site	
  
Member	
  Web	
  Site	
  
Content	
  Delivery	
  Service	
  
NeAlix	
  APIs	
  
Streaming	
  Device	
  API	
  




                           Netflix Ready Devices
                          From:      May 2008
                            To:      May 2010
Current	
  Architectural	
  PaCerns	
  for	
  Availability	
  

•  Isolated	
  Services	
  
   –  Resilient	
  Business	
  logic	
  
•  Three	
  Balanced	
  Availability	
  Zones	
  
   –  Resilient	
  to	
  Infrastructure	
  outage	
  
•  Triple	
  Replicated	
  Persistence	
  
   –  Durable	
  distributed	
  Storage	
  
•  Isolated	
  Regions	
  
   –  US	
  and	
  EU	
  don’t	
  take	
  each	
  other	
  down	
  
Isolated	
  Services	
  
                                                    	
  
Test	
  With	
  Chaos	
  Monkey,	
  Latency	
  Monkey
Three	
  Balanced	
  Availability	
  Zones	
  
                                  Test	
  with	
  Chaos	
  Gorilla	
  

                                           Load	
  Balancers	
  




          Zone	
  A	
                              Zone	
  B	
                       Zone	
  C	
  
Cassandra	
  and	
  Evcache	
            Cassandra	
  and	
  Evcache	
     Cassandra	
  and	
  Evcache	
  
      Replicas	
                               Replicas	
                        Replicas	
  
Triple	
  Replicated	
  Persistence	
  
             Cassandra	
  maintenance	
  drops	
  individual	
  replicas	
  	
  
                                       Load	
  Balancers	
  




          Zone	
  A	
                          Zone	
  B	
                       Zone	
  C	
  
Cassandra	
  and	
  Evcache	
        Cassandra	
  and	
  Evcache	
     Cassandra	
  and	
  Evcache	
  
      Replicas	
                           Replicas	
                        Replicas	
  
Isolated	
  Regions	
  

                        US-­‐East	
  Load	
  Balancers	
                                                               EU-­‐West	
  Load	
  Balancers	
  




       Zone	
  A	
                         Zone	
  B	
                     Zone	
  C	
                 Zone	
  A	
                        Zone	
  B	
                 Zone	
  C	
  

Cassandra	
  Replicas	
             Cassandra	
  Replicas	
         Cassandra	
  Replicas	
     Cassandra	
  Replicas	
            Cassandra	
  Replicas	
     Cassandra	
  Replicas	
  
Failure	
  Modes	
  and	
  Effects	
  
Failure	
  Mode	
              Probability	
     Mi;ga;on	
  Plan	
  
ApplicaVon	
  Failure	
        High	
            AutomaVc	
  degraded	
  response	
  
AWS	
  Region	
  Failure	
     Low	
             Wait	
  for	
  region	
  to	
  recover	
  
AWS	
  Zone	
  Failure	
       Medium	
          ConVnue	
  to	
  run	
  on	
  2	
  out	
  of	
  3	
  zones	
  
Datacenter	
  Failure	
        Medium	
          Migrate	
  more	
  funcVons	
  to	
  cloud	
  
Data	
  store	
  failure	
     Low	
             Restore	
  from	
  S3	
  backups	
  
S3	
  failure	
                Low	
             Restore	
  from	
  remote	
  archive	
  
Observed	
  Regional	
  Failures	
  
•  Power	
  Outages	
  
    –  PlaAorm	
  survives	
  any	
  one	
  zone	
  outage	
  
    –  Two	
  recent	
  zone	
  outages,	
  one	
  OK,	
  one	
  triggered	
  a	
  bug	
  

•  Router	
  Bug	
  Takes	
  Region	
  Offline	
  
    –  A	
  few	
  minutes	
  of	
  no	
  network	
  traffic,	
  then	
  recovered	
  
    –  AWS	
  has	
  redesigned	
  routes	
  to	
  be	
  per	
  zone	
  

•  Control	
  Plane	
  Overload	
  Affects	
  EnVre	
  Region	
  
    –  Consequence	
  of	
  other	
  outages	
  
    –  We	
  lose	
  control	
  of	
  our	
  infrastructure	
  
NeAlix	
  Deployed	
  on	
  AWS	
  
   2009	
            2009	
                  2010	
              2010	
            2010	
             2011	
  

Content	
            Logs	
                  Play	
              WWW	
             API	
                CS	
  
   Content	
             S3	
                                                                         InternaVonal	
  
  Management	
                                   DRM	
             Sign-­‐Up	
      Metadata	
          CS	
  lookup	
  
                      Terabytes	
  


      EC2	
                                                                           Device	
         DiagnosVcs	
  
                           EMR	
             CDN	
  rouVng	
        Search	
          Config	
           &	
  AcVons	
  
    Encoding	
  


      S3	
                                                          Movie	
         TV	
  Movie	
       Customer	
  
                      Hive	
  &	
  Pig	
     Bookmarks	
           Choosing	
       Choosing	
           Call	
  Log	
  
   Petabytes	
  


                       Business	
                                                     Social	
  
                                                Logging	
           RaVngs	
        Facebook	
        CS	
  AnalyVcs	
  
                     Intelligence	
  
   CDNs	
  
    ISPs	
  
  Terabits	
  
 Customers	
  
Cloud	
  Architecture	
  PaCerns	
  

        Where	
  do	
  we	
  start?	
  
Datacenter	
  to	
  Cloud	
  TransiVon	
  Goals	
  
•  Faster	
  
     –  Lower	
  latency	
  than	
  the	
  equivalent	
  datacenter	
  web	
  pages	
  and	
  API	
  calls	
  
     –  Measured	
  as	
  mean	
  and	
  99th	
  percenVle	
  
     –  For	
  both	
  first	
  hit	
  (e.g.	
  home	
  page)	
  and	
  in-­‐session	
  hits	
  for	
  the	
  same	
  user	
  
•  Scalable	
  
     –  Avoid	
  needing	
  any	
  more	
  datacenter	
  capacity	
  as	
  subscriber	
  count	
  increases	
  
     –  No	
  central	
  verVcally	
  scaled	
  databases	
  
     –  Leverage	
  AWS	
  elasVc	
  capacity	
  effecVvely	
  
•  Available	
  
     –  SubstanVally	
  higher	
  robustness	
  and	
  availability	
  than	
  datacenter	
  services	
  
     –  Leverage	
  mulVple	
  AWS	
  availability	
  zones	
  
     –  No	
  scheduled	
  down	
  Vme,	
  no	
  central	
  database	
  schema	
  to	
  change	
  
•  ProducVve	
  
     –  OpVmize	
  agility	
  of	
  a	
  large	
  development	
  team	
  with	
  automaVon	
  and	
  tools	
  
     –  Leave	
  behind	
  complex	
  tangled	
  datacenter	
  code	
  base	
  (~8	
  year	
  old	
  architecture)	
  
     –  Enforce	
  clean	
  layered	
  interfaces	
  and	
  re-­‐usable	
  components	
  
NeAlix	
  Datacenter	
  vs.	
  Cloud	
  Arch	
  
   Central	
  SQL	
  Database	
          Distributed	
  Key/Value	
  NoSQL	
  

SVcky	
  In-­‐Memory	
  Session	
         Shared	
  Memcached	
  Session	
  

      ChaCy	
  Protocols	
                 Latency	
  Tolerant	
  Protocols	
  

Tangled	
  Service	
  Interfaces	
         Layered	
  Service	
  Interfaces	
  

    Instrumented	
  Code	
              Instrumented	
  Service	
  PaCerns	
  

   Fat	
  Complex	
  Objects	
          Lightweight	
  Serializable	
  Objects	
  

 Components	
  as	
  Jar	
  Files	
         Components	
  as	
  Services	
  
Availability	
  and	
  Resilience	
  
Chaos	
  Monkey	
  
•  Computers	
  (Datacenter	
  or	
  AWS)	
  randomly	
  die	
  
    –  Fact	
  of	
  life,	
  but	
  too	
  infrequent	
  to	
  test	
  resiliency	
  
•  Test	
  to	
  make	
  sure	
  systems	
  are	
  resilient	
  
    –  Allow	
  any	
  instance	
  to	
  fail	
  without	
  customer	
  impact	
  
•  Chaos	
  Monkey	
  hours	
  
    –  Monday-­‐Friday	
  9am-­‐3pm	
  random	
  instance	
  kill	
  
•  ApplicaVon	
  configuraVon	
  opVon	
  
    –  Apps	
  now	
  have	
  to	
  opt-­‐out	
  from	
  Chaos	
  Monkey	
  
Responsibility	
  and	
  Experience	
  
•  Make	
  developers	
  responsible	
  for	
  failures	
  
    –  Then	
  they	
  learn	
  and	
  write	
  code	
  that	
  doesn’t	
  fail	
  
•  Use	
  Incident	
  Reviews	
  to	
  find	
  gaps	
  to	
  fix	
  
    –  Make	
  sure	
  its	
  not	
  about	
  finding	
  “who	
  to	
  blame”	
  
•  Keep	
  Vmeouts	
  short,	
  fail	
  fast	
  
    –  Don’t	
  let	
  cascading	
  Vmeouts	
  stack	
  up	
  
•  Make	
  configuraVon	
  opVons	
  dynamic	
  
    –  You	
  don’t	
  want	
  to	
  push	
  code	
  to	
  tweak	
  an	
  opVon	
  
Resilient	
  Design	
  –	
  Circuit	
  Breakers	
  
hCp://techblog.neAlix.com/2012/02/fault-­‐tolerance-­‐in-­‐high-­‐volume.html	
  
Distributed	
  OperaVonal	
  Model	
  
•  Developers	
  
   –  Provision	
  and	
  run	
  their	
  own	
  code	
  in	
  producVon	
  
   –  Take	
  turns	
  to	
  be	
  on	
  call	
  if	
  it	
  breaks	
  (pagerduty)	
  
   –  Configure	
  autoscalers	
  to	
  handle	
  capacity	
  needs	
  

•  DevOps	
  and	
  PaaS	
  (aka	
  NoOps)	
  
   –  DevOps	
  is	
  used	
  to	
  build	
  and	
  run	
  the	
  PaaS	
  
   –  PaaS	
  constrains	
  Dev	
  to	
  use	
  automaVon	
  instead	
  
   –  PaaS	
  puts	
  more	
  responsibility	
  on	
  Dev,	
  with	
  tools	
  
What’s	
  Le8	
  for	
  Corp	
  IT?	
  
•  Corporate	
  Security	
  and	
  Network	
  Management	
  
    –  Billing	
  and	
  remnants	
  of	
  streaming	
  service	
  back-­‐ends	
  in	
  DC	
  
•  Running	
  NeAlix’	
  DVD	
  Business	
  
    –    Tens	
  of	
  Oracle	
  instances	
                          Corp	
  WiFi	
  Performance	
  
    –    Hundreds	
  of	
  MySQL	
  instances	
  
    –    Thousands	
  of	
  VMWare	
  VMs	
  
    –    Zabbix,	
  CacV,	
  Sumologic,	
  Puppet,	
  Chef	
  
•  Employee	
  ProducVvity	
  
    –    Building	
  networks	
  and	
  WiFi	
  
    –    SaaS	
  OneLogin	
  SSO	
  Portal	
  
    –    Evernote	
  Premium,	
  Safari	
  Online	
  Bookshelf,	
  Dropbox	
  for	
  Teams	
  
    –    Google	
  Enterprise	
  Apps,	
  Workday	
  HCM/Expense,	
  Box.com	
  
    –    Many	
  more	
  SaaS	
  migraVons	
  coming…	
  
NeAlix	
  OrganizaVon	
  
                 DevOps	
  Org	
  ReporVng	
  into	
  Product	
  Group,	
  not	
  ITops                                                    	
  

                 NeAlix	
  Cloud	
  PlaAorm	
  Team	
  
 Cloud	
  Ops	
                                       Build	
  Tools	
              PlaAorm	
  and	
  
                                                                                                               Cloud	
                  Cloud	
  
 Reliability	
              Architecture	
                and	
                      Persistence	
  
                                                                                                            Performance	
              SoluVons	
  
Engineering	
                                         AutomaVon	
                   Engineering	
  


                                                       Perforce	
  Jenkins	
          PlaAorm	
  jars	
        Cassandra	
  
                            Future	
  planning	
       ArVfactory	
  JIRA	
                                  Benchmarking	
              Monitoring	
  
  Alert	
  RouVng	
                                                                     Key	
  store	
  
                             Security	
  Arch	
                                                                                           Monkeys	
  
Incident	
  Lifecycle	
                               Base	
  AMI,	
  Bakery	
         Zookeeper	
           JVM	
  GC	
  Tuning	
  
                                Efficiency	
           NeAlix	
  App	
  Console	
                               Wiresharking	
             Entrypoints	
  
                                                                                       Cassandra	
  



                               AWS	
  VPC	
  
    PagerDuty	
               Hyperguard	
                  AWS	
  API	
             AWS	
  Instances	
      AWS	
  Instances	
        AWS	
  Instances	
  
                             Powerpoint	
  J	
  
NeAlix	
  Open	
  Source	
  Strategy	
  
•  Steadily	
  release	
  PaaS	
  Components	
  git-­‐by-­‐git	
  
	
  
•  Source	
  at	
  github.com/neAlix	
  –	
  we	
  build	
  from	
  it…	
  
	
  
•  Intros	
  and	
  techniques	
  at	
  techblog.neAlix.com	
  
Give	
  back	
  to	
  Apache	
  licensed	
  OSS	
  
                 community	
  	
  
Lead	
  the	
  Best	
  PracVces	
  
MoVvate,	
  regain,	
  hire	
  top	
  engineers	
  
“Peer	
  Pressure”	
  code	
  cleanup	
  
External	
  contribuVons	
  
Clean	
  Code	
  is	
  Re-­‐usable	
  
•  Use	
  by	
  other	
  teams	
  and	
  projects	
  inside	
  NeAlix	
  
Timeline	
  
hCp://neAlix.github.com	
  
Simian	
  Army	
  (Chaos	
  Monkey)	
  
   hCp://techblog.neAlix.com/2012/07/chaos-­‐monkey-­‐released-­‐into-­‐wild.html   	
  
	
  	
  
Asgard	
  
hCp://techblog.neAlix.com/2012/06/asgard-­‐web-­‐based-­‐cloud-­‐management-­‐and.html   	
  
Astyanax,	
  Priam,	
  Curator,	
  Exhibitor	
  
	
  	
  
AcVve	
  Pipeline	
  
	
  	
  
Instance	
  creaVon	
  


 Bakery	
  &	
  
Build	
  tools	
                                      Asgard	
  

                     Base	
  AMI	
                                                    Instance	
  
                                                               Autoscaling	
  
ApplicaVon	
                               Odin	
                scripts	
  
  Code	
  




 Image	
  baked	
                      ASG	
  /	
  Instance	
  started	
         Instance	
  Running	
  
RunVme	
  


    Governator	
                                     Eureka	
  


                      Async	
  
                     logging	
  

                                      Archaius	
              Entrypoints	
  
       Servo	
  




                                             Registering,	
  
ApplicaVon	
  iniValizing	
  
                                            configuraVon	
  
RunVme,	
  Cont’d	
  


        Astyanax	
                                                   Priam	
  

                               Curator	
  
                                                                                                    Chaos	
  Monkey	
  
                                                                                                    Latency	
  Monkey	
  
                                                  NIWS	
  LB	
                     Exhibitor	
  
                                                                                                    Janitor	
  Monkey	
  
                                                                                                    Cass	
  JMeter	
  
 Dependency	
          REST	
  client	
  
  Command	
  
                                                                   Explorers	
  




Calling	
  other	
  services	
                       Managing	
  service	
                         Resiliency	
  aids	
  
Open	
  Source	
  Projects	
  
             Legend	
  
  Github	
  /	
  Techblog	
                     Priam	
                                    Exhibitor	
  
                                                                                                                           Servo	
  and	
  Autoscaling	
  Scripts	
  
Apache	
  ContribuVons	
  
                                   Cassandra	
  as	
  a	
  Service	
           Zookeeper	
  as	
  a	
  Service	
  
                                         Astyanax	
                                    Curator	
                                          Honu	
  
    Techblog	
  Post	
  
                                  Cassandra	
  client	
  for	
  Java	
            Zookeeper	
  PaCerns	
                    Log4j	
  streaming	
  to	
  Hadoop	
  
     Coming	
  Soon	
  
                                         CassJMeter	
                              EVCache	
                                      Circuit	
  Breaker	
  
                                     Cassandra	
  test	
  suite	
             Memcached	
  as	
  a	
  Service	
                Robust	
  service	
  paCern	
  

                                Cassandra	
  MulV-­‐region	
  EC2	
               Eureka	
  /	
  Discovery	
               Asgard	
  AutoScaleGroup	
  based	
  
                                    datastore	
  support	
                         Service	
  Directory	
                             AWS	
  console	
  

                                        Aegisthus	
                                 Archaius	
                                    Chaos	
  Monkey	
  
                                Hadoop	
  ETL	
  for	
  Cassandra	
        Dynamics	
  ProperVes	
  Service	
                  Robustness	
  verificaVon	
  

                                             Explorers	
                                EntryPoints	
                               Latency	
  Monkey	
  

                                Governator	
  Library	
  lifecycle	
         Server-­‐side	
  latency/error	
  
                                 and	
  dependency	
  injecVon	
                       injecVon	
                                   Janitor	
  Monkey	
  

                                         Odin	
  
                                                                              REST	
  Client	
  +	
  mid-­‐Ver	
  LB	
             Bakeries	
  and	
  AMI	
  
                                  Workflow	
  orchestraVon	
  

                                          Async	
  logging	
               ConfiguraVon	
  REST	
  endpoints	
                       Build	
  dynaslaves	
  
Repeat	
  a8er	
  me…	
  
Roadmap	
  for	
  2012	
  
•    More	
  resiliency	
  and	
  improved	
  availability	
  
•    More	
  automaVon,	
  orchestraVon	
  
•    “Hardening”	
  the	
  plaAorm,	
  code	
  clean-­‐up	
  
•    Lower	
  latency	
  for	
  web	
  services	
  and	
  devices	
  
•    IPv6	
  –	
  now	
  running	
  in	
  prod,	
  rollout	
  in	
  process	
  
•    More	
  open	
  sourced	
  components	
  
•    See	
  you	
  at	
  AWS	
  Re:Invent	
  in	
  November…	
  
Takeaway	
  
                                                     	
  
 NeElix	
  has	
  built	
  and	
  deployed	
  a	
  scalable	
  global	
  PlaEorm	
  as	
  a	
  Service.	
  
                                                     	
  
Key	
  components	
  of	
  the	
  NeElix	
  PaaS	
  are	
  being	
  released	
  as	
  Open	
  Source	
  
                   projects	
  so	
  you	
  can	
  build	
  your	
  own	
  custom	
  PaaS.	
  
                                                     	
  
                                  hCp://github.com/NeAlix	
  
                                 hCp://techblog.neAlix.com	
  
                                 hCp://slideshare.net/NeAlix	
  
                                               	
  
                          hCp://www.linkedin.com/in/adriancockcro8	
  
                         hCp://www.linkedin.com/in/ruslanmeshenberg	
  
                                               	
  
                           @adrianco	
  @rusmeshenberg	
  #neAlixcloud	
  

Contenu connexe

Tendances

Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsAdrian Cockcroft
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowAdrian Cockcroft
 
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Adrian Cockcroft
 
Performance architecture for cloud connect
Performance architecture for cloud connectPerformance architecture for cloud connect
Performance architecture for cloud connectAdrian Cockcroft
 
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAdrian Cockcroft
 
Cloud Computing for Developers and Architects - QCon 2008 Tutorial
Cloud Computing for Developers and Architects - QCon 2008 TutorialCloud Computing for Developers and Architects - QCon 2008 Tutorial
Cloud Computing for Developers and Architects - QCon 2008 TutorialStuart Charlton
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Adrian Cockcroft
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Adrian Cockcroft
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumAdrian Cockcroft
 
The Netflix Open Source Platform
The Netflix Open Source PlatformThe Netflix Open Source Platform
The Netflix Open Source PlatformRuslan Meshenberg
 
Asgard, the Grails App that Deploys Netflix to the Cloud
Asgard, the Grails App that Deploys Netflix to the CloudAsgard, the Grails App that Deploys Netflix to the Cloud
Asgard, the Grails App that Deploys Netflix to the CloudJoe Sondow
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSFAdrian Cockcroft
 
Journey Through the AWS Cloud; Application Services
Journey Through the AWS Cloud; Application ServicesJourney Through the AWS Cloud; Application Services
Journey Through the AWS Cloud; Application ServicesAmazon Web Services
 
Building Web Scale Applications with AWS
Building Web Scale Applications with AWSBuilding Web Scale Applications with AWS
Building Web Scale Applications with AWSAmazon Web Services
 
AmebaPico 裏側の技術やAWSの利用について
AmebaPico 裏側の技術やAWSの利用についてAmebaPico 裏側の技術やAWSの利用について
AmebaPico 裏側の技術やAWSの利用についてKohei Morino
 
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012Amazon Web Services
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...Adrian Cockcroft
 

Tendances (20)

Dystopia as a Service
Dystopia as a ServiceDystopia as a Service
Dystopia as a Service
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and Ops
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
 
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)
 
Performance architecture for cloud connect
Performance architecture for cloud connectPerformance architecture for cloud connect
Performance architecture for cloud connect
 
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at Netflix
 
Netflix in the Cloud
Netflix in the CloudNetflix in the Cloud
Netflix in the Cloud
 
Cloud Computing for Developers and Architects - QCon 2008 Tutorial
Cloud Computing for Developers and Architects - QCon 2008 TutorialCloud Computing for Developers and Architects - QCon 2008 Tutorial
Cloud Computing for Developers and Architects - QCon 2008 Tutorial
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV Forum
 
Gluecon keynote
Gluecon keynoteGluecon keynote
Gluecon keynote
 
The Netflix Open Source Platform
The Netflix Open Source PlatformThe Netflix Open Source Platform
The Netflix Open Source Platform
 
Asgard, the Grails App that Deploys Netflix to the Cloud
Asgard, the Grails App that Deploys Netflix to the CloudAsgard, the Grails App that Deploys Netflix to the Cloud
Asgard, the Grails App that Deploys Netflix to the Cloud
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSF
 
Journey Through the AWS Cloud; Application Services
Journey Through the AWS Cloud; Application ServicesJourney Through the AWS Cloud; Application Services
Journey Through the AWS Cloud; Application Services
 
Building Web Scale Applications with AWS
Building Web Scale Applications with AWSBuilding Web Scale Applications with AWS
Building Web Scale Applications with AWS
 
AmebaPico 裏側の技術やAWSの利用について
AmebaPico 裏側の技術やAWSの利用についてAmebaPico 裏側の技術やAWSの利用について
AmebaPico 裏側の技術やAWSの利用について
 
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 

En vedette

Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSAdrian Cockcroft
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Adrian Cockcroft
 
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesAdrian Cockcroft
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Adrian Cockcroft
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Sourceaspyker
 
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)Amazon Web Services
 
Netflix Architecture and Open Source
Netflix Architecture and Open SourceNetflix Architecture and Open Source
Netflix Architecture and Open SourceAll Things Open
 
Aw some day_essentials3.2ish_072214
Aw some day_essentials3.2ish_072214Aw some day_essentials3.2ish_072214
Aw some day_essentials3.2ish_072214Amazon Web Services
 
Operational Insight: Concepts and Examples
Operational Insight: Concepts and ExamplesOperational Insight: Concepts and Examples
Operational Insight: Concepts and Examplesroyrapoport
 
Cloud Native: A dose of reality
Cloud Native: A dose of realityCloud Native: A dose of reality
Cloud Native: A dose of realityDonnie Berkholz
 
Keeping Movies Running Amid Thunderstorms!
Keeping Movies Running Amid Thunderstorms!Keeping Movies Running Amid Thunderstorms!
Keeping Movies Running Amid Thunderstorms!Sid Anand
 

En vedette (14)

Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWS
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
 
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013
 
Global Netflix Platform
Global Netflix PlatformGlobal Netflix Platform
Global Netflix Platform
 
Netflix and Open Source
Netflix and Open SourceNetflix and Open Source
Netflix and Open Source
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
 
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
 
Culture
CultureCulture
Culture
 
Netflix Architecture and Open Source
Netflix Architecture and Open SourceNetflix Architecture and Open Source
Netflix Architecture and Open Source
 
Aw some day_essentials3.2ish_072214
Aw some day_essentials3.2ish_072214Aw some day_essentials3.2ish_072214
Aw some day_essentials3.2ish_072214
 
Operational Insight: Concepts and Examples
Operational Insight: Concepts and ExamplesOperational Insight: Concepts and Examples
Operational Insight: Concepts and Examples
 
Cloud Native: A dose of reality
Cloud Native: A dose of realityCloud Native: A dose of reality
Cloud Native: A dose of reality
 
Keeping Movies Running Amid Thunderstorms!
Keeping Movies Running Amid Thunderstorms!Keeping Movies Running Amid Thunderstorms!
Keeping Movies Running Amid Thunderstorms!
 

Similaire à SV Forum Platform Architecture SIG - Netflix Open Source Platform

Running High Availability Websites with Acquia and AWS
Running High Availability Websites with Acquia and AWSRunning High Availability Websites with Acquia and AWS
Running High Availability Websites with Acquia and AWSAcquia
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWSTom Laszewski
 
Web Scale Applications using NeflixOSS Cloud Platform
Web Scale Applications using NeflixOSS Cloud PlatformWeb Scale Applications using NeflixOSS Cloud Platform
Web Scale Applications using NeflixOSS Cloud PlatformSudhir Tonse
 
Cloud Computing & Scaling Web Apps
Cloud Computing & Scaling Web AppsCloud Computing & Scaling Web Apps
Cloud Computing & Scaling Web AppsMark Slingsby
 
AWS 101 - An Introduction to the Amazon Cloud
AWS 101  - An Introduction to the Amazon CloudAWS 101  - An Introduction to the Amazon Cloud
AWS 101 - An Introduction to the Amazon CloudCloudHesive
 
CloudStack-Developer-Day
CloudStack-Developer-DayCloudStack-Developer-Day
CloudStack-Developer-DayKimihiko Kitase
 
Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Wicked Easy Ceph Block Storage & OpenStack Deployment with CrowbarWicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Wicked Easy Ceph Block Storage & OpenStack Deployment with CrowbarKamesh Pemmaraju
 
Current State of Affairs – Cloud Computing - Indicthreads Cloud Computing Con...
Current State of Affairs – Cloud Computing - Indicthreads Cloud Computing Con...Current State of Affairs – Cloud Computing - Indicthreads Cloud Computing Con...
Current State of Affairs – Cloud Computing - Indicthreads Cloud Computing Con...IndicThreads
 
Cloud Architecture: Patterns and Best Practices
Cloud Architecture: Patterns and Best PracticesCloud Architecture: Patterns and Best Practices
Cloud Architecture: Patterns and Best PracticesSascha Möllering
 
Aws webcast - Scaling on AWS 13 08-20
Aws webcast - Scaling on AWS 13 08-20Aws webcast - Scaling on AWS 13 08-20
Aws webcast - Scaling on AWS 13 08-20Amazon Web Services
 
Directions for CloudStack Networking
Directions for CloudStack  NetworkingDirections for CloudStack  Networking
Directions for CloudStack NetworkingChiradeep Vittal
 
Ram chinta hug-20120922-v1
Ram chinta hug-20120922-v1Ram chinta hug-20120922-v1
Ram chinta hug-20120922-v1Ram Chinta
 
Private cloud cloud-phoenix-april-2014
Private cloud cloud-phoenix-april-2014Private cloud cloud-phoenix-april-2014
Private cloud cloud-phoenix-april-2014Miguel Zuniga
 
Amazon Web Services Building Blocks for Drupal Applications and Hosting
Amazon Web Services Building Blocks for Drupal Applications and HostingAmazon Web Services Building Blocks for Drupal Applications and Hosting
Amazon Web Services Building Blocks for Drupal Applications and HostingAcquia
 
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...IndicThreads
 
The Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep VittalThe Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep Vittalbuildacloud
 
What are clouds made from
What are clouds made fromWhat are clouds made from
What are clouds made fromJohn Garbutt
 

Similaire à SV Forum Platform Architecture SIG - Netflix Open Source Platform (20)

Running High Availability Websites with Acquia and AWS
Running High Availability Websites with Acquia and AWSRunning High Availability Websites with Acquia and AWS
Running High Availability Websites with Acquia and AWS
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
 
Web Scale Applications using NeflixOSS Cloud Platform
Web Scale Applications using NeflixOSS Cloud PlatformWeb Scale Applications using NeflixOSS Cloud Platform
Web Scale Applications using NeflixOSS Cloud Platform
 
AWS Distilled
AWS DistilledAWS Distilled
AWS Distilled
 
Cloud Computing & Scaling Web Apps
Cloud Computing & Scaling Web AppsCloud Computing & Scaling Web Apps
Cloud Computing & Scaling Web Apps
 
AWS 101 - An Introduction to the Amazon Cloud
AWS 101  - An Introduction to the Amazon CloudAWS 101  - An Introduction to the Amazon Cloud
AWS 101 - An Introduction to the Amazon Cloud
 
CloudStack-Developer-Day
CloudStack-Developer-DayCloudStack-Developer-Day
CloudStack-Developer-Day
 
104 meets cloud
104 meets cloud104 meets cloud
104 meets cloud
 
Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Wicked Easy Ceph Block Storage & OpenStack Deployment with CrowbarWicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
 
Current State of Affairs – Cloud Computing - Indicthreads Cloud Computing Con...
Current State of Affairs – Cloud Computing - Indicthreads Cloud Computing Con...Current State of Affairs – Cloud Computing - Indicthreads Cloud Computing Con...
Current State of Affairs – Cloud Computing - Indicthreads Cloud Computing Con...
 
Cloud Architecture: Patterns and Best Practices
Cloud Architecture: Patterns and Best PracticesCloud Architecture: Patterns and Best Practices
Cloud Architecture: Patterns and Best Practices
 
Aws webcast - Scaling on AWS 13 08-20
Aws webcast - Scaling on AWS 13 08-20Aws webcast - Scaling on AWS 13 08-20
Aws webcast - Scaling on AWS 13 08-20
 
DevOpsCon Cloud Workshop
DevOpsCon Cloud Workshop DevOpsCon Cloud Workshop
DevOpsCon Cloud Workshop
 
Directions for CloudStack Networking
Directions for CloudStack  NetworkingDirections for CloudStack  Networking
Directions for CloudStack Networking
 
Ram chinta hug-20120922-v1
Ram chinta hug-20120922-v1Ram chinta hug-20120922-v1
Ram chinta hug-20120922-v1
 
Private cloud cloud-phoenix-april-2014
Private cloud cloud-phoenix-april-2014Private cloud cloud-phoenix-april-2014
Private cloud cloud-phoenix-april-2014
 
Amazon Web Services Building Blocks for Drupal Applications and Hosting
Amazon Web Services Building Blocks for Drupal Applications and HostingAmazon Web Services Building Blocks for Drupal Applications and Hosting
Amazon Web Services Building Blocks for Drupal Applications and Hosting
 
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
 
The Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep VittalThe Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep Vittal
 
What are clouds made from
What are clouds made fromWhat are clouds made from
What are clouds made from
 

Dernier

COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 

Dernier (20)

COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 

SV Forum Platform Architecture SIG - Netflix Open Source Platform

  • 1. The  Ne&lix  Open  Source   Pla&orm   September  26th,  2012   Adrian  Cockcro8,  Ruslan  Meshenberg     @adrianco  @rusmeshenberg  #neAlixcloud   hCp://www.linkedin.com/in/adriancockcro8   hCp://www.linkedin.com/in/ruslanmeshenberg    
  • 2. What  NeAlix  Did   •  Moved  to  SaaS   –  Corporate  IT  –  OneLogin,  Workday,  Box,  Evernote…   –  Tools  –  Pagerduty,  AppDynamics,  ElasVc  MapReduce   •  Built  our  own  PaaS   –  Customized  to  make  our  developers  producVve   –  When  we  started,  we  had  liCle  choice   •  Moved  incremental  capacity  to  IaaS   –  No  new  datacenter  space  since  2008  as  we  grew   –  Moved  our  streaming  apps  to  the  cloud  
  • 5. NeAlix  Choice  was  AWS  with  our   own  plaAorm  and  tools   Unique  plaAorm  requirements  and   extreme  scale,  agility  and  flexibility  
  • 6. Leverage  AWS  Scale   “the  biggest  public  cloud”   AWS  investment  in  features  and  automaVon   Use  AWS  zones  and  regions  for  high  availability,   scalability  and  global  deployment  
  • 7. What  about  other  PaaS?   •  CloudFoundry  –  Open  Source  by  VMWare   –  Developer-­‐friendly,  easy  to  get  started   –  Missing  scale  and  some  enterprise  features   •  Rightscale   –  Widely  used  to  abstract  away  from  AWS   –  Creates  it’s  own  lock-­‐in  problem…   •  AWS  is  growing  into  this  space   –  We  didn’t  want  a  vendor  between  us  and  AWS   –  We  wanted  to  build  a  thin  PaaS,  that  gets  thinner  
  • 8. What  do  developers  care  about?  
  • 9. Keeping  up  with  Developer  Trends   In  producVon   at  NeAlix   •  Big  Data/Hadoop   2009   •  AWS  Cloud   2009   •  ApplicaVon  Performance  Management   2010   •  Integrated  DevOps  PracVces   2010   •  ConVnuous  IntegraVon/Delivery   2010   •  NoSQL   2010   •  PlaAorm  as  a  Service;  Fine  grain  SOA   2010   •  Social  coding,  open  development/github   2011  
  • 10. AWS  specific  feature  dependence….      
  • 11. Portability  vs.  FuncVonality   •  Portability  –  the  OperaVons  focus   –  Avoid  vendor  lock-­‐in   –  Support  datacenter  based  use  cases   –  Possible  operaVons  cost  savings   •  FuncVonality  –  the  Developer  focus   –  Less  complex  test  and  debug,  one  mature  supplier   –  Faster  Vme  to  market  for  your  products   –  Possible  developer  cost  savings  
  • 12. Portable  PaaS   •  Portable  IaaS  Base  -­‐  some  AWS  compaVbility   –  Eucalyptus  –  AWS  licensed  compaVble  subset   –  CloudStack  –  Citrix  Apache  project   –  OpenStack  –  Rackspace,  Cloudscaling,  HP  etc.   •  Portable  PaaS   –  VMWare  Cloud  Foundry  -­‐  run  it  yourself  in  your  DC   –  AppFog  and  Stackato  –  Cloud  Foundry/Openstack   –  Vendor  opVons:  Rightscale,  Enstratus,  Smartscale  
  • 13. FuncVonal  PaaS   •  IaaS  base  -­‐  all  the  features  of  AWS   –  Very  large  scale,  mature,  global,  evolving  rapidly   –  ELB,  Autoscale,  VPC,  SQS,  EIP,  EMR,  DynamoDB  etc.   –  Large  files  (TB)  and  mulVpart  writes  in  S3   •  FuncVonal  PaaS  –  NeAlix  added  features   –  Very  large  scale,  mature,  flexible,  customizable   –  Asgard  console,  Monkeys,  Big  data  tools   –  Cassandra/Zookeeper  data  store  automaVon  
  • 14. Developers  choose  FuncVonal     Don’t  let  the  roadie  write  the  set  list!   (yes  you  do  need  all  those  guitars  on  tour…)  
  • 15. Freedom  and  Responsibility   •  Developers  leverage  cloud  to  get  freedom   –  Agility  of  a  single  organizaVon,  no  silos   •  But  now  developers  are  responsible   –  For  compliance,  performance,  availability  etc.   “As  far  as  my  rehab  is  concerned,  it  is  within  my   ability  to  change  and  change  for  the  be>er  -­‐  Eddie   Van  Halen”    
  • 16. Amazon Cloud Terminology Reference See http://aws.amazon.com/ This is not a full list of Amazon Web Service features •  AWS  –  Amazon  Web  Services  (common  name  for  Amazon  cloud)   •  AMI  –  Amazon  Machine  Image  (archived  boot  disk,  Linux,  Windows  etc.  plus  applicaVon  code)   •  EC2  –  ElasVc  Compute  Cloud   –  Range  of  virtual  machine  types  m1,  m2,  c1,  cc,  cg.  Varying  memory,  CPU  and  disk  configuraVons.   –  Instance  –  a  running  computer  system.  Ephemeral,  when  it  is  de-­‐allocated  nothing  is  kept.   –  Reserved  Instances  –  pre-­‐paid  to  reduce  cost  for  long  term  usage   –  Availability  Zone  –  datacenter  with  own  power  and  cooling  hosVng  cloud  instances   –  Region  –  group  of  Avail  Zones  –  US-­‐East,  US-­‐West,  EU-­‐Eire,  Asia-­‐Singapore,  Asia-­‐Japan,  SA-­‐Brazil,  US-­‐Gov   •  ASG  –  Auto  Scaling  Group  (instances  booVng  from  the  same  AMI)   •  S3  –  Simple  Storage  Service  (hCp  access)   •  EBS  –  ElasVc  Block  Storage  (network  disk  filesystem  can  be  mounted  on  an  instance)   •  RDS  –  RelaVonal  Database  Service  (managed  MySQL  master  and  slaves)   •  DynamoDB/SDB  –  Simple  Data  Base  (hosted  hCp  based  NoSQL  datastore,  DynamoDB  replaces  SDB)   •  SQS  –  Simple  Queue  Service  (hCp  based  message  queue)   •  SNS  –  Simple  NoVficaVon  Service  (hCp  and  email  based  topics  and  messages)   •  EMR  –  ElasVc  Map  Reduce  (automaVcally  managed  Hadoop  cluster)   •  ELB  –  ElasVc  Load  Balancer   •  EIP  –  ElasVc  IP  (stable  IP  address  mapping  assigned  to  instance  or  ELB)   •  VPC  –  Virtual  Private  Cloud  (single  tenant,  more  flexible  network  and  security  constructs)   •  DirectConnect  –  secure  pipe  from  AWS  VPC  to  external  datacenter   •  IAM  –  IdenVty  and  Access  Management  (fine  grain  role  based  security  keys)  
  • 17. What  Runs  in  the  Cloud?   Step  by  Step  NeAlix  Product   TransiVon  
  • 22. Streaming  Device  API   Netflix Ready Devices From: May 2008 To: May 2010
  • 23. Current  Architectural  PaCerns  for  Availability   •  Isolated  Services   –  Resilient  Business  logic   •  Three  Balanced  Availability  Zones   –  Resilient  to  Infrastructure  outage   •  Triple  Replicated  Persistence   –  Durable  distributed  Storage   •  Isolated  Regions   –  US  and  EU  don’t  take  each  other  down  
  • 24. Isolated  Services     Test  With  Chaos  Monkey,  Latency  Monkey
  • 25. Three  Balanced  Availability  Zones   Test  with  Chaos  Gorilla   Load  Balancers   Zone  A   Zone  B   Zone  C   Cassandra  and  Evcache   Cassandra  and  Evcache   Cassandra  and  Evcache   Replicas   Replicas   Replicas  
  • 26. Triple  Replicated  Persistence   Cassandra  maintenance  drops  individual  replicas     Load  Balancers   Zone  A   Zone  B   Zone  C   Cassandra  and  Evcache   Cassandra  and  Evcache   Cassandra  and  Evcache   Replicas   Replicas   Replicas  
  • 27. Isolated  Regions   US-­‐East  Load  Balancers   EU-­‐West  Load  Balancers   Zone  A   Zone  B   Zone  C   Zone  A   Zone  B   Zone  C   Cassandra  Replicas   Cassandra  Replicas   Cassandra  Replicas   Cassandra  Replicas   Cassandra  Replicas   Cassandra  Replicas  
  • 28. Failure  Modes  and  Effects   Failure  Mode   Probability   Mi;ga;on  Plan   ApplicaVon  Failure   High   AutomaVc  degraded  response   AWS  Region  Failure   Low   Wait  for  region  to  recover   AWS  Zone  Failure   Medium   ConVnue  to  run  on  2  out  of  3  zones   Datacenter  Failure   Medium   Migrate  more  funcVons  to  cloud   Data  store  failure   Low   Restore  from  S3  backups   S3  failure   Low   Restore  from  remote  archive  
  • 29. Observed  Regional  Failures   •  Power  Outages   –  PlaAorm  survives  any  one  zone  outage   –  Two  recent  zone  outages,  one  OK,  one  triggered  a  bug   •  Router  Bug  Takes  Region  Offline   –  A  few  minutes  of  no  network  traffic,  then  recovered   –  AWS  has  redesigned  routes  to  be  per  zone   •  Control  Plane  Overload  Affects  EnVre  Region   –  Consequence  of  other  outages   –  We  lose  control  of  our  infrastructure  
  • 30. NeAlix  Deployed  on  AWS   2009   2009   2010   2010   2010   2011   Content   Logs   Play   WWW   API   CS   Content   S3   InternaVonal   Management   DRM   Sign-­‐Up   Metadata   CS  lookup   Terabytes   EC2   Device   DiagnosVcs   EMR   CDN  rouVng   Search   Config   &  AcVons   Encoding   S3   Movie   TV  Movie   Customer   Hive  &  Pig   Bookmarks   Choosing   Choosing   Call  Log   Petabytes   Business   Social   Logging   RaVngs   Facebook   CS  AnalyVcs   Intelligence   CDNs   ISPs   Terabits   Customers  
  • 31. Cloud  Architecture  PaCerns   Where  do  we  start?  
  • 32. Datacenter  to  Cloud  TransiVon  Goals   •  Faster   –  Lower  latency  than  the  equivalent  datacenter  web  pages  and  API  calls   –  Measured  as  mean  and  99th  percenVle   –  For  both  first  hit  (e.g.  home  page)  and  in-­‐session  hits  for  the  same  user   •  Scalable   –  Avoid  needing  any  more  datacenter  capacity  as  subscriber  count  increases   –  No  central  verVcally  scaled  databases   –  Leverage  AWS  elasVc  capacity  effecVvely   •  Available   –  SubstanVally  higher  robustness  and  availability  than  datacenter  services   –  Leverage  mulVple  AWS  availability  zones   –  No  scheduled  down  Vme,  no  central  database  schema  to  change   •  ProducVve   –  OpVmize  agility  of  a  large  development  team  with  automaVon  and  tools   –  Leave  behind  complex  tangled  datacenter  code  base  (~8  year  old  architecture)   –  Enforce  clean  layered  interfaces  and  re-­‐usable  components  
  • 33. NeAlix  Datacenter  vs.  Cloud  Arch   Central  SQL  Database   Distributed  Key/Value  NoSQL   SVcky  In-­‐Memory  Session   Shared  Memcached  Session   ChaCy  Protocols   Latency  Tolerant  Protocols   Tangled  Service  Interfaces   Layered  Service  Interfaces   Instrumented  Code   Instrumented  Service  PaCerns   Fat  Complex  Objects   Lightweight  Serializable  Objects   Components  as  Jar  Files   Components  as  Services  
  • 35. Chaos  Monkey   •  Computers  (Datacenter  or  AWS)  randomly  die   –  Fact  of  life,  but  too  infrequent  to  test  resiliency   •  Test  to  make  sure  systems  are  resilient   –  Allow  any  instance  to  fail  without  customer  impact   •  Chaos  Monkey  hours   –  Monday-­‐Friday  9am-­‐3pm  random  instance  kill   •  ApplicaVon  configuraVon  opVon   –  Apps  now  have  to  opt-­‐out  from  Chaos  Monkey  
  • 36. Responsibility  and  Experience   •  Make  developers  responsible  for  failures   –  Then  they  learn  and  write  code  that  doesn’t  fail   •  Use  Incident  Reviews  to  find  gaps  to  fix   –  Make  sure  its  not  about  finding  “who  to  blame”   •  Keep  Vmeouts  short,  fail  fast   –  Don’t  let  cascading  Vmeouts  stack  up   •  Make  configuraVon  opVons  dynamic   –  You  don’t  want  to  push  code  to  tweak  an  opVon  
  • 37. Resilient  Design  –  Circuit  Breakers   hCp://techblog.neAlix.com/2012/02/fault-­‐tolerance-­‐in-­‐high-­‐volume.html  
  • 38. Distributed  OperaVonal  Model   •  Developers   –  Provision  and  run  their  own  code  in  producVon   –  Take  turns  to  be  on  call  if  it  breaks  (pagerduty)   –  Configure  autoscalers  to  handle  capacity  needs   •  DevOps  and  PaaS  (aka  NoOps)   –  DevOps  is  used  to  build  and  run  the  PaaS   –  PaaS  constrains  Dev  to  use  automaVon  instead   –  PaaS  puts  more  responsibility  on  Dev,  with  tools  
  • 39. What’s  Le8  for  Corp  IT?   •  Corporate  Security  and  Network  Management   –  Billing  and  remnants  of  streaming  service  back-­‐ends  in  DC   •  Running  NeAlix’  DVD  Business   –  Tens  of  Oracle  instances   Corp  WiFi  Performance   –  Hundreds  of  MySQL  instances   –  Thousands  of  VMWare  VMs   –  Zabbix,  CacV,  Sumologic,  Puppet,  Chef   •  Employee  ProducVvity   –  Building  networks  and  WiFi   –  SaaS  OneLogin  SSO  Portal   –  Evernote  Premium,  Safari  Online  Bookshelf,  Dropbox  for  Teams   –  Google  Enterprise  Apps,  Workday  HCM/Expense,  Box.com   –  Many  more  SaaS  migraVons  coming…  
  • 40. NeAlix  OrganizaVon   DevOps  Org  ReporVng  into  Product  Group,  not  ITops   NeAlix  Cloud  PlaAorm  Team   Cloud  Ops   Build  Tools   PlaAorm  and   Cloud   Cloud   Reliability   Architecture   and   Persistence   Performance   SoluVons   Engineering   AutomaVon   Engineering   Perforce  Jenkins   PlaAorm  jars   Cassandra   Future  planning   ArVfactory  JIRA   Benchmarking   Monitoring   Alert  RouVng   Key  store   Security  Arch   Monkeys   Incident  Lifecycle   Base  AMI,  Bakery   Zookeeper   JVM  GC  Tuning   Efficiency   NeAlix  App  Console   Wiresharking   Entrypoints   Cassandra   AWS  VPC   PagerDuty   Hyperguard   AWS  API   AWS  Instances   AWS  Instances   AWS  Instances   Powerpoint  J  
  • 41. NeAlix  Open  Source  Strategy   •  Steadily  release  PaaS  Components  git-­‐by-­‐git     •  Source  at  github.com/neAlix  –  we  build  from  it…     •  Intros  and  techniques  at  techblog.neAlix.com  
  • 42. Give  back  to  Apache  licensed  OSS   community    
  • 43. Lead  the  Best  PracVces  
  • 44. MoVvate,  regain,  hire  top  engineers  
  • 47. Clean  Code  is  Re-­‐usable   •  Use  by  other  teams  and  projects  inside  NeAlix  
  • 50. Simian  Army  (Chaos  Monkey)   hCp://techblog.neAlix.com/2012/07/chaos-­‐monkey-­‐released-­‐into-­‐wild.html      
  • 52. Astyanax,  Priam,  Curator,  Exhibitor      
  • 54. Instance  creaVon   Bakery  &   Build  tools   Asgard   Base  AMI   Instance   Autoscaling   ApplicaVon   Odin   scripts   Code   Image  baked   ASG  /  Instance  started   Instance  Running  
  • 55. RunVme   Governator   Eureka   Async   logging   Archaius   Entrypoints   Servo   Registering,   ApplicaVon  iniValizing   configuraVon  
  • 56. RunVme,  Cont’d   Astyanax   Priam   Curator   Chaos  Monkey   Latency  Monkey   NIWS  LB   Exhibitor   Janitor  Monkey   Cass  JMeter   Dependency   REST  client   Command   Explorers   Calling  other  services   Managing  service   Resiliency  aids  
  • 57. Open  Source  Projects   Legend   Github  /  Techblog   Priam   Exhibitor   Servo  and  Autoscaling  Scripts   Apache  ContribuVons   Cassandra  as  a  Service   Zookeeper  as  a  Service   Astyanax   Curator   Honu   Techblog  Post   Cassandra  client  for  Java   Zookeeper  PaCerns   Log4j  streaming  to  Hadoop   Coming  Soon   CassJMeter   EVCache   Circuit  Breaker   Cassandra  test  suite   Memcached  as  a  Service   Robust  service  paCern   Cassandra  MulV-­‐region  EC2   Eureka  /  Discovery   Asgard  AutoScaleGroup  based   datastore  support   Service  Directory   AWS  console   Aegisthus   Archaius   Chaos  Monkey   Hadoop  ETL  for  Cassandra   Dynamics  ProperVes  Service   Robustness  verificaVon   Explorers   EntryPoints   Latency  Monkey   Governator  Library  lifecycle   Server-­‐side  latency/error   and  dependency  injecVon   injecVon   Janitor  Monkey   Odin   REST  Client  +  mid-­‐Ver  LB   Bakeries  and  AMI   Workflow  orchestraVon   Async  logging   ConfiguraVon  REST  endpoints   Build  dynaslaves  
  • 59. Roadmap  for  2012   •  More  resiliency  and  improved  availability   •  More  automaVon,  orchestraVon   •  “Hardening”  the  plaAorm,  code  clean-­‐up   •  Lower  latency  for  web  services  and  devices   •  IPv6  –  now  running  in  prod,  rollout  in  process   •  More  open  sourced  components   •  See  you  at  AWS  Re:Invent  in  November…  
  • 60. Takeaway     NeElix  has  built  and  deployed  a  scalable  global  PlaEorm  as  a  Service.     Key  components  of  the  NeElix  PaaS  are  being  released  as  Open  Source   projects  so  you  can  build  your  own  custom  PaaS.     hCp://github.com/NeAlix   hCp://techblog.neAlix.com   hCp://slideshare.net/NeAlix     hCp://www.linkedin.com/in/adriancockcro8   hCp://www.linkedin.com/in/ruslanmeshenberg     @adrianco  @rusmeshenberg  #neAlixcloud