SlideShare une entreprise Scribd logo
1  sur  66
Lorenzo Alberton
                      @lorenzoalberton


Scaling Teams,
Processes and
Architectures
               Managing growth




            London Scalability Group,
Innovation Warehouse, 16th April 2012
                                         1
Scalability Is About...

                  People




     Processes             Technology

                                        2
People
Staffing, Roles, Management, Teams




                                    3
Staffing




                                                         Never compromise.

                                            Only hire people smarter than you.




          http://www.earthrangers.com/content/wildwire/toxic_spill.jpg           4
Staffing



                                                      Hire people who can fit
                                                       the company culture.

                                                       Promote fun in your
                                                       working environment.




          http://www.earthrangers.com/content/wildwire/toxic_spill.jpg         4
Staffing



                                                                Beware of
                                                               toxic people




          http://www.earthrangers.com/content/wildwire/toxic_spill.jpg        4
Team Size and Structure
            Micromanaging managers    Poor communication
too small   Overworked team members   Low morale           too big
            Can’t accomplish much     Low productivity




                                                                     5
Team Size and Structure
            Micromanaging managers    Poor communication
too small   Overworked team members   Low morale           too big
            Can’t accomplish much     Low productivity


                                          CTO
  functional
                         PM                PM                PM

                      Designer         Developer           Tester

                      Designer         Developer           Tester

                      Designer         Developer           Tester

                      Designer         Developer           Tester
                      Designers        Developers          Testers
                                                                     5
Team Size and Structure
            Micromanaging managers    Poor communication
too small   Overworked team members   Low morale           too big
            Can’t accomplish much     Low productivity


                                          CTO
  functional
    matrix
                         PM                PM                PM

 Proj 1     PM        Designer         Developer           Tester

 Proj 2     PM        Designer         Developer           Tester

 Proj 3     PM        Designer         Developer           Tester

 Proj 4     PM        Designer         Developer           Tester
                      Designers        Developers          Testers
                                                                     5
Processes




            6
Why are processes critical?
 Improve management of teams and employees
 Standardise actions in repetitive tasks
 Reduce mundane decisions to focus on grander ideas
 Allow the team to react quickly to crisis
 Determine system capacity and scalability needs




                                                      7
Why are processes critical?
 Improve management of teams and employees
 Standardise actions in repetitive tasks
 Reduce mundane decisions to focus on grander ideas
 Allow the team to react quickly to crisis
 Determine system capacity and scalability needs

                           Challenge




                                                      7
Why are processes critical?
 Improve management of teams and employees
 Standardise actions in repetitive tasks
 Reduce mundane decisions to focus on grander ideas
 Allow the team to react quickly to crisis
 Determine system capacity and scalability needs

                           Challenge




    right amount
                                                      7
Why are processes critical?
 Improve management of teams and employees
 Standardise actions in repetitive tasks
 Reduce mundane decisions to focus on grander ideas
 Allow the team to react quickly to crisis
 Determine system capacity and scalability needs

                           Challenge




    right amount           right process
                                                      7
Why are processes critical?
 Improve management of teams and employees
 Standardise actions in repetitive tasks
 Reduce mundane decisions to focus on grander ideas
 Allow the team to react quickly to crisis
 Determine system capacity and scalability needs

                           Challenge




    right amount           right process           right time
                                                                7
Determining Headroom

             Capacity




           Current Load



                          8
Determining Headroom
                           Why?
             Capacity

                            Planning
                             annual
                             budget



                           Hiring plan
           Current Load


                          Prioritisation
                                           8
Controlling Change: Determine Risk




          http://dilbert.com/strips/comic/2008-05-08/   9
Controlling Change: Determine Risk




          http://dilbert.com/strips/comic/2008-05-08/   9
Risk Management
            Risk is cumulative




      Determine limits and tolerance
                                       10
Load / Stress Testing
                 Load testing
                 - identify, document and eliminate
                   bottlenecks through a strict controlled
                   process of measurement and analysis
                 - measure system’s response and stability
                 - verify the app can meet the desired
                   performance objectives (SLA)

                 Stress testing
                 - determine the app’s stability when
                   subjected to above-normal loads
                 - verify the app’s behaviour when close
                   to the breaking point
                 - test the application recoverability
                   (negative testing)

                                                             11
Barrier Conditions


              Code reviews
              Manual and automated QA processes
              Performance and stress testing
              Release documentation checks (runbook)
              Dev, Test, Stage and Live environments
              Instrumentation checks



      Protection from significant failures
                                                       12
Technology
Architecting Scalable Solutions




                                  13
Architectural Principles




                           14
Architectural Principles

    +1
 N + 1 design




                           14
Architectural Principles

    +1
 N + 1 design   for rollback




                               14
Architectural Principles

    +1
 N + 1 design   for rollback   to be disabled




                                                14
Architectural Principles

    +1
 N + 1 design   for rollback   to be disabled




    to be
  monitored




                                                14
Architectural Principles

    +1
 N + 1 design   for rollback   to be disabled




    to be       for multiple
  monitored       live sites




                                                14
Architectural Principles

    +1
 N + 1 design   for rollback   to be disabled




    to be       for multiple    use mature
  monitored       live sites    technology




                                                14
Architectural Principles

    +1
 N + 1 design   for rollback   to be disabled




    to be       for multiple    use mature
  monitored       live sites    technology




 asynchronous
     design
                                                14
Architectural Principles

    +1
 N + 1 design   for rollback   to be disabled




    to be       for multiple    use mature
  monitored       live sites    technology




 asynchronous    stateless
     design      systems
                                                14
Architectural Principles

    +1
 N + 1 design   for rollback   to be disabled




    to be       for multiple    use mature
  monitored       live sites    technology




 asynchronous    stateless       buy when
     design      systems         non core
                                                14
Stateless, Asynchronous Systems




    http://upload.wikimedia.org/wikipedia/commons/4/46/Synchronized_swimming_-_Russian_team.jpg   15
Fault Isolative Structures




                             16
Fault Isolative Structures
           Increase availability
              Limit impact of
                  failures
             Easier debugging




                                   16
Fault Isolative Structures
           Increase availability
              Limit impact of
                  failures
             Easier debugging




                  First

                                   16
Fault Isolative Structures
            Increase availability
               Limit impact of
                   failures
              Easier debugging
  Functions
    causing
     repetitive
       problems
                   First

                                    16
Fault Isolative Structures
            Increase availability
               Limit impact of
                   failures
              Easier debugging
  Functions                          Natural layout
    causing                          or topology
     repetitive                     of the site
       problems
                   First

                                                      16
Caching for Performance and Scale




                                    17
Caching for Performance and Scale
 Object Caches


 Usually serialized
 (marshalling /
 unmarshalling)



 get() / set() /
 replace()


APC, Memcached



                                    17
Caching for Performance and Scale
 Object Caches        Application Caches


 Usually serialized    Proxy caches
 (marshalling /
                       Reverse proxy
 unmarshalling)
                       caches


 get() / set() /       HTTP headers
 replace()

                        ISP/Uni proxies
APC, Memcached          Squid, Varnish,
                         mod_cache

                                           17
Caching for Performance and Scale
 Object Caches        Application Caches        CDNs


 Usually serialized    Proxy caches        Multiple locations
 (marshalling /                            / backbones
                       Reverse proxy
 unmarshalling)
                       caches


 get() / set() /       HTTP headers        CNAME entries
 replace()

                        ISP/Uni proxies     Akamai, Coral,
APC, Memcached          Squid, Varnish,
                                             Limelight...

                         mod_cache

                                                                17
Managing “Big Data”



                               storage costs
                        people and software
                           power and space
                          processing power
                      backup time and costs




                                          18
Managing “Big Data”
                      The more storage


                           ...the more
                      storage management
                                   storage costs
                            people and software
                               power and space
                              processing power
                          backup time and costs




                                              18
Managing “Big Data”
                        The more storage


                            ...the more
                       storage management
                                     storage costs
                              people and software
                                 power and space
                                processing power
                            backup time and costs
                  Evaluate data retention policy
                  Consider multi-tiered storage
               Distribute data/ work (Hadoop, M/R)
                                                18
Monitoring: Measure Everything




                                 19
Monitoring: Measure Everything




 1. Is there a problem?    User experience / Business metrics monitors

 2. Where is the problem? System monitors (threshold - variance)

 3. What is the problem?   Application monitors




                                                                         19
Monitoring: Measure Everything




 1. Is there a problem?    User experience / Business metrics monitors

 2. Where is the problem? System monitors (threshold - variance)

 3. What is the problem?   Application monitors

                Keep Signal vs. Noise ratio high
                                                                         19
Monitoring: Measure Everything


                                                      StatsD



 1. Is there a problem?    User experience / Business metrics monitors

 2. Where is the problem? System monitors (threshold - variance)

 3. What is the problem?   Application monitors

                Keep Signal vs. Noise ratio high
                                                                         19
DataSift Architecture
         Some Architecture Pr0n




                                  20
DataSift Architecture




   http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html   21
DataSift Architecture


                               SOA - loosely coupled,
                               independently scalable
                                services. Simple APIs




   http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html   21
DataSift Architecture


                               SOA - loosely coupled,
                               independently scalable
                                services. Simple APIs




                                                      example


   http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html   21
SOA - Scale Each Component




                             22
Our Stack
Languages: C++, PHP, Java, Scala, Ruby, Node.JS
Storage: MySQL, HBase
Cache: Memcached, APC, Redis
Queues: ZeroMQ, Kafka, Redis
Development/Deployment: GIT, Jenkins CI, RPM, Chef
Monitoring: StatsD + Graphite, Zenoss




                                                     23
Our Stack
Languages: C++, PHP, Java, Scala, Ruby, Node.JS
Storage: MySQL, HBase
Cache: Memcached, APC, Redis
Queues: ZeroMQ, Kafka, Redis
Development/Deployment: GIT, Jenkins CI, RPM, Chef
Monitoring: StatsD + Graphite, Zenoss



Secret recipe: amazing people and working environment


                                                        23
Messaging
ZeroMQ: PUSH-PULL, REQ-REP, PUB-SUB (multicast, broadcast)

       Internal communication: pass messages to the next processing
       stage, control events, monitoring


Kafka/Redis: PUSH-PULL with persistence

       Internal message / workload buffering and distribution

Node.js: WebSockets / HTTP Streaming

       Message delivery (output)

                                                                      24
0mq PUSH-PULL (workload distribution)

                                  Consumer 1




                                  Consumer 2




                                  Consumer 3


              [Round-Robin-ish]

                                           25
0mq PUB-SUB (High Availability)

                                      Listener 1


Publisher 1

                                     Listener 2


Publisher 2
                                      Listener 3



              [Broadcast]   [Dynamic Subscriptions]

                                                   26
0mq PUB-SUB (High Availability)


                              DC 1
Publisher 1




Publisher 2

                              DC 2




                                     27
Internal “Firehose”

  Publishers                       Subscribers

                             Alice’s        John’s
       Y Z                  timeline        Inbox
   X
                       subscribe
                      to topic X

                      Data Bus
               subscribe
               to topic Y

                     System            Fred’s      Tech
                     Monitor         Followers   Blog Feed



                                                             28
Instrumentation




     https://play.google.com/store/apps/details?id=net.networksaremadeofstring.rhybudd   29
We’re Hiring!




http://datasift.com/whoweare/jobs
                                30
References
                       M. L. Abbot, M. T. Fisher,
                       “The Art Of Scalability”,
                       Addison Wesley
                       http://theartofscalability.com/




http://www.slideshare.net/quipo/the-art-of-scalability-managing-
growth
http://www.slideshare.net/postwait/scalable-internet-architecture
http://bit.ly/IJKwuc
http://agile.dzone.com/news/approaches-organizational
https://bitly.com/vCSd49


                                                               31
Lorenzo Alberton
                  @lorenzoalberton




   Thank you!
       lorenzo@alberton.info
http://www.alberton.info/talks


                   Questions?


                                     32

Contenu connexe

Tendances

Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeDatabricks
 
Convergence Of Technology And Core Business Strategy
Convergence Of Technology And Core Business StrategyConvergence Of Technology And Core Business Strategy
Convergence Of Technology And Core Business StrategyLee Stott
 
Agile Planning Powerpoint Presentation Slides
Agile Planning Powerpoint Presentation SlidesAgile Planning Powerpoint Presentation Slides
Agile Planning Powerpoint Presentation SlidesSlideTeam
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...Databricks
 
Application Portfolio Rationalization
Application Portfolio RationalizationApplication Portfolio Rationalization
Application Portfolio RationalizationBob Rhubart
 
Strategic Planning With A Business Capability Map
Strategic Planning With A Business Capability MapStrategic Planning With A Business Capability Map
Strategic Planning With A Business Capability MapAcorn
 
Containerised Bioinformatics Pipeline on AWS
Containerised Bioinformatics Pipeline on AWSContainerised Bioinformatics Pipeline on AWS
Containerised Bioinformatics Pipeline on AWSAmazon Web Services
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?confluent
 
Software architecture quality attributes & Trade-offs
Software architecture quality attributes & Trade-offs Software architecture quality attributes & Trade-offs
Software architecture quality attributes & Trade-offs Asanka Dilruk
 
Agile Anti-patterns
Agile Anti-patternsAgile Anti-patterns
Agile Anti-patternsAndrew Cox
 
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...DataWorks Summit
 
Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...
Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...
Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...Chandrashekhar More
 
Solution architecture
Solution architectureSolution architecture
Solution architectureiasaglobal
 
Data Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data ArchitectureData Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data ArchitectureZaloni
 
Introducing SAFe 5.0 the operating system for Business Agility
Introducing SAFe 5.0 the operating system for Business AgilityIntroducing SAFe 5.0 the operating system for Business Agility
Introducing SAFe 5.0 the operating system for Business AgilityLeanwisdom
 
Introduction to Adaptive and 3DEXPERIENCE Cloud
Introduction to Adaptive and 3DEXPERIENCE CloudIntroduction to Adaptive and 3DEXPERIENCE Cloud
Introduction to Adaptive and 3DEXPERIENCE CloudAdaptive Corporation
 
How to Build Data Science Teams
How to Build Data Science TeamsHow to Build Data Science Teams
How to Build Data Science TeamsGanes Kesari
 

Tendances (20)

Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta Lake
 
Convergence Of Technology And Core Business Strategy
Convergence Of Technology And Core Business StrategyConvergence Of Technology And Core Business Strategy
Convergence Of Technology And Core Business Strategy
 
Agile Planning Powerpoint Presentation Slides
Agile Planning Powerpoint Presentation SlidesAgile Planning Powerpoint Presentation Slides
Agile Planning Powerpoint Presentation Slides
 
Define an EA Operating Model
Define an EA Operating ModelDefine an EA Operating Model
Define an EA Operating Model
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
 
MAPPING TOGAF® ADM AND AGILE APPROACH
MAPPING TOGAF® ADM AND AGILE APPROACHMAPPING TOGAF® ADM AND AGILE APPROACH
MAPPING TOGAF® ADM AND AGILE APPROACH
 
Application Portfolio Rationalization
Application Portfolio RationalizationApplication Portfolio Rationalization
Application Portfolio Rationalization
 
Strategic Planning With A Business Capability Map
Strategic Planning With A Business Capability MapStrategic Planning With A Business Capability Map
Strategic Planning With A Business Capability Map
 
Containerised Bioinformatics Pipeline on AWS
Containerised Bioinformatics Pipeline on AWSContainerised Bioinformatics Pipeline on AWS
Containerised Bioinformatics Pipeline on AWS
 
Introduction to Kanban
Introduction to KanbanIntroduction to Kanban
Introduction to Kanban
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
 
Software architecture quality attributes & Trade-offs
Software architecture quality attributes & Trade-offs Software architecture quality attributes & Trade-offs
Software architecture quality attributes & Trade-offs
 
Agile Anti-patterns
Agile Anti-patternsAgile Anti-patterns
Agile Anti-patterns
 
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
 
Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...
Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...
Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...
 
Solution architecture
Solution architectureSolution architecture
Solution architecture
 
Data Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data ArchitectureData Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data Architecture
 
Introducing SAFe 5.0 the operating system for Business Agility
Introducing SAFe 5.0 the operating system for Business AgilityIntroducing SAFe 5.0 the operating system for Business Agility
Introducing SAFe 5.0 the operating system for Business Agility
 
Introduction to Adaptive and 3DEXPERIENCE Cloud
Introduction to Adaptive and 3DEXPERIENCE CloudIntroduction to Adaptive and 3DEXPERIENCE Cloud
Introduction to Adaptive and 3DEXPERIENCE Cloud
 
How to Build Data Science Teams
How to Build Data Science TeamsHow to Build Data Science Teams
How to Build Data Science Teams
 

En vedette

Scalable Architectures - Taming the Twitter Firehose
Scalable Architectures - Taming the Twitter FirehoseScalable Architectures - Taming the Twitter Firehose
Scalable Architectures - Taming the Twitter FirehoseLorenzo Alberton
 
Monitoring at scale - Intuitive dashboard design
Monitoring at scale - Intuitive dashboard designMonitoring at scale - Intuitive dashboard design
Monitoring at scale - Intuitive dashboard designLorenzo Alberton
 
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesModern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesLorenzo Alberton
 
Graphs in the Database: Rdbms In The Social Networks Age
Graphs in the Database: Rdbms In The Social Networks AgeGraphs in the Database: Rdbms In The Social Networks Age
Graphs in the Database: Rdbms In The Social Networks AgeLorenzo Alberton
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenLorenzo Alberton
 
Trees In The Database - Advanced data structures
Trees In The Database - Advanced data structuresTrees In The Database - Advanced data structures
Trees In The Database - Advanced data structuresLorenzo Alberton
 

En vedette (6)

Scalable Architectures - Taming the Twitter Firehose
Scalable Architectures - Taming the Twitter FirehoseScalable Architectures - Taming the Twitter Firehose
Scalable Architectures - Taming the Twitter Firehose
 
Monitoring at scale - Intuitive dashboard design
Monitoring at scale - Intuitive dashboard designMonitoring at scale - Intuitive dashboard design
Monitoring at scale - Intuitive dashboard design
 
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesModern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
 
Graphs in the Database: Rdbms In The Social Networks Age
Graphs in the Database: Rdbms In The Social Networks AgeGraphs in the Database: Rdbms In The Social Networks Age
Graphs in the Database: Rdbms In The Social Networks Age
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and when
 
Trees In The Database - Advanced data structures
Trees In The Database - Advanced data structuresTrees In The Database - Advanced data structures
Trees In The Database - Advanced data structures
 

Similaire à Scaling Teams, Processes and Architectures

Value driven continuous delivery
Value driven continuous deliveryValue driven continuous delivery
Value driven continuous deliveryGabriel Prat
 
OSSCube - Zend Webinar
OSSCube - Zend WebinarOSSCube - Zend Webinar
OSSCube - Zend WebinarOSSCube
 
Jax Sql Saturday Scrum presentation #130
Jax Sql Saturday Scrum presentation #130Jax Sql Saturday Scrum presentation #130
Jax Sql Saturday Scrum presentation #130Christopher Daily
 
Amy.stapleton
Amy.stapletonAmy.stapleton
Amy.stapletonNASAPMC
 
Agile developers create their own identity by Ajay Danait
Agile developers create their own identity by Ajay DanaitAgile developers create their own identity by Ajay Danait
Agile developers create their own identity by Ajay DanaitXebia IT Architects
 
Introduction To Scrum
Introduction To ScrumIntroduction To Scrum
Introduction To ScrumDave Neuman
 
Estimation Agile Projects
Estimation Agile ProjectsEstimation Agile Projects
Estimation Agile ProjectsRam Srivastava
 
Practices of an agile developer
Practices of an agile developerPractices of an agile developer
Practices of an agile developerDUONG Trong Tan
 
Scrum managing through complexity
Scrum managing through complexityScrum managing through complexity
Scrum managing through complexityPierre E. NEIS
 
The Essential Product Owner - Partnering with the team
The Essential Product Owner - Partnering with the teamThe Essential Product Owner - Partnering with the team
The Essential Product Owner - Partnering with the teamCprime
 
Introduction to Agile
Introduction to AgileIntroduction to Agile
Introduction to AgileRichard Cheng
 
What is this thing called Agile?
What is this thing called Agile?What is this thing called Agile?
What is this thing called Agile?John Goodpasture
 
Six Sigma Project Replication Webinar Slides
Six Sigma Project Replication Webinar SlidesSix Sigma Project Replication Webinar Slides
Six Sigma Project Replication Webinar SlidesPowerSteering Software
 

Similaire à Scaling Teams, Processes and Architectures (20)

Value driven continuous delivery
Value driven continuous deliveryValue driven continuous delivery
Value driven continuous delivery
 
OSSCube - Zend Webinar
OSSCube - Zend WebinarOSSCube - Zend Webinar
OSSCube - Zend Webinar
 
Jax Sql Saturday Scrum presentation #130
Jax Sql Saturday Scrum presentation #130Jax Sql Saturday Scrum presentation #130
Jax Sql Saturday Scrum presentation #130
 
Agile values
Agile valuesAgile values
Agile values
 
Amy.stapleton
Amy.stapletonAmy.stapleton
Amy.stapleton
 
Agile developers create their own identity by Ajay Danait
Agile developers create their own identity by Ajay DanaitAgile developers create their own identity by Ajay Danait
Agile developers create their own identity by Ajay Danait
 
Introduction To Scrum
Introduction To ScrumIntroduction To Scrum
Introduction To Scrum
 
Estimation Agile Projects
Estimation Agile ProjectsEstimation Agile Projects
Estimation Agile Projects
 
Practices of an agile developer
Practices of an agile developerPractices of an agile developer
Practices of an agile developer
 
Scrum managing through complexity
Scrum managing through complexityScrum managing through complexity
Scrum managing through complexity
 
The Essential Product Owner - Partnering with the team
The Essential Product Owner - Partnering with the teamThe Essential Product Owner - Partnering with the team
The Essential Product Owner - Partnering with the team
 
APO 2.0
APO 2.0APO 2.0
APO 2.0
 
Agile
AgileAgile
Agile
 
Introduction to Agile
Introduction to AgileIntroduction to Agile
Introduction to Agile
 
AMI Presentation
AMI PresentationAMI Presentation
AMI Presentation
 
What is this thing called Agile?
What is this thing called Agile?What is this thing called Agile?
What is this thing called Agile?
 
Large Scale Software Project
Large Scale Software ProjectLarge Scale Software Project
Large Scale Software Project
 
Agile intro module 1
Agile intro   module 1Agile intro   module 1
Agile intro module 1
 
Iqnite keynote
Iqnite keynoteIqnite keynote
Iqnite keynote
 
Six Sigma Project Replication Webinar Slides
Six Sigma Project Replication Webinar SlidesSix Sigma Project Replication Webinar Slides
Six Sigma Project Replication Webinar Slides
 

Dernier

Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 

Dernier (20)

Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 

Scaling Teams, Processes and Architectures

  • 1. Lorenzo Alberton @lorenzoalberton Scaling Teams, Processes and Architectures Managing growth London Scalability Group, Innovation Warehouse, 16th April 2012 1
  • 2. Scalability Is About... People Processes Technology 2
  • 4. Staffing Never compromise. Only hire people smarter than you. http://www.earthrangers.com/content/wildwire/toxic_spill.jpg 4
  • 5. Staffing Hire people who can fit the company culture. Promote fun in your working environment. http://www.earthrangers.com/content/wildwire/toxic_spill.jpg 4
  • 6. Staffing Beware of toxic people http://www.earthrangers.com/content/wildwire/toxic_spill.jpg 4
  • 7. Team Size and Structure Micromanaging managers Poor communication too small Overworked team members Low morale too big Can’t accomplish much Low productivity 5
  • 8. Team Size and Structure Micromanaging managers Poor communication too small Overworked team members Low morale too big Can’t accomplish much Low productivity CTO functional PM PM PM Designer Developer Tester Designer Developer Tester Designer Developer Tester Designer Developer Tester Designers Developers Testers 5
  • 9. Team Size and Structure Micromanaging managers Poor communication too small Overworked team members Low morale too big Can’t accomplish much Low productivity CTO functional matrix PM PM PM Proj 1 PM Designer Developer Tester Proj 2 PM Designer Developer Tester Proj 3 PM Designer Developer Tester Proj 4 PM Designer Developer Tester Designers Developers Testers 5
  • 11. Why are processes critical? Improve management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs 7
  • 12. Why are processes critical? Improve management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs Challenge 7
  • 13. Why are processes critical? Improve management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs Challenge right amount 7
  • 14. Why are processes critical? Improve management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs Challenge right amount right process 7
  • 15. Why are processes critical? Improve management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs Challenge right amount right process right time 7
  • 16. Determining Headroom Capacity Current Load 8
  • 17. Determining Headroom Why? Capacity Planning annual budget Hiring plan Current Load Prioritisation 8
  • 18. Controlling Change: Determine Risk http://dilbert.com/strips/comic/2008-05-08/ 9
  • 19. Controlling Change: Determine Risk http://dilbert.com/strips/comic/2008-05-08/ 9
  • 20. Risk Management Risk is cumulative Determine limits and tolerance 10
  • 21. Load / Stress Testing Load testing - identify, document and eliminate bottlenecks through a strict controlled process of measurement and analysis - measure system’s response and stability - verify the app can meet the desired performance objectives (SLA) Stress testing - determine the app’s stability when subjected to above-normal loads - verify the app’s behaviour when close to the breaking point - test the application recoverability (negative testing) 11
  • 22. Barrier Conditions Code reviews Manual and automated QA processes Performance and stress testing Release documentation checks (runbook) Dev, Test, Stage and Live environments Instrumentation checks Protection from significant failures 12
  • 25. Architectural Principles +1 N + 1 design 14
  • 26. Architectural Principles +1 N + 1 design for rollback 14
  • 27. Architectural Principles +1 N + 1 design for rollback to be disabled 14
  • 28. Architectural Principles +1 N + 1 design for rollback to be disabled to be monitored 14
  • 29. Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple monitored live sites 14
  • 30. Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple use mature monitored live sites technology 14
  • 31. Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple use mature monitored live sites technology asynchronous design 14
  • 32. Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple use mature monitored live sites technology asynchronous stateless design systems 14
  • 33. Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple use mature monitored live sites technology asynchronous stateless buy when design systems non core 14
  • 34. Stateless, Asynchronous Systems http://upload.wikimedia.org/wikipedia/commons/4/46/Synchronized_swimming_-_Russian_team.jpg 15
  • 36. Fault Isolative Structures Increase availability Limit impact of failures Easier debugging 16
  • 37. Fault Isolative Structures Increase availability Limit impact of failures Easier debugging First 16
  • 38. Fault Isolative Structures Increase availability Limit impact of failures Easier debugging Functions causing repetitive problems First 16
  • 39. Fault Isolative Structures Increase availability Limit impact of failures Easier debugging Functions Natural layout causing or topology repetitive of the site problems First 16
  • 40. Caching for Performance and Scale 17
  • 41. Caching for Performance and Scale Object Caches Usually serialized (marshalling / unmarshalling) get() / set() / replace() APC, Memcached 17
  • 42. Caching for Performance and Scale Object Caches Application Caches Usually serialized Proxy caches (marshalling / Reverse proxy unmarshalling) caches get() / set() / HTTP headers replace() ISP/Uni proxies APC, Memcached Squid, Varnish, mod_cache 17
  • 43. Caching for Performance and Scale Object Caches Application Caches CDNs Usually serialized Proxy caches Multiple locations (marshalling / / backbones Reverse proxy unmarshalling) caches get() / set() / HTTP headers CNAME entries replace() ISP/Uni proxies Akamai, Coral, APC, Memcached Squid, Varnish, Limelight... mod_cache 17
  • 44. Managing “Big Data” storage costs people and software power and space processing power backup time and costs 18
  • 45. Managing “Big Data” The more storage ...the more storage management storage costs people and software power and space processing power backup time and costs 18
  • 46. Managing “Big Data” The more storage ...the more storage management storage costs people and software power and space processing power backup time and costs Evaluate data retention policy Consider multi-tiered storage Distribute data/ work (Hadoop, M/R) 18
  • 48. Monitoring: Measure Everything 1. Is there a problem? User experience / Business metrics monitors 2. Where is the problem? System monitors (threshold - variance) 3. What is the problem? Application monitors 19
  • 49. Monitoring: Measure Everything 1. Is there a problem? User experience / Business metrics monitors 2. Where is the problem? System monitors (threshold - variance) 3. What is the problem? Application monitors Keep Signal vs. Noise ratio high 19
  • 50. Monitoring: Measure Everything StatsD 1. Is there a problem? User experience / Business metrics monitors 2. Where is the problem? System monitors (threshold - variance) 3. What is the problem? Application monitors Keep Signal vs. Noise ratio high 19
  • 51. DataSift Architecture Some Architecture Pr0n 20
  • 52. DataSift Architecture http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 21
  • 53. DataSift Architecture SOA - loosely coupled, independently scalable services. Simple APIs http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 21
  • 54. DataSift Architecture SOA - loosely coupled, independently scalable services. Simple APIs example http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 21
  • 55. SOA - Scale Each Component 22
  • 56. Our Stack Languages: C++, PHP, Java, Scala, Ruby, Node.JS Storage: MySQL, HBase Cache: Memcached, APC, Redis Queues: ZeroMQ, Kafka, Redis Development/Deployment: GIT, Jenkins CI, RPM, Chef Monitoring: StatsD + Graphite, Zenoss 23
  • 57. Our Stack Languages: C++, PHP, Java, Scala, Ruby, Node.JS Storage: MySQL, HBase Cache: Memcached, APC, Redis Queues: ZeroMQ, Kafka, Redis Development/Deployment: GIT, Jenkins CI, RPM, Chef Monitoring: StatsD + Graphite, Zenoss Secret recipe: amazing people and working environment 23
  • 58. Messaging ZeroMQ: PUSH-PULL, REQ-REP, PUB-SUB (multicast, broadcast) Internal communication: pass messages to the next processing stage, control events, monitoring Kafka/Redis: PUSH-PULL with persistence Internal message / workload buffering and distribution Node.js: WebSockets / HTTP Streaming Message delivery (output) 24
  • 59. 0mq PUSH-PULL (workload distribution) Consumer 1 Consumer 2 Consumer 3 [Round-Robin-ish] 25
  • 60. 0mq PUB-SUB (High Availability) Listener 1 Publisher 1 Listener 2 Publisher 2 Listener 3 [Broadcast] [Dynamic Subscriptions] 26
  • 61. 0mq PUB-SUB (High Availability) DC 1 Publisher 1 Publisher 2 DC 2 27
  • 62. Internal “Firehose” Publishers Subscribers Alice’s John’s Y Z timeline Inbox X subscribe to topic X Data Bus subscribe to topic Y System Fred’s Tech Monitor Followers Blog Feed 28
  • 63. Instrumentation https://play.google.com/store/apps/details?id=net.networksaremadeofstring.rhybudd 29
  • 65. References M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley http://theartofscalability.com/ http://www.slideshare.net/quipo/the-art-of-scalability-managing- growth http://www.slideshare.net/postwait/scalable-internet-architecture http://bit.ly/IJKwuc http://agile.dzone.com/news/approaches-organizational https://bitly.com/vCSd49 31
  • 66. Lorenzo Alberton @lorenzoalberton Thank you! lorenzo@alberton.info http://www.alberton.info/talks Questions? 32

Notes de l'éditeur

  1. \n
  2. Let’s start by focusing on the true foundation: people and process, without which true scalability cannot be built.\nPeople are the most important element of scalability, as without people there are no processes and no technology.\n
  3. \n
  4. Leadership is the influencing of an organisation to accomplish a specific objective (down to personal characteristics + skills + experience + actions).\nLook for solid software engineers with good understanding of CS topics, and exceptional devops. Create fun working environment. We solve serious, challenging problems. We also want to have fun. Avoid rockstars. "Hard work beats talent when talent doesn't work hard." - Tim Notke\nFocus and dedication.\n
  5. Leadership is the influencing of an organisation to accomplish a specific objective (down to personal characteristics + skills + experience + actions).\nLook for solid software engineers with good understanding of CS topics, and exceptional devops. Create fun working environment. We solve serious, challenging problems. We also want to have fun. Avoid rockstars. "Hard work beats talent when talent doesn't work hard." - Tim Notke\nFocus and dedication.\n
  6. Leadership is the influencing of an organisation to accomplish a specific objective (down to personal characteristics + skills + experience + actions).\nLook for solid software engineers with good understanding of CS topics, and exceptional devops. Create fun working environment. We solve serious, challenging problems. We also want to have fun. Avoid rockstars. "Hard work beats talent when talent doesn't work hard." - Tim Notke\nFocus and dedication.\n
  7. Leadership is the influencing of an organisation to accomplish a specific objective (down to personal characteristics + skills + experience + actions).\nLook for solid software engineers with good understanding of CS topics, and exceptional devops. Create fun working environment. We solve serious, challenging problems. We also want to have fun. Avoid rockstars. "Hard work beats talent when talent doesn't work hard." - Tim Notke\nFocus and dedication.\n
  8. Leadership is the influencing of an organisation to accomplish a specific objective (down to personal characteristics + skills + experience + actions).\nLook for solid software engineers with good understanding of CS topics, and exceptional devops. Create fun working environment. We solve serious, challenging problems. We also want to have fun. Avoid rockstars. "Hard work beats talent when talent doesn't work hard." - Tim Notke\nFocus and dedication.\n
  9. Leadership is the influencing of an organisation to accomplish a specific objective (down to personal characteristics + skills + experience + actions).\nLook for solid software engineers with good understanding of CS topics, and exceptional devops. Create fun working environment. We solve serious, challenging problems. We also want to have fun. Avoid rockstars. "Hard work beats talent when talent doesn't work hard." - Tim Notke\nFocus and dedication.\n
  10. Leadership is the influencing of an organisation to accomplish a specific objective (down to personal characteristics + skills + experience + actions).\nLook for solid software engineers with good understanding of CS topics, and exceptional devops. Create fun working environment. We solve serious, challenging problems. We also want to have fun. Avoid rockstars. "Hard work beats talent when talent doesn't work hard." - Tim Notke\nFocus and dedication.\n
  11. Leadership is the influencing of an organisation to accomplish a specific objective (down to personal characteristics + skills + experience + actions).\nLook for solid software engineers with good understanding of CS topics, and exceptional devops. Create fun working environment. We solve serious, challenging problems. We also want to have fun. Avoid rockstars. "Hard work beats talent when talent doesn't work hard." - Tim Notke\nFocus and dedication.\n
  12. Overlapping responsibilities create wasted effort and value-destroying conflicts.\nKey scale-related responsibilities for any organisation include:\n- setting measurable goals; - staffing the team with the appropriate skills; - defining and implementing a scalable architecture.\n
  13. Overlapping responsibilities create wasted effort and value-destroying conflicts.\nKey scale-related responsibilities for any organisation include:\n- setting measurable goals; - staffing the team with the appropriate skills; - defining and implementing a scalable architecture.\n
  14. Overlapping responsibilities create wasted effort and value-destroying conflicts.\nKey scale-related responsibilities for any organisation include:\n- setting measurable goals; - staffing the team with the appropriate skills; - defining and implementing a scalable architecture.\n
  15. Overlapping responsibilities create wasted effort and value-destroying conflicts.\nKey scale-related responsibilities for any organisation include:\n- setting measurable goals; - staffing the team with the appropriate skills; - defining and implementing a scalable architecture.\n
  16. Overlapping responsibilities create wasted effort and value-destroying conflicts.\nKey scale-related responsibilities for any organisation include:\n- setting measurable goals; - staffing the team with the appropriate skills; - defining and implementing a scalable architecture.\n
  17. Overlapping responsibilities create wasted effort and value-destroying conflicts.\nKey scale-related responsibilities for any organisation include:\n- setting measurable goals; - staffing the team with the appropriate skills; - defining and implementing a scalable architecture.\n
  18. Overlapping responsibilities create wasted effort and value-destroying conflicts.\nKey scale-related responsibilities for any organisation include:\n- setting measurable goals; - staffing the team with the appropriate skills; - defining and implementing a scalable architecture.\n
  19. Overlapping responsibilities create wasted effort and value-destroying conflicts.\nKey scale-related responsibilities for any organisation include:\n- setting measurable goals; - staffing the team with the appropriate skills; - defining and implementing a scalable architecture.\n
  20. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  21. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  22. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  23. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  24. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  25. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  26. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  27. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  28. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  29. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  30. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  31. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  32. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  33. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  34. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  35. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  36. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  37. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  38. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  39. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  40. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  41. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  42. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  43. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  44. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  45. There’s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person’s primary discipline).\n
  46. \n
  47. Processes serve 3 general purposes:\n- they augment the management of our teams and employees\n- they standardise employee’s actions while performing repetitive tasks\n- they free employees up from daily mundane decisions to concentrate on grander ideas.\n
  48. Processes serve 3 general purposes:\n- they augment the management of our teams and employees\n- they standardise employee’s actions while performing repetitive tasks\n- they free employees up from daily mundane decisions to concentrate on grander ideas.\n
  49. Processes serve 3 general purposes:\n- they augment the management of our teams and employees\n- they standardise employee’s actions while performing repetitive tasks\n- they free employees up from daily mundane decisions to concentrate on grander ideas.\n
  50. Processes serve 3 general purposes:\n- they augment the management of our teams and employees\n- they standardise employee’s actions while performing repetitive tasks\n- they free employees up from daily mundane decisions to concentrate on grander ideas.\n
  51. Processes serve 3 general purposes:\n- they augment the management of our teams and employees\n- they standardise employee’s actions while performing repetitive tasks\n- they free employees up from daily mundane decisions to concentrate on grander ideas.\n
  52. Processes serve 3 general purposes:\n- they augment the management of our teams and employees\n- they standardise employee’s actions while performing repetitive tasks\n- they free employees up from daily mundane decisions to concentrate on grander ideas.\n
  53. Processes serve 3 general purposes:\n- they augment the management of our teams and employees\n- they standardise employee’s actions while performing repetitive tasks\n- they free employees up from daily mundane decisions to concentrate on grander ideas.\n
  54. Processes serve 3 general purposes:\n- they augment the management of our teams and employees\n- they standardise employee’s actions while performing repetitive tasks\n- they free employees up from daily mundane decisions to concentrate on grander ideas.\n
  55. In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
  56. In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
  57. In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
  58. In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
  59. In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
  60. In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
  61. In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
  62. In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
  63. In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
  64. In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
  65. In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
  66. In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
  67. In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
  68. The intent of change management is to limit the impact of changes by controlling them through their release into the production environment and logging them as they are introduced to production. Gut feeling / finger in the air: thanks to experience + innate ability: fast, but not accurate.\nSemaphore: Assign a risk level of green/yellow/red to each small component, then assign an overall colour: methodical, repeatable, documentable, no longer relying on a single person, accurate. Better: Failure Mode and Effect Analysis.\n\n\n
  69. Risk is cumulative. You might want to establish some limits to the amount of risk that you are willing to allow at a particular time of the day or customer volume.\nAlso consider the human factor, i.e. the level of risk tolerance that a person can have within a certain time frame.\n\n
  70. The purpose of load testing is to identify, document and eliminate bottlenecks in the system through a strict controlled process of measurement and analysis. Load testing is the process of putting load or user demand on a system to measure its response and stability, to verify that the app can meet the desired performance objectives (SLA: service level agreement).\n Establish success criteria (concurrent usage, response time, ...)\n Establish the test environment (as close as possible to the production environment)\n Define the tests (Pareto rule 20% - 80%) to cover different things (endurance, most used, most visible, different components)\n Identify what needs to be monitored / what data needs to be collected\n Run, Analyse, Report to Engineers\n Repeat Tests and Analysis\nStress testing is a process used to determine an application’s stability when subjected to above-normal loads, to verify the behaviour when close to the breaking point of the application.\nPositive testing is where the load is progressively increased to overwhelm the system’s resources.\nNegative testing takes away resources such as memory, threads, connections, testing the application recoverability.\nThe way to insure that the headroom calculations remain accurate is to conduct performance testing on all your releases to insure you are not introducing unexpected load increases.\n
  71. Good processes for the promotion of systems into the production environment have the capability of protecting you from significant failures. Developing effective barrier conditions and coupling them with a process and capability to roll back production changes are necessary components within any highly available service and are critical to the success of your scalability goals.\n
  72. \n
  73. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it’s performing differently than it normally operates in addition to telling you when it’s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  74. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it’s performing differently than it normally operates in addition to telling you when it’s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  75. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it’s performing differently than it normally operates in addition to telling you when it’s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  76. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it’s performing differently than it normally operates in addition to telling you when it’s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  77. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it’s performing differently than it normally operates in addition to telling you when it’s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  78. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it’s performing differently than it normally operates in addition to telling you when it’s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  79. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it’s performing differently than it normally operates in addition to telling you when it’s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  80. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it’s performing differently than it normally operates in addition to telling you when it’s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  81. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it’s performing differently than it normally operates in addition to telling you when it’s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  82. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it’s performing differently than it normally operates in addition to telling you when it’s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  83. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it’s performing differently than it normally operates in addition to telling you when it’s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  84. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it’s performing differently than it normally operates in addition to telling you when it’s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  85. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it’s performing differently than it normally operates in addition to telling you when it’s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  86. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it’s performing differently than it normally operates in addition to telling you when it’s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  87. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it’s performing differently than it normally operates in addition to telling you when it’s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  88. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it’s performing differently than it normally operates in addition to telling you when it’s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  89. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it’s performing differently than it normally operates in addition to telling you when it’s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  90. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it’s performing differently than it normally operates in addition to telling you when it’s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  91. Synchronous calls, if used excessively or incorrectly cause undue burden on the system and prevent it from scaling.\nSystems designed to interact synchronously have a higher failure rate than asynchronous ones. Their ability to scale is tied to the slowest system in the chain of communications. It’s better to use callbacks, and timeouts to recover gracefully should they not receive responses in a timely fashion.\nSynchronisation is when two or more pieces of work must be in a specific order to accomplish a task. Asynchronous coordination between the original method and the invoked method requires a mechanism that the original method determines when or if a called method has completed executing (callbacks). Ensure they have a chance to recover gracefully with timeouts should they not receive responses in a timely fashion.\nA related problem is stateful versus stateless applications. An application that uses state relies on the current condition of execution as a determinant of the next action to be performed. \nThere are 3 basic approaches to solving the complexities of scaling an application that uses session data: 1) Avoidance (using no sessions or sticky sessions) avoid replication: Share-nothing architecture; 2) Decentralisation (store session data in the browser’s cookie or in a db whose key is referenced by a hash in the cookie); 3) Centralisation (store cookies in the db / memcached).\n\n
  92. You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don’t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
  93. You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don’t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
  94. You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don’t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
  95. You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don’t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
  96. You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don’t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
  97. What is the best way to handle large volumes of traffic? Answer: “Establish the right organisation, implement the right processes and follow the right architectural principles”. Correct, but the best way is not to have to handle it at all. The key to achieving this is through pervasive use of caching. The cache hit ratio is important to understand its effectiveness. The cache can be updated/refreshed via a batch job or on a cache-miss. If the cache is filled, some algorithms (LRU, MRU...) will decide on which entry to evict. When the data changes, the cache can be updated through a write-back or write-through policy. There are 3 cache types:\n- Object caches: used to store objects for the app to be reused, usually serialized objects. The app must be aware of them. Layer in front of the db / external services. Marshalling is a process where the object is transformed into a data format suitable for transmitting or storing.\n- Application caches: A) Proxy caches, usually implemented by ISPs, universities or corporations; it caches for a limited number of users and for an unlimited number of sites. B) Reverse proxy caches (opposite): it caches for an unlimited number of users and for a limited number of applications; the configuration of the specific app will determine what can be cached. HTTP headers give much control over caching (Last-Modified, Etag, Cache-Control).\n- Content Delivery Networks: they speed up response time, off load requests from your application’s origin server, and usually lower costs. The total capacity of the CDN’s strategically placed servers can yield a higher capacity and availability than the network backbone. The way it works is that you place the CDN’s domain name as an alias for your server by using a canonical name (CNAME) in your DNS entry\n
  98. What is the best way to handle large volumes of traffic? Answer: “Establish the right organisation, implement the right processes and follow the right architectural principles”. Correct, but the best way is not to have to handle it at all. The key to achieving this is through pervasive use of caching. The cache hit ratio is important to understand its effectiveness. The cache can be updated/refreshed via a batch job or on a cache-miss. If the cache is filled, some algorithms (LRU, MRU...) will decide on which entry to evict. When the data changes, the cache can be updated through a write-back or write-through policy. There are 3 cache types:\n- Object caches: used to store objects for the app to be reused, usually serialized objects. The app must be aware of them. Layer in front of the db / external services. Marshalling is a process where the object is transformed into a data format suitable for transmitting or storing.\n- Application caches: A) Proxy caches, usually implemented by ISPs, universities or corporations; it caches for a limited number of users and for an unlimited number of sites. B) Reverse proxy caches (opposite): it caches for an unlimited number of users and for a limited number of applications; the configuration of the specific app will determine what can be cached. HTTP headers give much control over caching (Last-Modified, Etag, Cache-Control).\n- Content Delivery Networks: they speed up response time, off load requests from your application’s origin server, and usually lower costs. The total capacity of the CDN’s strategically placed servers can yield a higher capacity and availability than the network backbone. The way it works is that you place the CDN’s domain name as an alias for your server by using a canonical name (CNAME) in your DNS entry\n
  99. What is the best way to handle large volumes of traffic? Answer: “Establish the right organisation, implement the right processes and follow the right architectural principles”. Correct, but the best way is not to have to handle it at all. The key to achieving this is through pervasive use of caching. The cache hit ratio is important to understand its effectiveness. The cache can be updated/refreshed via a batch job or on a cache-miss. If the cache is filled, some algorithms (LRU, MRU...) will decide on which entry to evict. When the data changes, the cache can be updated through a write-back or write-through policy. There are 3 cache types:\n- Object caches: used to store objects for the app to be reused, usually serialized objects. The app must be aware of them. Layer in front of the db / external services. Marshalling is a process where the object is transformed into a data format suitable for transmitting or storing.\n- Application caches: A) Proxy caches, usually implemented by ISPs, universities or corporations; it caches for a limited number of users and for an unlimited number of sites. B) Reverse proxy caches (opposite): it caches for an unlimited number of users and for a limited number of applications; the configuration of the specific app will determine what can be cached. HTTP headers give much control over caching (Last-Modified, Etag, Cache-Control).\n- Content Delivery Networks: they speed up response time, off load requests from your application’s origin server, and usually lower costs. The total capacity of the CDN’s strategically placed servers can yield a higher capacity and availability than the network backbone. The way it works is that you place the CDN’s domain name as an alias for your server by using a canonical name (CNAME) in your DNS entry\n
  100. \n
  101. \n
  102. \n
  103. \n
  104. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  105. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  106. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  107. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  108. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  109. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  110. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  111. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  112. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  113. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  114. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  115. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  116. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  117. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  118. \n
  119. \n
  120. \n
  121. \n
  122. Use queues and workers to make processes asynchronous, distribute data to parallel workers. \n
  123. happy to talk about any of them\n
  124. \n
  125. \n
  126. \n
  127. listeners can only subscribe to one or more topics. Different output channels.\nZeroMQ v3: filtering done on the publisher side\n
  128. An interesting idea if you have a highly dynamic site / service, with each update affecting several other users / pages, is to have an internal data bus that carries all the information, with updates labelled with topics, and all the services/users subscribing to the relevant topics.\n
  129. We collect millions of events every second.\nThe importance of people: devops who know what to monitor, how, how to use and write tools, and have 100% dedication.\nWe use different technologies. It’s very easy to set up a new ZeroMQ listener.\nWe use StatsD (from Flickr / Etsy), Zenoss, Graphite\n
  130. shameless plug\n
  131. \n
  132. \n