SlideShare une entreprise Scribd logo
1  sur  45
Télécharger pour lire hors ligne
MANAGING GROWTH
SCALING TEAMS, PROCESSES, ARCHITECTURES
Lorenzo Alberton, CTO @ DataSift
MEST, Accra 10 December 2017
LORENZO ALBERTON
Chief Technology Officer, DataSift
http://alberton.info
@lorenzoalberton
SCALABLE ARCHITECTURES http://bit.ly/scaleds
SCALABILITY IS ABOUT…
People
Technology
ProcessesTRUE
FOUNDATION
PART 1.
PEOPLE
Staffing, Roles,
Management, Teams
CULTURE
➤ Treat people as volunteers (*)
➤ Lead by living the values you
promote
➤ Respect, collaboration
➤ Promote fun in the workplace
➤ Culture of safety at work (**)
(*) Peter Drucker
(**) Google, Project Aristotle
EFFECTIVE TEAMS
PROJECTARISTOTLE(2012)
Psychological safety: team climate
characterised by interpersonal trust
and mutual respect in which people
are comfortable being themselves.
Feeling free to share the things that
scare us without fear of
recriminations.
Behaviours: conversational turn-
taking and empathy.
https://www.nytimes.com/2016/02/28/magazine/what-google-learned-from-its-quest-to-build-the-perfect-team.html
TEAMS VS. INDIVIDUAL CONTRIBUTORS
➤ Beware of toxic people
➤ Value communication and
team work over super-heroes
(*) Sunday afternoon test
STAFFING
Don’t hire
experts
Technologies come and go
Focus more on people with passion
and less on people with specific skills
TEAM SIZE
➤ Never underestimate the
power of a small team
➤ Small teams force alignment
and focus
➤ Bigger teams need an insane
amount of overhead
➤ Parkinson's Law: “Work
expands to fill the time available
for its completion”
work that keeps a person busy
but has little value in itself
TEAM STRUCTURE
No artificial boundaries around languages or skills
Try cross-functional teams 

(less friction, better end to end collaboration, project ownership)
MIDDLE-MANAGEMENT CURSE
Mistakes:
➤ Prematurely re-organise for scale
(deep hierarchy, over-
specialisation)
➤ Process managers (factory
mentality) vs Problem solvers
➤ Micromanagement
➤ Non-engineering culture
➤ 1-on-1s as calendar-filler
➤ Not being “on the ground”
➤ Over-confidence in tooling
➤ OTOH, coordination can be hard
PART 2.
PROCESSES
How to make day to day
operations smooth
WHY ARE PROCESSES CRITICAL?
Ease management
of teams/projects
Standardise actions
in repetitive tasks
Reduce mundane
decisions to focus
on grander ideas
Allow the team to
react quickly to crisis
➤ A process shouldn’t exist for the sake of it
➤ Introduce processes gradually, only keep what works
➤ Don’t put too much confidence in tools alone to fix issues
EXAMPLE PROCESSES
➤ Development methodology
➤ Risk / Benefit analysis
➤ Prioritisation / Planning
➤ Design and code reviews
➤ Evaluating headroom / scale
➤ Load / Stress testing
➤ Test automation
➤ Deployment automation
➤ Release checklists
➤ Risk assessment/management
➤ Blameless postmortems
PROMOTING SYSTEMS TO PROD
➤ Code reviews
➤ Dev, Test, Stage and Live
environments
➤ Manual and automated QA
processes
➤ Performance and stress testing
➤ Release check lists (runbook)
➤ Instrumentation checks
➤ Testing roll-back capability
Protection from significant failures
BARRIER CONDITIONS
DESIGN AND CODE REVIEWS
➤ Promote collaboration
➤ Validate ideas, assess risk, detect
flaws, simplify the solution
➤ Reason about behaviour before
coding
DAILY STAND-UPS
➤ Important for knowledge
sharing, collaboration,
alignment
CONTROLLING CHANGE: RISK ESTIMATION
http://dilbert.com/strips/comic/2008-05-08/
➤ Limit / log the impact of changes
➤ Assess risk methodologies:
• Gut feeling / finger in the air
• Semaphore method
• Failure Mode and Effect Analysis
RISK MANAGEMENT
➤ Risk is cumulative
➤ Determine limits and
tolerance
➤ Stress, long hours, peer
pressure can multiply risk
WHEN/WHAT TO SCALE: DETERMINING HEADROOM
Capacity
Current Load
Why?
Budget plan
Prioritisation
Hiring plan
Determine starting point, remaining capacity, expected demand
LOAD TESTING
➤ Identify, document and
eliminate bottlenecks through
a strict controlled process of
measurement and analysis
➤ Measure system’s response
and stability
➤ Verify the app can meet the
desired performance
objectives (SLA)
➤ Establish success criteria, test
environment, tests, what
needs to be monitored, what
data needs to be collected
STRESS TESTING
➤ Determine the app’s stability
when subjected to above-
normal loads
➤ Verify the app’s behaviour
when close to the breaking
point
➤ Positive testing: progressively
increase load to overwhelm
the system’s resources
➤ Negative testing: take away
resources (memory, threads,
connections) to test the
application recoverability
PART 3.
TECHNOLOGY
Architecting Robust,
Scalable Solutions
DO NOT SCALE UNTIL YOU CAN’T AVOID IT ANYMORE
➤ “Go meet your people. Do things that don’t scale.” (Paul
Graham to AirBNB’s founders)
➤ Solve for specific problems
➤ Don’t generalise until you rebuilt something for the 3rd time
➤ Don’t over-engineer the solution
➤ Automate repetitive and error-prone tasks
➤ Avoid complicating things
✴ Phone system
MVP APPROACH
➤ Test ideas before spending a
year building something you
haven’t proven in the market
first
➤ Fake it till you make it
➤ Example: Zappos
ARCHITECTURAL / DESIGN PRINCIPLES
N + 1 nodes for rollback to be disabled
(feature flags)
to be monitored
for multiple live
systems/sites
use mature
technology
asynchronous
communications
stateless
systems
+1
buy when
non core
FAULT-TOLERANT STRUCTURES
➤ Swim lanes: isolate and limit the
impacts of failure within the
system by segmenting pipelines
➤ Barrier and Guide (shard)
➤ Increase availability
➤ Make incidents easier to detect,
identify and resolve

➤ Favour the transactions making
the company money first
➤ Isolate functions causing repetitive
problems (or busy tenants)
➤ Consider the natural layout or
topology of the site
SCALING IN DIFFERENT DIRECTIONS
x
y z
AKF Scaling Cube, “The Art of Scalability”, M.L.Abbott, M.T.Fisher
cloning of services and data
without any bias
(e.g. more serving nodes in a worker
pool where any node can do the work)
separation of work
responsibility by type of data
or type of work
(different specialised worker
pools)
separation of work by
customer or requestor
(dedicated highly specialised
worker pools)
SCALING IN DIFFERENT DIRECTIONS - 1. SCALING WORK / APPS
x
cloning of entities
or data - unbiased
distribution of work
y
separation of work
by activity or data
z
separation of work
by person for whom
the work is done
web site

(mirror 1)
web site

(mirror 2)
search 

server
shopping
cart server
premium site
standard site
LB
SCALING IN DIFFERENT DIRECTIONS - 1. SCALING WORK / APPS
x mirroring
+ scale transactions
- scale data
y split by service
+ scale isolation
+ scale function data
- scale customer data
z
split by need /
location / value
+ scale isolation
+ scale customer data
- scale function data
SCALING IN DIFFERENT DIRECTIONS - 2. SCALING DATA
x
data cloning
(replication /
clustering) + load
balancer
y
split different things
by service / resource /
data affinity
z
split similar things
by modulus / hash-
based lookups
copy 1 copy 2 copy 3
ABC DEF GHI
SCALING IN DIFFERENT DIRECTIONS - 2. SCALING DATA
x
data cloning
(replication /
clustering) + load
balancer
+ easy to implement
+ scale transaction volume
+ useful in case of high read to write ratio
- scale data size and growth
y
split different things
by service / resource /
data affinity
+ fault isolation
+ reduce query time
- more difficult
- data migration
z
split similar things
by modulus / hash-
based lookups
+ uniformly balanced demand
+ fault isolation
+ scale data and transactions
- more costly
QUEUES
➤ Asynchronous communication
➤ Workload distribution
➤ Failure isolation
MESSAGE QUEUES AS BUFFERS (ASYNC COMM - DECOUPLING)
CP
Unpredictable load spikes
CP
Load normalisation / smoothing
Batching ⇒ higher throughput
source /
producer
sink /
consumer
WORKLOAD DISTRIBUTION - LOAD BALANCING
Consumer 1
Consumer 2
Consumer 3
Producer
push pull
pull
pull
MULTIPLEXING
pull
Consumer
fair-queuing:
R1, R4, R5,
R2, R6, R3
Producer 1
Producer 2
Producer 3
push R4
push R1, R2, R3
push R5, R6
HIGH AVAILABILITY (PUB-SUB / BROADCAST)
Listener 1
Listener 2
Listener 3
[Broadcast]
Publisher 1
Publisher 2
[Dynamic Subscriptions]
BOUND YOUR QUEUE SIZE - APPLY BACK PRESSURE
CP
MONITORING
➤ Measure all the things!
➤ Think about what metrics to
track when you design your
app: system/app/user level
➤ Engage with Ops / QA early
on in the design phase
➤ Invest in a good monitoring
solution
➤ Data integrity checks (bucket
analysis, statistical analysis)
➤ Alerting and monitoring
dashboards should be intuitive
39
LOOK! RIB CAGES!
INTUITIVE MONITORING DASHBOARDS: LIVE HEAT-MAPS
INTUITIVE MONITORING DASHBOARDS: LIVE HEAT-MAPS
LOOK! MONITORS!
INTUITIVE MONITORING DASHBOARDS: LIVE HEAT-MAPS
OTHER SCALING TIPS
➤ Use caching aggressively (CDNs,
app & object caches)
➤ Design to scale out horizontally
➤ Simplify scope, design,
implementation: lean == fast
➤ Know latencies
➤ Relax temporal constraints
➤ Discuss and Learn from mistakes
➤ Design for fault tolerance,
graceful failure, and resilience
➤ Avoid SPOFs
➤ Avoid or distribute state
➤ Be competent
REFERENCES
http://www.slideshare.net/quipo/the-art-of-
scalability-managing-growth
http://www.infoq.com/presentations/Simple-
Made-Easy-QCon-London-2012
http://www.slideshare.net/postwait/scalable-
internet-architecture
http://bit.ly/IJKwuc
http://agile.dzone.com/news/approaches-
organizational
https://bitly.com/vCSd49
M. L. Abbot, M. T. Fisher,
“The Art Of Scalability”,
Addison Wesley
http://theartofscalability.com/
http://alberton.info/talks
@lorenzoalberton
lorenzo@datasift.com
THANK YOU!
/in/lorenzoalberton

Contenu connexe

Similaire à Scaling teams, processes and architectures

Machine Data Is EVERYWHERE: Use It for Testing
Machine Data Is EVERYWHERE: Use It for TestingMachine Data Is EVERYWHERE: Use It for Testing
Machine Data Is EVERYWHERE: Use It for TestingTechWell
 
I pushed in production :). Have a nice weekend
I pushed in production :). Have a nice weekendI pushed in production :). Have a nice weekend
I pushed in production :). Have a nice weekendNicolas Carlier
 
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"Fwdays
 
Dev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrDev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrJohn Allspaw
 
Ledingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lkLedingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lkMukesh Singh
 
RailswayCon 2010 - Command Your Domain
RailswayCon 2010 - Command Your DomainRailswayCon 2010 - Command Your Domain
RailswayCon 2010 - Command Your DomainLourens Naudé
 
Managing an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product LeaderManaging an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product LeaderProduct School
 
Writting Better Software
Writting Better SoftwareWritting Better Software
Writting Better Softwaresvilen.ivanov
 
Track c how do we break - jasper
Track c   how do we break - jasperTrack c   how do we break - jasper
Track c how do we break - jasperchiportal
 
Nick Martin Resume Estimator
Nick Martin   Resume EstimatorNick Martin   Resume Estimator
Nick Martin Resume Estimatornm3343
 
MineExcellence Blasting Products
MineExcellence Blasting ProductsMineExcellence Blasting Products
MineExcellence Blasting ProductsMineExcellence
 
Test Design for Fully Automated Build Architecture
Test Design for Fully Automated Build ArchitectureTest Design for Fully Automated Build Architecture
Test Design for Fully Automated Build ArchitectureTechWell
 
How to Manage the Risk of your Polyglot Environments
How to Manage the Risk of your Polyglot EnvironmentsHow to Manage the Risk of your Polyglot Environments
How to Manage the Risk of your Polyglot EnvironmentsDevOps.com
 
DevOpsRoadTrip San Francisco Final Speaking Deck
DevOpsRoadTrip San Francisco Final Speaking Deck DevOpsRoadTrip San Francisco Final Speaking Deck
DevOpsRoadTrip San Francisco Final Speaking Deck VictorOps
 
Transforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOpsTransforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOpsNicolas (Nick) Barcet
 
Build your open source data science platform
Build your open source data science platformBuild your open source data science platform
Build your open source data science platformDavid Talby
 
The ins & outs of data transfer
The ins & outs of data transferThe ins & outs of data transfer
The ins & outs of data transferJason Davis
 

Similaire à Scaling teams, processes and architectures (20)

Machine Data Is EVERYWHERE: Use It for Testing
Machine Data Is EVERYWHERE: Use It for TestingMachine Data Is EVERYWHERE: Use It for Testing
Machine Data Is EVERYWHERE: Use It for Testing
 
I pushed in production :). Have a nice weekend
I pushed in production :). Have a nice weekendI pushed in production :). Have a nice weekend
I pushed in production :). Have a nice weekend
 
PDC+++ Module 2 Class 9 Design Techniques I
PDC+++ Module 2 Class 9 Design Techniques IPDC+++ Module 2 Class 9 Design Techniques I
PDC+++ Module 2 Class 9 Design Techniques I
 
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
 
Dev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrDev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and Flickr
 
Ledingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lkLedingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lk
 
RailswayCon 2010 - Command Your Domain
RailswayCon 2010 - Command Your DomainRailswayCon 2010 - Command Your Domain
RailswayCon 2010 - Command Your Domain
 
Managing an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product LeaderManaging an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product Leader
 
Writting Better Software
Writting Better SoftwareWritting Better Software
Writting Better Software
 
Turning Information chaos into reliable data
Turning Information chaos into reliable dataTurning Information chaos into reliable data
Turning Information chaos into reliable data
 
Track c how do we break - jasper
Track c   how do we break - jasperTrack c   how do we break - jasper
Track c how do we break - jasper
 
Nick Martin Resume Estimator
Nick Martin   Resume EstimatorNick Martin   Resume Estimator
Nick Martin Resume Estimator
 
MineExcellence Blasting Products
MineExcellence Blasting ProductsMineExcellence Blasting Products
MineExcellence Blasting Products
 
Test Design for Fully Automated Build Architecture
Test Design for Fully Automated Build ArchitectureTest Design for Fully Automated Build Architecture
Test Design for Fully Automated Build Architecture
 
How to Manage the Risk of your Polyglot Environments
How to Manage the Risk of your Polyglot EnvironmentsHow to Manage the Risk of your Polyglot Environments
How to Manage the Risk of your Polyglot Environments
 
Maximizing Big Data ROI via Best of Breed Technology Patterns and Practices -...
Maximizing Big Data ROI via Best of Breed Technology Patterns and Practices -...Maximizing Big Data ROI via Best of Breed Technology Patterns and Practices -...
Maximizing Big Data ROI via Best of Breed Technology Patterns and Practices -...
 
DevOpsRoadTrip San Francisco Final Speaking Deck
DevOpsRoadTrip San Francisco Final Speaking Deck DevOpsRoadTrip San Francisco Final Speaking Deck
DevOpsRoadTrip San Francisco Final Speaking Deck
 
Transforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOpsTransforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOps
 
Build your open source data science platform
Build your open source data science platformBuild your open source data science platform
Build your open source data science platform
 
The ins & outs of data transfer
The ins & outs of data transferThe ins & outs of data transfer
The ins & outs of data transfer
 

Plus de Lorenzo Alberton

Monitoring at scale - Intuitive dashboard design
Monitoring at scale - Intuitive dashboard designMonitoring at scale - Intuitive dashboard design
Monitoring at scale - Intuitive dashboard designLorenzo Alberton
 
Scalable Architectures - Taming the Twitter Firehose
Scalable Architectures - Taming the Twitter FirehoseScalable Architectures - Taming the Twitter Firehose
Scalable Architectures - Taming the Twitter FirehoseLorenzo Alberton
 
Scaling Teams, Processes and Architectures
Scaling Teams, Processes and ArchitecturesScaling Teams, Processes and Architectures
Scaling Teams, Processes and ArchitecturesLorenzo Alberton
 
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesModern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesLorenzo Alberton
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenLorenzo Alberton
 
The Art of Scalability - Managing growth
The Art of Scalability - Managing growthThe Art of Scalability - Managing growth
The Art of Scalability - Managing growthLorenzo Alberton
 
Graphs in the Database: Rdbms In The Social Networks Age
Graphs in the Database: Rdbms In The Social Networks AgeGraphs in the Database: Rdbms In The Social Networks Age
Graphs in the Database: Rdbms In The Social Networks AgeLorenzo Alberton
 
Trees In The Database - Advanced data structures
Trees In The Database - Advanced data structuresTrees In The Database - Advanced data structures
Trees In The Database - Advanced data structuresLorenzo Alberton
 

Plus de Lorenzo Alberton (8)

Monitoring at scale - Intuitive dashboard design
Monitoring at scale - Intuitive dashboard designMonitoring at scale - Intuitive dashboard design
Monitoring at scale - Intuitive dashboard design
 
Scalable Architectures - Taming the Twitter Firehose
Scalable Architectures - Taming the Twitter FirehoseScalable Architectures - Taming the Twitter Firehose
Scalable Architectures - Taming the Twitter Firehose
 
Scaling Teams, Processes and Architectures
Scaling Teams, Processes and ArchitecturesScaling Teams, Processes and Architectures
Scaling Teams, Processes and Architectures
 
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesModern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and when
 
The Art of Scalability - Managing growth
The Art of Scalability - Managing growthThe Art of Scalability - Managing growth
The Art of Scalability - Managing growth
 
Graphs in the Database: Rdbms In The Social Networks Age
Graphs in the Database: Rdbms In The Social Networks AgeGraphs in the Database: Rdbms In The Social Networks Age
Graphs in the Database: Rdbms In The Social Networks Age
 
Trees In The Database - Advanced data structures
Trees In The Database - Advanced data structuresTrees In The Database - Advanced data structures
Trees In The Database - Advanced data structures
 

Dernier

Beyond the Five Whys: Exploring the Hierarchical Causes with the Why-Why Diagram
Beyond the Five Whys: Exploring the Hierarchical Causes with the Why-Why DiagramBeyond the Five Whys: Exploring the Hierarchical Causes with the Why-Why Diagram
Beyond the Five Whys: Exploring the Hierarchical Causes with the Why-Why DiagramCIToolkit
 
Chapter 1 Performance Management HRM.ppt
Chapter 1 Performance Management HRM.pptChapter 1 Performance Management HRM.ppt
Chapter 1 Performance Management HRM.ppt2020102713
 
Digital PR Summit - Leadership Lessons: Myths, Mistakes, & Toxic Traits
Digital PR Summit - Leadership Lessons: Myths, Mistakes, & Toxic TraitsDigital PR Summit - Leadership Lessons: Myths, Mistakes, & Toxic Traits
Digital PR Summit - Leadership Lessons: Myths, Mistakes, & Toxic TraitsHannah Smith
 
Mind Mapping: A Visual Approach to Organize Ideas and Thoughts
Mind Mapping: A Visual Approach to Organize Ideas and ThoughtsMind Mapping: A Visual Approach to Organize Ideas and Thoughts
Mind Mapping: A Visual Approach to Organize Ideas and ThoughtsCIToolkit
 
Shaping Organizational Culture Beyond Wishful Thinking
Shaping Organizational Culture Beyond Wishful ThinkingShaping Organizational Culture Beyond Wishful Thinking
Shaping Organizational Culture Beyond Wishful ThinkingGiuseppe De Simone
 
Simplifying Complexity: How the Four-Field Matrix Reshapes Thinking
Simplifying Complexity: How the Four-Field Matrix Reshapes ThinkingSimplifying Complexity: How the Four-Field Matrix Reshapes Thinking
Simplifying Complexity: How the Four-Field Matrix Reshapes ThinkingCIToolkit
 
THE LEADERSHIP TO CHANGE THE WOLRD THIS IS YOUR HOUR PURSUES YOUR GIFT, TALEN...
THE LEADERSHIP TO CHANGE THE WOLRD THIS IS YOUR HOUR PURSUES YOUR GIFT, TALEN...THE LEADERSHIP TO CHANGE THE WOLRD THIS IS YOUR HOUR PURSUES YOUR GIFT, TALEN...
THE LEADERSHIP TO CHANGE THE WOLRD THIS IS YOUR HOUR PURSUES YOUR GIFT, TALEN...PROF. PAUL ALLIEU KAMARA
 
Effective learning in the Age of Hybrid Work - Agile Saturday Tallinn 2024
Effective learning in the Age of Hybrid Work - Agile Saturday Tallinn 2024Effective learning in the Age of Hybrid Work - Agile Saturday Tallinn 2024
Effective learning in the Age of Hybrid Work - Agile Saturday Tallinn 2024Giuseppe De Simone
 
From Goals to Actions: Uncovering the Key Components of Improvement Roadmaps
From Goals to Actions: Uncovering the Key Components of Improvement RoadmapsFrom Goals to Actions: Uncovering the Key Components of Improvement Roadmaps
From Goals to Actions: Uncovering the Key Components of Improvement RoadmapsCIToolkit
 
Choosing the best strategy qspm matrix.pptx
Choosing the best strategy qspm matrix.pptxChoosing the best strategy qspm matrix.pptx
Choosing the best strategy qspm matrix.pptxMadan Karki
 
Paired Comparison Analysis: A Practical Tool for Evaluating Options and Prior...
Paired Comparison Analysis: A Practical Tool for Evaluating Options and Prior...Paired Comparison Analysis: A Practical Tool for Evaluating Options and Prior...
Paired Comparison Analysis: A Practical Tool for Evaluating Options and Prior...CIToolkit
 
Unlocking Productivity and Personal Growth through the Importance-Urgency Matrix
Unlocking Productivity and Personal Growth through the Importance-Urgency MatrixUnlocking Productivity and Personal Growth through the Importance-Urgency Matrix
Unlocking Productivity and Personal Growth through the Importance-Urgency MatrixCIToolkit
 
How-How Diagram: A Practical Approach to Problem Resolution
How-How Diagram: A Practical Approach to Problem ResolutionHow-How Diagram: A Practical Approach to Problem Resolution
How-How Diagram: A Practical Approach to Problem ResolutionCIToolkit
 
Measuring True Process Yield using Robust Yield Metrics
Measuring True Process Yield using Robust Yield MetricsMeasuring True Process Yield using Robust Yield Metrics
Measuring True Process Yield using Robust Yield MetricsCIToolkit
 
The Final Activity in Project Management
The Final Activity in Project ManagementThe Final Activity in Project Management
The Final Activity in Project ManagementCIToolkit
 
Farmer Representative Organization in Lucknow | Rashtriya Kisan Manch
Farmer Representative Organization in Lucknow | Rashtriya Kisan ManchFarmer Representative Organization in Lucknow | Rashtriya Kisan Manch
Farmer Representative Organization in Lucknow | Rashtriya Kisan ManchRashtriya Kisan Manch
 
原版1:1复刻密西西比大学毕业证Mississippi毕业证留信学历认证
原版1:1复刻密西西比大学毕业证Mississippi毕业证留信学历认证原版1:1复刻密西西比大学毕业证Mississippi毕业证留信学历认证
原版1:1复刻密西西比大学毕业证Mississippi毕业证留信学历认证jdkhjh
 
From Red to Green: Enhancing Decision-Making with Traffic Light Assessment
From Red to Green: Enhancing Decision-Making with Traffic Light AssessmentFrom Red to Green: Enhancing Decision-Making with Traffic Light Assessment
From Red to Green: Enhancing Decision-Making with Traffic Light AssessmentCIToolkit
 

Dernier (18)

Beyond the Five Whys: Exploring the Hierarchical Causes with the Why-Why Diagram
Beyond the Five Whys: Exploring the Hierarchical Causes with the Why-Why DiagramBeyond the Five Whys: Exploring the Hierarchical Causes with the Why-Why Diagram
Beyond the Five Whys: Exploring the Hierarchical Causes with the Why-Why Diagram
 
Chapter 1 Performance Management HRM.ppt
Chapter 1 Performance Management HRM.pptChapter 1 Performance Management HRM.ppt
Chapter 1 Performance Management HRM.ppt
 
Digital PR Summit - Leadership Lessons: Myths, Mistakes, & Toxic Traits
Digital PR Summit - Leadership Lessons: Myths, Mistakes, & Toxic TraitsDigital PR Summit - Leadership Lessons: Myths, Mistakes, & Toxic Traits
Digital PR Summit - Leadership Lessons: Myths, Mistakes, & Toxic Traits
 
Mind Mapping: A Visual Approach to Organize Ideas and Thoughts
Mind Mapping: A Visual Approach to Organize Ideas and ThoughtsMind Mapping: A Visual Approach to Organize Ideas and Thoughts
Mind Mapping: A Visual Approach to Organize Ideas and Thoughts
 
Shaping Organizational Culture Beyond Wishful Thinking
Shaping Organizational Culture Beyond Wishful ThinkingShaping Organizational Culture Beyond Wishful Thinking
Shaping Organizational Culture Beyond Wishful Thinking
 
Simplifying Complexity: How the Four-Field Matrix Reshapes Thinking
Simplifying Complexity: How the Four-Field Matrix Reshapes ThinkingSimplifying Complexity: How the Four-Field Matrix Reshapes Thinking
Simplifying Complexity: How the Four-Field Matrix Reshapes Thinking
 
THE LEADERSHIP TO CHANGE THE WOLRD THIS IS YOUR HOUR PURSUES YOUR GIFT, TALEN...
THE LEADERSHIP TO CHANGE THE WOLRD THIS IS YOUR HOUR PURSUES YOUR GIFT, TALEN...THE LEADERSHIP TO CHANGE THE WOLRD THIS IS YOUR HOUR PURSUES YOUR GIFT, TALEN...
THE LEADERSHIP TO CHANGE THE WOLRD THIS IS YOUR HOUR PURSUES YOUR GIFT, TALEN...
 
Effective learning in the Age of Hybrid Work - Agile Saturday Tallinn 2024
Effective learning in the Age of Hybrid Work - Agile Saturday Tallinn 2024Effective learning in the Age of Hybrid Work - Agile Saturday Tallinn 2024
Effective learning in the Age of Hybrid Work - Agile Saturday Tallinn 2024
 
From Goals to Actions: Uncovering the Key Components of Improvement Roadmaps
From Goals to Actions: Uncovering the Key Components of Improvement RoadmapsFrom Goals to Actions: Uncovering the Key Components of Improvement Roadmaps
From Goals to Actions: Uncovering the Key Components of Improvement Roadmaps
 
Choosing the best strategy qspm matrix.pptx
Choosing the best strategy qspm matrix.pptxChoosing the best strategy qspm matrix.pptx
Choosing the best strategy qspm matrix.pptx
 
Paired Comparison Analysis: A Practical Tool for Evaluating Options and Prior...
Paired Comparison Analysis: A Practical Tool for Evaluating Options and Prior...Paired Comparison Analysis: A Practical Tool for Evaluating Options and Prior...
Paired Comparison Analysis: A Practical Tool for Evaluating Options and Prior...
 
Unlocking Productivity and Personal Growth through the Importance-Urgency Matrix
Unlocking Productivity and Personal Growth through the Importance-Urgency MatrixUnlocking Productivity and Personal Growth through the Importance-Urgency Matrix
Unlocking Productivity and Personal Growth through the Importance-Urgency Matrix
 
How-How Diagram: A Practical Approach to Problem Resolution
How-How Diagram: A Practical Approach to Problem ResolutionHow-How Diagram: A Practical Approach to Problem Resolution
How-How Diagram: A Practical Approach to Problem Resolution
 
Measuring True Process Yield using Robust Yield Metrics
Measuring True Process Yield using Robust Yield MetricsMeasuring True Process Yield using Robust Yield Metrics
Measuring True Process Yield using Robust Yield Metrics
 
The Final Activity in Project Management
The Final Activity in Project ManagementThe Final Activity in Project Management
The Final Activity in Project Management
 
Farmer Representative Organization in Lucknow | Rashtriya Kisan Manch
Farmer Representative Organization in Lucknow | Rashtriya Kisan ManchFarmer Representative Organization in Lucknow | Rashtriya Kisan Manch
Farmer Representative Organization in Lucknow | Rashtriya Kisan Manch
 
原版1:1复刻密西西比大学毕业证Mississippi毕业证留信学历认证
原版1:1复刻密西西比大学毕业证Mississippi毕业证留信学历认证原版1:1复刻密西西比大学毕业证Mississippi毕业证留信学历认证
原版1:1复刻密西西比大学毕业证Mississippi毕业证留信学历认证
 
From Red to Green: Enhancing Decision-Making with Traffic Light Assessment
From Red to Green: Enhancing Decision-Making with Traffic Light AssessmentFrom Red to Green: Enhancing Decision-Making with Traffic Light Assessment
From Red to Green: Enhancing Decision-Making with Traffic Light Assessment
 

Scaling teams, processes and architectures

  • 1. MANAGING GROWTH SCALING TEAMS, PROCESSES, ARCHITECTURES Lorenzo Alberton, CTO @ DataSift MEST, Accra 10 December 2017
  • 2. LORENZO ALBERTON Chief Technology Officer, DataSift http://alberton.info @lorenzoalberton
  • 6. CULTURE ➤ Treat people as volunteers (*) ➤ Lead by living the values you promote ➤ Respect, collaboration ➤ Promote fun in the workplace ➤ Culture of safety at work (**) (*) Peter Drucker (**) Google, Project Aristotle
  • 7. EFFECTIVE TEAMS PROJECTARISTOTLE(2012) Psychological safety: team climate characterised by interpersonal trust and mutual respect in which people are comfortable being themselves. Feeling free to share the things that scare us without fear of recriminations. Behaviours: conversational turn- taking and empathy. https://www.nytimes.com/2016/02/28/magazine/what-google-learned-from-its-quest-to-build-the-perfect-team.html
  • 8. TEAMS VS. INDIVIDUAL CONTRIBUTORS ➤ Beware of toxic people ➤ Value communication and team work over super-heroes (*) Sunday afternoon test
  • 9. STAFFING Don’t hire experts Technologies come and go Focus more on people with passion and less on people with specific skills
  • 10. TEAM SIZE ➤ Never underestimate the power of a small team ➤ Small teams force alignment and focus ➤ Bigger teams need an insane amount of overhead ➤ Parkinson's Law: “Work expands to fill the time available for its completion” work that keeps a person busy but has little value in itself
  • 11. TEAM STRUCTURE No artificial boundaries around languages or skills Try cross-functional teams 
 (less friction, better end to end collaboration, project ownership)
  • 12. MIDDLE-MANAGEMENT CURSE Mistakes: ➤ Prematurely re-organise for scale (deep hierarchy, over- specialisation) ➤ Process managers (factory mentality) vs Problem solvers ➤ Micromanagement ➤ Non-engineering culture ➤ 1-on-1s as calendar-filler ➤ Not being “on the ground” ➤ Over-confidence in tooling ➤ OTOH, coordination can be hard
  • 13. PART 2. PROCESSES How to make day to day operations smooth
  • 14. WHY ARE PROCESSES CRITICAL? Ease management of teams/projects Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis ➤ A process shouldn’t exist for the sake of it ➤ Introduce processes gradually, only keep what works ➤ Don’t put too much confidence in tools alone to fix issues
  • 15. EXAMPLE PROCESSES ➤ Development methodology ➤ Risk / Benefit analysis ➤ Prioritisation / Planning ➤ Design and code reviews ➤ Evaluating headroom / scale ➤ Load / Stress testing ➤ Test automation ➤ Deployment automation ➤ Release checklists ➤ Risk assessment/management ➤ Blameless postmortems
  • 16. PROMOTING SYSTEMS TO PROD ➤ Code reviews ➤ Dev, Test, Stage and Live environments ➤ Manual and automated QA processes ➤ Performance and stress testing ➤ Release check lists (runbook) ➤ Instrumentation checks ➤ Testing roll-back capability Protection from significant failures BARRIER CONDITIONS
  • 17. DESIGN AND CODE REVIEWS ➤ Promote collaboration ➤ Validate ideas, assess risk, detect flaws, simplify the solution ➤ Reason about behaviour before coding DAILY STAND-UPS ➤ Important for knowledge sharing, collaboration, alignment
  • 18. CONTROLLING CHANGE: RISK ESTIMATION http://dilbert.com/strips/comic/2008-05-08/ ➤ Limit / log the impact of changes ➤ Assess risk methodologies: • Gut feeling / finger in the air • Semaphore method • Failure Mode and Effect Analysis
  • 19. RISK MANAGEMENT ➤ Risk is cumulative ➤ Determine limits and tolerance ➤ Stress, long hours, peer pressure can multiply risk
  • 20. WHEN/WHAT TO SCALE: DETERMINING HEADROOM Capacity Current Load Why? Budget plan Prioritisation Hiring plan Determine starting point, remaining capacity, expected demand
  • 21. LOAD TESTING ➤ Identify, document and eliminate bottlenecks through a strict controlled process of measurement and analysis ➤ Measure system’s response and stability ➤ Verify the app can meet the desired performance objectives (SLA) ➤ Establish success criteria, test environment, tests, what needs to be monitored, what data needs to be collected
  • 22. STRESS TESTING ➤ Determine the app’s stability when subjected to above- normal loads ➤ Verify the app’s behaviour when close to the breaking point ➤ Positive testing: progressively increase load to overwhelm the system’s resources ➤ Negative testing: take away resources (memory, threads, connections) to test the application recoverability
  • 24. DO NOT SCALE UNTIL YOU CAN’T AVOID IT ANYMORE ➤ “Go meet your people. Do things that don’t scale.” (Paul Graham to AirBNB’s founders) ➤ Solve for specific problems ➤ Don’t generalise until you rebuilt something for the 3rd time ➤ Don’t over-engineer the solution ➤ Automate repetitive and error-prone tasks ➤ Avoid complicating things ✴ Phone system
  • 25. MVP APPROACH ➤ Test ideas before spending a year building something you haven’t proven in the market first ➤ Fake it till you make it ➤ Example: Zappos
  • 26. ARCHITECTURAL / DESIGN PRINCIPLES N + 1 nodes for rollback to be disabled (feature flags) to be monitored for multiple live systems/sites use mature technology asynchronous communications stateless systems +1 buy when non core
  • 27. FAULT-TOLERANT STRUCTURES ➤ Swim lanes: isolate and limit the impacts of failure within the system by segmenting pipelines ➤ Barrier and Guide (shard) ➤ Increase availability ➤ Make incidents easier to detect, identify and resolve
 ➤ Favour the transactions making the company money first ➤ Isolate functions causing repetitive problems (or busy tenants) ➤ Consider the natural layout or topology of the site
  • 28. SCALING IN DIFFERENT DIRECTIONS x y z AKF Scaling Cube, “The Art of Scalability”, M.L.Abbott, M.T.Fisher cloning of services and data without any bias (e.g. more serving nodes in a worker pool where any node can do the work) separation of work responsibility by type of data or type of work (different specialised worker pools) separation of work by customer or requestor (dedicated highly specialised worker pools)
  • 29. SCALING IN DIFFERENT DIRECTIONS - 1. SCALING WORK / APPS x cloning of entities or data - unbiased distribution of work y separation of work by activity or data z separation of work by person for whom the work is done web site
 (mirror 1) web site
 (mirror 2) search 
 server shopping cart server premium site standard site LB
  • 30. SCALING IN DIFFERENT DIRECTIONS - 1. SCALING WORK / APPS x mirroring + scale transactions - scale data y split by service + scale isolation + scale function data - scale customer data z split by need / location / value + scale isolation + scale customer data - scale function data
  • 31. SCALING IN DIFFERENT DIRECTIONS - 2. SCALING DATA x data cloning (replication / clustering) + load balancer y split different things by service / resource / data affinity z split similar things by modulus / hash- based lookups copy 1 copy 2 copy 3 ABC DEF GHI
  • 32. SCALING IN DIFFERENT DIRECTIONS - 2. SCALING DATA x data cloning (replication / clustering) + load balancer + easy to implement + scale transaction volume + useful in case of high read to write ratio - scale data size and growth y split different things by service / resource / data affinity + fault isolation + reduce query time - more difficult - data migration z split similar things by modulus / hash- based lookups + uniformly balanced demand + fault isolation + scale data and transactions - more costly
  • 33. QUEUES ➤ Asynchronous communication ➤ Workload distribution ➤ Failure isolation
  • 34. MESSAGE QUEUES AS BUFFERS (ASYNC COMM - DECOUPLING) CP Unpredictable load spikes CP Load normalisation / smoothing Batching ⇒ higher throughput source / producer sink / consumer
  • 35. WORKLOAD DISTRIBUTION - LOAD BALANCING Consumer 1 Consumer 2 Consumer 3 Producer push pull pull pull
  • 36. MULTIPLEXING pull Consumer fair-queuing: R1, R4, R5, R2, R6, R3 Producer 1 Producer 2 Producer 3 push R4 push R1, R2, R3 push R5, R6
  • 37. HIGH AVAILABILITY (PUB-SUB / BROADCAST) Listener 1 Listener 2 Listener 3 [Broadcast] Publisher 1 Publisher 2 [Dynamic Subscriptions]
  • 38. BOUND YOUR QUEUE SIZE - APPLY BACK PRESSURE CP
  • 39. MONITORING ➤ Measure all the things! ➤ Think about what metrics to track when you design your app: system/app/user level ➤ Engage with Ops / QA early on in the design phase ➤ Invest in a good monitoring solution ➤ Data integrity checks (bucket analysis, statistical analysis) ➤ Alerting and monitoring dashboards should be intuitive 39
  • 40. LOOK! RIB CAGES! INTUITIVE MONITORING DASHBOARDS: LIVE HEAT-MAPS
  • 42. LOOK! MONITORS! INTUITIVE MONITORING DASHBOARDS: LIVE HEAT-MAPS
  • 43. OTHER SCALING TIPS ➤ Use caching aggressively (CDNs, app & object caches) ➤ Design to scale out horizontally ➤ Simplify scope, design, implementation: lean == fast ➤ Know latencies ➤ Relax temporal constraints ➤ Discuss and Learn from mistakes ➤ Design for fault tolerance, graceful failure, and resilience ➤ Avoid SPOFs ➤ Avoid or distribute state ➤ Be competent