SlideShare une entreprise Scribd logo
1  sur  49
Télécharger pour lire hors ligne
@mmaibaum
DevOps - Nothing Stays The Same
Michael Maibaum
Also, introduction to me. I’m the chief architect at Sky Betting & Gaming and that means some of the stupider things you are about to hear are my fault. But not all of
them as I’ve only been there for just under five years and I didn’t start out as chief architect…
@mmaibaum
Precis
Sky Betting & Gaming has become one of the largest online operators in the UK, undergoing a period of sustained high growth in customer
numbers, transaction rate, staff size, and number of systems. Five years ago, the company established its first DevOps team, and since then,
DevOps has become a major part of the way Sky Betting & Gaming does things. However, what that means keeps changing. Michael Maibaum
describes how the DevOps function has changed repeatedly over the last few years to help the company continue to move fast and keep systems
operating through organizational and technical challenges.

Originally, the DevOps team was established as a group of like-minded engineers keen to smooth the delivery of software into operations and make
it run better. As the business grew, the engineering teams were split and the accumulated DevOps knowledge distributed into those new groups,
but the team soon found out that things didn’t fit into a distributed function and features of the platform that need ownership. As a result, platform
teams were formed to produce products that other teams use. Sky Betting & Gaming’s DevOps experts now come in two categories: those that
directly work in or with individual (product) engineering teams and those that deliver a platform that makes life easier for the rest of the engineering
function.

It is easy to see a narrow definition of DevOps as part of the function of a specific engineering team. However, in the experience of Sky Betting &
Gaming, to achieve a truly effective delivery and operational culture (and indeed, DevOps) once you have hundreds of engineers requires
investment in the platform as a product in and of itself.

Michael outlines the history of DevOps at Sky Betting & Gaming and explains how the company has taken its DevOps philosophy into its vendors
as it takes its first steps into the cloud.
@mmaibaum
Introducing Sky Betting & Gaming
• One of the top 3 online gambling
operators in the UK
• 3 Categories of product
• Sportsbook
• Free Sports related content
• Gaming
@mmaibaum
A Diverse Technology Stack
@mmaibaum
In the Beginning
Sky bought Sports Internet Group in early 2000s, primarily for it’s online properties in sports news, but came with Surrey Sports originally telebet only, a bit of online
starting to creep in by
@mmaibaum
A Story of Change & Growth
2010 2015
£50M
£350M
• Business grew slowly for the first 8
years post-acquisition
• Interactive tv was seen as the next big
thing
• Major growth period starts in ~2008
Over same sort of time frame, 2011-2016 gone from ~250 staff to 1200 staff. Doubled over the last year.
@mmaibaum
2008
interactive tv was seen as the next big thing and a key focus for the company…

As anyone who used interactive tv applications in the mid-2000s can probably testify, that didn’t work out that well… Focusing on the wrong product (interactive TV) with
little internal capability to evolve products
@mmaibaum
Infrastructure & Ops Only
• Tech Team
• No in-house development
• Hosting and operating third party vendors applications
• Waterfall project management and delivery
Small team, focussed on traditional server/system admin skills. Network, Storage, Compute, OS etc

Software delivery by third party vendors, very waterfall project management structure
@mmaibaum
Focus on the Web
• Increased focus on the web, but still delivered by third party vendor software teams
• Starting to deliver real customer & revenue growth at this point, company profits
start to grow.
still third party software, but starting to get more complexity, more services, more customers!
@mmaibaum
The End of the Beginning
• Business wanted to increase velocity
• More frequent change
• Cheaper to deliver new features
• More control
• Time to bring the user experience in-house
* Skybet begins working on a like-for-like replacement for the 3rd-party provided website. 

* 3rd party code present throughout the stack and to make substantial changes to the website is painful and slow. In-house development seen as the fix for this.

* About 170 staff now, majority not technical still

* Platform management starting to be a problem.
@mmaibaum
The First Problem
How to improve delivery & reliability from the in-house software teams?
Vendor
Ops
Dev
* Skybet begins working on a like-for-like replacement for the 3rd-party provided website. 

* 3rd party code present throughout the stack and to make substantial changes to the website is painful and slow. In-house development seen as the fix for this.

* Time passes - development occurs. While technically an agile team, they are doing the initial build-out and this is basically a waterfall model project.

* They realise that they are getting into a situation that they cannot dev their way out of - code is building up, but whether it's the right code, and how they get it out of
the door isn't well understood.
Backlog Wip Done Test Live
Delivery Team 1
Infrastructure
Service Desk
1st Line On Call
The First Answer - 2011
DevOps Team
2nd Line On Call
• Kev B suggests Devops practices; not sure about a team, but creates one to launch the ideas. Skybet Devops is born!

• 1 dev and 1 sysadmin loaned to Kev, along with headcount for a few more.

•Work on the website is moving fast but there are still whole epics to do.
@mmaibaum
DevOps
• Build tooling focussed on developer productivity and system reliability
• First CI pipelines with Jenkins
• Load testing, capacity and scaling (large, peaky events)
* The basic model at this point was that we sat together but often some members were off with the scrums, supporting their activities. They needed a local root for
test environment work, as development was rapid and a lot of configuration work was needed. Releases quite difficult. 

* Going to a release a day, mon-thur. 

* Analysed links to things that broke on Saturday, no correlation with releasing on Thur so started on Fri as well

* Then release on demand per ‘scrum’
@mmaibaum
Typical Saturday (Bets & Logins)
@mmaibaum
Centralised DevOps?
• Probably not want what you want to aspire to
• But… Can be a good way to start
• Start the cultural shift
• Solve the problem of not enough ‘DevOps’ to go around
@mmaibaum
DevOps Starting to Work
• Fit in well with increasing emphasis of agile delivery (Scrum, then Kanban)
• Central team provided a concentration of capability and culture
• Demonstrable wins important for adoption
• In-House Dev going well enough that we start work on Sky Vegas ‘in-house’
front end
* provided clear benefits to the still small dev groups so motivated to work together
@mmaibaum
Commits/MonthCommits/Release
@mmaibaum
March 2012
We go from 'does anyone think we
will be using the in-house site for
Grand National?' to 'does anyone
think we *won't* be using the in-
house site?'
* During this time Devops is starting to get recognised as a force in its own right at Skybet.

* Grand National was a roaring success. We grossly underestimated post-race logins but other than that it was very smooth. Everything worked pretty well and the in-
house dev and Devops teams had a good day. Traders, too :-)
@mmaibaum
Soon we had another problem…
How do we manage configuration for Disaster Recovery?
@mmaibaum
Config Management
ConfigureServer - Many custom perl scripts
Revision control via
something.pl.freds-test
something.pl.bak, something.pl.bak2, something.pl.old,
something.pl.not-sure-what-this-is-but-scared-to-delete-it
@mmaibaum
Platform Evolution
• Another Centralised Team
• This time born out of infrastructure and the DevOps team
• Created with a specific purpose (fix config management for DR, aka Chef
All the Things)
• This turned out to be hard - At least 1.5 years effort
• Lots of concurrent change, with little effective standardisation
This was at the start of the real growth curve in people and technical estate size. 

Made much harder with constant change going on all the time - advice here is automate early on config management, you won’t get it all right as you can’t predict the
future but it is an area worth the time. 

We could tell when we were winning: JD rebuilt all 70-odd lamp-web servers a couple at a time, it took all day on a Tuesday (but only one day) and no-one noticed. The
next day he met CEO outside and was asked how things were, JD said just finished a big project, rebuilt all the (lamp) web-servers a couple of days ago. CEO observed
he didn’t know, then realised he didn’t *need* to know. These changes were becoming safe enough that he didn’t need to care that they were happening.

It wasn’t all plain sailing though, we made quite a few mistakes where we broke far more of the platform, far more quickly than we could have done with the old tooling-
in this area we were definitely less mature in terms of testing (and making the systems testable) than in our more typical software development.
Publish
Applications
Infrastructure
Code
(Chef, Ruby,
ServerSpec)
Publish
Application
Code
(PHP, NodeJS, React,
Java)
Release
Configuration
Orchestration
Chef code is released and applied and changes are delivered into test and production environments
With this we now had great power and that in of itself caused some problems… For example we once accidentally upgraded our MongoDB cluster because a team updated the version in our yum repos but the version in the various environments wasn’t pinned - so on the next chef run, the mongoDB cluster
happily upgraded itself… with somewhat less than happy results for our service. Also, MySQL upgrade as it was easy - but not well enough tested (query cache configuration options changed)
@mmaibaum
The Beginning of the Middle
The birth of tribes
About 300 staff, probably about 100 in the ‘tech team’

The hordes were starting to overwhelm the DevOps

DevOps team becoming a victim of their success, first port of call, massively in demand
@mmaibaum
DevOps was in danger of becoming Ops
• With teams growing and changing the ways they work, a centralised
devops team increasingly mis-aligned.
• DevOps engineers were spread out around different teams
* Difficult for DevOps team to prioritise requirements and becoming a choke point for other teams

* Hard for a single devOps team to know all the services

* Ways of working/running services starting to diverge

* technology choices starting to diverge (e.g. MongoDB in bet, not in other areas)

* The rapid pace of change you enable can easily swamp you
@mmaibaum
Tribes
• Inspired by the Spotify white
paper
• Overall team getting too big
• Sub-divide into autonomous
teams first at main product level
(tribes, e.g. bet/gaming) then
squads within those.
Business is now about 300 people (2013)
Core Tribe
Gaming Tribe
Infra Tribe
Bet Tribe
Growing Pains - 2013
Backlog Wip Done Test Live Backlog Wip Done Test Live
Web Experience Place & Track Squad
Platform Ops
Service Desk
1st Line On Call
Backlog Wip Done Test Live Backlog Wip Done Test Live
Casino Squad Vegas Squad
Backlog Wip Done Test Live Backlog Wip Done Test Live
Platform Evo Account Squad
Backlog Wip Done Test Live
Infra Squad
SLM
Security
•So we have many squads
•Supporting functions like SLM, service desk and security
@mmaibaum
Better…
• Alignment with development
• Ownership of Ops issues in squads
• Knowledge of services each ‘DevOp’ was working with
@mmaibaum
But…
• The ‘DevOps’ were still the first on-call, cross tribe
• Increasingly limited knowledge of other teams services
• Team size awkward,
• too many services for individuals to know all services,
• not big enough to populate on-call with the right Ops skills
@mmaibaum
And…
Publish
Applications
Infrastructure
Code
(Chef, Ruby,
ServerSpec)
Publish
Application
Code
(PHP, NodeJS, React,
Java)
Release
Configuration
Orchestration
if you remember this slide from earlier - there is actually a problem here…
Publish
Applications
Infrastructure
Code
(Chef, Ruby,
ServerSpec)
Publish
Application
Code
(PHP, NodeJS, React,
Java)
Release
Configuration
Orchestration
Integration/Test/Production
e.g. accidentally regressed a application config feature switch, released config out of sync with code or code out of sync with config = or changed the base system and
broke things e.g. when an OpenSSL config update broke a bunch of applications that hadn’t been well tested for compatibility
Dev Ops
Publish
Applications
Infrastructure
Code
(Chef, Ruby,
ServerSpec)
Publish
Application
Code
(PHP, NodeJS, React,
Java)
Release
Configuration
Orchestration
Integration/Test
Also, looking at this we can clearly still see a separation between dev and ops here. They’re all part of the same team, but they’re using different tools for different jobs with different deployment pipelines, so the
opportunities for collaboration are limited.
so, hold this thought, we’ll come back to it.
@mmaibaum
The Middle
of the Middle
aka 2014
now up to about 400 staff
@mmaibaum
Tribes have local focus
• Optimising for local concerns
• Delivery of that product
• Improvement of their technology stack
• Improving their processes
• Local service delivery teams
• Bet WebOps team (monitoring and so on)
• Bet SRE team
• Bet Delivery Engineering
@mmaibaum
Squads taking ownership…
• End to end ownership
• Design, Build, Run, Change, Fix, Retire
• Full support in a team - on call
• There are specialists, but they aren’t the only people that can do
things
Can talk about Core Account taking control of it’s monitoring, backing out of somewhat-overwhelmed/unloved central service. Allowed them to refactor and improve it,
and do new things like link alarms to PagerDuty (too many false alarms in the central service) - but still publish service level health information central service for other
teams to consume
@mmaibaum
But…
Should everything be a local concern?
@mmaibaum
Cross Cutting Platform Features
• What happened to Platform Evolution?…. It evolved into Platform Services
• There is a wider set of ‘PaaS’ like services that would be useful across the business
• Counterbalance ‘everything local’ inefficiencies
• What
• PlatCI - Our CI as a service platform (Jenkins etc),
• Shared Kafka - Messaging Platform as a Service
• Self Contained Projects - Get rid of the Dev/Ops tooling projects/tooling splits
other examples, repo management, distributed command execution on servers, VMWare integration (build, images, DRS rules etc) network automation (currently around
config/rule management for our Layer 7 load balancers and soon firewall object group membership - more in the development pipeline)
Orchestration
Application + Config
Build
Jenkins
Publish
Cookbook (sbg_myapp)
•Infrastructure Code (Chef, Inspec, Custom Resources)
•Application Code (PHP, NodeJS, Java)
•CI Pipeline (Jenkins Pipeline, Chef)
•Integration Tests (Kitchen, Chef)
* Chef recipes don’t just have to be used to write system configuration or install packages. With Test Kitchen and Docker, we can use Chef DSL to perform and test any action inside the container.
* Replacing CI integration bash scripts usually run by Jenkins with Chef DSL run by Test Kitchen makes these scripts testable and version controlled in the same way as Chef cookbooks.
* Developers and operations are now talking a common language, meaning a step change in collaboration.
* This means that we can write Test Kitchen suites that do things such as check out git repositories, execute Mocha tests, run ESLint for Node.js, or install a compiler and build a binary, or do something with Maven. Endless possibilities!
* Jenkins Pipeline (a plugin for Jenkins maintained by CloudBees) that allows you to configure your jobs as a Groovy-based DSL.
* The plugin allows job definitions to be stored and run directly from source control, which means the Jenkins pipeline can also be stored in the same git repository as the application and infrastructure code. We create ‘stub’ Jenkins jobs for each of our services, and
these jobs run Pipeline DSL from the git repository maintained by the service owning team.
pscli
The ‘Glue’ - enables the consistent composition of toolsets in different environments
• Internal Tool
• Written in Go
• Pulls in various ‘tools’ Docker images
• Executes tools in containers, e.g.
• ChefDK
• Terraform
• Packer
• AWS Authentication
• Hashicorp Vault
• Code Generation
pscli generate cookbook myapp
Git
--volumes-from
/git
/opt/chefdk
ChefDK
--volumes-from
Docker Registry
Code Generator
--volumes-from
/generator
{command runner} ~/workspace/myapp
pscli kitchen converge
Git
--volumes-from
/git
{command runner}
Kitchen Suite A
Kitchen Suite B
/opt/chefdk
ChefDK
--volumes-from
Docker Registry
pscli terraform apply
/opt/terraform
Terraform
--volumes-from
Docker Registry
{command runner}
@mmaibaum
Vendors
• Bad vendor relationships can cripple progress
• Or they can enable it
• It is in your interest to help them as much as you can
* A path dependency and some of the biggest barriers to realising your ambitions of efficient, lean, reliable delivery across your organisation

*
@mmaibaum
Delivery Partners
• Bad vendor relationships can cripple progress
• Or they can enable it
• It is in your interest to help them as much as you can
* A large part of our application is delivered by a third party software house and part of our journey has been learning how to work with them better. 

* Release automation. 

* Shared test packs (why keep ours ‘secret’ if they can use it to accelerate their work and produce better quality)

* Sharing our work on composing local dev/test environments using docker and pscli - recently had our first release to a test environment that could be tested before it
got there, including testing the chef cookbooks, an expected data set, representative and consistent configuration etc. 

* We’ve workshopped, we’ve shared experience, we’ve told them that deployment time and reliability matters, we’ve sent our agile/lean experts to work with their team.
We’ve built automation for them and evangelised it’s use. 

* It isn’t always easy but if you get software from a third party and it is a significant part of your application this can be really worthwhile.
@mmaibaum
Tribes Getting Too Big
• Feeling the pain of growth again
• Bet Tribe bigger than whole tech organisation was 3 years earlier
• Break up of bet tribe into smaller, nested, tribes
• Making more specialist roles closer to each ‘product’ delivery squad (e.g.
SRE as part of a squad)
• Fuelled by
•30% YOY growth of customers, stakes and traffic
•80% buyout from sky by CVC, more investment
Whole org went from around 600 to around 1200 in 12 months (mid 2015-mid 2016), technology hired 200 new staff in 20 weeks.
@mmaibaum
Two kinds of ‘DevOps’
• People in every delivery team, some of these are DevOps specialists but the
whole team cares about the whole product lifecycle
• People in specialist teams working on shared platform capabilities
• Platform Services - Cross Tribe Services
• Platform Engineering (Big tribes have their own ‘shared’ services)
• Delivery Engineering (Specialists in tribes helping squads optimise
reliability & delivery, especially things like release engineering, CI, etc)
* Are these all DevOps teams? 

* They are all working to improve the ability of the business to deliver value by helping deliver technical products to production - they are all working on tooling and
systems to bring development and ops closer together

* Who cares what they are called.
@mmaibaum
Path Dependency
• It really matters where you are, and where you are coming from
• At least as much as where you’d like to go to.
• There isn’t a path, because there isn’t an environment (and it changes)
* It was hard to Chef things because we had already got a fairly large estate with little consistency and poor testing. It was even harder because the business was
growing fast (customers, staff and number of services). But we had to do it to have a chance of implementing a reliable, consistent site and establish a real DR
capability. 

* Our initial monolithic Chef organisation made sense with a central team “cheffing all the things” but does cause problems as the teams grow - know we have dozens of
chef orgs as people gradually split stuff apart 

* The same thing will be true for other organisations. Decisions made in the past strongly influence what is right at any one time

* Minesweeper vs Daemonised Chef
@mmaibaum
The End?
One of the key points is it really matters where you are, and where you are coming from - at least as much as where you’d like to go to.
@mmaibaum
There is no End
Except of this talk
http://engineering.skybettingandgaming.com

Contenu connexe

En vedette

How slow load times hurt UX (and what you can do about it) [FluentConf 2016]
How slow load times hurt UX (and what you can do about it) [FluentConf 2016]How slow load times hurt UX (and what you can do about it) [FluentConf 2016]
How slow load times hurt UX (and what you can do about it) [FluentConf 2016]Tammy Everts
 
Service workers - Velocity 2016 Training
Service workers - Velocity 2016 TrainingService workers - Velocity 2016 Training
Service workers - Velocity 2016 TrainingPatrick Meenan
 
Scala in practice
Scala in practiceScala in practice
Scala in practiceTomer Gabel
 
Scaling Front-End Performance - Velocity 2016
Scaling Front-End Performance - Velocity 2016Scaling Front-End Performance - Velocity 2016
Scaling Front-End Performance - Velocity 2016Patrick Meenan
 
TLS - 2016 Velocity Training
TLS - 2016 Velocity TrainingTLS - 2016 Velocity Training
TLS - 2016 Velocity TrainingPatrick Meenan
 
Discover how Adidas is using data science to deliver third-party governance
Discover how Adidas is using data science to deliver third-party governanceDiscover how Adidas is using data science to deliver third-party governance
Discover how Adidas is using data science to deliver third-party governanceKristian Sköld
 
Real world experiences with HTTP/2 (Michael Gooding, Javier Garza from Akamai)
Real world experiences with HTTP/2 (Michael Gooding, Javier Garza from Akamai)Real world experiences with HTTP/2 (Michael Gooding, Javier Garza from Akamai)
Real world experiences with HTTP/2 (Michael Gooding, Javier Garza from Akamai)💻 Javier Garza
 
Kim Goodwin on UX Leadership 2011 04
Kim Goodwin on UX Leadership 2011 04Kim Goodwin on UX Leadership 2011 04
Kim Goodwin on UX Leadership 2011 04Kim Goodwin
 
Designing how we design - UXCamp Ottawa 2014 closing keynote
Designing how we design - UXCamp Ottawa 2014 closing keynoteDesigning how we design - UXCamp Ottawa 2014 closing keynote
Designing how we design - UXCamp Ottawa 2014 closing keynoteKim Goodwin
 
Using machine learning to determine drivers of bounce and conversion
Using machine learning to determine drivers of bounce and conversionUsing machine learning to determine drivers of bounce and conversion
Using machine learning to determine drivers of bounce and conversionTammy Everts
 
Technology Roadmapping
Technology RoadmappingTechnology Roadmapping
Technology Roadmappinggmeric
 
Value Chain Canvas Model an Enterprise Architecture Framework
Value Chain Canvas Model an Enterprise Architecture FrameworkValue Chain Canvas Model an Enterprise Architecture Framework
Value Chain Canvas Model an Enterprise Architecture FrameworkRené MANDEL
 
The Government of New Brunswick Enterprise Architecture Roadmap
The Government of New Brunswick Enterprise Architecture RoadmapThe Government of New Brunswick Enterprise Architecture Roadmap
The Government of New Brunswick Enterprise Architecture RoadmapTamim Rahman
 
Big Picture Design: Systems, Strategy, Policy, Delivery
Big Picture Design: Systems, Strategy, Policy, DeliveryBig Picture Design: Systems, Strategy, Policy, Delivery
Big Picture Design: Systems, Strategy, Policy, DeliveryJess McMullin
 
Why We Need Architects (and Architecture) on Agile Projects
Why We Need Architects (and Architecture) on Agile ProjectsWhy We Need Architects (and Architecture) on Agile Projects
Why We Need Architects (and Architecture) on Agile ProjectsRebecca Wirfs-Brock
 
Technology Roadmapping
Technology RoadmappingTechnology Roadmapping
Technology RoadmappingJazziator
 

En vedette (18)

How slow load times hurt UX (and what you can do about it) [FluentConf 2016]
How slow load times hurt UX (and what you can do about it) [FluentConf 2016]How slow load times hurt UX (and what you can do about it) [FluentConf 2016]
How slow load times hurt UX (and what you can do about it) [FluentConf 2016]
 
Service workers - Velocity 2016 Training
Service workers - Velocity 2016 TrainingService workers - Velocity 2016 Training
Service workers - Velocity 2016 Training
 
Scala in practice
Scala in practiceScala in practice
Scala in practice
 
Scaling Front-End Performance - Velocity 2016
Scaling Front-End Performance - Velocity 2016Scaling Front-End Performance - Velocity 2016
Scaling Front-End Performance - Velocity 2016
 
TLS - 2016 Velocity Training
TLS - 2016 Velocity TrainingTLS - 2016 Velocity Training
TLS - 2016 Velocity Training
 
Discover how Adidas is using data science to deliver third-party governance
Discover how Adidas is using data science to deliver third-party governanceDiscover how Adidas is using data science to deliver third-party governance
Discover how Adidas is using data science to deliver third-party governance
 
Real world experiences with HTTP/2 (Michael Gooding, Javier Garza from Akamai)
Real world experiences with HTTP/2 (Michael Gooding, Javier Garza from Akamai)Real world experiences with HTTP/2 (Michael Gooding, Javier Garza from Akamai)
Real world experiences with HTTP/2 (Michael Gooding, Javier Garza from Akamai)
 
Measuring Continuity
Measuring ContinuityMeasuring Continuity
Measuring Continuity
 
Kim Goodwin on UX Leadership 2011 04
Kim Goodwin on UX Leadership 2011 04Kim Goodwin on UX Leadership 2011 04
Kim Goodwin on UX Leadership 2011 04
 
Designing how we design - UXCamp Ottawa 2014 closing keynote
Designing how we design - UXCamp Ottawa 2014 closing keynoteDesigning how we design - UXCamp Ottawa 2014 closing keynote
Designing how we design - UXCamp Ottawa 2014 closing keynote
 
Agile Architecture
Agile ArchitectureAgile Architecture
Agile Architecture
 
Using machine learning to determine drivers of bounce and conversion
Using machine learning to determine drivers of bounce and conversionUsing machine learning to determine drivers of bounce and conversion
Using machine learning to determine drivers of bounce and conversion
 
Technology Roadmapping
Technology RoadmappingTechnology Roadmapping
Technology Roadmapping
 
Value Chain Canvas Model an Enterprise Architecture Framework
Value Chain Canvas Model an Enterprise Architecture FrameworkValue Chain Canvas Model an Enterprise Architecture Framework
Value Chain Canvas Model an Enterprise Architecture Framework
 
The Government of New Brunswick Enterprise Architecture Roadmap
The Government of New Brunswick Enterprise Architecture RoadmapThe Government of New Brunswick Enterprise Architecture Roadmap
The Government of New Brunswick Enterprise Architecture Roadmap
 
Big Picture Design: Systems, Strategy, Policy, Delivery
Big Picture Design: Systems, Strategy, Policy, DeliveryBig Picture Design: Systems, Strategy, Policy, Delivery
Big Picture Design: Systems, Strategy, Policy, Delivery
 
Why We Need Architects (and Architecture) on Agile Projects
Why We Need Architects (and Architecture) on Agile ProjectsWhy We Need Architects (and Architecture) on Agile Projects
Why We Need Architects (and Architecture) on Agile Projects
 
Technology Roadmapping
Technology RoadmappingTechnology Roadmapping
Technology Roadmapping
 

Dernier

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Dernier (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

DevOps - Nothing Stays the Same (With notes)

  • 1. @mmaibaum DevOps - Nothing Stays The Same Michael Maibaum Also, introduction to me. I’m the chief architect at Sky Betting & Gaming and that means some of the stupider things you are about to hear are my fault. But not all of them as I’ve only been there for just under five years and I didn’t start out as chief architect…
  • 2. @mmaibaum Precis Sky Betting & Gaming has become one of the largest online operators in the UK, undergoing a period of sustained high growth in customer numbers, transaction rate, staff size, and number of systems. Five years ago, the company established its first DevOps team, and since then, DevOps has become a major part of the way Sky Betting & Gaming does things. However, what that means keeps changing. Michael Maibaum describes how the DevOps function has changed repeatedly over the last few years to help the company continue to move fast and keep systems operating through organizational and technical challenges. Originally, the DevOps team was established as a group of like-minded engineers keen to smooth the delivery of software into operations and make it run better. As the business grew, the engineering teams were split and the accumulated DevOps knowledge distributed into those new groups, but the team soon found out that things didn’t fit into a distributed function and features of the platform that need ownership. As a result, platform teams were formed to produce products that other teams use. Sky Betting & Gaming’s DevOps experts now come in two categories: those that directly work in or with individual (product) engineering teams and those that deliver a platform that makes life easier for the rest of the engineering function. It is easy to see a narrow definition of DevOps as part of the function of a specific engineering team. However, in the experience of Sky Betting & Gaming, to achieve a truly effective delivery and operational culture (and indeed, DevOps) once you have hundreds of engineers requires investment in the platform as a product in and of itself. Michael outlines the history of DevOps at Sky Betting & Gaming and explains how the company has taken its DevOps philosophy into its vendors as it takes its first steps into the cloud.
  • 3. @mmaibaum Introducing Sky Betting & Gaming • One of the top 3 online gambling operators in the UK • 3 Categories of product • Sportsbook • Free Sports related content • Gaming
  • 5. @mmaibaum In the Beginning Sky bought Sports Internet Group in early 2000s, primarily for it’s online properties in sports news, but came with Surrey Sports originally telebet only, a bit of online starting to creep in by
  • 6. @mmaibaum A Story of Change & Growth 2010 2015 £50M £350M • Business grew slowly for the first 8 years post-acquisition • Interactive tv was seen as the next big thing • Major growth period starts in ~2008 Over same sort of time frame, 2011-2016 gone from ~250 staff to 1200 staff. Doubled over the last year.
  • 7. @mmaibaum 2008 interactive tv was seen as the next big thing and a key focus for the company… As anyone who used interactive tv applications in the mid-2000s can probably testify, that didn’t work out that well… Focusing on the wrong product (interactive TV) with little internal capability to evolve products
  • 8. @mmaibaum Infrastructure & Ops Only • Tech Team • No in-house development • Hosting and operating third party vendors applications • Waterfall project management and delivery Small team, focussed on traditional server/system admin skills. Network, Storage, Compute, OS etc Software delivery by third party vendors, very waterfall project management structure
  • 9. @mmaibaum Focus on the Web • Increased focus on the web, but still delivered by third party vendor software teams • Starting to deliver real customer & revenue growth at this point, company profits start to grow. still third party software, but starting to get more complexity, more services, more customers!
  • 10. @mmaibaum The End of the Beginning • Business wanted to increase velocity • More frequent change • Cheaper to deliver new features • More control • Time to bring the user experience in-house * Skybet begins working on a like-for-like replacement for the 3rd-party provided website. * 3rd party code present throughout the stack and to make substantial changes to the website is painful and slow. In-house development seen as the fix for this. * About 170 staff now, majority not technical still * Platform management starting to be a problem.
  • 11. @mmaibaum The First Problem How to improve delivery & reliability from the in-house software teams? Vendor Ops Dev * Skybet begins working on a like-for-like replacement for the 3rd-party provided website. * 3rd party code present throughout the stack and to make substantial changes to the website is painful and slow. In-house development seen as the fix for this. * Time passes - development occurs. While technically an agile team, they are doing the initial build-out and this is basically a waterfall model project. * They realise that they are getting into a situation that they cannot dev their way out of - code is building up, but whether it's the right code, and how they get it out of the door isn't well understood.
  • 12. Backlog Wip Done Test Live Delivery Team 1 Infrastructure Service Desk 1st Line On Call The First Answer - 2011 DevOps Team 2nd Line On Call • Kev B suggests Devops practices; not sure about a team, but creates one to launch the ideas. Skybet Devops is born! • 1 dev and 1 sysadmin loaned to Kev, along with headcount for a few more. •Work on the website is moving fast but there are still whole epics to do.
  • 13. @mmaibaum DevOps • Build tooling focussed on developer productivity and system reliability • First CI pipelines with Jenkins • Load testing, capacity and scaling (large, peaky events) * The basic model at this point was that we sat together but often some members were off with the scrums, supporting their activities. They needed a local root for test environment work, as development was rapid and a lot of configuration work was needed. Releases quite difficult. * Going to a release a day, mon-thur. * Analysed links to things that broke on Saturday, no correlation with releasing on Thur so started on Fri as well * Then release on demand per ‘scrum’
  • 15. @mmaibaum Centralised DevOps? • Probably not want what you want to aspire to • But… Can be a good way to start • Start the cultural shift • Solve the problem of not enough ‘DevOps’ to go around
  • 16. @mmaibaum DevOps Starting to Work • Fit in well with increasing emphasis of agile delivery (Scrum, then Kanban) • Central team provided a concentration of capability and culture • Demonstrable wins important for adoption • In-House Dev going well enough that we start work on Sky Vegas ‘in-house’ front end * provided clear benefits to the still small dev groups so motivated to work together
  • 18. @mmaibaum March 2012 We go from 'does anyone think we will be using the in-house site for Grand National?' to 'does anyone think we *won't* be using the in- house site?' * During this time Devops is starting to get recognised as a force in its own right at Skybet. * Grand National was a roaring success. We grossly underestimated post-race logins but other than that it was very smooth. Everything worked pretty well and the in- house dev and Devops teams had a good day. Traders, too :-)
  • 19. @mmaibaum Soon we had another problem… How do we manage configuration for Disaster Recovery?
  • 20. @mmaibaum Config Management ConfigureServer - Many custom perl scripts Revision control via something.pl.freds-test something.pl.bak, something.pl.bak2, something.pl.old, something.pl.not-sure-what-this-is-but-scared-to-delete-it
  • 21. @mmaibaum Platform Evolution • Another Centralised Team • This time born out of infrastructure and the DevOps team • Created with a specific purpose (fix config management for DR, aka Chef All the Things) • This turned out to be hard - At least 1.5 years effort • Lots of concurrent change, with little effective standardisation This was at the start of the real growth curve in people and technical estate size. Made much harder with constant change going on all the time - advice here is automate early on config management, you won’t get it all right as you can’t predict the future but it is an area worth the time. We could tell when we were winning: JD rebuilt all 70-odd lamp-web servers a couple at a time, it took all day on a Tuesday (but only one day) and no-one noticed. The next day he met CEO outside and was asked how things were, JD said just finished a big project, rebuilt all the (lamp) web-servers a couple of days ago. CEO observed he didn’t know, then realised he didn’t *need* to know. These changes were becoming safe enough that he didn’t need to care that they were happening. It wasn’t all plain sailing though, we made quite a few mistakes where we broke far more of the platform, far more quickly than we could have done with the old tooling- in this area we were definitely less mature in terms of testing (and making the systems testable) than in our more typical software development.
  • 22. Publish Applications Infrastructure Code (Chef, Ruby, ServerSpec) Publish Application Code (PHP, NodeJS, React, Java) Release Configuration Orchestration Chef code is released and applied and changes are delivered into test and production environments With this we now had great power and that in of itself caused some problems… For example we once accidentally upgraded our MongoDB cluster because a team updated the version in our yum repos but the version in the various environments wasn’t pinned - so on the next chef run, the mongoDB cluster happily upgraded itself… with somewhat less than happy results for our service. Also, MySQL upgrade as it was easy - but not well enough tested (query cache configuration options changed)
  • 23. @mmaibaum The Beginning of the Middle The birth of tribes About 300 staff, probably about 100 in the ‘tech team’ The hordes were starting to overwhelm the DevOps DevOps team becoming a victim of their success, first port of call, massively in demand
  • 24. @mmaibaum DevOps was in danger of becoming Ops • With teams growing and changing the ways they work, a centralised devops team increasingly mis-aligned. • DevOps engineers were spread out around different teams * Difficult for DevOps team to prioritise requirements and becoming a choke point for other teams * Hard for a single devOps team to know all the services * Ways of working/running services starting to diverge * technology choices starting to diverge (e.g. MongoDB in bet, not in other areas) * The rapid pace of change you enable can easily swamp you
  • 25. @mmaibaum Tribes • Inspired by the Spotify white paper • Overall team getting too big • Sub-divide into autonomous teams first at main product level (tribes, e.g. bet/gaming) then squads within those. Business is now about 300 people (2013)
  • 26. Core Tribe Gaming Tribe Infra Tribe Bet Tribe Growing Pains - 2013 Backlog Wip Done Test Live Backlog Wip Done Test Live Web Experience Place & Track Squad Platform Ops Service Desk 1st Line On Call Backlog Wip Done Test Live Backlog Wip Done Test Live Casino Squad Vegas Squad Backlog Wip Done Test Live Backlog Wip Done Test Live Platform Evo Account Squad Backlog Wip Done Test Live Infra Squad SLM Security •So we have many squads •Supporting functions like SLM, service desk and security
  • 27. @mmaibaum Better… • Alignment with development • Ownership of Ops issues in squads • Knowledge of services each ‘DevOp’ was working with
  • 28. @mmaibaum But… • The ‘DevOps’ were still the first on-call, cross tribe • Increasingly limited knowledge of other teams services • Team size awkward, • too many services for individuals to know all services, • not big enough to populate on-call with the right Ops skills
  • 30. Publish Applications Infrastructure Code (Chef, Ruby, ServerSpec) Publish Application Code (PHP, NodeJS, React, Java) Release Configuration Orchestration if you remember this slide from earlier - there is actually a problem here…
  • 31. Publish Applications Infrastructure Code (Chef, Ruby, ServerSpec) Publish Application Code (PHP, NodeJS, React, Java) Release Configuration Orchestration Integration/Test/Production e.g. accidentally regressed a application config feature switch, released config out of sync with code or code out of sync with config = or changed the base system and broke things e.g. when an OpenSSL config update broke a bunch of applications that hadn’t been well tested for compatibility
  • 32. Dev Ops Publish Applications Infrastructure Code (Chef, Ruby, ServerSpec) Publish Application Code (PHP, NodeJS, React, Java) Release Configuration Orchestration Integration/Test Also, looking at this we can clearly still see a separation between dev and ops here. They’re all part of the same team, but they’re using different tools for different jobs with different deployment pipelines, so the opportunities for collaboration are limited. so, hold this thought, we’ll come back to it.
  • 33. @mmaibaum The Middle of the Middle aka 2014 now up to about 400 staff
  • 34. @mmaibaum Tribes have local focus • Optimising for local concerns • Delivery of that product • Improvement of their technology stack • Improving their processes • Local service delivery teams • Bet WebOps team (monitoring and so on) • Bet SRE team • Bet Delivery Engineering
  • 35. @mmaibaum Squads taking ownership… • End to end ownership • Design, Build, Run, Change, Fix, Retire • Full support in a team - on call • There are specialists, but they aren’t the only people that can do things Can talk about Core Account taking control of it’s monitoring, backing out of somewhat-overwhelmed/unloved central service. Allowed them to refactor and improve it, and do new things like link alarms to PagerDuty (too many false alarms in the central service) - but still publish service level health information central service for other teams to consume
  • 37. @mmaibaum Cross Cutting Platform Features • What happened to Platform Evolution?…. It evolved into Platform Services • There is a wider set of ‘PaaS’ like services that would be useful across the business • Counterbalance ‘everything local’ inefficiencies • What • PlatCI - Our CI as a service platform (Jenkins etc), • Shared Kafka - Messaging Platform as a Service • Self Contained Projects - Get rid of the Dev/Ops tooling projects/tooling splits other examples, repo management, distributed command execution on servers, VMWare integration (build, images, DRS rules etc) network automation (currently around config/rule management for our Layer 7 load balancers and soon firewall object group membership - more in the development pipeline)
  • 38. Orchestration Application + Config Build Jenkins Publish Cookbook (sbg_myapp) •Infrastructure Code (Chef, Inspec, Custom Resources) •Application Code (PHP, NodeJS, Java) •CI Pipeline (Jenkins Pipeline, Chef) •Integration Tests (Kitchen, Chef) * Chef recipes don’t just have to be used to write system configuration or install packages. With Test Kitchen and Docker, we can use Chef DSL to perform and test any action inside the container. * Replacing CI integration bash scripts usually run by Jenkins with Chef DSL run by Test Kitchen makes these scripts testable and version controlled in the same way as Chef cookbooks. * Developers and operations are now talking a common language, meaning a step change in collaboration. * This means that we can write Test Kitchen suites that do things such as check out git repositories, execute Mocha tests, run ESLint for Node.js, or install a compiler and build a binary, or do something with Maven. Endless possibilities! * Jenkins Pipeline (a plugin for Jenkins maintained by CloudBees) that allows you to configure your jobs as a Groovy-based DSL. * The plugin allows job definitions to be stored and run directly from source control, which means the Jenkins pipeline can also be stored in the same git repository as the application and infrastructure code. We create ‘stub’ Jenkins jobs for each of our services, and these jobs run Pipeline DSL from the git repository maintained by the service owning team.
  • 39. pscli The ‘Glue’ - enables the consistent composition of toolsets in different environments • Internal Tool • Written in Go • Pulls in various ‘tools’ Docker images • Executes tools in containers, e.g. • ChefDK • Terraform • Packer • AWS Authentication • Hashicorp Vault • Code Generation
  • 40. pscli generate cookbook myapp Git --volumes-from /git /opt/chefdk ChefDK --volumes-from Docker Registry Code Generator --volumes-from /generator {command runner} ~/workspace/myapp
  • 41. pscli kitchen converge Git --volumes-from /git {command runner} Kitchen Suite A Kitchen Suite B /opt/chefdk ChefDK --volumes-from Docker Registry
  • 43. @mmaibaum Vendors • Bad vendor relationships can cripple progress • Or they can enable it • It is in your interest to help them as much as you can * A path dependency and some of the biggest barriers to realising your ambitions of efficient, lean, reliable delivery across your organisation *
  • 44. @mmaibaum Delivery Partners • Bad vendor relationships can cripple progress • Or they can enable it • It is in your interest to help them as much as you can * A large part of our application is delivered by a third party software house and part of our journey has been learning how to work with them better. * Release automation. * Shared test packs (why keep ours ‘secret’ if they can use it to accelerate their work and produce better quality) * Sharing our work on composing local dev/test environments using docker and pscli - recently had our first release to a test environment that could be tested before it got there, including testing the chef cookbooks, an expected data set, representative and consistent configuration etc. * We’ve workshopped, we’ve shared experience, we’ve told them that deployment time and reliability matters, we’ve sent our agile/lean experts to work with their team. We’ve built automation for them and evangelised it’s use. * It isn’t always easy but if you get software from a third party and it is a significant part of your application this can be really worthwhile.
  • 45. @mmaibaum Tribes Getting Too Big • Feeling the pain of growth again • Bet Tribe bigger than whole tech organisation was 3 years earlier • Break up of bet tribe into smaller, nested, tribes • Making more specialist roles closer to each ‘product’ delivery squad (e.g. SRE as part of a squad) • Fuelled by •30% YOY growth of customers, stakes and traffic •80% buyout from sky by CVC, more investment Whole org went from around 600 to around 1200 in 12 months (mid 2015-mid 2016), technology hired 200 new staff in 20 weeks.
  • 46. @mmaibaum Two kinds of ‘DevOps’ • People in every delivery team, some of these are DevOps specialists but the whole team cares about the whole product lifecycle • People in specialist teams working on shared platform capabilities • Platform Services - Cross Tribe Services • Platform Engineering (Big tribes have their own ‘shared’ services) • Delivery Engineering (Specialists in tribes helping squads optimise reliability & delivery, especially things like release engineering, CI, etc) * Are these all DevOps teams? * They are all working to improve the ability of the business to deliver value by helping deliver technical products to production - they are all working on tooling and systems to bring development and ops closer together * Who cares what they are called.
  • 47. @mmaibaum Path Dependency • It really matters where you are, and where you are coming from • At least as much as where you’d like to go to. • There isn’t a path, because there isn’t an environment (and it changes) * It was hard to Chef things because we had already got a fairly large estate with little consistency and poor testing. It was even harder because the business was growing fast (customers, staff and number of services). But we had to do it to have a chance of implementing a reliable, consistent site and establish a real DR capability. * Our initial monolithic Chef organisation made sense with a central team “cheffing all the things” but does cause problems as the teams grow - know we have dozens of chef orgs as people gradually split stuff apart * The same thing will be true for other organisations. Decisions made in the past strongly influence what is right at any one time * Minesweeper vs Daemonised Chef
  • 48. @mmaibaum The End? One of the key points is it really matters where you are, and where you are coming from - at least as much as where you’d like to go to.
  • 49. @mmaibaum There is no End Except of this talk http://engineering.skybettingandgaming.com