SlideShare une entreprise Scribd logo
1  sur  52
Télécharger pour lire hors ligne
Service Management
in a DevOps World
By Helen Beal
A Guide to Evolving Light-Weight Service Management
Processes for Value Streams Using DevOps Principles
2Service Management in a DevOps World
The Origins of Services	 4
Where ITSM Has Been Painful	 6
Value Stream Centric Service Management	 8
Identifying Current Condition: Value Stream Mapping	 11
1.0 The DevOps Approach to Change	 13
1.1 Long Term Vision: Lightweight, peer-reviewed change	 14
1.2 Current Condition: Change Advisory/Approval Boards	 14
1.3 Next Target Conditions	 15
1.3.1 Reduce Batch Size	 15
1.3.2 Classify Changes and Make Them Visible	 15
1.3.3 Automate the Change ‘Checklist’	 17
1.3.4 Limit the Blast Radius: Canary Testing/Deployment	 18
1.3.5 Address System Dependencies	 18
1.4 Example Experiments	 19
2.0 The DevOps Approach to Release	 20
2.1 Long Term Goal: Teams autonomously release on demand (CD)	 21
2.2 Current Condition: Release weekends, calendars and managers	 21
2.3 Next Target Conditions	 22
2.3.1 Define Release and Deploy and Reduce Batch Size	 22
2.3.2 Defer the Release Management Role to the Team	 23
2.3.3 Increase the Availability of Release Slots	 23
2.3.4 Automate the Release ‘Checklist’ and Deployment Process	 24
2.3.5 Limit the Deployment Blast Radius: Blue/Green Deployments	 24
2.3.6 Reducing the Route to Live, Leveraging Cloud	 26
3.0 The DevOps Approach to Security	 27
3.1 Long Term Goal: Checks in the IDE	 28
3.2 Current Condition: Pen tests in Prod	 28
Table of Contents
3Service Management in a DevOps World
3.3 Next Target Conditions	 29
3.3.1 Shifting Left and Automation	 29
3.3.2 DevSecOps Culture and Behaviors	 30
3.3.3 Customer Feedback and Bug Bounties	 31
3.3.4 Software is Like Milk, Not Wine	 32
3.3.5 Belt and Braces	 32
3.4 Example Experiments	 33
4.0 The DevOps Approach to Support	 34
4.1 Long Term Goal: You build it, you own it, and/or swarming	 35
4.2 Current Condition: 3 tiers	 35
4.3 Next Target Conditions	 36
4.3.1 Arrange Around Products	 37
4.3.2 Automating Support: ChatOps and Bots	 38
4.3.3 Automating Support: Knowledge and Self-Service	 39
4.3.4 Telemetry Everywhere and Viewability/Observability	 39
4.3.5 CICD and Intelligent Risk Management	 40
4.4 Example Experiments	 41
5.0 The DevOps Approach to Incidents	 42
5.1 Long Term Goal: ChatOps in/across teams	 43
5.2 Current Condition: War rooms, incident managers	 44
5.3 Next Target Conditions	 44
5.3.1 Blameless Retrospectives and Experimentation	 45
5.3.2 Reframing Failure, Safety Culture and The Andon Cord	 45
5.3.3 Automation: Site Reliability and Chaos Engineering	 46
5.4 Example Experiments	 47
Conclusion: Flow & Value Stream Management	 48
Further Reading	 51
4Service Management in a DevOps World
The Origins of Services
The origins of DevOps lie in agile system administration
and the recognition that whilst software development
teams were taking advantage of agile methodologies
to become more responsive to change and uncertainty,
the IT Operations people were not.
Sometimes they were even oblivious to what was happening on
the other side of the ‘wall of confusion’ and painful tensions and
misunderstandings occurred between the two technology teams: IT
Ops guys grumbled about developers wanting administrator access to
production machines, the developers moaned that IT Ops guys took too
long to provision environments, and releasing an update was always
a ‘hair on fire’ moment that frequently resulted in blame games and
mostly happened at weekends.
5Service Management in a DevOps World
The battle between change and stability seemed as if it would rage on.
But DevOps principles have taught us how to balance throughput and
reliability without compromise to either. DevOps is not just practiced by
the ‘born on the web’ behemoths but by an ever increasing number of
traditional enterprises, those who in the past have embraced IT Service
Management (ITSM) approaches to service delivery.
Whilst some organizations consciously or unconsciously drive their
DevOps evolution from their development teams, there always
comes a time where they seek to understand how to optimize service
management activities as part of the end-to-end technology delivery
value stream. And whilst development are about agile, IT Ops are about
ITSM and we can use lean tools to marry the two and create lightweight,
“just-enough” processes that allow both teams to work at the same
cadence.
DevOps has evolved to focus on the end-to-end optimization of the
value stream, accelerating flow from idea to value realization. How we
handle and manage key technology delivery services changes when our
primary goals are to optimize the flow of value and system integrity.
6Service Management in a DevOps World
Where ITSM Has Been Painful
Traditional ITSM processes, whilst designed for all the right reasons;
to protect us, to improve our predictability and to enable common
understanding, are frequently accused of being onerous and of
blocking the flow of value from idea to realisation to the customer.
In the past, when we have experienced an issue or problem, a typical
response is to be to add a control; this is why large enterprises that
have operated for a significant amount of time are often bogged
down in bureaucracy - layers of process that have built up over time.
Additionally, these types of organizations have evolved system and
organizational designs (not necessarily by intention) that contain large
numbers of (sometimes unknown) dependencies, exacerbating the
sense, and actuality, of fragility.
DevOps seeks to improve sustainable working practices and reduce
workplace burnout and stress. It remodels the ways of working
to improve velocity, consistency and predictability, visualizing the
flow of work and removing constraints. Our focus moves from
managing dependencies, to breaking them to create loosely coupled
organizational and technology systems that allow us to build, test and
deploy in small increments.
7Service Management in a DevOps World
ITSM has, inadvertently, caused some key constraints which have
led to working practices that frustrate people because they slow
people down. But these same working practices were introduced to
avoid catastrophic failures caused by chaos and unknown or unseen
dependencies.
Things like the Change Advisory Board, that in many organizations
morphed into the Change Approval Board (the difference is subtle, but
palpable) that add wait times to value streams and are often perceived
to be adding no real value to the customer experience. The change and
release calendars and checklists are often similarly reviled and
not valued.
Painful working practices
related to these are release
weekends and nights, the
subsequent war-room when
a large batch release goes
bad, and project centric
cultures that are typified by
a culture of meetings and
irregular demand flows
(feast and famine) and
spiralling technical debt.
8Service Management in a DevOps World
Value Stream Centric Service
Management
Approaching service management activities such as
change, release, security, support and incident with a
DevOps hat on changes the way we work to allow for
improved adaptability whilst not forgetting what we’ve
learned about ensuring customer experience.
Underpinning this is the principle of little and often;
more frequent inspection and smaller work packages
allowing us to receive feedback more often to course
correct more frequently and regularly.
9Service Management in a DevOps World
Activity Current Condition Next Target Condition Long Term Vision
Change CABs
Typifying change,
automating checklists
Teams peer review their
own changes
Release
Release weekends,
calendars & managers
More frequent,
automated releases
Teams autonomously
release on demand
Security Pen tests in prod
Automatic scanning
in CI
Checks in IDE
Support 3 tiers
ChatOps, automated
customer feedback
You build it, you own it,
and/or swarming
Incident
War rooms, incident
managers
Healthy retrospectives ChatOps in/across teams
The following table summarizes how these activities change in a value
stream centric world and describes a midway step to consider as an
organization transitions from one capability to another.
It uses the lean improvement kata approach where we first look at the
long term vision, seek to understand the current condition and identify
the next target condition. Organizations should seek to experiment using
the Deming PDCA (Plan-Do-Check-Act) cycle:
10Service Management in a DevOps World
Each of these five areas: change, release, security, support and incident
are explored in detail in this paper. A key to achieving the desired
capabilities is the focus on breaking dependencies to allow for loosely
coupled systems and structures; observe Conway’s Law that tells us that
any organization will design systems that look like their communication
and organization structure.
Compare how an organization with many large silos that pass off to
each other creates monolithic ‘big balls of mud’, compared to how an
organization with small, autonomous teams creates a microservices
architecture where components are loosely connected and can be
changed, tested and deployed independently of one another.
“Organizations which design systems … are
constrained to produce designs which are
copies of the communication structures of
these organizations.”
- Melvin Conway
11Service Management in a DevOps World
Identifying Current Condition:
Value Stream Mapping
In DevOps the realization of value is the core focus of teams; the
definition of done moves from “I did my job” to “the customer has
received value”. When an organization is in transition from a traditional
waterfall way of working to a more adaptable, agile way of working, it
can be hard to see what changes should be made and how. Using a
lean tool, Value Stream Mapping, is a highly effective way of reaching
understanding of the current condition and consensus on the activities
needed to make improvement.
A value stream is anything that delivers a product
or a service and is made up of several activities or
processes that start when an idea presents itself and
is complete when the customer receives the value
derived from that idea.
12Service Management in a DevOps World
Value Stream Mapping requires a group of people with representation
from each activity or process in the value stream share the same physical
space while they visually collaborate to map how the activities connect
together, how long each takes and how long each step waits for the other
to start thus calculating the cycle time for value delivery.
It’s often the first time a particular group of people have been in a room
together and provides a qualitative and quantitative time diagnostic
of the value stream. It’s here that people begin to fully appreciate how
and why particular processes, for example change, cause delays in
the delivery of value and work together to imagine, measure and plan
improvements.
The improvements are based on principles around queuing and batch
size: when we map a value stream we can see how large batch sizes
create queues which consequently increase our lead and cycle time. It
highlights how risk is reduced when we make our work packages smaller
as we receive faster feedback. This will directly impact architectural
decisions and how we seek to reduce the route to live.
1.0 The DevOps Approach to Change
“There’s a right way to handle the change approval
process, and it leads to improvements in speed and
stability and reductions in burnout. Heavyweight
change approval processes, such as change approval
boards, negatively impact speed and stability.”
- 2019 Accelerate State of DevOps Report
14Service Management in a DevOps World
There are other problems with change approval boards too - they are
often seen as a ‘checklist’ exercise, performed by people who have no
real understanding of the nature or impact of the change itself. Having
a change calendar constrains teams from being able to release their
changes when they are ready or they want to and inevitably slows
them down.
A lightweight change process is a peer-reviewed change process that is
owned by the team. Changes are so small and so frequent they only take
a short time to be checked, approved and released. All of the testing has
already happened (it is automated in our continuous delivery pipeline)
and we have good test data and production-like test environments to
deploy to. Our systems are sufficiently decoupled.
As part of a value stream mapping exercise, the team asks if each step or
activity in the value stream is ‘value-adding’ i.e. does work happen here
that directly creates value for the customer. The answer here is never
yes, so we should seek to remove this step, whilst ensuring that the
purpose of the activity (the protection of our services from failure) is
not lost.
1.1 Long Term Vision: Lightweight, peer-reviewed change
1.2 Current Condition: Change Advisory/Approval Boards
15Service Management in a DevOps World
We mustn’t lose sight of the fact that we need these controls to protect
ourselves from chaotic change. This requires that teams have clarity and
understanding of the change process and that everyone that needs to
have visibility of the changes and that proper procedures are followed.
1.3.1 Reduce Batch Size
The first step is for individual teams to reduce the batch size of their
changes. This will have a direct effect on the length of the queues and
also the amount of risk each change carries. Whilst doing this, the team
needs to make the small changes visible probably by using a product
backlog tool and expressing them as user stories. Now that the team has
smaller changes, they will need to meet more frequently to review their
progress, ideally using an agile framework. The work in progress is made
visible via physical or virtual boards.
1.3.2 Classify Changes and Make Them Visible
As the changes become smaller and segregated out from a batch,
it becomes possible to classify changes. Teams may use terms such
as “standard”, “small” or “emergency”. Working with the change
management people, the team agree to experiment with making
changes with their Product Owner only (not the CAB) approving them.
They make these changes visible to the change management people and
do not schedule them on the change calendar.
1.3 Next Target Conditions
16Service Management in a DevOps World
They agree with the change management people what checks they will
perform themselves before the change is released to customers and
record that these checks have been undertaken, ideally in a workflow in
the product backlog.
Once the team has proven themselves reliable, they gain autonomy to
increase the amount of change they perform outside of the traditional
change process.
17Service Management in a DevOps World
1.3.3 Automate the Change ‘Checklist’
Working with the change management people, the team can identify
the change ‘checklist’ and automate it. It’s likely to demand certain
tests are done, for example unit, integration and user acceptance tests.
Higher levels of fluency also include non-functional tests for security
(see section 3.0) and performance.
Having smaller changes and using trunk-based development (where
there are a small number of short-lived feature branches) in a
continuous integration and delivery (CICD) pipeline demands these
gates are completed before deployment. Peer-review of code and the
change can also be baked into the product backlog workflow; this is
fantastic for auditors, as is the version control that is the foundation
of the CICD pipeline, as it not only shows how the process steps are
followed but provides actual proof that they are.
Adding monitoring means that the team receive customer feedback fast
and have access to fast fault diagnosis. Using automated deployment
(see Section 2.0) will give the team the opportunity to instantly redeploy
last known good state (caveat: not all failures are a change failure
relating directly to the last change and this can get more complex when
we are delivering changes more frequently, but the alternative is to try
to identify which change in a large batch caused the problem).
18Service Management in a DevOps World
1.3.4 Limit the Blast Radius: Canary Testing/Deployment
If we recognise that the point of the change controls is to protect us
from catastrophic failure, let’s define that. If the definition includes
making a change that fails for everyone, we can tackle the ‘everyone’
element by making the change for only a few: canary testing or
deployment. If it works for a few, we can then push it out to more,
many and all. If it doesn’t, we revert, learn and try again.
1.3.5 Address System Dependencies
Since much of the requirement for centrally coordinated change comes
from tightly coupled systems and incidents caused by unpredictable
system dependencies, reducing these dependencies is key to protecting
the teams from system fragility. Teams take ownership of these
architectural discussions and drive cross-team conversations through
communities of practice/interest or agile at scale techniques such as
Scrum of Scrums. Additionally, organizations practice inner-source
where teams can see and change (with visibility and peer-review) each
others’ systems.
Conway’s Law tells us that we will design systems that look like our
organizational communication structures; if our teams are autonomous
and loosely coupled, so will be our systems architecture. Using a
microservices and API model leads us to a place where we can test
and deploy small pieces independently. It does give us more pieces to
manage but that’s the trade-off.
19Service Management in a DevOps World
1.4 Example Experiments
“If the team completes a small change themselves next week, ensuring
the change is visible in Jira, versioned in GitLab and Jenkins automates
the build and runs the unit and integration tests and we peer-review
the code and change, there won’t be a change failure as a result.”
“We believe that when we classify changes to ‘small’, 60% of our
changes won’t need to go through CAB and our lead time will reduce on
average by one week in the next 4 months.”
“Our architect thinks that we can uncover and break 20 dependencies
by the end of the year if they are flagged in the Scrum of Scrums and
20% of all product teams’ sprint is allocated to this activity.”
“I hypothesize that if we create a workflow in Jira this week that won’t
allow a build to go green until the tests are passed and peer-review is
complete, our change fail rate will drop over the next three months
and over 10 sprints the central change team and CAB will accept this as
evidence that we have followed their procedures and allow us change
autonomy. Auditing will take no time at all when it comes around
next March.”
We start here with a couple of key principles:
1. Release weekends are bad.
2. The DevOps ‘little and often’ approach:
 Releases should be ‘like breathing’ not creating
 ‘hair on fire’ moments.
2.0 The DevOps Approach to Release
21Service Management in a DevOps World
For many organizations, it’s important to ensure everyone understands
what is meant by ‘release’ and ‘deploy’ as they vary frequently. Often
people prepare a release, deploy it to production and then release it
to customers. These distinctions become less important as DevOps
fluency improves, but as teams and organizations evolve, they need to
know they are speaking the same language.
2.1 Long Term Goal: Teams autonomously release on demand (CD)
Here our teams can release their new features and fixes whenever they
are ready. Their continuous delivery pipeline ensures that software
is always in a releasable state and they may also have continuous
deployment - on successful completion of all the tests, the change is
automatically deployed and released into production.
2.2 Current Condition: Release weekends, calendars and
managers
Traditional ITSM processes have taught us to create release packages;
large bundles of features. This is clearly in contention with our
DevOps principle of little and often where we reduce the risk of
deploying a change by making it smaller. Because we have large, high
risk releases we then schedule them and have people to manage
these schedules. Teams have to wait until their slots in the calendar
become available in order to perform their deployment to production
and release to customers.
22Service Management in a DevOps World
2.3 Next Target Conditions
We want to balance two of the
four key DevOps metrics here:
the throughput metric for
deployment frequency and the
stability metric for change fail
rate. This will also reduce our
lead time and we don’t want to
cause incidents that will cause
us to measure our Mean Time
to Recovery (see Section 5.0).
Once again, value stream mapping is likely to show in traditional
ways of working that the release and deployment process is
lengthy, particularly if teams are required to interact with a release
management team to book slots on a release calendar.
2.3.1 Define Release and Deploy and Reduce Batch Size
As covered in Section 1.0, reducing the batch size reduces the risk and
the queueing time. It’s important that people in an organization know
what is meant when the words ‘release’ and ‘deploy’ are used and they
do vary from organization to organization. People talk about ‘deploying
a release’ or ‘releasing to production’.
23Service Management in a DevOps World
When changes are small they can be deployed easily with reduced
risk of disruption and the distinction between the terms becomes less
important.
2.3.2 Defer the Release Management Role to the Team
When teams are working autonomously with small changes they can
release them when they are ready. But it takes time to transition to
that place and on the path is considering the move from one state to
another. In a traditional way of working, there is likely to be a release
manager or a team of release managers who are coordinating the
release process. The release manager role can be transitioned to the
team, with the team giving access to systems that allow the release
manager visibility into releases that are happening.
2.3.3 Increase the Availability of Release Slots
Initially, when teams start using agile frameworks such as Scrum, they
will aim to release at the end of a two week sprint, or perhaps at the
end of several sprints. If the organization is working with a release
calendar that may have quarterly or monthly release slots, they should
look to increase the number of slots available to allow for the smaller
and more frequent changes. In time, whether teams are using sprint
or Kanban ways of working, they will evolve to releasing on demand
or continuous delivery. At this point, no type of release calendar or
management is required as the teams operate autonomously.
24Service Management in a DevOps World
2.3.4 Automate the Release ‘Checklist’ and Deployment Process
As with change, many organizations operate with a release checklist to
ensure agreed policies and procedures are met. As with change, many of
these steps, such as versioning and testing can be automated in the CICD
pipeline and teams will release themselves; they build it, they own it.
As with change, release and deployment autonomy is highly dependent
on system autonomy but where systems remain tightly coupled, release
management tools are also available to track and manage these system
dependencies. These systems can also profile the risk associated with
release. Deployment automation tools as part of the CICD pipeline
further predictability in the process by reducing the manual effort
associated with these tasks and providing patterns or templates that
reduce configuration drift and allow for self-service in the teams.
2.3.5 Limit the Deployment Blast Radius: Blue/Green Deployments
Organizations use the canary testing/deployment scenario described
in Section 1.3.4 and also use feature toggles/flags and blue/green
deployments to mitigate deployment failure risk. Toggling features on or
off separates feature release from code deployment allowing code to be
deployed to production while restricting access (through configuration)
to a subset of users. It also allows unfinished code to undergo integration
testing whilst remaining inaccessible when live and allows for A/B testing
and canary testing/deployment.
25Service Management in a DevOps World
Blue-green deployment is a technique that reduces downtime and risk by
running two identical production environments called Blue and Green. At
any time, only one of the environments is live, with the live environment
serving all production traffic. For this example, Blue is currently live and
Green is idle.
As a new version of software is prepared, deployment and the final stage
of testing takes place in the environment that is not live: in this example,
Green. Once the software is deployed and fully tested in Green, the
router switches incoming requests to Green instead of Blue. Green is
now live, and Blue is idle. This can also help with reducing the Route to
Live (RtL) which reduces handoffs and opportunities from problems and
improves flow.
26Service Management in a DevOps World
2.3.6 Reducing the Route to Live, Leveraging Cloud
Many organizations have complex RtLs containing multiple test
environments and experience difficulties in production since these
environments are not production-like. Teams also frequently have to
share these environments and find it difficult to obtain good test data.
The factor that most commonly prevents teams from having access to
production like test environments is cost. Using cloud technologies
can ease the pain here (and research shows that using these type of
technologies (public, private, hybrid or multi) correlates with higher
performing organizations) allowing teams to easily spin up test
environments as when they are needed.
Working in small increments, using blue/green deployments,
automating testing, embedding testing in the team and Test Driven
Development (TDD) all contribute to a reduction in the number of
steps in the RtL, reducing the risk and accelerating the flow of value.
Once more, value stream mapping uncovers how much time is spent
stepping through the RtL.
TDD is a software development process that relies on the repetition of
a very short development cycle: first the developer writes an (initially
failing) automated test case that defines a desired improvement or new
function, then produces the minimum amount of code to pass that test,
and finally refactors the new code to acceptable standards.
As well as DevOps, we have DevSecOps. Whilst not all
in the industry are comfortable with the addition of
another term (it has the potential to confuse people
and create additional silos and handoffs) it recognizes
that security has been late to the party, or that their
invitation was sent late.
In many organizations security represents a severe
constraint, unsurprisingly since there are many
reports of cybersecurity skills shortages, and often are
significantly separate from the rest of the technology
team. It’s not uncommon, when performing value
stream mapping exercises, to find delays of several
weeks while teams wait for penetration tests.
3.0 The DevOps Approach to Security
28Service Management in a DevOps World
There are many who say that security is just another test and just
another non-functional requirement, and whilst elements of this is true,
it’s also true that the extreme separation of the security team and their
often being seen as a ‘black-box’ means that incorporating them into
the pipeline earlier (shifting left) is more difficult to do than with some
other areas of testing. For example, it’s relatively easy for developers to
start incorporating unit tests as part of their automated build process.
Automated integration tests and user acceptance tests follow fast.
3.1 Long Term Goal: Checks in the IDE
Here the security tests are pushed as far left as technically possible;
into the developers’ hands, providing developers with the knowledge
they need about vulnerabilities in the components that they are
accessing from their IDE in the artifact repository and handing them
control over the software supply chain.
3.2 Current Condition: Pen tests in Prod
Most organizations perform regular or sporadic penetration tests or
vulnerability assessments in production and many are required by
regulators to do so and audited to ensure they happen. They can be
done either manually or using tools, typically a combination of the two,
and produce a report that is then passed to developers who work the
actions into their backlog. Or not.
29Service Management in a DevOps World
3.3 Next Target Conditions
Ultimately the security constraint is broken so there is no wait time for
security activities to complete and the teams are confident that their
product is as uncompromisable as possible. We break the security
constraint through culture and the sharing of knowledge and from
automating checks and remediation.
3.3.1 Shifting Left and Automation
As described, in DevSecOps security testing happens much earlier than
penetration testing in production (although, in many cases this may still
need to happen, not just for auditing purposes but for configuration cases
also). Where the teams are using artifacts, the repositories can be used to
scan and flag for vulnerabilities at the point of software composition. The
developer can be informed as they access a component of its vulnerability
status and advised if another version fits the organization’s security
policies better. If the teams don’t want developers interrupted in this way,
non-compliant vulnerabilities can break the build in the CICD pipeline.
Static and Dynamic Application Security Testing (SAST and DAST) are
also used to test the source code and the application when its running.
IAST (Interactive Application Security Testing) analyzes code for security
vulnerabilities while the application is running from inside the application
and reports in real-time. As cloud and CICD proliferate machines,
automated identity management tools are also recommended.
30Service Management in a DevOps World
3.3.2 DevSecOps Culture and Behaviors
The relationship between development and security is fractious in
many organizations, with security believing that developers don’t care
about security and developers feeling that security are overly zealous,
detailed and don’t understand the myriad of pressures that they
are under.
An effective pattern is to have security people work in a product team
or feature squad on a temporary basis. Whilst there may not be a lot
of security people to go around (some refer to the 100:10:1 ratio of
developers:operations:security), the payoff is worth it as there are
two key benefits; the first is the building of empathy and relationships
and the second is knowledge transfer as the 80:20 rule applies here:
80% of the security issues relate to 20% of the knowledge. This 20%
of knowledge is relatively easy for the engineers to access, retain and
share in this scenario.
Developers do care about
security, since they care
deeply about their code,
particularly when they
are transitioned to a
‘you build it, you own it’
way of working.
31Service Management in a DevOps World
They also care about the customer experience and for the organization
that they work for - few people are ignorant of the wide ranging
impact on company performance and reputation that a breach causes.
However, they are focused on new features that deliver value first then
the improvement of the way in which they deliver value.
Although we aim for multi-functional, ‘comb’-shaped people, nobody
can know everything and to expect a developer to know of, understand
and be able to remediate every possible vulnerability is unreasonable.
To ask them to be aware of and follow visible coding policies and use
tools that break the knowledge constraint is not unreasonable.
3.3.3 Customer Feedback and Bug Bounties
In DevOps ways of working the focus is on the customer and the flow of
value to them (The First Way). The Second Way teaches us to shorten and
amplify our feedback loops. Highly evolved and performant organizations
seek feedback from customers and the market on security too; they
understand that transparency leads to trust.
Having a public bug bounty programme
is an effective way of collaborating with
customers and the market to receive
feedback and improve security posture.
32Service Management in a DevOps World
3.3.4 Software is Like Milk, Not Wine
New vulnerabilities are found and appear constantly so software that
passed its security tests today may not tomorrow. Tools are available that
continuously assess the bill of materials in applications and offer teams
fast remediation capabilities. We can look forward to a future where
products are automatically updated with security vulnerability fixes.
3.3.5 Belt and Braces
Data breaches aren’t the only way for threat actors to cause problems
with the operation and safety of an organization’s products. They can
do other things, like distributed denial of service attacks for example. In
order to protect yourself from these sort of attacks you’ll need support
from a cloud vendor or a specialist security vendor in this space.
Whilst shifting security left and continuously scanning products in
production for vulnerable components goes an enormous distance in
protection against breaches, it’s doubtful that human penetration testing
or vulnerability assessments on products in production will be in the past
any time soon.
Not only do regulators continue to require evidence for these activities,
humans are infinitely creative and will find configurations and routes into
systems that may not directly relate to a specific vulnerable artifact.
33Service Management in a DevOps World
3.4 Example Experiments
“My hypothesis is that if we launch a bug bounty programme in January,
then by the end of the first quarter, fifteen vulnerabilities of which we
were unaware will have been brought to our attention and it will have
cost us $15,000 from the bug bounty payout budget.”
“As a developer, I believe I’ll fix 100% of security vulnerabilities on the
same day if I know about them in my IDE. At the moment, I have 35
outstanding user stories in Jira flagged as issues found in a vulnerability
assessment and they are between six and sixteen weeks old. I will be able
to close all of them within 3 months using a tool in my IDE.”
“If we introduce IAST into the CICD pipeline, we’ll be able to reduce our
spend on production penetration testing by 30% per annum.”
“If I automate the management of our machine identities, then our
penetration tests will find no vulnerabilities as a result of, and we
will suffer no data breaches traceable to, expired or misconfigured
certificates.”
Support people are typically the lowest paid and least
respected in the technology hierarchy. Strange, when
they are on the frontline, dealing with our customers,
our reason for being, on a daily basis.
The Second Way in DevOps is to amplify and shorten
feedback loops - and in Value Stream Management
we are particularly interested in customer feedback.
So whilst the function of a support role is to fix
customer problems, it’s also to sense customer
sentiment and identify value delivery opportunities.
4.0 The DevOps Approach to Support
35Service Management in a DevOps World
4.1 Long Term Goal: You build it, you own it, and/or
swarming
This way of working is centered around small (because of what we’ve
learned about how humans build trust and social connections),
autonomous (because we don’t want them to have to wait for decisions
to be made on their behalf and because we hired them because they
are capable of doing this themselves, and best-placed), multifunctional
(because we don’t want them having to wait for other teams to do stuff
for them) teams. They change and run their product. This isn’t about
giving developers ‘pagers’; this is about having end-to-end ownership of a
value stream.
4.2 Current Condition: 3 tiers
As with all the traditional ITSM patterns described here, there are good
reasons for why they have been widely implemented, and for some time
they worked. But the world keeps turning and right now, digital disruption
demands we all change the way that we work to optimize flow through a
value stream.
Having a support or service desk makes less sense when our users
experience few problems or are mostly able to resolve them themselves
using online documentation. If we want to shorten a feedback loop, it’s
best not to have multiple handoffs through teams - delays don’t help with
our flow or with delighting our customers.
36Service Management in a DevOps World
4.3 Next Target Conditions
Tiers create queues of work in progress which we seek to minimize as
queuing creates delays. Whilst the tiered approach is intended to ‘protect’
the ‘best’ (read: most expensive) staff from trivial customer issues (is there
such a thing?), when we seek to put the customer at the center of all we
do and want them to have optimized service, why would we put our best
people at the back of the process?
So instead of streaming, we move to swarming.
There are several models organizations work with, but
they all follow these broad principles:
• There should be no tiered support teams or hierarchy
• There should be no escalations from one team to another
• The issues should move directly to the person most likely to be able
to resolve it
• The person who takes the issue is the one who sees it through
to resolution
Swarming isn’t solely for Severity 1 issues or incidents (see Section 5.0 for
more); it establishes teams whose priority to ensure that the issue gets to
the right person as fast as possible and that it receives attention as soon
as possible.
37Service Management in a DevOps World
4.3.1 Arrange Around Products
Having small, autonomous and multi-functional teams arranged around
products is the foundation to the ‘you build it, you own it’ mantra. Many
agile transitions start by bringing developers and testers into the same
team along with the ideation capabilities (Product Owners and business
analysis roles).
DevOps and value stream thinking brings Ops capabilities into the team
too and many teams start with support roles. This isn’t simply about
putting the developers on 24/7 call duties but about automating the front
end of support as far as possible and getting the issue in front of the right
person as soon as possible.
DevOps balances throughput and stability so as organizations improve
their posture, teams experience a reduction in the volume of issues
and a shortening of resolution time. When teams are dedicated solely
to support issue resolution, they often find Kanban a suitable way of
managing the flow of work. Where teams are working in development
sprints, they may find it helpful to record unplanned work and practice
assigning a percentage of the sprint to it. Unplanned work is an effective
proxy metric for quality and when measured is extremely useful when
teams want to assign time to invest in paying down technical debt.
38Service Management in a DevOps World
4.3.2 Automating Support: ChatOps and Bots
ChatOps is the use of a group messaging tool integrated with
the DevOps toolchain. Chat channels can be created as needed
(typically for an incident) or in permanent use (typically for a theme
for a particular product). Section 5.0 following describes an incident
management use case for ChatOp. A swarming support use case might
allow the received of the customer issue to access a specific backlog
channel and request interaction from that product team or the team
may have their own channels for support issues relating to items such
as payment gateway for example.
The service desk can also encourage customers/consumers of their
service to interact via online chat once they have been guided through
available topics and support artifacts in a knowledge base. Bots
can try to resolve the issue initially and as needed the issue can be
automatically routed to the team and swarmed from there.
39Service Management in a DevOps World
4.3.3 Automating Support: Knowledge and Self-Service
Many people don’t enjoy committing extended periods to writing and
documentation, however, to optimize a value stream, ‘just enough’
documentation is key. Underpinning this then is the ‘little and often’
principle; ensuring that small pieces are documented frequently at
source and held in a repository that is easily searchable and visible.
This takes burden off the support team as people can find and resolve
common issues themselves, leaving the support swarms to work with
the edge cases.
4.3.4 Telemetry Everywhere and Viewability/Observability
Much of the waste in the support value stream is in the fault diagnosis
(after we’ve removed delays through handoffs in a tiered model) so the
team needs data to help them identify unknown and unusual issues.
Support teams are frequently poorly supported by tooling, other than
ticketing systems, so providing the product teams with tools that
radiate telemetry means everyone in the team can benefit.
Application monitoring and logging tools accelerate the identification
of the root cause(s) of an issue (and these should be used in pre-
production too) - it’s over to the team then to fix it fast - but their
CICD pipeline will help validate and deploy it at speed. And it’ll be an
emergency fix or a small change so they won’t be slowed down by CAB
or the release schedule.
40Service Management in a DevOps World
This type of tools also provides customer journey insights and real-
time feedback on the business value of features and changes that the
whole team can use in the sprint reviews to check the outcome of their
hypotheses and in their sprint planning to set up their next round of
experiments.
4.3.5 CICD and Intelligent Risk Management
Once a team is collaborating on a shared and visible backlog and are
proficient in performing continuous delivery, they will have reduced
their incidents and improved their MTTR. AI tools that help teams to
assess the risk of a release help teams make decisions on when to act
and who to have pre-warned.
Having this data visible to central release teams
provides evidence, builds trust and earns the right
to autonomy.
41Service Management in a DevOps World
4.4 Example Experiments
“We believe that if we set up a backlog swarm, we can resolve 50% of
backlog items over 6 months old in 4 working weeks.”
“My hypothesis is that if we have an incident swarm using ChatOps,
we’ll reduce our MTTR by 70%.”
“Implementing an application performance management tool by the
end of the month means that by the end of next month we’ll see our
fault diagnosis time drop by at least 20%.”
“Making our knowledge base publicly searchable will likely reduce the
volume of tickets by 25% within 6 months.”
5.0 The DevOps Approach to Incidents
“Incidents are unplanned investments; their costs
have already been incurred. Your org’s challenge
is to get ROI on those events. Right now, in most
companies, this ROI is left sitting in the dark because
of the “template-driven” approaches and “action
item” myopia.”
- John Allspaw
43Service Management in a DevOps World
We are taught, in all cultures, from an early age, that
failure is to avoided at all costs, and that it’s shameful
and humiliating. It’s only as we grow up and experience
more in life that we realise failures are not only
inevitable, but useful for learning and light the path
to success.
In many large enterprises there is deep-seated fear of failure
(understandably so since many organizations operate infrastructure
whose availability is critical to many). Incidents will happen; however,
DevOps practices allow us to increase the flow of work through the
value stream whilst increasing stability so more value delivered does
not equal more incidents to deal with.
5.1 Long Term Goal: ChatOps in/across teams
The goal of incident management is to restore service as soon as
possible and, arguably more importantly, learn from it. ChatOps
supports this goal in two key ways. Firstly, it allows teams to swarm
through a channel in real time so that everyone has everything visible
through a single pane of glass (contrast this to some people being in
a room, on a conference call, various team members logged into and
observing various systems) and records the progress and process.
Secondly, the team has access to their DevOps toolchain and can both
receive information and make commands from the chat window.
44Service Management in a DevOps World
5.2 Current Condition: War rooms, incident managers
The cultural driver for DevOps is the creation of a working space in
which people can be their best and most productive selves; removing
risk of burnout and nurturing autonomy, mastery and purpose. ‘War
rooms’ immediately set a sense of crisis and conflict.
Whilst the sense of urgency should a Severity 1 issue or incident occur
should not be diminished, a number of steps can be taken to move
from a place where incidents are catastrophic and to be avoided at all
costs to one where impact is minimal and they are valued as a learning
opportunity.
DevOps regularly seeks to decentralize activities, especially when
they have been centralized in order to manage dependencies. Since
autonomy reduces handoffs and queueing, assigning an incident
manager from a separate team because systems are so complex, is
unlikely to be the fastest way to restore service.
5.3 Next Target Conditions
Ultimately the volume of incidents, or at least the time spent dealing
with them, should be as close to nil as possible since they are the main
disruptor of the delivery of planned work or value to the customer.
45Service Management in a DevOps World
5.3.1 Blameless Retrospectives and Experimentation
Rather than having war-rooms, swarm an incident and once service is
restored, hold a blameless retrospective over ChatOps. Agree learnings
and write actions as experiments and save the chat log to the ticket
in the backlog. Close the ticket only once the initial experiments are
complete.
5.3.2 Reframing Failure, Safety Culture and The Andon Cord
Another tool from the kings of Lean, Toyota, The Andon Cord is used
in a manufacturing pipeline to raise an issue. But what’s important
about is the behavior and culture it created. Workers were encouraged
and empowered to highlight potential defects with the knowledge that
their leaders wanted to know about them and fix them at the earliest
opportunity before they continued downstream. Much can be taken
from the Andon Cord: that successful leaders embrace and are grateful
for learning opportunities and encourage their teams to self-discover,
that fixing the problem immediately and preventing it from proceeding
downstream is key to building a quality product and that people are
psychologically safe when they are not afraid to point out mistakes or
try new things.
Safety culture can be broadly defined as a place where all in an
organization share a view on how best to mitigate risk in their
environment and they prioritize learning over failure and create
mechanisms to protect themselves from catastrophic failure.
46Service Management in a DevOps World
In an environment where these mechanisms are discovered, perhaps
through value stream mapping, to be slowing the flow, using the
mechanisms described here for change, release, security, support and
incident management accelerate the delivery of value.
5.3.3 Automation: Site Reliability and Chaos Engineering
Several of the automation techniques we have already discussed in this
paper help either to reduce the likelihood of major incidents happening
(CICD, limited blast radius) or make them more manageable (telemetry,
ChatOps, automated deployment). Organizations also look to Site
Reliability Engineering (SRE) to improve their stability posture.
“SRE is fundamentally doing work that has historically
been done by an operations team, but using
engineers with software expertise and banking on
the fact that these engineers are inherently both
predisposed to, and have the ability to, substitute
automation for human labor. In general, an SRE team
is responsible for availability, latency, performance,
efficiency, change management, monitoring,
emergency response, and capacity planning.”
- Ben Traynor, founder of SRE at Google
47Service Management in a DevOps World
Some organizations have teams of SREs, others look to embed this
role in product or feature teams/squads. Whichever model is used, the
principle is to increase the focus on antifragility and SRE has this goal
in common with Chaos Engineering. The best known example of Chaos
Engineering is Netflix’s Chaos Monkey which is essentially a fire drill.
With an actual fire.
5.4 Example Experiments
“We believe if we use chaos engineering to practice incident recovery 4
times this year, we’ll find ways to improve that will reduce our MTTR by
50% next year.”
“I hypothesize that asking two of my product team, one whose
background is in development, the other in system administration to
learn to extend their skillset to include site reliability engineering skills,
they will cross-skill each other and buddy. As a result, our change fail
rate will drop by 5% in 6 months.”
“My experiment says that if we can only close our incident tickets when
all experiments have been completed, we will be able to document
25 key learnings in our knowledgebase in the first quarter of the new
practice.”
48Service Management in a DevOps World
Conclusion: Flow & Value Stream
Management
Taking a value stream approach to service delivery puts
the priority on optimization of the flow of work from
the idea to the realization of the value in the hands of
the customer.
Necessarily it demands a rethink of the traditional
approaches and organizational practices, just as
becoming agile and product focused demands we
rethink an inherently waterfall and project centric
approach.
49Service Management in a DevOps World
Value Stream Mapping is an extremely valuable and effective method
for quantifying the cycle time, waste and cost associated with
delivering an iteration of a product or service. It also provides a great
deal of qualitative data through the visual collaboration and human
conversation it drives.
Good value stream mapping exercises are held regularly and deliver
backlogs of improvements which are steadily and iteratively worked
through. The disadvantage of Value Stream Mapping is that it’s a
human driven and opinion driven process and whilst those opinions are
mostly accurate (and a big part of the value stream mapping process
is understanding the system and building empathy for counterparts
in the end to end lifecycle of the product or service) they struggle to
provide data as evidence.
Since improvements in value stream flow are likely to necessitate
significant and far-reaching decisions about things like the roles in
the organization, the organizational design, how work is funded and
how investments are prioritized, it’s helpful for the people making
those decisions to be as well-informed as possible and able to monitor
feedback, learnings and evolutionary progress over time.
Following our telemetry everywhere mantra, it’s best to support the
human-driven value stream mapping efforts with data-driven value
stream management evidence.
50Service Management in a DevOps World
Choices can be made when building a CICD pipeline or DevOps
toolchain about the traceability of value through the value delivery
lifecycle. Teams can build integrations between the tools themselves
or use available connectors and APIs (but this might make it difficult
to swap tools out as needs inevitably change), or integration brokers
can be used to pass the feature/code from one tool to another as it
progresses.
Since we want feedback for learning, we want all of this to be visible,
so some organizations use dashboards. But when a dashboard is
effectively just screenscraping data from a number of tools and
presenting it in a single pane of glass, it’s very difficult to understand
the end to end cycle time of delivering a piece of value.
Value Stream Management tooling allows simple integration within a
toolchain, which future-proofs for ongoing evolution, and collects data
that not only shows the cycle time but also where it’s slow and risky,
providing insights for improvements.
51Service Management in a DevOps World
Further Reading
Learn how to use value streams to accelerate DevOps
transformation at your organization and become a
software juggernaut.
What is Value Stream Management?
Learn DevOps: Enterprise DevOps at Scale
CI/CD Tools Universe: 100+ Tools
Enable Value Stream Management with Plutora:
Why Plutora?
The Plutora Platform
Request a Demo
Helen Beal helps people practice DevOps principles in real
world organizations for Ranger4. She describes herself as
a DevOpsologist as her main role in her working life is to
study the inputs and outputs of the thinking systems that
make up DevOps and what value outcomes they deliver
and we can measure.
Helen is also a product owner and DevOps Ambassador
for London at the DevOps Institute, a DevOps editor for
InfoQ and writes for a number of online platforms.
Outside of DevOps she is an ecologist and novelist. She
once saw a flamingo lay an egg and has a particular
fondness for llamas.
About
the
Author

Contenu connexe

Tendances

Unleash the agile power bridging the gap between development and operations...
Unleash the agile power   bridging the gap between development and operations...Unleash the agile power   bridging the gap between development and operations...
Unleash the agile power bridging the gap between development and operations...XebiaLabs
 
Whats the problem_ebook
Whats the problem_ebookWhats the problem_ebook
Whats the problem_ebookVC-ERP
 
Taking the first step to agile digital services
Taking the first step to agile digital servicesTaking the first step to agile digital services
Taking the first step to agile digital servicesindeuppal
 
Lean IT Presentation
Lean IT PresentationLean IT Presentation
Lean IT Presentationbyunesiu
 
Business Results: Get there faster with SOA Governance
Business Results:  Get there faster with SOA GovernanceBusiness Results:  Get there faster with SOA Governance
Business Results: Get there faster with SOA GovernanceKelly Emo
 
Transforming Your Organization to Agile
Transforming Your Organization to AgileTransforming Your Organization to Agile
Transforming Your Organization to AgileSteve Greene
 
Join In The Race Or Be Left Behind: How ‘Change’ Is Changing The Competitive ...
Join In The Race Or Be Left Behind: How ‘Change’ Is Changing The Competitive ...Join In The Race Or Be Left Behind: How ‘Change’ Is Changing The Competitive ...
Join In The Race Or Be Left Behind: How ‘Change’ Is Changing The Competitive ...Accenture Insurance
 
Power to the People! Shifting from Project to Product with Tasktop Viz
Power to the People! Shifting from Project to Product with Tasktop VizPower to the People! Shifting from Project to Product with Tasktop Viz
Power to the People! Shifting from Project to Product with Tasktop VizTasktop
 
Let It Flow: Using Flow Metrics to Combat Cognitive Overload
Let It Flow: Using Flow Metrics to Combat Cognitive OverloadLet It Flow: Using Flow Metrics to Combat Cognitive Overload
Let It Flow: Using Flow Metrics to Combat Cognitive OverloadTasktop
 
Shared Services Leaders agenda March 2015 FINAL
Shared Services Leaders agenda March 2015 FINALShared Services Leaders agenda March 2015 FINAL
Shared Services Leaders agenda March 2015 FINALChantal Hevey
 
Agile in enterprise resource planning a myth no more
Agile in enterprise resource planning a myth no moreAgile in enterprise resource planning a myth no more
Agile in enterprise resource planning a myth no moreMauricio Rivadeneira
 
Agile Development Meets Cloud Computing for Extraordinary Results at Salesfor...
Agile Development Meets Cloud Computing for Extraordinary Results at Salesfor...Agile Development Meets Cloud Computing for Extraordinary Results at Salesfor...
Agile Development Meets Cloud Computing for Extraordinary Results at Salesfor...Steve Greene
 
Transforming the services centric tech stack
Transforming the services centric tech stack Transforming the services centric tech stack
Transforming the services centric tech stack Melissa Lewington
 
Ten tech-enabled business trands to watch - August 10
Ten tech-enabled business trands to watch - August 10Ten tech-enabled business trands to watch - August 10
Ten tech-enabled business trands to watch - August 10Carl Terrantroy
 
Algos+Power Objects Case Study 021024
Algos+Power Objects Case Study 021024Algos+Power Objects Case Study 021024
Algos+Power Objects Case Study 021024sixsigmascience
 
Office 365 Adoption Guide (63 pages) (2016 version)
Office 365 Adoption Guide (63 pages) (2016 version) Office 365 Adoption Guide (63 pages) (2016 version)
Office 365 Adoption Guide (63 pages) (2016 version) Daniel Chang
 

Tendances (20)

Unleash the agile power bridging the gap between development and operations...
Unleash the agile power   bridging the gap between development and operations...Unleash the agile power   bridging the gap between development and operations...
Unleash the agile power bridging the gap between development and operations...
 
Whats the problem_ebook
Whats the problem_ebookWhats the problem_ebook
Whats the problem_ebook
 
Taking the first step to agile digital services
Taking the first step to agile digital servicesTaking the first step to agile digital services
Taking the first step to agile digital services
 
Lean IT Presentation
Lean IT PresentationLean IT Presentation
Lean IT Presentation
 
Business Results: Get there faster with SOA Governance
Business Results:  Get there faster with SOA GovernanceBusiness Results:  Get there faster with SOA Governance
Business Results: Get there faster with SOA Governance
 
Corporate presentation
Corporate presentationCorporate presentation
Corporate presentation
 
Transforming Your Organization to Agile
Transforming Your Organization to AgileTransforming Your Organization to Agile
Transforming Your Organization to Agile
 
Join In The Race Or Be Left Behind: How ‘Change’ Is Changing The Competitive ...
Join In The Race Or Be Left Behind: How ‘Change’ Is Changing The Competitive ...Join In The Race Or Be Left Behind: How ‘Change’ Is Changing The Competitive ...
Join In The Race Or Be Left Behind: How ‘Change’ Is Changing The Competitive ...
 
Corporate profile
Corporate profileCorporate profile
Corporate profile
 
Power to the People! Shifting from Project to Product with Tasktop Viz
Power to the People! Shifting from Project to Product with Tasktop VizPower to the People! Shifting from Project to Product with Tasktop Viz
Power to the People! Shifting from Project to Product with Tasktop Viz
 
Let It Flow: Using Flow Metrics to Combat Cognitive Overload
Let It Flow: Using Flow Metrics to Combat Cognitive OverloadLet It Flow: Using Flow Metrics to Combat Cognitive Overload
Let It Flow: Using Flow Metrics to Combat Cognitive Overload
 
Shared Services Leaders agenda March 2015 FINAL
Shared Services Leaders agenda March 2015 FINALShared Services Leaders agenda March 2015 FINAL
Shared Services Leaders agenda March 2015 FINAL
 
Agile in enterprise resource planning a myth no more
Agile in enterprise resource planning a myth no moreAgile in enterprise resource planning a myth no more
Agile in enterprise resource planning a myth no more
 
Agile Development Meets Cloud Computing for Extraordinary Results at Salesfor...
Agile Development Meets Cloud Computing for Extraordinary Results at Salesfor...Agile Development Meets Cloud Computing for Extraordinary Results at Salesfor...
Agile Development Meets Cloud Computing for Extraordinary Results at Salesfor...
 
Transforming the services centric tech stack
Transforming the services centric tech stack Transforming the services centric tech stack
Transforming the services centric tech stack
 
Top 10 Pitfalls Of Am
Top 10 Pitfalls Of AmTop 10 Pitfalls Of Am
Top 10 Pitfalls Of Am
 
Ten tech-enabled business trands to watch - August 10
Ten tech-enabled business trands to watch - August 10Ten tech-enabled business trands to watch - August 10
Ten tech-enabled business trands to watch - August 10
 
Algos+Power Objects Case Study 021024
Algos+Power Objects Case Study 021024Algos+Power Objects Case Study 021024
Algos+Power Objects Case Study 021024
 
Automated legacy portfolio assessment
Automated legacy portfolio assessmentAutomated legacy portfolio assessment
Automated legacy portfolio assessment
 
Office 365 Adoption Guide (63 pages) (2016 version)
Office 365 Adoption Guide (63 pages) (2016 version) Office 365 Adoption Guide (63 pages) (2016 version)
Office 365 Adoption Guide (63 pages) (2016 version)
 

Similaire à Service Management in a DevOps World - by Helen Beal

Use DevOps to Respond Faster to End Customers
Use DevOps to Respond Faster to End CustomersUse DevOps to Respond Faster to End Customers
Use DevOps to Respond Faster to End CustomersInfo-Tech Research Group
 
devops services.pdf
devops services.pdfdevops services.pdf
devops services.pdfElina619459
 
Devops ppt copy
Devops ppt   copyDevops ppt   copy
Devops ppt copysaigowsi
 
Why DevOps is Essential for Digital Transformation
Why DevOps is Essential for Digital TransformationWhy DevOps is Essential for Digital Transformation
Why DevOps is Essential for Digital TransformationCloudZenix LLC
 
Patterns for Success: Lessons Learned When Adopting Enterprise DevOps
Patterns for Success: Lessons Learned When Adopting Enterprise DevOpsPatterns for Success: Lessons Learned When Adopting Enterprise DevOps
Patterns for Success: Lessons Learned When Adopting Enterprise DevOpsCognizant
 
Why is dev ops essential for fintech development
Why is dev ops essential for fintech developmentWhy is dev ops essential for fintech development
Why is dev ops essential for fintech developmentnimbleappgenie
 
Why is dev ops essential for fintech development
Why is dev ops essential for fintech developmentWhy is dev ops essential for fintech development
Why is dev ops essential for fintech developmentnimbleappgenie
 
Adopting DevOps: Overcoming Three Common Stumbling Blocks
Adopting DevOps: Overcoming Three Common Stumbling BlocksAdopting DevOps: Overcoming Three Common Stumbling Blocks
Adopting DevOps: Overcoming Three Common Stumbling BlocksCognizant
 
What is DevOps' process?
What is DevOps' process?What is DevOps' process?
What is DevOps' process?prabhuseshu
 
Introduction to DevOps slides.pdf
Introduction to DevOps slides.pdfIntroduction to DevOps slides.pdf
Introduction to DevOps slides.pdfBoreVishnusai
 
Best DevOps course Online & Classroom Training Naresh-IT
Best DevOps course Online & Classroom Training Naresh-ITBest DevOps course Online & Classroom Training Naresh-IT
Best DevOps course Online & Classroom Training Naresh-ITmanoharjgpsolutions
 
4 Prerequisites for DevOps Success
4 Prerequisites for DevOps Success4 Prerequisites for DevOps Success
4 Prerequisites for DevOps SuccessCloudCheckr
 
The Advantages of DevOps Consulting That Can Transform Your Business 
The Advantages of DevOps Consulting That Can Transform Your Business The Advantages of DevOps Consulting That Can Transform Your Business 
The Advantages of DevOps Consulting That Can Transform Your Business Flexsin
 
Introduction to DevSecOps. An intuitiv approach
Introduction to DevSecOps. An intuitiv approachIntroduction to DevSecOps. An intuitiv approach
Introduction to DevSecOps. An intuitiv approachFrancisXavierInyanga
 
DevOps & continuous delivery - Sogeti
DevOps & continuous delivery - SogetiDevOps & continuous delivery - Sogeti
DevOps & continuous delivery - SogetiBalram Yadav
 
Top 10 DevOps Principles for successful development teams.pdf
Top 10 DevOps Principles for successful development teams.pdfTop 10 DevOps Principles for successful development teams.pdf
Top 10 DevOps Principles for successful development teams.pdfSparity1
 

Similaire à Service Management in a DevOps World - by Helen Beal (20)

Use DevOps to Respond Faster to End Customers
Use DevOps to Respond Faster to End CustomersUse DevOps to Respond Faster to End Customers
Use DevOps to Respond Faster to End Customers
 
devops services.pdf
devops services.pdfdevops services.pdf
devops services.pdf
 
Devops ppt copy
Devops ppt   copyDevops ppt   copy
Devops ppt copy
 
DevOps
DevOps DevOps
DevOps
 
Why DevOps is Essential for Digital Transformation
Why DevOps is Essential for Digital TransformationWhy DevOps is Essential for Digital Transformation
Why DevOps is Essential for Digital Transformation
 
Patterns for Success: Lessons Learned When Adopting Enterprise DevOps
Patterns for Success: Lessons Learned When Adopting Enterprise DevOpsPatterns for Success: Lessons Learned When Adopting Enterprise DevOps
Patterns for Success: Lessons Learned When Adopting Enterprise DevOps
 
Why is dev ops essential for fintech development
Why is dev ops essential for fintech developmentWhy is dev ops essential for fintech development
Why is dev ops essential for fintech development
 
Why is dev ops essential for fintech development
Why is dev ops essential for fintech developmentWhy is dev ops essential for fintech development
Why is dev ops essential for fintech development
 
ITIL Guide for DevOps
ITIL Guide for DevOpsITIL Guide for DevOps
ITIL Guide for DevOps
 
Adopting DevOps: Overcoming Three Common Stumbling Blocks
Adopting DevOps: Overcoming Three Common Stumbling BlocksAdopting DevOps: Overcoming Three Common Stumbling Blocks
Adopting DevOps: Overcoming Three Common Stumbling Blocks
 
Bn1006 demo ppt devops
Bn1006 demo ppt devopsBn1006 demo ppt devops
Bn1006 demo ppt devops
 
What is DevOps' process?
What is DevOps' process?What is DevOps' process?
What is DevOps' process?
 
Introduction to DevOps slides.pdf
Introduction to DevOps slides.pdfIntroduction to DevOps slides.pdf
Introduction to DevOps slides.pdf
 
Best DevOps course Online & Classroom Training Naresh-IT
Best DevOps course Online & Classroom Training Naresh-ITBest DevOps course Online & Classroom Training Naresh-IT
Best DevOps course Online & Classroom Training Naresh-IT
 
4 Prerequisites for DevOps Success
4 Prerequisites for DevOps Success4 Prerequisites for DevOps Success
4 Prerequisites for DevOps Success
 
Devops
DevopsDevops
Devops
 
The Advantages of DevOps Consulting That Can Transform Your Business 
The Advantages of DevOps Consulting That Can Transform Your Business The Advantages of DevOps Consulting That Can Transform Your Business 
The Advantages of DevOps Consulting That Can Transform Your Business 
 
Introduction to DevSecOps. An intuitiv approach
Introduction to DevSecOps. An intuitiv approachIntroduction to DevSecOps. An intuitiv approach
Introduction to DevSecOps. An intuitiv approach
 
DevOps & continuous delivery - Sogeti
DevOps & continuous delivery - SogetiDevOps & continuous delivery - Sogeti
DevOps & continuous delivery - Sogeti
 
Top 10 DevOps Principles for successful development teams.pdf
Top 10 DevOps Principles for successful development teams.pdfTop 10 DevOps Principles for successful development teams.pdf
Top 10 DevOps Principles for successful development teams.pdf
 

Plus de Plutora

Deployment Planning and Management
Deployment Planning and ManagementDeployment Planning and Management
Deployment Planning and ManagementPlutora
 
Product Brief – Plutora Release
Product Brief – Plutora ReleaseProduct Brief – Plutora Release
Product Brief – Plutora ReleasePlutora
 
Product Brief – Plutora Environments
Product Brief – Plutora EnvironmentsProduct Brief – Plutora Environments
Product Brief – Plutora EnvironmentsPlutora
 
CI/CD Tools Universe: The Ultimate List
CI/CD Tools Universe: The Ultimate ListCI/CD Tools Universe: The Ultimate List
CI/CD Tools Universe: The Ultimate ListPlutora
 
How Can You Solve Today’s “My-Hair’s-on-Fire!” Release Challenges?_top_challe...
How Can You Solve Today’s “My-Hair’s-on-Fire!” Release Challenges?_top_challe...How Can You Solve Today’s “My-Hair’s-on-Fire!” Release Challenges?_top_challe...
How Can You Solve Today’s “My-Hair’s-on-Fire!” Release Challenges?_top_challe...Plutora
 
Achieve the Full Potential of SAFe with Effective Release Management
Achieve the Full Potential of SAFe with Effective Release ManagementAchieve the Full Potential of SAFe with Effective Release Management
Achieve the Full Potential of SAFe with Effective Release ManagementPlutora
 
Balance Change and Control of Continuous Delivery at Scale
Balance Change and Control of Continuous Delivery at ScaleBalance Change and Control of Continuous Delivery at Scale
Balance Change and Control of Continuous Delivery at ScalePlutora
 

Plus de Plutora (7)

Deployment Planning and Management
Deployment Planning and ManagementDeployment Planning and Management
Deployment Planning and Management
 
Product Brief – Plutora Release
Product Brief – Plutora ReleaseProduct Brief – Plutora Release
Product Brief – Plutora Release
 
Product Brief – Plutora Environments
Product Brief – Plutora EnvironmentsProduct Brief – Plutora Environments
Product Brief – Plutora Environments
 
CI/CD Tools Universe: The Ultimate List
CI/CD Tools Universe: The Ultimate ListCI/CD Tools Universe: The Ultimate List
CI/CD Tools Universe: The Ultimate List
 
How Can You Solve Today’s “My-Hair’s-on-Fire!” Release Challenges?_top_challe...
How Can You Solve Today’s “My-Hair’s-on-Fire!” Release Challenges?_top_challe...How Can You Solve Today’s “My-Hair’s-on-Fire!” Release Challenges?_top_challe...
How Can You Solve Today’s “My-Hair’s-on-Fire!” Release Challenges?_top_challe...
 
Achieve the Full Potential of SAFe with Effective Release Management
Achieve the Full Potential of SAFe with Effective Release ManagementAchieve the Full Potential of SAFe with Effective Release Management
Achieve the Full Potential of SAFe with Effective Release Management
 
Balance Change and Control of Continuous Delivery at Scale
Balance Change and Control of Continuous Delivery at ScaleBalance Change and Control of Continuous Delivery at Scale
Balance Change and Control of Continuous Delivery at Scale
 

Dernier

buds n tech IT solutions
buds n  tech IT                solutionsbuds n  tech IT                solutions
buds n tech IT solutionsmonugehlot87
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?Watsoo Telematics
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 

Dernier (20)

buds n tech IT solutions
buds n  tech IT                solutionsbuds n  tech IT                solutions
buds n tech IT solutions
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 

Service Management in a DevOps World - by Helen Beal

  • 1. Service Management in a DevOps World By Helen Beal A Guide to Evolving Light-Weight Service Management Processes for Value Streams Using DevOps Principles
  • 2. 2Service Management in a DevOps World The Origins of Services 4 Where ITSM Has Been Painful 6 Value Stream Centric Service Management 8 Identifying Current Condition: Value Stream Mapping 11 1.0 The DevOps Approach to Change 13 1.1 Long Term Vision: Lightweight, peer-reviewed change 14 1.2 Current Condition: Change Advisory/Approval Boards 14 1.3 Next Target Conditions 15 1.3.1 Reduce Batch Size 15 1.3.2 Classify Changes and Make Them Visible 15 1.3.3 Automate the Change ‘Checklist’ 17 1.3.4 Limit the Blast Radius: Canary Testing/Deployment 18 1.3.5 Address System Dependencies 18 1.4 Example Experiments 19 2.0 The DevOps Approach to Release 20 2.1 Long Term Goal: Teams autonomously release on demand (CD) 21 2.2 Current Condition: Release weekends, calendars and managers 21 2.3 Next Target Conditions 22 2.3.1 Define Release and Deploy and Reduce Batch Size 22 2.3.2 Defer the Release Management Role to the Team 23 2.3.3 Increase the Availability of Release Slots 23 2.3.4 Automate the Release ‘Checklist’ and Deployment Process 24 2.3.5 Limit the Deployment Blast Radius: Blue/Green Deployments 24 2.3.6 Reducing the Route to Live, Leveraging Cloud 26 3.0 The DevOps Approach to Security 27 3.1 Long Term Goal: Checks in the IDE 28 3.2 Current Condition: Pen tests in Prod 28 Table of Contents
  • 3. 3Service Management in a DevOps World 3.3 Next Target Conditions 29 3.3.1 Shifting Left and Automation 29 3.3.2 DevSecOps Culture and Behaviors 30 3.3.3 Customer Feedback and Bug Bounties 31 3.3.4 Software is Like Milk, Not Wine 32 3.3.5 Belt and Braces 32 3.4 Example Experiments 33 4.0 The DevOps Approach to Support 34 4.1 Long Term Goal: You build it, you own it, and/or swarming 35 4.2 Current Condition: 3 tiers 35 4.3 Next Target Conditions 36 4.3.1 Arrange Around Products 37 4.3.2 Automating Support: ChatOps and Bots 38 4.3.3 Automating Support: Knowledge and Self-Service 39 4.3.4 Telemetry Everywhere and Viewability/Observability 39 4.3.5 CICD and Intelligent Risk Management 40 4.4 Example Experiments 41 5.0 The DevOps Approach to Incidents 42 5.1 Long Term Goal: ChatOps in/across teams 43 5.2 Current Condition: War rooms, incident managers 44 5.3 Next Target Conditions 44 5.3.1 Blameless Retrospectives and Experimentation 45 5.3.2 Reframing Failure, Safety Culture and The Andon Cord 45 5.3.3 Automation: Site Reliability and Chaos Engineering 46 5.4 Example Experiments 47 Conclusion: Flow & Value Stream Management 48 Further Reading 51
  • 4. 4Service Management in a DevOps World The Origins of Services The origins of DevOps lie in agile system administration and the recognition that whilst software development teams were taking advantage of agile methodologies to become more responsive to change and uncertainty, the IT Operations people were not. Sometimes they were even oblivious to what was happening on the other side of the ‘wall of confusion’ and painful tensions and misunderstandings occurred between the two technology teams: IT Ops guys grumbled about developers wanting administrator access to production machines, the developers moaned that IT Ops guys took too long to provision environments, and releasing an update was always a ‘hair on fire’ moment that frequently resulted in blame games and mostly happened at weekends.
  • 5. 5Service Management in a DevOps World The battle between change and stability seemed as if it would rage on. But DevOps principles have taught us how to balance throughput and reliability without compromise to either. DevOps is not just practiced by the ‘born on the web’ behemoths but by an ever increasing number of traditional enterprises, those who in the past have embraced IT Service Management (ITSM) approaches to service delivery. Whilst some organizations consciously or unconsciously drive their DevOps evolution from their development teams, there always comes a time where they seek to understand how to optimize service management activities as part of the end-to-end technology delivery value stream. And whilst development are about agile, IT Ops are about ITSM and we can use lean tools to marry the two and create lightweight, “just-enough” processes that allow both teams to work at the same cadence. DevOps has evolved to focus on the end-to-end optimization of the value stream, accelerating flow from idea to value realization. How we handle and manage key technology delivery services changes when our primary goals are to optimize the flow of value and system integrity.
  • 6. 6Service Management in a DevOps World Where ITSM Has Been Painful Traditional ITSM processes, whilst designed for all the right reasons; to protect us, to improve our predictability and to enable common understanding, are frequently accused of being onerous and of blocking the flow of value from idea to realisation to the customer. In the past, when we have experienced an issue or problem, a typical response is to be to add a control; this is why large enterprises that have operated for a significant amount of time are often bogged down in bureaucracy - layers of process that have built up over time. Additionally, these types of organizations have evolved system and organizational designs (not necessarily by intention) that contain large numbers of (sometimes unknown) dependencies, exacerbating the sense, and actuality, of fragility. DevOps seeks to improve sustainable working practices and reduce workplace burnout and stress. It remodels the ways of working to improve velocity, consistency and predictability, visualizing the flow of work and removing constraints. Our focus moves from managing dependencies, to breaking them to create loosely coupled organizational and technology systems that allow us to build, test and deploy in small increments.
  • 7. 7Service Management in a DevOps World ITSM has, inadvertently, caused some key constraints which have led to working practices that frustrate people because they slow people down. But these same working practices were introduced to avoid catastrophic failures caused by chaos and unknown or unseen dependencies. Things like the Change Advisory Board, that in many organizations morphed into the Change Approval Board (the difference is subtle, but palpable) that add wait times to value streams and are often perceived to be adding no real value to the customer experience. The change and release calendars and checklists are often similarly reviled and not valued. Painful working practices related to these are release weekends and nights, the subsequent war-room when a large batch release goes bad, and project centric cultures that are typified by a culture of meetings and irregular demand flows (feast and famine) and spiralling technical debt.
  • 8. 8Service Management in a DevOps World Value Stream Centric Service Management Approaching service management activities such as change, release, security, support and incident with a DevOps hat on changes the way we work to allow for improved adaptability whilst not forgetting what we’ve learned about ensuring customer experience. Underpinning this is the principle of little and often; more frequent inspection and smaller work packages allowing us to receive feedback more often to course correct more frequently and regularly.
  • 9. 9Service Management in a DevOps World Activity Current Condition Next Target Condition Long Term Vision Change CABs Typifying change, automating checklists Teams peer review their own changes Release Release weekends, calendars & managers More frequent, automated releases Teams autonomously release on demand Security Pen tests in prod Automatic scanning in CI Checks in IDE Support 3 tiers ChatOps, automated customer feedback You build it, you own it, and/or swarming Incident War rooms, incident managers Healthy retrospectives ChatOps in/across teams The following table summarizes how these activities change in a value stream centric world and describes a midway step to consider as an organization transitions from one capability to another. It uses the lean improvement kata approach where we first look at the long term vision, seek to understand the current condition and identify the next target condition. Organizations should seek to experiment using the Deming PDCA (Plan-Do-Check-Act) cycle:
  • 10. 10Service Management in a DevOps World Each of these five areas: change, release, security, support and incident are explored in detail in this paper. A key to achieving the desired capabilities is the focus on breaking dependencies to allow for loosely coupled systems and structures; observe Conway’s Law that tells us that any organization will design systems that look like their communication and organization structure. Compare how an organization with many large silos that pass off to each other creates monolithic ‘big balls of mud’, compared to how an organization with small, autonomous teams creates a microservices architecture where components are loosely connected and can be changed, tested and deployed independently of one another. “Organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations.” - Melvin Conway
  • 11. 11Service Management in a DevOps World Identifying Current Condition: Value Stream Mapping In DevOps the realization of value is the core focus of teams; the definition of done moves from “I did my job” to “the customer has received value”. When an organization is in transition from a traditional waterfall way of working to a more adaptable, agile way of working, it can be hard to see what changes should be made and how. Using a lean tool, Value Stream Mapping, is a highly effective way of reaching understanding of the current condition and consensus on the activities needed to make improvement. A value stream is anything that delivers a product or a service and is made up of several activities or processes that start when an idea presents itself and is complete when the customer receives the value derived from that idea.
  • 12. 12Service Management in a DevOps World Value Stream Mapping requires a group of people with representation from each activity or process in the value stream share the same physical space while they visually collaborate to map how the activities connect together, how long each takes and how long each step waits for the other to start thus calculating the cycle time for value delivery. It’s often the first time a particular group of people have been in a room together and provides a qualitative and quantitative time diagnostic of the value stream. It’s here that people begin to fully appreciate how and why particular processes, for example change, cause delays in the delivery of value and work together to imagine, measure and plan improvements. The improvements are based on principles around queuing and batch size: when we map a value stream we can see how large batch sizes create queues which consequently increase our lead and cycle time. It highlights how risk is reduced when we make our work packages smaller as we receive faster feedback. This will directly impact architectural decisions and how we seek to reduce the route to live.
  • 13. 1.0 The DevOps Approach to Change “There’s a right way to handle the change approval process, and it leads to improvements in speed and stability and reductions in burnout. Heavyweight change approval processes, such as change approval boards, negatively impact speed and stability.” - 2019 Accelerate State of DevOps Report
  • 14. 14Service Management in a DevOps World There are other problems with change approval boards too - they are often seen as a ‘checklist’ exercise, performed by people who have no real understanding of the nature or impact of the change itself. Having a change calendar constrains teams from being able to release their changes when they are ready or they want to and inevitably slows them down. A lightweight change process is a peer-reviewed change process that is owned by the team. Changes are so small and so frequent they only take a short time to be checked, approved and released. All of the testing has already happened (it is automated in our continuous delivery pipeline) and we have good test data and production-like test environments to deploy to. Our systems are sufficiently decoupled. As part of a value stream mapping exercise, the team asks if each step or activity in the value stream is ‘value-adding’ i.e. does work happen here that directly creates value for the customer. The answer here is never yes, so we should seek to remove this step, whilst ensuring that the purpose of the activity (the protection of our services from failure) is not lost. 1.1 Long Term Vision: Lightweight, peer-reviewed change 1.2 Current Condition: Change Advisory/Approval Boards
  • 15. 15Service Management in a DevOps World We mustn’t lose sight of the fact that we need these controls to protect ourselves from chaotic change. This requires that teams have clarity and understanding of the change process and that everyone that needs to have visibility of the changes and that proper procedures are followed. 1.3.1 Reduce Batch Size The first step is for individual teams to reduce the batch size of their changes. This will have a direct effect on the length of the queues and also the amount of risk each change carries. Whilst doing this, the team needs to make the small changes visible probably by using a product backlog tool and expressing them as user stories. Now that the team has smaller changes, they will need to meet more frequently to review their progress, ideally using an agile framework. The work in progress is made visible via physical or virtual boards. 1.3.2 Classify Changes and Make Them Visible As the changes become smaller and segregated out from a batch, it becomes possible to classify changes. Teams may use terms such as “standard”, “small” or “emergency”. Working with the change management people, the team agree to experiment with making changes with their Product Owner only (not the CAB) approving them. They make these changes visible to the change management people and do not schedule them on the change calendar. 1.3 Next Target Conditions
  • 16. 16Service Management in a DevOps World They agree with the change management people what checks they will perform themselves before the change is released to customers and record that these checks have been undertaken, ideally in a workflow in the product backlog. Once the team has proven themselves reliable, they gain autonomy to increase the amount of change they perform outside of the traditional change process.
  • 17. 17Service Management in a DevOps World 1.3.3 Automate the Change ‘Checklist’ Working with the change management people, the team can identify the change ‘checklist’ and automate it. It’s likely to demand certain tests are done, for example unit, integration and user acceptance tests. Higher levels of fluency also include non-functional tests for security (see section 3.0) and performance. Having smaller changes and using trunk-based development (where there are a small number of short-lived feature branches) in a continuous integration and delivery (CICD) pipeline demands these gates are completed before deployment. Peer-review of code and the change can also be baked into the product backlog workflow; this is fantastic for auditors, as is the version control that is the foundation of the CICD pipeline, as it not only shows how the process steps are followed but provides actual proof that they are. Adding monitoring means that the team receive customer feedback fast and have access to fast fault diagnosis. Using automated deployment (see Section 2.0) will give the team the opportunity to instantly redeploy last known good state (caveat: not all failures are a change failure relating directly to the last change and this can get more complex when we are delivering changes more frequently, but the alternative is to try to identify which change in a large batch caused the problem).
  • 18. 18Service Management in a DevOps World 1.3.4 Limit the Blast Radius: Canary Testing/Deployment If we recognise that the point of the change controls is to protect us from catastrophic failure, let’s define that. If the definition includes making a change that fails for everyone, we can tackle the ‘everyone’ element by making the change for only a few: canary testing or deployment. If it works for a few, we can then push it out to more, many and all. If it doesn’t, we revert, learn and try again. 1.3.5 Address System Dependencies Since much of the requirement for centrally coordinated change comes from tightly coupled systems and incidents caused by unpredictable system dependencies, reducing these dependencies is key to protecting the teams from system fragility. Teams take ownership of these architectural discussions and drive cross-team conversations through communities of practice/interest or agile at scale techniques such as Scrum of Scrums. Additionally, organizations practice inner-source where teams can see and change (with visibility and peer-review) each others’ systems. Conway’s Law tells us that we will design systems that look like our organizational communication structures; if our teams are autonomous and loosely coupled, so will be our systems architecture. Using a microservices and API model leads us to a place where we can test and deploy small pieces independently. It does give us more pieces to manage but that’s the trade-off.
  • 19. 19Service Management in a DevOps World 1.4 Example Experiments “If the team completes a small change themselves next week, ensuring the change is visible in Jira, versioned in GitLab and Jenkins automates the build and runs the unit and integration tests and we peer-review the code and change, there won’t be a change failure as a result.” “We believe that when we classify changes to ‘small’, 60% of our changes won’t need to go through CAB and our lead time will reduce on average by one week in the next 4 months.” “Our architect thinks that we can uncover and break 20 dependencies by the end of the year if they are flagged in the Scrum of Scrums and 20% of all product teams’ sprint is allocated to this activity.” “I hypothesize that if we create a workflow in Jira this week that won’t allow a build to go green until the tests are passed and peer-review is complete, our change fail rate will drop over the next three months and over 10 sprints the central change team and CAB will accept this as evidence that we have followed their procedures and allow us change autonomy. Auditing will take no time at all when it comes around next March.”
  • 20. We start here with a couple of key principles: 1. Release weekends are bad. 2. The DevOps ‘little and often’ approach:  Releases should be ‘like breathing’ not creating  ‘hair on fire’ moments. 2.0 The DevOps Approach to Release
  • 21. 21Service Management in a DevOps World For many organizations, it’s important to ensure everyone understands what is meant by ‘release’ and ‘deploy’ as they vary frequently. Often people prepare a release, deploy it to production and then release it to customers. These distinctions become less important as DevOps fluency improves, but as teams and organizations evolve, they need to know they are speaking the same language. 2.1 Long Term Goal: Teams autonomously release on demand (CD) Here our teams can release their new features and fixes whenever they are ready. Their continuous delivery pipeline ensures that software is always in a releasable state and they may also have continuous deployment - on successful completion of all the tests, the change is automatically deployed and released into production. 2.2 Current Condition: Release weekends, calendars and managers Traditional ITSM processes have taught us to create release packages; large bundles of features. This is clearly in contention with our DevOps principle of little and often where we reduce the risk of deploying a change by making it smaller. Because we have large, high risk releases we then schedule them and have people to manage these schedules. Teams have to wait until their slots in the calendar become available in order to perform their deployment to production and release to customers.
  • 22. 22Service Management in a DevOps World 2.3 Next Target Conditions We want to balance two of the four key DevOps metrics here: the throughput metric for deployment frequency and the stability metric for change fail rate. This will also reduce our lead time and we don’t want to cause incidents that will cause us to measure our Mean Time to Recovery (see Section 5.0). Once again, value stream mapping is likely to show in traditional ways of working that the release and deployment process is lengthy, particularly if teams are required to interact with a release management team to book slots on a release calendar. 2.3.1 Define Release and Deploy and Reduce Batch Size As covered in Section 1.0, reducing the batch size reduces the risk and the queueing time. It’s important that people in an organization know what is meant when the words ‘release’ and ‘deploy’ are used and they do vary from organization to organization. People talk about ‘deploying a release’ or ‘releasing to production’.
  • 23. 23Service Management in a DevOps World When changes are small they can be deployed easily with reduced risk of disruption and the distinction between the terms becomes less important. 2.3.2 Defer the Release Management Role to the Team When teams are working autonomously with small changes they can release them when they are ready. But it takes time to transition to that place and on the path is considering the move from one state to another. In a traditional way of working, there is likely to be a release manager or a team of release managers who are coordinating the release process. The release manager role can be transitioned to the team, with the team giving access to systems that allow the release manager visibility into releases that are happening. 2.3.3 Increase the Availability of Release Slots Initially, when teams start using agile frameworks such as Scrum, they will aim to release at the end of a two week sprint, or perhaps at the end of several sprints. If the organization is working with a release calendar that may have quarterly or monthly release slots, they should look to increase the number of slots available to allow for the smaller and more frequent changes. In time, whether teams are using sprint or Kanban ways of working, they will evolve to releasing on demand or continuous delivery. At this point, no type of release calendar or management is required as the teams operate autonomously.
  • 24. 24Service Management in a DevOps World 2.3.4 Automate the Release ‘Checklist’ and Deployment Process As with change, many organizations operate with a release checklist to ensure agreed policies and procedures are met. As with change, many of these steps, such as versioning and testing can be automated in the CICD pipeline and teams will release themselves; they build it, they own it. As with change, release and deployment autonomy is highly dependent on system autonomy but where systems remain tightly coupled, release management tools are also available to track and manage these system dependencies. These systems can also profile the risk associated with release. Deployment automation tools as part of the CICD pipeline further predictability in the process by reducing the manual effort associated with these tasks and providing patterns or templates that reduce configuration drift and allow for self-service in the teams. 2.3.5 Limit the Deployment Blast Radius: Blue/Green Deployments Organizations use the canary testing/deployment scenario described in Section 1.3.4 and also use feature toggles/flags and blue/green deployments to mitigate deployment failure risk. Toggling features on or off separates feature release from code deployment allowing code to be deployed to production while restricting access (through configuration) to a subset of users. It also allows unfinished code to undergo integration testing whilst remaining inaccessible when live and allows for A/B testing and canary testing/deployment.
  • 25. 25Service Management in a DevOps World Blue-green deployment is a technique that reduces downtime and risk by running two identical production environments called Blue and Green. At any time, only one of the environments is live, with the live environment serving all production traffic. For this example, Blue is currently live and Green is idle. As a new version of software is prepared, deployment and the final stage of testing takes place in the environment that is not live: in this example, Green. Once the software is deployed and fully tested in Green, the router switches incoming requests to Green instead of Blue. Green is now live, and Blue is idle. This can also help with reducing the Route to Live (RtL) which reduces handoffs and opportunities from problems and improves flow.
  • 26. 26Service Management in a DevOps World 2.3.6 Reducing the Route to Live, Leveraging Cloud Many organizations have complex RtLs containing multiple test environments and experience difficulties in production since these environments are not production-like. Teams also frequently have to share these environments and find it difficult to obtain good test data. The factor that most commonly prevents teams from having access to production like test environments is cost. Using cloud technologies can ease the pain here (and research shows that using these type of technologies (public, private, hybrid or multi) correlates with higher performing organizations) allowing teams to easily spin up test environments as when they are needed. Working in small increments, using blue/green deployments, automating testing, embedding testing in the team and Test Driven Development (TDD) all contribute to a reduction in the number of steps in the RtL, reducing the risk and accelerating the flow of value. Once more, value stream mapping uncovers how much time is spent stepping through the RtL. TDD is a software development process that relies on the repetition of a very short development cycle: first the developer writes an (initially failing) automated test case that defines a desired improvement or new function, then produces the minimum amount of code to pass that test, and finally refactors the new code to acceptable standards.
  • 27. As well as DevOps, we have DevSecOps. Whilst not all in the industry are comfortable with the addition of another term (it has the potential to confuse people and create additional silos and handoffs) it recognizes that security has been late to the party, or that their invitation was sent late. In many organizations security represents a severe constraint, unsurprisingly since there are many reports of cybersecurity skills shortages, and often are significantly separate from the rest of the technology team. It’s not uncommon, when performing value stream mapping exercises, to find delays of several weeks while teams wait for penetration tests. 3.0 The DevOps Approach to Security
  • 28. 28Service Management in a DevOps World There are many who say that security is just another test and just another non-functional requirement, and whilst elements of this is true, it’s also true that the extreme separation of the security team and their often being seen as a ‘black-box’ means that incorporating them into the pipeline earlier (shifting left) is more difficult to do than with some other areas of testing. For example, it’s relatively easy for developers to start incorporating unit tests as part of their automated build process. Automated integration tests and user acceptance tests follow fast. 3.1 Long Term Goal: Checks in the IDE Here the security tests are pushed as far left as technically possible; into the developers’ hands, providing developers with the knowledge they need about vulnerabilities in the components that they are accessing from their IDE in the artifact repository and handing them control over the software supply chain. 3.2 Current Condition: Pen tests in Prod Most organizations perform regular or sporadic penetration tests or vulnerability assessments in production and many are required by regulators to do so and audited to ensure they happen. They can be done either manually or using tools, typically a combination of the two, and produce a report that is then passed to developers who work the actions into their backlog. Or not.
  • 29. 29Service Management in a DevOps World 3.3 Next Target Conditions Ultimately the security constraint is broken so there is no wait time for security activities to complete and the teams are confident that their product is as uncompromisable as possible. We break the security constraint through culture and the sharing of knowledge and from automating checks and remediation. 3.3.1 Shifting Left and Automation As described, in DevSecOps security testing happens much earlier than penetration testing in production (although, in many cases this may still need to happen, not just for auditing purposes but for configuration cases also). Where the teams are using artifacts, the repositories can be used to scan and flag for vulnerabilities at the point of software composition. The developer can be informed as they access a component of its vulnerability status and advised if another version fits the organization’s security policies better. If the teams don’t want developers interrupted in this way, non-compliant vulnerabilities can break the build in the CICD pipeline. Static and Dynamic Application Security Testing (SAST and DAST) are also used to test the source code and the application when its running. IAST (Interactive Application Security Testing) analyzes code for security vulnerabilities while the application is running from inside the application and reports in real-time. As cloud and CICD proliferate machines, automated identity management tools are also recommended.
  • 30. 30Service Management in a DevOps World 3.3.2 DevSecOps Culture and Behaviors The relationship between development and security is fractious in many organizations, with security believing that developers don’t care about security and developers feeling that security are overly zealous, detailed and don’t understand the myriad of pressures that they are under. An effective pattern is to have security people work in a product team or feature squad on a temporary basis. Whilst there may not be a lot of security people to go around (some refer to the 100:10:1 ratio of developers:operations:security), the payoff is worth it as there are two key benefits; the first is the building of empathy and relationships and the second is knowledge transfer as the 80:20 rule applies here: 80% of the security issues relate to 20% of the knowledge. This 20% of knowledge is relatively easy for the engineers to access, retain and share in this scenario. Developers do care about security, since they care deeply about their code, particularly when they are transitioned to a ‘you build it, you own it’ way of working.
  • 31. 31Service Management in a DevOps World They also care about the customer experience and for the organization that they work for - few people are ignorant of the wide ranging impact on company performance and reputation that a breach causes. However, they are focused on new features that deliver value first then the improvement of the way in which they deliver value. Although we aim for multi-functional, ‘comb’-shaped people, nobody can know everything and to expect a developer to know of, understand and be able to remediate every possible vulnerability is unreasonable. To ask them to be aware of and follow visible coding policies and use tools that break the knowledge constraint is not unreasonable. 3.3.3 Customer Feedback and Bug Bounties In DevOps ways of working the focus is on the customer and the flow of value to them (The First Way). The Second Way teaches us to shorten and amplify our feedback loops. Highly evolved and performant organizations seek feedback from customers and the market on security too; they understand that transparency leads to trust. Having a public bug bounty programme is an effective way of collaborating with customers and the market to receive feedback and improve security posture.
  • 32. 32Service Management in a DevOps World 3.3.4 Software is Like Milk, Not Wine New vulnerabilities are found and appear constantly so software that passed its security tests today may not tomorrow. Tools are available that continuously assess the bill of materials in applications and offer teams fast remediation capabilities. We can look forward to a future where products are automatically updated with security vulnerability fixes. 3.3.5 Belt and Braces Data breaches aren’t the only way for threat actors to cause problems with the operation and safety of an organization’s products. They can do other things, like distributed denial of service attacks for example. In order to protect yourself from these sort of attacks you’ll need support from a cloud vendor or a specialist security vendor in this space. Whilst shifting security left and continuously scanning products in production for vulnerable components goes an enormous distance in protection against breaches, it’s doubtful that human penetration testing or vulnerability assessments on products in production will be in the past any time soon. Not only do regulators continue to require evidence for these activities, humans are infinitely creative and will find configurations and routes into systems that may not directly relate to a specific vulnerable artifact.
  • 33. 33Service Management in a DevOps World 3.4 Example Experiments “My hypothesis is that if we launch a bug bounty programme in January, then by the end of the first quarter, fifteen vulnerabilities of which we were unaware will have been brought to our attention and it will have cost us $15,000 from the bug bounty payout budget.” “As a developer, I believe I’ll fix 100% of security vulnerabilities on the same day if I know about them in my IDE. At the moment, I have 35 outstanding user stories in Jira flagged as issues found in a vulnerability assessment and they are between six and sixteen weeks old. I will be able to close all of them within 3 months using a tool in my IDE.” “If we introduce IAST into the CICD pipeline, we’ll be able to reduce our spend on production penetration testing by 30% per annum.” “If I automate the management of our machine identities, then our penetration tests will find no vulnerabilities as a result of, and we will suffer no data breaches traceable to, expired or misconfigured certificates.”
  • 34. Support people are typically the lowest paid and least respected in the technology hierarchy. Strange, when they are on the frontline, dealing with our customers, our reason for being, on a daily basis. The Second Way in DevOps is to amplify and shorten feedback loops - and in Value Stream Management we are particularly interested in customer feedback. So whilst the function of a support role is to fix customer problems, it’s also to sense customer sentiment and identify value delivery opportunities. 4.0 The DevOps Approach to Support
  • 35. 35Service Management in a DevOps World 4.1 Long Term Goal: You build it, you own it, and/or swarming This way of working is centered around small (because of what we’ve learned about how humans build trust and social connections), autonomous (because we don’t want them to have to wait for decisions to be made on their behalf and because we hired them because they are capable of doing this themselves, and best-placed), multifunctional (because we don’t want them having to wait for other teams to do stuff for them) teams. They change and run their product. This isn’t about giving developers ‘pagers’; this is about having end-to-end ownership of a value stream. 4.2 Current Condition: 3 tiers As with all the traditional ITSM patterns described here, there are good reasons for why they have been widely implemented, and for some time they worked. But the world keeps turning and right now, digital disruption demands we all change the way that we work to optimize flow through a value stream. Having a support or service desk makes less sense when our users experience few problems or are mostly able to resolve them themselves using online documentation. If we want to shorten a feedback loop, it’s best not to have multiple handoffs through teams - delays don’t help with our flow or with delighting our customers.
  • 36. 36Service Management in a DevOps World 4.3 Next Target Conditions Tiers create queues of work in progress which we seek to minimize as queuing creates delays. Whilst the tiered approach is intended to ‘protect’ the ‘best’ (read: most expensive) staff from trivial customer issues (is there such a thing?), when we seek to put the customer at the center of all we do and want them to have optimized service, why would we put our best people at the back of the process? So instead of streaming, we move to swarming. There are several models organizations work with, but they all follow these broad principles: • There should be no tiered support teams or hierarchy • There should be no escalations from one team to another • The issues should move directly to the person most likely to be able to resolve it • The person who takes the issue is the one who sees it through to resolution Swarming isn’t solely for Severity 1 issues or incidents (see Section 5.0 for more); it establishes teams whose priority to ensure that the issue gets to the right person as fast as possible and that it receives attention as soon as possible.
  • 37. 37Service Management in a DevOps World 4.3.1 Arrange Around Products Having small, autonomous and multi-functional teams arranged around products is the foundation to the ‘you build it, you own it’ mantra. Many agile transitions start by bringing developers and testers into the same team along with the ideation capabilities (Product Owners and business analysis roles). DevOps and value stream thinking brings Ops capabilities into the team too and many teams start with support roles. This isn’t simply about putting the developers on 24/7 call duties but about automating the front end of support as far as possible and getting the issue in front of the right person as soon as possible. DevOps balances throughput and stability so as organizations improve their posture, teams experience a reduction in the volume of issues and a shortening of resolution time. When teams are dedicated solely to support issue resolution, they often find Kanban a suitable way of managing the flow of work. Where teams are working in development sprints, they may find it helpful to record unplanned work and practice assigning a percentage of the sprint to it. Unplanned work is an effective proxy metric for quality and when measured is extremely useful when teams want to assign time to invest in paying down technical debt.
  • 38. 38Service Management in a DevOps World 4.3.2 Automating Support: ChatOps and Bots ChatOps is the use of a group messaging tool integrated with the DevOps toolchain. Chat channels can be created as needed (typically for an incident) or in permanent use (typically for a theme for a particular product). Section 5.0 following describes an incident management use case for ChatOp. A swarming support use case might allow the received of the customer issue to access a specific backlog channel and request interaction from that product team or the team may have their own channels for support issues relating to items such as payment gateway for example. The service desk can also encourage customers/consumers of their service to interact via online chat once they have been guided through available topics and support artifacts in a knowledge base. Bots can try to resolve the issue initially and as needed the issue can be automatically routed to the team and swarmed from there.
  • 39. 39Service Management in a DevOps World 4.3.3 Automating Support: Knowledge and Self-Service Many people don’t enjoy committing extended periods to writing and documentation, however, to optimize a value stream, ‘just enough’ documentation is key. Underpinning this then is the ‘little and often’ principle; ensuring that small pieces are documented frequently at source and held in a repository that is easily searchable and visible. This takes burden off the support team as people can find and resolve common issues themselves, leaving the support swarms to work with the edge cases. 4.3.4 Telemetry Everywhere and Viewability/Observability Much of the waste in the support value stream is in the fault diagnosis (after we’ve removed delays through handoffs in a tiered model) so the team needs data to help them identify unknown and unusual issues. Support teams are frequently poorly supported by tooling, other than ticketing systems, so providing the product teams with tools that radiate telemetry means everyone in the team can benefit. Application monitoring and logging tools accelerate the identification of the root cause(s) of an issue (and these should be used in pre- production too) - it’s over to the team then to fix it fast - but their CICD pipeline will help validate and deploy it at speed. And it’ll be an emergency fix or a small change so they won’t be slowed down by CAB or the release schedule.
  • 40. 40Service Management in a DevOps World This type of tools also provides customer journey insights and real- time feedback on the business value of features and changes that the whole team can use in the sprint reviews to check the outcome of their hypotheses and in their sprint planning to set up their next round of experiments. 4.3.5 CICD and Intelligent Risk Management Once a team is collaborating on a shared and visible backlog and are proficient in performing continuous delivery, they will have reduced their incidents and improved their MTTR. AI tools that help teams to assess the risk of a release help teams make decisions on when to act and who to have pre-warned. Having this data visible to central release teams provides evidence, builds trust and earns the right to autonomy.
  • 41. 41Service Management in a DevOps World 4.4 Example Experiments “We believe that if we set up a backlog swarm, we can resolve 50% of backlog items over 6 months old in 4 working weeks.” “My hypothesis is that if we have an incident swarm using ChatOps, we’ll reduce our MTTR by 70%.” “Implementing an application performance management tool by the end of the month means that by the end of next month we’ll see our fault diagnosis time drop by at least 20%.” “Making our knowledge base publicly searchable will likely reduce the volume of tickets by 25% within 6 months.”
  • 42. 5.0 The DevOps Approach to Incidents “Incidents are unplanned investments; their costs have already been incurred. Your org’s challenge is to get ROI on those events. Right now, in most companies, this ROI is left sitting in the dark because of the “template-driven” approaches and “action item” myopia.” - John Allspaw
  • 43. 43Service Management in a DevOps World We are taught, in all cultures, from an early age, that failure is to avoided at all costs, and that it’s shameful and humiliating. It’s only as we grow up and experience more in life that we realise failures are not only inevitable, but useful for learning and light the path to success. In many large enterprises there is deep-seated fear of failure (understandably so since many organizations operate infrastructure whose availability is critical to many). Incidents will happen; however, DevOps practices allow us to increase the flow of work through the value stream whilst increasing stability so more value delivered does not equal more incidents to deal with. 5.1 Long Term Goal: ChatOps in/across teams The goal of incident management is to restore service as soon as possible and, arguably more importantly, learn from it. ChatOps supports this goal in two key ways. Firstly, it allows teams to swarm through a channel in real time so that everyone has everything visible through a single pane of glass (contrast this to some people being in a room, on a conference call, various team members logged into and observing various systems) and records the progress and process. Secondly, the team has access to their DevOps toolchain and can both receive information and make commands from the chat window.
  • 44. 44Service Management in a DevOps World 5.2 Current Condition: War rooms, incident managers The cultural driver for DevOps is the creation of a working space in which people can be their best and most productive selves; removing risk of burnout and nurturing autonomy, mastery and purpose. ‘War rooms’ immediately set a sense of crisis and conflict. Whilst the sense of urgency should a Severity 1 issue or incident occur should not be diminished, a number of steps can be taken to move from a place where incidents are catastrophic and to be avoided at all costs to one where impact is minimal and they are valued as a learning opportunity. DevOps regularly seeks to decentralize activities, especially when they have been centralized in order to manage dependencies. Since autonomy reduces handoffs and queueing, assigning an incident manager from a separate team because systems are so complex, is unlikely to be the fastest way to restore service. 5.3 Next Target Conditions Ultimately the volume of incidents, or at least the time spent dealing with them, should be as close to nil as possible since they are the main disruptor of the delivery of planned work or value to the customer.
  • 45. 45Service Management in a DevOps World 5.3.1 Blameless Retrospectives and Experimentation Rather than having war-rooms, swarm an incident and once service is restored, hold a blameless retrospective over ChatOps. Agree learnings and write actions as experiments and save the chat log to the ticket in the backlog. Close the ticket only once the initial experiments are complete. 5.3.2 Reframing Failure, Safety Culture and The Andon Cord Another tool from the kings of Lean, Toyota, The Andon Cord is used in a manufacturing pipeline to raise an issue. But what’s important about is the behavior and culture it created. Workers were encouraged and empowered to highlight potential defects with the knowledge that their leaders wanted to know about them and fix them at the earliest opportunity before they continued downstream. Much can be taken from the Andon Cord: that successful leaders embrace and are grateful for learning opportunities and encourage their teams to self-discover, that fixing the problem immediately and preventing it from proceeding downstream is key to building a quality product and that people are psychologically safe when they are not afraid to point out mistakes or try new things. Safety culture can be broadly defined as a place where all in an organization share a view on how best to mitigate risk in their environment and they prioritize learning over failure and create mechanisms to protect themselves from catastrophic failure.
  • 46. 46Service Management in a DevOps World In an environment where these mechanisms are discovered, perhaps through value stream mapping, to be slowing the flow, using the mechanisms described here for change, release, security, support and incident management accelerate the delivery of value. 5.3.3 Automation: Site Reliability and Chaos Engineering Several of the automation techniques we have already discussed in this paper help either to reduce the likelihood of major incidents happening (CICD, limited blast radius) or make them more manageable (telemetry, ChatOps, automated deployment). Organizations also look to Site Reliability Engineering (SRE) to improve their stability posture. “SRE is fundamentally doing work that has historically been done by an operations team, but using engineers with software expertise and banking on the fact that these engineers are inherently both predisposed to, and have the ability to, substitute automation for human labor. In general, an SRE team is responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning.” - Ben Traynor, founder of SRE at Google
  • 47. 47Service Management in a DevOps World Some organizations have teams of SREs, others look to embed this role in product or feature teams/squads. Whichever model is used, the principle is to increase the focus on antifragility and SRE has this goal in common with Chaos Engineering. The best known example of Chaos Engineering is Netflix’s Chaos Monkey which is essentially a fire drill. With an actual fire. 5.4 Example Experiments “We believe if we use chaos engineering to practice incident recovery 4 times this year, we’ll find ways to improve that will reduce our MTTR by 50% next year.” “I hypothesize that asking two of my product team, one whose background is in development, the other in system administration to learn to extend their skillset to include site reliability engineering skills, they will cross-skill each other and buddy. As a result, our change fail rate will drop by 5% in 6 months.” “My experiment says that if we can only close our incident tickets when all experiments have been completed, we will be able to document 25 key learnings in our knowledgebase in the first quarter of the new practice.”
  • 48. 48Service Management in a DevOps World Conclusion: Flow & Value Stream Management Taking a value stream approach to service delivery puts the priority on optimization of the flow of work from the idea to the realization of the value in the hands of the customer. Necessarily it demands a rethink of the traditional approaches and organizational practices, just as becoming agile and product focused demands we rethink an inherently waterfall and project centric approach.
  • 49. 49Service Management in a DevOps World Value Stream Mapping is an extremely valuable and effective method for quantifying the cycle time, waste and cost associated with delivering an iteration of a product or service. It also provides a great deal of qualitative data through the visual collaboration and human conversation it drives. Good value stream mapping exercises are held regularly and deliver backlogs of improvements which are steadily and iteratively worked through. The disadvantage of Value Stream Mapping is that it’s a human driven and opinion driven process and whilst those opinions are mostly accurate (and a big part of the value stream mapping process is understanding the system and building empathy for counterparts in the end to end lifecycle of the product or service) they struggle to provide data as evidence. Since improvements in value stream flow are likely to necessitate significant and far-reaching decisions about things like the roles in the organization, the organizational design, how work is funded and how investments are prioritized, it’s helpful for the people making those decisions to be as well-informed as possible and able to monitor feedback, learnings and evolutionary progress over time. Following our telemetry everywhere mantra, it’s best to support the human-driven value stream mapping efforts with data-driven value stream management evidence.
  • 50. 50Service Management in a DevOps World Choices can be made when building a CICD pipeline or DevOps toolchain about the traceability of value through the value delivery lifecycle. Teams can build integrations between the tools themselves or use available connectors and APIs (but this might make it difficult to swap tools out as needs inevitably change), or integration brokers can be used to pass the feature/code from one tool to another as it progresses. Since we want feedback for learning, we want all of this to be visible, so some organizations use dashboards. But when a dashboard is effectively just screenscraping data from a number of tools and presenting it in a single pane of glass, it’s very difficult to understand the end to end cycle time of delivering a piece of value. Value Stream Management tooling allows simple integration within a toolchain, which future-proofs for ongoing evolution, and collects data that not only shows the cycle time but also where it’s slow and risky, providing insights for improvements.
  • 51. 51Service Management in a DevOps World Further Reading Learn how to use value streams to accelerate DevOps transformation at your organization and become a software juggernaut. What is Value Stream Management? Learn DevOps: Enterprise DevOps at Scale CI/CD Tools Universe: 100+ Tools Enable Value Stream Management with Plutora: Why Plutora? The Plutora Platform Request a Demo
  • 52. Helen Beal helps people practice DevOps principles in real world organizations for Ranger4. She describes herself as a DevOpsologist as her main role in her working life is to study the inputs and outputs of the thinking systems that make up DevOps and what value outcomes they deliver and we can measure. Helen is also a product owner and DevOps Ambassador for London at the DevOps Institute, a DevOps editor for InfoQ and writes for a number of online platforms. Outside of DevOps she is an ecologist and novelist. She once saw a flamingo lay an egg and has a particular fondness for llamas. About the Author