Many clusters, many problems? Having many clusters has benefits: reduced blast radius, less vertical scaling of cluster components, and a natural trust boundary. In this session, Zalando shows its approach for running 140+ clusters on AWS, how it does continuous delivery for its cluster infrastructure, and how it created open-source tooling to manage cost efficiency and improve developer experience. The company openly shares its failures and the learnings collected during three years of Kubernetes in production.
AWS re:Invent session OPN211 on 2019-12-05
4. 4
~ 5.4billion EUR
revenue 2018
> 300
million
visits
per
month
~ 14,000
employees in
Europe
> 80%
of visits via
mobile devices
> 28
million
active customers
> 400,000
product choices
> 2,000
brands
17
countries
as of June 2019
ZALANDO AT A GLANCE
5. 5
2015: JOURNEY INTO THE CLOUD
AWS
STUPS
DOCKER
DEPLOY
SSH
ACCESS
AUDIT
REPORTS
FULL AWS
ACCESS
Teams have
admin access
& full
responsibility
6. 6
2015: ISOLATED AWS ACCOUNTS
Internet
*.abc.example.org *.xyz.example.org
Team ABC Team XYZ
EC2EC2
ELBELB
EC2
7. 7
INFRASTRUCTURE @ ZALANDO
STUPS
(toolset around AWS)
Kubernetes
AWS accounts per team.
All instances must run the same AMI.
PowerUser access to Production.
Clusters per product (multiple teams).
Instances are not managed by teams.
Hands off approach.
You build it, you run EVERYTHING. A lot of stuff out of the box.
11. 11
YOU BUILD IT, YOU RUN IT
The traditional model is that you take your software to the
wall that separates development and operations, and
throw it over and then forget about it. Not at Amazon.
You build it, you run it. This brings developers into
contact with the day-to-day operation of their software. It
also brings them into day-to-day contact with the
customer.
- A Conversation with Werner Vogels, ACM Queue, 2006
12. 12
ON-CALL: YOU OWN IT, YOU RUN IT
When things are broken,
we want people with the best
context trying to fix things.
- Blake Scrivener, Netflix SRE Manager
13. 13
GOALS
• No manual operations
• No pet clusters
• Reliability
• Autoscaling
• Latest Kubernetes
• Cost efficient
14. 14
ARCHITECTURE
Pairs of clusters, each cluster in isolated account
AWS Acc. foobar-test
Cluster
foobar-test
AWS Acc. foobar
Cluster
foobar
23. 23
E2E TESTS ON EVERY PR
github.com/zalando-incubator/kubernetes-on-aws
24. 24
E2E TESTS
Conformance Tests
Upstream Kubernetes e2e conformance tests
✓
159
Zalando Tests (custom)
Custom tests for ingress, external-dns, PSP
etc.
17
StatefulSet Tests
Rolling update of stateful sets including volume
mounting
✓
2
✓
25. 25
RUNNING E2E TESTS
Control plane
nodenode
Control plane
nodenode
branch: alpha (base) branch: dev (head)
Create Cluster Update Cluster Run e2e tests Delete Cluster
Testing dev to alpha upgrade
Control plane Control plane
27. 27
NAÏVE NODE UPGRADE STRATEGY
Auto Scaling Group
Min: 3
Max: 9
Current: 5
Desired: 5
28. 28
NAÏVE NODE UPGRADE STRATEGY
Auto Scaling Group
Min: 6
Max: 6
Current: 5
Desired: 6
Set ASG size to current + 1
29. 29
NAÏVE NODE UPGRADE STRATEGY
Auto Scaling Group
Min: 6
Max: 6
Current: 6
Desired: 6
drain
Get a new instance
drain
30. 30
PROBLEMS WITH THE NAÏVE STRATEGY
What about stateful applications like Postgres?
Node
master Node
Node
replica
replica
drain
Postgres cluster unavailable :(
50. 50
CLUSTER CONFIGURATION
Clusters look mostly the same, except:
• secrets, e.g. credentials for external logging provider
• node pools and their instance sizes
Cluster-specific config items are stored in Cluster Registry
55. 55
MONITORING SYSTEM - ZMON
• Dynamic entity registration
(clusters, pods, ..)
• Generic checks on entity attributes,
e.g. for all production clusters
"Less than 60% of worker nodes are ready"
• OpsGenie alerts
62. 62
KUBERNETES JANITOR
● TTL and expiry date annotations, e.g.
○ set time-to-live for your test deployment
● Custom rules, e.g.
○ delete everything without "app" label after 7 days
github.com/hjacobs/kube-janitor
65. 65
HOW MUCH DO WE DIVERGE?
• API access via Zalando OAuth
• CPU throttling disabled via Kubelet flag
• No memory overcommit (requests == limits)
• Ingress: External DNS, Skipper, AWS ALB
• Custom CRDs: Zalando OAuth, Postgres, StackSet
• Kubernetes Downscaler
• DNS setup (CoreDNS DaemonSet, ndots: 2)
67. 67
DNS: COREDNS AS DAEMONSET
github.com/zalando-incubator/kubernetes-on-aws/blob/dev/docs/postmortems/jan-2019-dns-outage.md
68. 68
NON-PROD VS PROD
• Non-production similar to plain hosted Kubernetes
• Production:
• No write access (only via CI/CD)
• Compliance webhooks
• Require production-ready Docker images
69. 69
COMPLIANCE FOR PRODUCTION
• Pods require application label pointing to application registry
⇒ establishes link to owning team
• Docker images must be built from master via CDP
NOTE: teams can freely choose their namespace(s)