SlideShare une entreprise Scribd logo
1  sur  42
Télécharger pour lire hors ligne
Kubernetes at Datadog
the very hard way
Laurent Bernaille and Rob Boll
Background
Platform challenges
Shooting ourselves in the foot
Future plans
Table of
Contents
New instance of Datadog
● Completely independent and isolated
● Launch on a different cloud provider
● Fresh start, leave legacy behind
Background
New platform requirements
● Small infra team, high leverage, low touch
● Support for multiple cloud providers
● Self service, API driven, automation friendly
● Meet our scale now, and years from now
Background
Why Kubernetes?
Kubernetes hits all the requirements
● Extensible & API driven (but changing fast)
● Large active community (but immature)
● Scalable architecture (but unproven at scale)
● Multiple cloud providers supported (kind of)
Why Kubernetes?
+
Dogfooding!
Hope is not an option
Certificates
Runtime
Networking
Cloud integrations
Ecosystem
Scale
Platform challenges
Setup
● Vault + TLS bootstrap
● Refresh certificates every 24h
The fun part
● etcd did not reload certs for client connections using IP addresses
● Kubernetes master components don’t reload certificates
● Flaky bootstraps (vault dependency)
● vault-agent and vault-sidekick
Certificates
Runtime: containerd
The good
● Lightweight
● Great development team easily accessible
The bad
● Not as battle-tested as docker
● Several small issues
● Many tools assume docker (Datadog agent used to)
The ugly
● Remaining issue: shim sometimes hang and require kill -9
Runtime: containerd
Not containerd specific...
# We simply kill the process when there is a failure. Another systemd service will
# automatically restart the process.
function docker_monitoring {
while [ 1 ]; do
if ! timeout 60 docker ps > /dev/null; then
echo "Docker daemon failed!"
pkill docker
# Wait for a while, as we don't want to kill it again before it is really up.
sleep 120
else
sleep "${SLEEP_SECONDS}"
fi
done
}
function kubelet_monitoring {
echo "Wait for 2 minutes for kubelet to be functional"
# TODO(andyzheng0831): replace it with a more reliable method if possible.
sleep 120
local -r max_seconds=10
local output=""
while [ 1 ]; do
if ! output=$(curl -m "${max_seconds}" -f -s -S http://127.0.0.1:10255/healthz 2>&1); then
# Print the response and/or errors.
echo $output
echo "Kubelet is unhealthy!"
pkill kubelet
# Wait for a while, as we don't want to kill it again before it is really up.
sleep 60
else
sleep "${SLEEP_SECONDS}"
fi
done
}
Running on all GKE instances
<= Restart docker if docker ps hangs 60s
<= Restart kubelet if healthz takes 10+s
/home/kubernetes/bin/health-monitor.sh
Networking: Overlays
Node 0
eth0
192.168.0.100
C0 Namespace
br0
vxlanveth
eth0
Node 1
C1 Namespace
br0
vxlanveth
eth0
eth0
192.168.0.Y
10.0.0.10 10.0.0.11
IP
src: 10.0.0.11
dst: 10.0.0.10
UDP
src: X
dst: 4789
VXLAN
VNI
Original L2
src: 192.168.0.Y
dst: 192.168.0.100
Networking: Native pod routing
Node 0
eth0
10.0.1.10
C0 Namespace
ipvlan
veth
eth0
Node 1
10.0.0.10
eth1
eth0
10.0.1.11
ipvlan
veth
C1 Namespace
eth0eth1
eth0
10.0.1.11
ipvlan
veth
C1 Namespace
Objectives
● Avoid overlays
● Avoid bridges (PTP or IPVLAN)
● Route from non kubernetes hosts/between clusters
Challenges
● Beta features
● Young CNI plugins
○ Bugs (Nodeports, inconsistent metadata)
○ Much more complicated to debug
➢ Good relationship with developers of the Lyft plugins
Networking: Native pod routing
Networking: ingresses, default
Ingresses: native pod routing
Ingresses: native pod routing
More efficient
● No need to add all nodes to load-balancers
● Simpler data path
Not that simple
● Very recent
● A few bugs (instable cloud provider features, single CIDR VPC)
➢ Getting fixes upstream was quite easy
Networking: kube-proxy
Sounded too good to be true
● Faster (traffic and refresh)
● Cleaner (almost no iptables rules)
Was too good to be true
● Unable to access services from the host
● Regression in 1.11 breaking ExternalTrafficPolicy: Local
● No localhost:nodeport
● No graceful termination
➢ Very good relationship with the Huawei team
➢ Almost everything has been fixed, working great so far
IPVS instead of iptables
“Sometimes my DNS queries take more than 5s/time out”
“Yeah right”
[narrator]: “Well actually...”
➢ Race condition in the conntrack code
➢ We disabled IPV6 in the kernel
➢ Also had to disable native Go resolution
Networking: IPV6 and DNS
Cloud integrations
Many small edge-cases
● Different Load-balancer behaviors
● Magically disappearing instances (zones / instance state)
● Some “standard” controller config like “cidr-allocator-type”
(almost) No doc
controller/nodeipam/ipam/cloud_cidr_allocator.go
providers/aws/aws.go
Ecosystem
Good news
● Very dynamic
● Usually easy to get PRs merged
Bad news
● Not heavily tested / Limited to basic setup
● (very) Limited doc: localVolumeProvisioner and mount paths
● Almost never tested on large clusters
○ cluster-autoscaler doesn’t work with more than 50 ASGs
○ we abandoned kubed when its memory usage reached 10+GB
○ metrics-server: doesn’t start with a single NotReady node
○ kube-state-metrics: pod/node collector generates ~100MB payloads
○ voyager: map sort bug lead to continuous pod recreation
Scaling nodes: 100 => 1000
API server : High load but ok
Careful with File descriptors, CPU, Memory
TargetRAM helped a lot (avoid OOM)
Controller/Scheduler with high load too
Competing for CPU => split
Thinking about splitting controllers but this is hard/impossible
etcd imbalance => shuffle etcd endpoints on API servers
works pretty well, alternatives were a bit scary (gRPC proxy)
Scaling nodes: 100 => 1000
Routes on API servers (registration + daemonsets)
CoreDNS issues
● Not enough nodes for coredns pods: "nodesPerReplica":16
● Memory limits leading to OOMkills (mem usage with “pods: verified”)
NAME READY STATUS RESTARTS AGE
coredns-7b4d675999-22d4q 0/1 CrashLoopBackOff 40 6h
coredns-7b4d675999-2lt5w 0/1 ImagePullBackOff 135 17h
coredns-7b4d675999-45s94 0/1 CrashLoopBackOff 41 6h
coredns-7b4d675999-4dfbt 0/1 CrashLoopBackOff 140 17h
Scale: create 200 deployments
Very hard on all components (apiservers, controller, scheduler, etcd)
Maxed-out the scheduler QPS (too many tunables)
Main issues: apiserver sizing and traffic imbalance
Footguns
DaemonSets
StatefulSets
Cargo culting
Zombies
Containers, not VMs
Rolling updates
OOM
InitContainers
Native resources / external config
Manual changes
Daemonsets
High DDOS risk
● APIservers, vault
● Cloud provider API rate-limits
● Very slow or very dangerous
Pod scheduling
● Stuck rollout
● 1.12 scheduler alpha...
app broke due to permission change
container in restart loop
ImagePullPolicy: Always
Stateful sets
Persistent volumes
● Local: Cloud provider disk errors
● Local: New node with same name
● EBS scheduler
localvolumeprovision: discovery.go
Stateful sets
Scheduling tricks
kubectl get sts myapp
NAME DESIRED CURRENT AGE
myapp 5 4 5d
kubectl get pods -lapp=myapp
NAME READY STATUS RESTARTS AGE
myapp-0 1/1 Running 0 5d
myapp-1 1/1 Running 10 6m
myapp-2 1/1 CrashloopBackoff 10 6m
[?]
myapp-4 1/1 Running 0 5d
Cargo culting
How can I keep container running on Kubernetes?
https://stackoverflow.com/questions/31870222/how-can-i-keep-container-running-on-kubernetes
Zombies
ps auxf | grep -c defunct
16018
readinessProbe:
exec:
command: [server_readiness_probe.sh]
timeoutSeconds: 1
Takeway
● Careful with exec-based probes
● Use tini as pid 1 (or shared pid namespace)
Containers, not VMs
Complex process trees and many open files is very hard on the runtime
Native resources / external config
Load-Balancer
pod
pod
kube-proxy
kube-proxy
NP
NP
Heathchecker
Manual
Loadbalancer
service
Synchronisation
● NodePort
● External IP
Issues
● Keeping track: Load-balancer External IP re-assigned to node
● Port conflict on Internal Load-Balancer (port 443, broke apiservers…)
Rolling updates
spec.replicas and hpa
● Replicas override hpa settings
● Removing replicas is not enough
● Recommended solution: edit the “last-applied” deployment first
OOM Killer
Limits too low will trigger “cgroup oom”
Requests too low (or 0) will trigger “system oom”
Not a surprise, but do spend some effort on sizing
InitContainers
Requests/Limits
● Pod Resources = MAX(MAX(initContainers), sum(containers))
● LimitRanger also applies to InitContainers
Inconsistent behavior on container restarts
● Usually not restarted
● Except when Kubernetes no longer knows their exit status
Untracked kubectl changes
kubectl apply / edit
Partial chart apply
kubectl apply deploy
=> checksum/config_templates: 840ae1e0b4b2f7b5033edd34fd9eb88b55dc914adca5c
^ Hash for the updated config map, but not deployed
Chart deletion
kubectl delete -f <dir> Contained namespace
Manual changes
Control plane isolation
➢ “Meta” cluster for control planes
Better load balancing
➢ Avoid imbalanced API servers
➢ Avoid imbalanced etcd
Container-Optimized images
➢ In-place upgrade for data stores
Custom controllers
➢ We already have a few but will build new ones
Future Plans
Conclusion
The Kubernetes Control plane is very complex
➢ Crazy number of options/tunables
➢ Low-level components are great but some bugs remain
➢ The ecosystem is very young
➢ Reading the code is not optional (and fixing part of it)
Account for culture change and training
➢ Kubernetes resources don’t look that complicated
➢ Many, many edge-cases and pitfalls
Instrument the audit logs
➢ Very high value to debug performance issues
➢ Also helps understand history of interactions with a specific resource
➢ Expensive (very verbose: we are at 1000+ logs/s ) but game changing for us
Thank you
Also, We’re hiring

Contenu connexe

Tendances

Docker 101: Introduction to Docker
Docker 101: Introduction to DockerDocker 101: Introduction to Docker
Docker 101: Introduction to DockerDocker, Inc.
 
Kubernetes in Docker
Kubernetes in DockerKubernetes in Docker
Kubernetes in DockerDocker, Inc.
 
Docker Birthday #3 - Intro to Docker Slides
Docker Birthday #3 - Intro to Docker SlidesDocker Birthday #3 - Intro to Docker Slides
Docker Birthday #3 - Intro to Docker SlidesDocker, Inc.
 
Docker Compose | Docker Compose Tutorial | Docker Tutorial For Beginners | De...
Docker Compose | Docker Compose Tutorial | Docker Tutorial For Beginners | De...Docker Compose | Docker Compose Tutorial | Docker Tutorial For Beginners | De...
Docker Compose | Docker Compose Tutorial | Docker Tutorial For Beginners | De...Simplilearn
 
Container orchestration overview
Container orchestration overviewContainer orchestration overview
Container orchestration overviewWyn B. Van Devanter
 
Introduction to Nexus Repository Manager.pdf
Introduction to Nexus Repository Manager.pdfIntroduction to Nexus Repository Manager.pdf
Introduction to Nexus Repository Manager.pdfKnoldus Inc.
 
Monitoring MongoDB Atlas with Datadog
Monitoring MongoDB Atlas with DatadogMonitoring MongoDB Atlas with Datadog
Monitoring MongoDB Atlas with DatadogMongoDB
 
Introduction to jenkins
Introduction to jenkinsIntroduction to jenkins
Introduction to jenkinsAbe Diaz
 
9 steps to awesome with kubernetes
9 steps to awesome with kubernetes9 steps to awesome with kubernetes
9 steps to awesome with kubernetesBaraniBuuny
 
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...Edureka!
 
Kubernetes - introduction
Kubernetes - introductionKubernetes - introduction
Kubernetes - introductionSparkbit
 
Kubernates vs Openshift: What is the difference and comparison between Opensh...
Kubernates vs Openshift: What is the difference and comparison between Opensh...Kubernates vs Openshift: What is the difference and comparison between Opensh...
Kubernates vs Openshift: What is the difference and comparison between Opensh...jeetendra mandal
 
SRE and GitOps for Building Robust Kubernetes Platforms.pdf
SRE and GitOps for Building Robust Kubernetes Platforms.pdfSRE and GitOps for Building Robust Kubernetes Platforms.pdf
SRE and GitOps for Building Robust Kubernetes Platforms.pdfWeaveworks
 
An overview of the Kubernetes architecture
An overview of the Kubernetes architectureAn overview of the Kubernetes architecture
An overview of the Kubernetes architectureIgor Sfiligoi
 
CI and CD with Jenkins
CI and CD with JenkinsCI and CD with Jenkins
CI and CD with JenkinsMartin Málek
 

Tendances (20)

Docker 101: Introduction to Docker
Docker 101: Introduction to DockerDocker 101: Introduction to Docker
Docker 101: Introduction to Docker
 
Docker Basics
Docker BasicsDocker Basics
Docker Basics
 
Kubernetes in Docker
Kubernetes in DockerKubernetes in Docker
Kubernetes in Docker
 
Jenkins-CI
Jenkins-CIJenkins-CI
Jenkins-CI
 
Docker Birthday #3 - Intro to Docker Slides
Docker Birthday #3 - Intro to Docker SlidesDocker Birthday #3 - Intro to Docker Slides
Docker Birthday #3 - Intro to Docker Slides
 
Docker Compose | Docker Compose Tutorial | Docker Tutorial For Beginners | De...
Docker Compose | Docker Compose Tutorial | Docker Tutorial For Beginners | De...Docker Compose | Docker Compose Tutorial | Docker Tutorial For Beginners | De...
Docker Compose | Docker Compose Tutorial | Docker Tutorial For Beginners | De...
 
Container orchestration overview
Container orchestration overviewContainer orchestration overview
Container orchestration overview
 
Introduction to Nexus Repository Manager.pdf
Introduction to Nexus Repository Manager.pdfIntroduction to Nexus Repository Manager.pdf
Introduction to Nexus Repository Manager.pdf
 
Monitoring MongoDB Atlas with Datadog
Monitoring MongoDB Atlas with DatadogMonitoring MongoDB Atlas with Datadog
Monitoring MongoDB Atlas with Datadog
 
DevSecOps 101
DevSecOps 101DevSecOps 101
DevSecOps 101
 
Introduction to jenkins
Introduction to jenkinsIntroduction to jenkins
Introduction to jenkins
 
Introduction to CI/CD
Introduction to CI/CDIntroduction to CI/CD
Introduction to CI/CD
 
9 steps to awesome with kubernetes
9 steps to awesome with kubernetes9 steps to awesome with kubernetes
9 steps to awesome with kubernetes
 
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
 
Kubernetes - introduction
Kubernetes - introductionKubernetes - introduction
Kubernetes - introduction
 
Kubernates vs Openshift: What is the difference and comparison between Opensh...
Kubernates vs Openshift: What is the difference and comparison between Opensh...Kubernates vs Openshift: What is the difference and comparison between Opensh...
Kubernates vs Openshift: What is the difference and comparison between Opensh...
 
SRE and GitOps for Building Robust Kubernetes Platforms.pdf
SRE and GitOps for Building Robust Kubernetes Platforms.pdfSRE and GitOps for Building Robust Kubernetes Platforms.pdf
SRE and GitOps for Building Robust Kubernetes Platforms.pdf
 
DevOps 3 - Docker.pdf
DevOps 3 - Docker.pdfDevOps 3 - Docker.pdf
DevOps 3 - Docker.pdf
 
An overview of the Kubernetes architecture
An overview of the Kubernetes architectureAn overview of the Kubernetes architecture
An overview of the Kubernetes architecture
 
CI and CD with Jenkins
CI and CD with JenkinsCI and CD with Jenkins
CI and CD with Jenkins
 

Similaire à Kubernetes at Datadog the very hard way

DockerCon EU '17 - Dockerizing Aurea
DockerCon EU '17 - Dockerizing AureaDockerCon EU '17 - Dockerizing Aurea
DockerCon EU '17 - Dockerizing AureaŁukasz Piątkowski
 
6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production 6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production Hung Lin
 
Coredns nodecache - A highly-available Node-cache DNS server
Coredns nodecache - A highly-available Node-cache DNS serverCoredns nodecache - A highly-available Node-cache DNS server
Coredns nodecache - A highly-available Node-cache DNS serverYann Hamon
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownScyllaDB
 
Production Grade Kubernetes Applications
Production Grade Kubernetes ApplicationsProduction Grade Kubernetes Applications
Production Grade Kubernetes ApplicationsNarayanan Krishnamurthy
 
To Russia with Love: Deploying Kubernetes in Exotic Locations On Prem
To Russia with Love: Deploying Kubernetes in Exotic Locations On PremTo Russia with Love: Deploying Kubernetes in Exotic Locations On Prem
To Russia with Love: Deploying Kubernetes in Exotic Locations On PremCloudOps2005
 
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Community
 
Red Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShiftRed Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShiftJeremy Eder
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
 
An Ensemble Core with Docker - Solving a Real Pain in the PaaS
An Ensemble Core with Docker - Solving a Real Pain in the PaaS An Ensemble Core with Docker - Solving a Real Pain in the PaaS
An Ensemble Core with Docker - Solving a Real Pain in the PaaS Erik Osterman
 
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQDocker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQJérôme Petazzoni
 
Docker Introduction + what is new in 0.9
Docker Introduction + what is new in 0.9 Docker Introduction + what is new in 0.9
Docker Introduction + what is new in 0.9 Jérôme Petazzoni
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightRed_Hat_Storage
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightColleen Corrice
 
Kubernetes & Google Container Engine @ mabl
Kubernetes & Google Container Engine @ mablKubernetes & Google Container Engine @ mabl
Kubernetes & Google Container Engine @ mablJoseph Lust
 
Scylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla OperatorScylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla OperatorScyllaDB
 
Enabling ceph-mgr to control Ceph services via Kubernetes
Enabling ceph-mgr to control Ceph services via KubernetesEnabling ceph-mgr to control Ceph services via Kubernetes
Enabling ceph-mgr to control Ceph services via Kubernetesmountpoint.io
 
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Toronto-Oracle-Users-Group
 
High-availability with Galera Cluster for MySQL
High-availability with Galera Cluster for MySQLHigh-availability with Galera Cluster for MySQL
High-availability with Galera Cluster for MySQLFromDual GmbH
 
Kubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes Failure Stories - KubeCon Europe BarcelonaKubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes Failure Stories - KubeCon Europe BarcelonaHenning Jacobs
 

Similaire à Kubernetes at Datadog the very hard way (20)

DockerCon EU '17 - Dockerizing Aurea
DockerCon EU '17 - Dockerizing AureaDockerCon EU '17 - Dockerizing Aurea
DockerCon EU '17 - Dockerizing Aurea
 
6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production 6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production
 
Coredns nodecache - A highly-available Node-cache DNS server
Coredns nodecache - A highly-available Node-cache DNS serverCoredns nodecache - A highly-available Node-cache DNS server
Coredns nodecache - A highly-available Node-cache DNS server
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance Showdown
 
Production Grade Kubernetes Applications
Production Grade Kubernetes ApplicationsProduction Grade Kubernetes Applications
Production Grade Kubernetes Applications
 
To Russia with Love: Deploying Kubernetes in Exotic Locations On Prem
To Russia with Love: Deploying Kubernetes in Exotic Locations On PremTo Russia with Love: Deploying Kubernetes in Exotic Locations On Prem
To Russia with Love: Deploying Kubernetes in Exotic Locations On Prem
 
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
 
Red Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShiftRed Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShift
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
An Ensemble Core with Docker - Solving a Real Pain in the PaaS
An Ensemble Core with Docker - Solving a Real Pain in the PaaS An Ensemble Core with Docker - Solving a Real Pain in the PaaS
An Ensemble Core with Docker - Solving a Real Pain in the PaaS
 
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQDocker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
 
Docker Introduction + what is new in 0.9
Docker Introduction + what is new in 0.9 Docker Introduction + what is new in 0.9
Docker Introduction + what is new in 0.9
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer Spotlight
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer Spotlight
 
Kubernetes & Google Container Engine @ mabl
Kubernetes & Google Container Engine @ mablKubernetes & Google Container Engine @ mabl
Kubernetes & Google Container Engine @ mabl
 
Scylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla OperatorScylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla Operator
 
Enabling ceph-mgr to control Ceph services via Kubernetes
Enabling ceph-mgr to control Ceph services via KubernetesEnabling ceph-mgr to control Ceph services via Kubernetes
Enabling ceph-mgr to control Ceph services via Kubernetes
 
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
 
High-availability with Galera Cluster for MySQL
High-availability with Galera Cluster for MySQLHigh-availability with Galera Cluster for MySQL
High-availability with Galera Cluster for MySQL
 
Kubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes Failure Stories - KubeCon Europe BarcelonaKubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes Failure Stories - KubeCon Europe Barcelona
 

Plus de Laurent Bernaille

How the OOM Killer Deleted My Namespace
How the OOM Killer Deleted My NamespaceHow the OOM Killer Deleted My Namespace
How the OOM Killer Deleted My NamespaceLaurent Bernaille
 
Kubernetes DNS Horror Stories
Kubernetes DNS Horror StoriesKubernetes DNS Horror Stories
Kubernetes DNS Horror StoriesLaurent Bernaille
 
Evolution of kube-proxy (Brussels, Fosdem 2020)
Evolution of kube-proxy (Brussels, Fosdem 2020)Evolution of kube-proxy (Brussels, Fosdem 2020)
Evolution of kube-proxy (Brussels, Fosdem 2020)Laurent Bernaille
 
Making the most out of kubernetes audit logs
Making the most out of kubernetes audit logsMaking the most out of kubernetes audit logs
Making the most out of kubernetes audit logsLaurent Bernaille
 
Kubernetes the Very Hard Way. Velocity Berlin 2019
Kubernetes the Very Hard Way. Velocity Berlin 2019Kubernetes the Very Hard Way. Velocity Berlin 2019
Kubernetes the Very Hard Way. Velocity Berlin 2019Laurent Bernaille
 
Kubernetes the Very Hard Way. Lisa Portland 2019
Kubernetes the Very Hard Way. Lisa Portland 2019Kubernetes the Very Hard Way. Lisa Portland 2019
Kubernetes the Very Hard Way. Lisa Portland 2019Laurent Bernaille
 
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...Laurent Bernaille
 
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!Laurent Bernaille
 
Optimizing kubernetes networking
Optimizing kubernetes networkingOptimizing kubernetes networking
Optimizing kubernetes networkingLaurent Bernaille
 
Deep Dive in Docker Overlay Networks
Deep Dive in Docker Overlay NetworksDeep Dive in Docker Overlay Networks
Deep Dive in Docker Overlay NetworksLaurent Bernaille
 
Deeper dive in Docker Overlay Networks
Deeper dive in Docker Overlay NetworksDeeper dive in Docker Overlay Networks
Deeper dive in Docker Overlay NetworksLaurent Bernaille
 
Operational challenges behind Serverless architectures
Operational challenges behind Serverless architecturesOperational challenges behind Serverless architectures
Operational challenges behind Serverless architecturesLaurent Bernaille
 
Deep dive in Docker Overlay Networks
Deep dive in Docker Overlay NetworksDeep dive in Docker Overlay Networks
Deep dive in Docker Overlay NetworksLaurent Bernaille
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Laurent Bernaille
 
Early recognition of encryted applications
Early recognition of encryted applicationsEarly recognition of encryted applications
Early recognition of encryted applicationsLaurent Bernaille
 
Early application identification. CONEXT 2006
Early application identification. CONEXT 2006Early application identification. CONEXT 2006
Early application identification. CONEXT 2006Laurent Bernaille
 

Plus de Laurent Bernaille (17)

How the OOM Killer Deleted My Namespace
How the OOM Killer Deleted My NamespaceHow the OOM Killer Deleted My Namespace
How the OOM Killer Deleted My Namespace
 
Kubernetes DNS Horror Stories
Kubernetes DNS Horror StoriesKubernetes DNS Horror Stories
Kubernetes DNS Horror Stories
 
Evolution of kube-proxy (Brussels, Fosdem 2020)
Evolution of kube-proxy (Brussels, Fosdem 2020)Evolution of kube-proxy (Brussels, Fosdem 2020)
Evolution of kube-proxy (Brussels, Fosdem 2020)
 
Making the most out of kubernetes audit logs
Making the most out of kubernetes audit logsMaking the most out of kubernetes audit logs
Making the most out of kubernetes audit logs
 
Kubernetes the Very Hard Way. Velocity Berlin 2019
Kubernetes the Very Hard Way. Velocity Berlin 2019Kubernetes the Very Hard Way. Velocity Berlin 2019
Kubernetes the Very Hard Way. Velocity Berlin 2019
 
Kubernetes the Very Hard Way. Lisa Portland 2019
Kubernetes the Very Hard Way. Lisa Portland 2019Kubernetes the Very Hard Way. Lisa Portland 2019
Kubernetes the Very Hard Way. Lisa Portland 2019
 
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...
 
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
 
Optimizing kubernetes networking
Optimizing kubernetes networkingOptimizing kubernetes networking
Optimizing kubernetes networking
 
Deep Dive in Docker Overlay Networks
Deep Dive in Docker Overlay NetworksDeep Dive in Docker Overlay Networks
Deep Dive in Docker Overlay Networks
 
Deeper dive in Docker Overlay Networks
Deeper dive in Docker Overlay NetworksDeeper dive in Docker Overlay Networks
Deeper dive in Docker Overlay Networks
 
Discovering OpenBSD on AWS
Discovering OpenBSD on AWSDiscovering OpenBSD on AWS
Discovering OpenBSD on AWS
 
Operational challenges behind Serverless architectures
Operational challenges behind Serverless architecturesOperational challenges behind Serverless architectures
Operational challenges behind Serverless architectures
 
Deep dive in Docker Overlay Networks
Deep dive in Docker Overlay NetworksDeep dive in Docker Overlay Networks
Deep dive in Docker Overlay Networks
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016
 
Early recognition of encryted applications
Early recognition of encryted applicationsEarly recognition of encryted applications
Early recognition of encryted applications
 
Early application identification. CONEXT 2006
Early application identification. CONEXT 2006Early application identification. CONEXT 2006
Early application identification. CONEXT 2006
 

Dernier

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Dernier (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 

Kubernetes at Datadog the very hard way

  • 1. Kubernetes at Datadog the very hard way Laurent Bernaille and Rob Boll
  • 2. Background Platform challenges Shooting ourselves in the foot Future plans Table of Contents
  • 3. New instance of Datadog ● Completely independent and isolated ● Launch on a different cloud provider ● Fresh start, leave legacy behind Background
  • 4. New platform requirements ● Small infra team, high leverage, low touch ● Support for multiple cloud providers ● Self service, API driven, automation friendly ● Meet our scale now, and years from now Background
  • 5. Why Kubernetes? Kubernetes hits all the requirements ● Extensible & API driven (but changing fast) ● Large active community (but immature) ● Scalable architecture (but unproven at scale) ● Multiple cloud providers supported (kind of)
  • 7.
  • 8. Hope is not an option
  • 10. Setup ● Vault + TLS bootstrap ● Refresh certificates every 24h The fun part ● etcd did not reload certs for client connections using IP addresses ● Kubernetes master components don’t reload certificates ● Flaky bootstraps (vault dependency) ● vault-agent and vault-sidekick Certificates
  • 12. The good ● Lightweight ● Great development team easily accessible The bad ● Not as battle-tested as docker ● Several small issues ● Many tools assume docker (Datadog agent used to) The ugly ● Remaining issue: shim sometimes hang and require kill -9 Runtime: containerd
  • 13. Not containerd specific... # We simply kill the process when there is a failure. Another systemd service will # automatically restart the process. function docker_monitoring { while [ 1 ]; do if ! timeout 60 docker ps > /dev/null; then echo "Docker daemon failed!" pkill docker # Wait for a while, as we don't want to kill it again before it is really up. sleep 120 else sleep "${SLEEP_SECONDS}" fi done } function kubelet_monitoring { echo "Wait for 2 minutes for kubelet to be functional" # TODO(andyzheng0831): replace it with a more reliable method if possible. sleep 120 local -r max_seconds=10 local output="" while [ 1 ]; do if ! output=$(curl -m "${max_seconds}" -f -s -S http://127.0.0.1:10255/healthz 2>&1); then # Print the response and/or errors. echo $output echo "Kubelet is unhealthy!" pkill kubelet # Wait for a while, as we don't want to kill it again before it is really up. sleep 60 else sleep "${SLEEP_SECONDS}" fi done } Running on all GKE instances <= Restart docker if docker ps hangs 60s <= Restart kubelet if healthz takes 10+s /home/kubernetes/bin/health-monitor.sh
  • 14. Networking: Overlays Node 0 eth0 192.168.0.100 C0 Namespace br0 vxlanveth eth0 Node 1 C1 Namespace br0 vxlanveth eth0 eth0 192.168.0.Y 10.0.0.10 10.0.0.11 IP src: 10.0.0.11 dst: 10.0.0.10 UDP src: X dst: 4789 VXLAN VNI Original L2 src: 192.168.0.Y dst: 192.168.0.100
  • 15. Networking: Native pod routing Node 0 eth0 10.0.1.10 C0 Namespace ipvlan veth eth0 Node 1 10.0.0.10 eth1 eth0 10.0.1.11 ipvlan veth C1 Namespace eth0eth1 eth0 10.0.1.11 ipvlan veth C1 Namespace
  • 16. Objectives ● Avoid overlays ● Avoid bridges (PTP or IPVLAN) ● Route from non kubernetes hosts/between clusters Challenges ● Beta features ● Young CNI plugins ○ Bugs (Nodeports, inconsistent metadata) ○ Much more complicated to debug ➢ Good relationship with developers of the Lyft plugins Networking: Native pod routing
  • 19. Ingresses: native pod routing More efficient ● No need to add all nodes to load-balancers ● Simpler data path Not that simple ● Very recent ● A few bugs (instable cloud provider features, single CIDR VPC) ➢ Getting fixes upstream was quite easy
  • 21. Sounded too good to be true ● Faster (traffic and refresh) ● Cleaner (almost no iptables rules) Was too good to be true ● Unable to access services from the host ● Regression in 1.11 breaking ExternalTrafficPolicy: Local ● No localhost:nodeport ● No graceful termination ➢ Very good relationship with the Huawei team ➢ Almost everything has been fixed, working great so far IPVS instead of iptables
  • 22. “Sometimes my DNS queries take more than 5s/time out” “Yeah right” [narrator]: “Well actually...” ➢ Race condition in the conntrack code ➢ We disabled IPV6 in the kernel ➢ Also had to disable native Go resolution Networking: IPV6 and DNS
  • 23. Cloud integrations Many small edge-cases ● Different Load-balancer behaviors ● Magically disappearing instances (zones / instance state) ● Some “standard” controller config like “cidr-allocator-type” (almost) No doc controller/nodeipam/ipam/cloud_cidr_allocator.go providers/aws/aws.go
  • 24. Ecosystem Good news ● Very dynamic ● Usually easy to get PRs merged Bad news ● Not heavily tested / Limited to basic setup ● (very) Limited doc: localVolumeProvisioner and mount paths ● Almost never tested on large clusters ○ cluster-autoscaler doesn’t work with more than 50 ASGs ○ we abandoned kubed when its memory usage reached 10+GB ○ metrics-server: doesn’t start with a single NotReady node ○ kube-state-metrics: pod/node collector generates ~100MB payloads ○ voyager: map sort bug lead to continuous pod recreation
  • 25. Scaling nodes: 100 => 1000 API server : High load but ok Careful with File descriptors, CPU, Memory TargetRAM helped a lot (avoid OOM) Controller/Scheduler with high load too Competing for CPU => split Thinking about splitting controllers but this is hard/impossible etcd imbalance => shuffle etcd endpoints on API servers works pretty well, alternatives were a bit scary (gRPC proxy)
  • 26. Scaling nodes: 100 => 1000 Routes on API servers (registration + daemonsets) CoreDNS issues ● Not enough nodes for coredns pods: "nodesPerReplica":16 ● Memory limits leading to OOMkills (mem usage with “pods: verified”) NAME READY STATUS RESTARTS AGE coredns-7b4d675999-22d4q 0/1 CrashLoopBackOff 40 6h coredns-7b4d675999-2lt5w 0/1 ImagePullBackOff 135 17h coredns-7b4d675999-45s94 0/1 CrashLoopBackOff 41 6h coredns-7b4d675999-4dfbt 0/1 CrashLoopBackOff 140 17h
  • 27. Scale: create 200 deployments Very hard on all components (apiservers, controller, scheduler, etcd) Maxed-out the scheduler QPS (too many tunables) Main issues: apiserver sizing and traffic imbalance
  • 28. Footguns DaemonSets StatefulSets Cargo culting Zombies Containers, not VMs Rolling updates OOM InitContainers Native resources / external config Manual changes
  • 29. Daemonsets High DDOS risk ● APIservers, vault ● Cloud provider API rate-limits ● Very slow or very dangerous Pod scheduling ● Stuck rollout ● 1.12 scheduler alpha... app broke due to permission change container in restart loop ImagePullPolicy: Always
  • 30. Stateful sets Persistent volumes ● Local: Cloud provider disk errors ● Local: New node with same name ● EBS scheduler localvolumeprovision: discovery.go
  • 31. Stateful sets Scheduling tricks kubectl get sts myapp NAME DESIRED CURRENT AGE myapp 5 4 5d kubectl get pods -lapp=myapp NAME READY STATUS RESTARTS AGE myapp-0 1/1 Running 0 5d myapp-1 1/1 Running 10 6m myapp-2 1/1 CrashloopBackoff 10 6m [?] myapp-4 1/1 Running 0 5d
  • 32. Cargo culting How can I keep container running on Kubernetes? https://stackoverflow.com/questions/31870222/how-can-i-keep-container-running-on-kubernetes
  • 33. Zombies ps auxf | grep -c defunct 16018 readinessProbe: exec: command: [server_readiness_probe.sh] timeoutSeconds: 1 Takeway ● Careful with exec-based probes ● Use tini as pid 1 (or shared pid namespace)
  • 34. Containers, not VMs Complex process trees and many open files is very hard on the runtime
  • 35. Native resources / external config Load-Balancer pod pod kube-proxy kube-proxy NP NP Heathchecker Manual Loadbalancer service Synchronisation ● NodePort ● External IP Issues ● Keeping track: Load-balancer External IP re-assigned to node ● Port conflict on Internal Load-Balancer (port 443, broke apiservers…)
  • 36. Rolling updates spec.replicas and hpa ● Replicas override hpa settings ● Removing replicas is not enough ● Recommended solution: edit the “last-applied” deployment first
  • 37. OOM Killer Limits too low will trigger “cgroup oom” Requests too low (or 0) will trigger “system oom” Not a surprise, but do spend some effort on sizing
  • 38. InitContainers Requests/Limits ● Pod Resources = MAX(MAX(initContainers), sum(containers)) ● LimitRanger also applies to InitContainers Inconsistent behavior on container restarts ● Usually not restarted ● Except when Kubernetes no longer knows their exit status
  • 39. Untracked kubectl changes kubectl apply / edit Partial chart apply kubectl apply deploy => checksum/config_templates: 840ae1e0b4b2f7b5033edd34fd9eb88b55dc914adca5c ^ Hash for the updated config map, but not deployed Chart deletion kubectl delete -f <dir> Contained namespace Manual changes
  • 40. Control plane isolation ➢ “Meta” cluster for control planes Better load balancing ➢ Avoid imbalanced API servers ➢ Avoid imbalanced etcd Container-Optimized images ➢ In-place upgrade for data stores Custom controllers ➢ We already have a few but will build new ones Future Plans
  • 41. Conclusion The Kubernetes Control plane is very complex ➢ Crazy number of options/tunables ➢ Low-level components are great but some bugs remain ➢ The ecosystem is very young ➢ Reading the code is not optional (and fixing part of it) Account for culture change and training ➢ Kubernetes resources don’t look that complicated ➢ Many, many edge-cases and pitfalls Instrument the audit logs ➢ Very high value to debug performance issues ➢ Also helps understand history of interactions with a specific resource ➢ Expensive (very verbose: we are at 1000+ logs/s ) but game changing for us