SlideShare a Scribd company logo
1 of 106
Download to read offline
10 Ways to Shoot Yourself in the Foot
with Kubernetes, #9 Will Surprise You!
Laurent Bernaille, @lbernail
Staff Engineer, Datadog
#ContainerDayFR
#ContainerDayFR
Who are we?
Datadog is a monitoring service; metrics, traces, logs,
dashboards, alerts, etc.
Cloud Native service provider
www.datadoghq.com
We run the cloud infrastructure that powers the
product.
Cloud Native end user
We Run Big Kubernetes
@lbernail #ContainerDayFR
Sometimes it Breaks
@lbernail #ContainerDayFR
Why am I here?
Lessons learned from incidents with Kubernetes in production!
And especially the self inflicted ones
#ContainerDayFR
#1 It’s never always DNS
@lbernail #ContainerDayFR
DNS is broken
Coredns rolling update and IPVS
Virtual Server
- Pod A
- Pod B
- Pod C
IPVS conntrack
- :X => A:53
- :Y => B:53
- :Z => C:53
Pod A
Pod B
Pod C
App
@lbernail #ContainerDayFR
Under high load ports are reused faster than they expire
Virtual Server
- Pod B
- Pod C
- Pod D
IPVS conntrack
- :X => A:53 TTL:300s
- :Y => B:53
- :Z => C:53
Pod B
Pod C
App
Pod D
black-holed
DNS is broken
@lbernail #ContainerDayFR
Under high load ports are reused faster than they expire
Virtual Server
- Pod B
- Pod C
- Pod D
IPVS conntrack
- :X => A:53 TTL:300s
- :Y => B:53
- :Z => C:53
Pod B
Pod C
App
Pod D
black-holed
net/ipv4/vs/expire_nodest_conn
DNS is broken
@lbernail #ContainerDayFR
Graceful termination?
Virtual Server
- Pod A Weight=0
- Pod B Weight=1
- Pod C Weight=1
- Pod D Weight=1
IPVS conntrack
- :X => A:53 TTL:300s
- :Y => B:53
- :Z => C:53
Pod B
Pod C
App
Pod D
black-holed
DNS is broken
@lbernail #ContainerDayFR
DNS is broken
● Before graceful termination
○ Backend removed
○ Traffic reusing port is blackholed
● After graceful termination
○ Under high-load port are reused fast
○ The backend pod is never removed
○ When the pod terminates, black-holed
○ Fix: https://github.com/kubernetes/kubernetes/pull/77802
#ContainerDayFR
#2 Jobs are not starting
@lbernail #ContainerDayFR
52 Completed
1 ErrImagePull
155 Evicted
7 ImagePullBackOff
7 Init:ErrImagePull
4 Init:Error
49 Init:ImagePullBackOff
16 Pending
1267 Running
2 Terminating
Impossible to pull images
@lbernail #ContainerDayFR
52 Completed
1 ErrImagePull
155 Evicted
7 ImagePullBackOff
7 Init:ErrImagePull
4 Init:Error
49 Init:ImagePullBackOff
16 Pending
1267 Running
2 Terminating
Impossible to pull images
@lbernail #ContainerDayFR
Number of image pulls sharply increases, sustained for several hours.
Image Stampede
@lbernail #ContainerDayFR
Number of image pulls sharply increases, sustained for several hours.
Then a sudden change.
Image Stampede
@lbernail #ContainerDayFR
Number of image pulls sharply increases, sustained for several hours.
Then a sudden change.
Image Stampede
@lbernail #ContainerDayFR
● Permission change on bucket
● DaemonSet in CrashLoopBackoff
Image Stampede
@lbernail #ContainerDayFR
● Permission change on bucket
● DaemonSet in CrashLoopBackoff
● imagePullPolicy: Always
Image Stampede
@lbernail #ContainerDayFR
● Permission change on bucket
● DaemonSet in CrashLoopBackoff
● imagePullPolicy: Always
● ~1000 pods pulling through 3 NAT instances
● Daily quota reached for NAT IPs
● All NAT instances are impacted
Image Stampede
@lbernail #ContainerDayFR
Image Stampede, follow-up #1
● Replaced the impacted NAT instances
@lbernail #ContainerDayFR
● Replaced the impacted NAT instances
● Apps couldn’t connect to CloudSQL anymore...
Image Stampede, follow-up #1
@lbernail #ContainerDayFR
● Admission webhook denying “latest” tag
Image Stampede, follow-up #2
@lbernail #ContainerDayFR
● Admission webhook denying “latest” tag
● Applied to
○ Deployments
○ StatefulSets
○ DaemonSets
○ Jobs
○ Pods
Image Stampede, follow-up #2
@lbernail #ContainerDayFR
● Admission webhook denying “latest” tag
● Applied to
○ Deployments
○ StatefulSets
○ DaemonSets
○ Jobs
○ Pods
● Controllers can’t create pods for existing workloads
Image Stampede, follow-up #2
#ContainerDayFR
#3 “I can’t kubectl”
@lbernail #ContainerDayFR
Load on apiservers
Symptoms: apiservers unresponsive
> load => 50 (on 8 core nodes)
Apiserver DDOS
@lbernail #ContainerDayFR
Load on apiservers
Usable memory
Symptoms: apiservers unresponsive
> load => 50 (on 8 core nodes)
> Free memory => 0
> apiservers OOM killed
Apiserver DDOS
@lbernail #ContainerDayFR
Load on apiservers
Usable memory
Outgoing traffic
Symptoms: apiservers unresponsive
> load => 50 (on 8 core nodes)
> Free memory => 0
> apiservers OOM killed
> network traffic much higher
Apiserver DDOS
@lbernail #ContainerDayFR
kube2iam pod restarts
Seemed related to kube2iam update
> Lots of restarts
Apiserver DDOS
@lbernail #ContainerDayFR
Seemed related to kube2iam update
> Lots of restarts
> cgroup OOM-killer
> Memory usage much higher
> Increase limit
kube2iam pod restarts
kube2iam memory usage / limit
Apiserver DDOS
@lbernail #ContainerDayFR
Seemed related to kube2iam update
> Lots of restarts
> cgroup OOM-killer
> Memory usage much higher
> Increase limit
Cause?
> Patch in kube2iam
> Small typo
> kube2iam pods syncing all pods
> Broke apiservers + kube2iam
kube2iam pod restarts
kube2iam memory usage / limit
Apiserver DDOS
#ContainerDayFR
#4 DNS updates take 30mn
@lbernail #ContainerDayFR
DNS updates take 30+ minutes to propagate
● We use external-dns for cross-cluster communications
● Usually propagation is ~5mn
● But on a cluster...
Existing records (cached)
"Retrieved 1671 records from registry in 525ns"
"Generated 1670 records from sources in 36mn12"
"Calculated plan in 37.8ms"
From external-dns logs
Expected records (from k8s)
@lbernail #ContainerDayFR
Why does it take so long to generate records?
● To generate records external-dns does the following
@lbernail #ContainerDayFR
Why does it take so long to generate records?
● To generate records external-dns does the following
a. Get all services with specific annotation
@lbernail #ContainerDayFR
Why does it take so long to generate records?
● To generate records external-dns does the following
a. Get all services with specific annotation
b. For each headless service
■ List all pods matching the service selector
■ Add the pod IP for running pods to the record
@lbernail #ContainerDayFR
Why does it take so long to generate records?
● To generate records external-dns does the following
a. Get all services with specific annotation
b. For each headless service
■ List all pods matching the service selector
■ Add the pod IP for running pods to the record
c. Add the record as DNS Round-robin
@lbernail #ContainerDayFR
Why does it take so long to generate records?
● To generate records external-dns does the following
a. Get all services with specific annotation
b. For each headless service
■ List all pods matching the service selector
■ Add the pod IP for running pods to the record
c. Add the record as DNS Round-robin
● In this cluster, we use a lot of headless services (1500+)
@lbernail #ContainerDayFR
What had changed?
“List pods” calls duration (from apiserver audit logs)
Listing pods 75ms => 1.25s
> external-dns does this 1600+ times in each sync loop
> 1600*1.2s => ~33mn
@lbernail #ContainerDayFR
Why was list so slow?
Number of pods in the main namespace
Pods: 900 => 4400
> List calls much slower
@lbernail #ContainerDayFR
Why was list so slow?
Number of pods in the main namespace
Pods: 900 => 4400
> List calls much slower
Number of “Evicted” pods Evicted pods: 3500
> Remain in apiserver
@lbernail #ContainerDayFR
Why did the apiserver slow down so much?
Apiserver memory usage
Apiservers configured to use 24GB
> No room for additional caching
@lbernail #ContainerDayFR
Why did the apiserver slow down so much?
Apiserver memory usage
Apiservers configured to use 24GB
> No room for additional caching
ETCD traffic sent
A lot more reads in ETCD
> 32MB/s => 70 MB/s
> Much slower than from cache
@lbernail #ContainerDayFR
Where are the evictions coming from?
A lot of evictions
Evictions over time (from Kubernetes events)
@lbernail #ContainerDayFR
Where are the evictions coming from?
Evictions over time (from Kubernetes events)
A lot of evictions
Disk usage Triggered because usage >85%
@lbernail #ContainerDayFR
Kubelet logs
Disk is full
Delete unused images (good)
Evict pods (bad)
Eviction behavior
@lbernail #ContainerDayFR
Pod eviction under disk pressure
● Rank pods according to disk usage
> local volumes + logs & writable layer
● Evict pods with highest ranking
● What happened?
a. Application uses a cache in an empty-dir
b. Change in traffic pattern > cache usage became more important
c. Application was evicted frequently during a few hours
@lbernail #ContainerDayFR
Fixes
1) Manually remove evicted pods and monitor (immediate win)
2) Tune the controller-manager garbage collection
“terminated-pod-gc-threshold” (default: 12500)
3) Update external-dns to use informers (cache) instead of list calls
> Fixed upstream
“Use k8s informer cache instead of making active API GET requests”
https://github.com/kubernetes-incubator/external-dns/pull/917
#ContainerDayFR
#5 “Logs intake just 20x”
@lbernail #ContainerDayFR
Logs intake volume 20x
For this account, log volume did x20 in 30mn
Logs per source
@lbernail #ContainerDayFR
Kernel Audit DDOS
For this account, log volume did x20 in 30mn
> Enabled audit with a daemonset
> Nodes running Kubernetes generate a huge amount of audit logs (exec, clones, iptables…)
Logs per source
#ContainerDayFR
#6 “Where did my pods go”
@lbernail #ContainerDayFR
HorizontalPodAutoscaler
apiVersion: apps/v1
kind: Deployment
spec:
replicas: 60
@lbernail #ContainerDayFR
HorizontalPodAutoscaler
apiVersion: apps/v1
kind: Deployment
spec:
replicas: 60
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 1
maxReplicas: 100
targetCPUUtilizationPercentage: 50
@lbernail #ContainerDayFR
HorizontalPodAutoscaler
apiVersion: apps/v1
kind: Deployment
spec:
replicas: 60
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 1
maxReplicas: 100
targetCPUUtilizationPercentage: 50
Controller manages replica count of
deployment based on the value of a
metric.
@lbernail #ContainerDayFR
HorizontalPodAutoscaler
apiVersion: apps/v1
kind: Deployment
spec:
replicas: 60
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 50
Controller manages replica count of
deployment based on the value of a
metric.
Must remove explicit replica count!
@lbernail #ContainerDayFR
Scale to One
#ContainerDayFR
#7 Ghosts in the C* cluster
@lbernail #ContainerDayFR
● 100+ nodes Cassandra cluster
● Deployed fine
● Broken the following morning
Ghosts in the Cassandra cluster
@lbernail #ContainerDayFR
● 100+ nodes Cassandra cluster
● Deployed fine
● Broken the following morning
> 25% pods pending: “Volume affinity issue”
Ghosts in the Cassandra cluster
@lbernail #ContainerDayFR
● 100+ nodes Cassandra cluster
● Deployed fine
● Broken the following morning
> 25% pods pending: “Volume affinity issue”
> 25% nodes had been deleted + local volumes bound to deleted nodes
Ghosts in the Cassandra cluster
@lbernail #ContainerDayFR
● 100+ nodes Cassandra cluster
● Deployed fine
● Broken the following morning
> 25% pods pending: “Volume affinity issue”
> 25% nodes had been deleted + local volumes bound to deleted nodes
Nodes per AZ > ASG rebalance
Ghosts in the Cassandra cluster
#ContainerDayFR
#8 Very slow deployments
@lbernail #ContainerDayFR
Symptoms
Deployments getting slower
Very slow deployments
@lbernail #ContainerDayFR
Symptoms
Deployments getting slower
Cause
4000 pending pods
Very slow deployments
@lbernail #ContainerDayFR
Symptoms
Deployments getting slower
Cause
4000 pending pods
cronjob (*/10) with wrong toleration
Scheduling loop having a hard time
Very slow deployments
#ContainerDayFR
#8.2 Very slow deployments
@lbernail #ContainerDayFR
Symptoms
Deployments getting slower
Very slow deployments #2
Deploy heartbeat
@lbernail #ContainerDayFR
Symptoms
Deployments getting slower
Cause
4000 pending pods from cronjob
But tolerations were correct
Very slow deployments #2
Pending pods
Deploy heartbeat
@lbernail #ContainerDayFR
Symptoms
Deployments getting slower
Cause
4000 pending pods from cronjobs
A lot of preemption attempts
=> Resource issue
Very slow deployments #2
Pending pods
Deploy heartbeat
Preemptions attempts
@lbernail #ContainerDayFR
Cronjob moved to new namespace
Tolerations were fine
But vault permission were tied to the old namespace
What happened?
@lbernail #ContainerDayFR
Cronjob moved to new namespace
Tolerations were fine
But vault permission were tied to the old namespace
InitContainers failed
InitContainers enter Crashloop Backoff
What happened?
@lbernail #ContainerDayFR
Cronjob moved to new namespace
Tolerations were fine
But vault permission were tied to the old namespace
InitContainers failed
InitContainers enter Crashloop Backoff
Default “concurrencyPolicy” for cronjob (Allow)
These pods consume resources
After a while new pods can’t be scheduled => Pending
What happened?
@lbernail #ContainerDayFR
Cronjob moved to new namespace
Tolerations were fine
But vault permission were tied to the old namespace
InitContainers failed
InitContainers enter Crashloop Backoff
Default “concurrencyPolicy” for cronjob (Allow)
These pods consume resources
After a while new pods can’t be scheduled => Pending
Actually an upstream bug
InitContainer restarts not counted against “backoffLimit” (job spec)
Fixed upstream: https://github.com/kubernetes/kubernetes/pull/77647
What happened?
#ContainerDayFR
#9 Contained ?
#ContainerDayFR
#9.1 Broken runtime
@lbernail #ContainerDayFR
Broken runtime #1
Pods stuck “Terminating”
Runtime unresponsive
@lbernail #ContainerDayFR
Broken runtime #1
Pods stuck “Terminating”
Runtime unresponsive
@lbernail #ContainerDayFR
ps auxf | grep -c defunct
16018
Broken runtime #1
Pods stuck “Terminating”
Runtime unresponsive
@lbernail #ContainerDayFR
ps auxf | grep -c defunct
16018
readinessProbe:
exec:
command: [server_readiness_probe.sh]
timeoutSeconds: 1
Broken runtime #1
Pods stuck “Terminating”
Runtime unresponsive
@lbernail #ContainerDayFR
Broken runtime #2
Pods stuck “Terminating”
@lbernail #ContainerDayFR
Broken runtime #2
Pods stuck “Terminating”
Runtime logs: Failure to StopContainer
@lbernail #ContainerDayFR
Broken runtime #2
Pods stuck “Terminating”
Runtime logs: Failure to StopContainer
ps auxf
[...]
600 containerd
10000 _ containerd-shim
12345 _ java …
@lbernail #ContainerDayFR
Broken runtime #2
Pods stuck “Terminating”
Runtime logs: Failure to StopContainer
ps auxf
[...]
600 containerd
10000 _ containerd-shim
12345 _ java …
kill 12345 => Nothing
@lbernail #ContainerDayFR
Broken runtime #2
Pods stuck “Terminating”
Runtime logs: Failure to StopContainer
ps auxf
[...]
600 containerd
10000 _ containerd-shim
12345 _ java …
kill 12345 => Nothing
kill -9 12345 => Nothing
@lbernail #ContainerDayFR
Broken runtime #2
Pods stuck “Terminating”
Runtime logs: Failure to StopContainer
ps auxf
[...]
600 containerd
10000 _ containerd-shim
12345 _ java …
kill 12345 => Nothing
kill -9 12345 => Nothing
Status: “D” => Uninterruptible sleep
@lbernail #ContainerDayFR
Broken runtime #2
What is the process waiting on?
@lbernail #ContainerDayFR
• Blocked io (nvme issue)
• Deletion starts with cgroup freeze
• Freezing was hanging
Broken runtime #2
#ContainerDayFR
#9.2 Performance issues
@lbernail #ContainerDayFR
Perf issues: slow deployments
Symptoms
● Deployment slow
● Pod took 1+mn to start
@lbernail #ContainerDayFR
Load on nodepool Symptoms
● Deployment slow
● Pod took 1+mn to start
Cause
● Load on nodes running deployment high
Perf issues: slow deployments
@lbernail #ContainerDayFR
Load on nodepool
EBS burst balance
Write iops
Symptoms
● Deployment slow
● Pod took 1+mn to start
Cause
● Load on nodes running deployment high
● Because IOs are being rate limited
Perf issues: slow deployments
@lbernail #ContainerDayFR
Why ??
DNS queries
@lbernail #ContainerDayFR
DNS queries
● Nodes were running coredns pods
● An app started doing 5000+ dns queries per second
● coredns logs filled the disk
Why ??
@lbernail #ContainerDayFR
Fix #1
Change app to use local DNS cache
@lbernail #ContainerDayFR
IOs still too high
What is this???
@lbernail #ContainerDayFR
Remember audit?
Daemonset had been removed, but audit was still enabled
We dropped traffic at log intake
But we were still writing to disk...
IOs still too high
@lbernail #ContainerDayFR
Fix #2
Disable audit for real
@lbernail #ContainerDayFR
Weird outlier
iops by node in the group One node still had more iops
@lbernail #ContainerDayFR
iops by node in the group
iops by node in the group (zoomed)
One node still had more iops
> io spikes every minute
Weird outlier
@lbernail #ContainerDayFR
iops by node in the group
iops by node in the group (zoomed)
containers created by node
One node still had more iops
> io spikes every minute
> Single host get all pod for a cronjob
> Suspending it => 0
> Changing the job to sleep => 0
Weird outlier
@lbernail #ContainerDayFR
Why is the job using so much IOs???
Job
ips = $(aws ec2 describe-instances --filter=consul)
kubectl update endpoints consul $ips
@lbernail #ContainerDayFR
Job
ips = $(aws ec2 describe-instances --filter=consul)
kubectl update endpoints consul $ips
$ kubectl get componentstatuses
$ find $HOME/.kube/ |wc -l
163
$ kubectl -v=8 get componentstatuses | grep "GET https" -c
45
Why is the job using so much IOs???
#ContainerDayFR
1. Be very careful with Daemonsets
2. Cronjobs can easily create problems
3. DNS is hard
4. Cloud infra not transparent
5. Containers not really contained
6. Train users
Key take-aways
#ContainerDayFR
Thank you!
Questions / comments
twitter: @lbernail
K8s slack: lbernail
We’re hiring!
Come join us (and break k8s!)
https://www.datadoghq.com/careers/

More Related Content

Similar to 10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! (Container Day Paris)

How the OOM Killer Deleted My Namespace
How the OOM Killer Deleted My NamespaceHow the OOM Killer Deleted My Namespace
How the OOM Killer Deleted My NamespaceLaurent Bernaille
 
Kubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes Failure Stories - KubeCon Europe BarcelonaKubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes Failure Stories - KubeCon Europe BarcelonaHenning Jacobs
 
Kubernetes Networking
Kubernetes NetworkingKubernetes Networking
Kubernetes NetworkingCJ Cullen
 
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...Henning Jacobs
 
Networking in Kubernetes
Networking in KubernetesNetworking in Kubernetes
Networking in KubernetesMinhan Xia
 
To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…Sergey Dzyuban
 
How Honestbee Does CI/CD on Kubernetes - Vincent DeSmet
How Honestbee Does CI/CD on Kubernetes - Vincent DeSmetHow Honestbee Does CI/CD on Kubernetes - Vincent DeSmet
How Honestbee Does CI/CD on Kubernetes - Vincent DeSmetDevOpsDaysJKT
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Fwdays
 
Let's talk about Failures with Kubernetes - Hamburg Meetup
Let's talk about Failures with Kubernetes - Hamburg MeetupLet's talk about Failures with Kubernetes - Hamburg Meetup
Let's talk about Failures with Kubernetes - Hamburg MeetupHenning Jacobs
 
JDD2014: Docker.io - versioned linux containers for JVM devops - Dominik Dorn
JDD2014: Docker.io - versioned linux containers for JVM devops - Dominik DornJDD2014: Docker.io - versioned linux containers for JVM devops - Dominik Dorn
JDD2014: Docker.io - versioned linux containers for JVM devops - Dominik DornPROIDEA
 
stackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Three
stackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Threestackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Three
stackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick ThreeNETWAYS
 
Running Cloud Foundry for 12 months - An experience report | anynines
Running Cloud Foundry for 12 months - An experience report | anyninesRunning Cloud Foundry for 12 months - An experience report | anynines
Running Cloud Foundry for 12 months - An experience report | anyninesanynines GmbH
 
2016 - Easing Your Way Into Docker: Lessons From a Journey to Production
2016 - Easing Your Way Into Docker: Lessons From a Journey to Production2016 - Easing Your Way Into Docker: Lessons From a Journey to Production
2016 - Easing Your Way Into Docker: Lessons From a Journey to Productiondevopsdaysaustin
 
Docker Advanced registry usage
Docker Advanced registry usageDocker Advanced registry usage
Docker Advanced registry usageDocker, Inc.
 
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...Odinot Stanislas
 
'DOCKER' & CLOUD: ENABLERS For DEVOPS
'DOCKER' & CLOUD:  ENABLERS For DEVOPS'DOCKER' & CLOUD:  ENABLERS For DEVOPS
'DOCKER' & CLOUD: ENABLERS For DEVOPSACA IT-Solutions
 
Docker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITDocker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITStijn Wijndaele
 
Kubernetes the Very Hard Way. Lisa Portland 2019
Kubernetes the Very Hard Way. Lisa Portland 2019Kubernetes the Very Hard Way. Lisa Portland 2019
Kubernetes the Very Hard Way. Lisa Portland 2019Laurent Bernaille
 
Monitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSMonitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSAmazon Web Services
 

Similar to 10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! (Container Day Paris) (20)

How the OOM Killer Deleted My Namespace
How the OOM Killer Deleted My NamespaceHow the OOM Killer Deleted My Namespace
How the OOM Killer Deleted My Namespace
 
Kubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes Failure Stories - KubeCon Europe BarcelonaKubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes Failure Stories - KubeCon Europe Barcelona
 
Kubernetes Networking
Kubernetes NetworkingKubernetes Networking
Kubernetes Networking
 
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...
 
Networking in Kubernetes
Networking in KubernetesNetworking in Kubernetes
Networking in Kubernetes
 
To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…
 
How Honestbee Does CI/CD on Kubernetes - Vincent DeSmet
How Honestbee Does CI/CD on Kubernetes - Vincent DeSmetHow Honestbee Does CI/CD on Kubernetes - Vincent DeSmet
How Honestbee Does CI/CD on Kubernetes - Vincent DeSmet
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
 
Let's talk about Failures with Kubernetes - Hamburg Meetup
Let's talk about Failures with Kubernetes - Hamburg MeetupLet's talk about Failures with Kubernetes - Hamburg Meetup
Let's talk about Failures with Kubernetes - Hamburg Meetup
 
JDD2014: Docker.io - versioned linux containers for JVM devops - Dominik Dorn
JDD2014: Docker.io - versioned linux containers for JVM devops - Dominik DornJDD2014: Docker.io - versioned linux containers for JVM devops - Dominik Dorn
JDD2014: Docker.io - versioned linux containers for JVM devops - Dominik Dorn
 
stackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Three
stackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Threestackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Three
stackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Three
 
Running Cloud Foundry for 12 months - An experience report | anynines
Running Cloud Foundry for 12 months - An experience report | anyninesRunning Cloud Foundry for 12 months - An experience report | anynines
Running Cloud Foundry for 12 months - An experience report | anynines
 
2016 - Easing Your Way Into Docker: Lessons From a Journey to Production
2016 - Easing Your Way Into Docker: Lessons From a Journey to Production2016 - Easing Your Way Into Docker: Lessons From a Journey to Production
2016 - Easing Your Way Into Docker: Lessons From a Journey to Production
 
Docker Advanced registry usage
Docker Advanced registry usageDocker Advanced registry usage
Docker Advanced registry usage
 
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
 
Introduction to Docker
Introduction to DockerIntroduction to Docker
Introduction to Docker
 
'DOCKER' & CLOUD: ENABLERS For DEVOPS
'DOCKER' & CLOUD:  ENABLERS For DEVOPS'DOCKER' & CLOUD:  ENABLERS For DEVOPS
'DOCKER' & CLOUD: ENABLERS For DEVOPS
 
Docker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITDocker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-IT
 
Kubernetes the Very Hard Way. Lisa Portland 2019
Kubernetes the Very Hard Way. Lisa Portland 2019Kubernetes the Very Hard Way. Lisa Portland 2019
Kubernetes the Very Hard Way. Lisa Portland 2019
 
Monitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSMonitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECS
 

More from Laurent Bernaille

Kubernetes DNS Horror Stories
Kubernetes DNS Horror StoriesKubernetes DNS Horror Stories
Kubernetes DNS Horror StoriesLaurent Bernaille
 
Evolution of kube-proxy (Brussels, Fosdem 2020)
Evolution of kube-proxy (Brussels, Fosdem 2020)Evolution of kube-proxy (Brussels, Fosdem 2020)
Evolution of kube-proxy (Brussels, Fosdem 2020)Laurent Bernaille
 
Making the most out of kubernetes audit logs
Making the most out of kubernetes audit logsMaking the most out of kubernetes audit logs
Making the most out of kubernetes audit logsLaurent Bernaille
 
Kubernetes the Very Hard Way. Velocity Berlin 2019
Kubernetes the Very Hard Way. Velocity Berlin 2019Kubernetes the Very Hard Way. Velocity Berlin 2019
Kubernetes the Very Hard Way. Velocity Berlin 2019Laurent Bernaille
 
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!Laurent Bernaille
 
Optimizing kubernetes networking
Optimizing kubernetes networkingOptimizing kubernetes networking
Optimizing kubernetes networkingLaurent Bernaille
 
Kubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayKubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayLaurent Bernaille
 
Deep Dive in Docker Overlay Networks
Deep Dive in Docker Overlay NetworksDeep Dive in Docker Overlay Networks
Deep Dive in Docker Overlay NetworksLaurent Bernaille
 
Deeper dive in Docker Overlay Networks
Deeper dive in Docker Overlay NetworksDeeper dive in Docker Overlay Networks
Deeper dive in Docker Overlay NetworksLaurent Bernaille
 
Operational challenges behind Serverless architectures
Operational challenges behind Serverless architecturesOperational challenges behind Serverless architectures
Operational challenges behind Serverless architecturesLaurent Bernaille
 
Deep dive in Docker Overlay Networks
Deep dive in Docker Overlay NetworksDeep dive in Docker Overlay Networks
Deep dive in Docker Overlay NetworksLaurent Bernaille
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Laurent Bernaille
 
Early recognition of encryted applications
Early recognition of encryted applicationsEarly recognition of encryted applications
Early recognition of encryted applicationsLaurent Bernaille
 
Early application identification. CONEXT 2006
Early application identification. CONEXT 2006Early application identification. CONEXT 2006
Early application identification. CONEXT 2006Laurent Bernaille
 

More from Laurent Bernaille (15)

Kubernetes DNS Horror Stories
Kubernetes DNS Horror StoriesKubernetes DNS Horror Stories
Kubernetes DNS Horror Stories
 
Evolution of kube-proxy (Brussels, Fosdem 2020)
Evolution of kube-proxy (Brussels, Fosdem 2020)Evolution of kube-proxy (Brussels, Fosdem 2020)
Evolution of kube-proxy (Brussels, Fosdem 2020)
 
Making the most out of kubernetes audit logs
Making the most out of kubernetes audit logsMaking the most out of kubernetes audit logs
Making the most out of kubernetes audit logs
 
Kubernetes the Very Hard Way. Velocity Berlin 2019
Kubernetes the Very Hard Way. Velocity Berlin 2019Kubernetes the Very Hard Way. Velocity Berlin 2019
Kubernetes the Very Hard Way. Velocity Berlin 2019
 
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
 
Optimizing kubernetes networking
Optimizing kubernetes networkingOptimizing kubernetes networking
Optimizing kubernetes networking
 
Kubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayKubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard way
 
Deep Dive in Docker Overlay Networks
Deep Dive in Docker Overlay NetworksDeep Dive in Docker Overlay Networks
Deep Dive in Docker Overlay Networks
 
Deeper dive in Docker Overlay Networks
Deeper dive in Docker Overlay NetworksDeeper dive in Docker Overlay Networks
Deeper dive in Docker Overlay Networks
 
Discovering OpenBSD on AWS
Discovering OpenBSD on AWSDiscovering OpenBSD on AWS
Discovering OpenBSD on AWS
 
Operational challenges behind Serverless architectures
Operational challenges behind Serverless architecturesOperational challenges behind Serverless architectures
Operational challenges behind Serverless architectures
 
Deep dive in Docker Overlay Networks
Deep dive in Docker Overlay NetworksDeep dive in Docker Overlay Networks
Deep dive in Docker Overlay Networks
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016
 
Early recognition of encryted applications
Early recognition of encryted applicationsEarly recognition of encryted applications
Early recognition of encryted applications
 
Early application identification. CONEXT 2006
Early application identification. CONEXT 2006Early application identification. CONEXT 2006
Early application identification. CONEXT 2006
 

Recently uploaded

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Recently uploaded (20)

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! (Container Day Paris)