Kubernetes Hands-On: Testing Liveness and Readiness probes

jrfilocao
3 min readMar 3, 2024

--

This is the second part of https://jrfilocao.medium.com/kubernetes-probes-do-not-make-this-mistake-0f5302f2ff8b.

In the past article, we learned about best practices for readiness and liveness probes. Additionally, we reviewed a case study where the following misconfigured liveness probe led to cascading failures in production:

Kubernetes configuration with a misconfigured liveness probe

Let's now reproduce the error on your local machine. Afterwards, we simulate and test a solution.

Reproducing the error

We need to simulate a scenario where the liveness and readiness probes encounter failures. To achieve this, we create a server with a health check endpoint. When called by the probes, this endpoint will intentionally respond with an error, replicating the failure conditions.

Step by Step

1) Clone the repository with files for simulating a server and configuring Kubernetes:

git clone https://github.com/jrfilocao/medium-kubernetes.git

2) Install docker

3) Install kubectl:
-
linux: https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/
- macos: https://kubernetes.io/docs/tasks/tools/install-kubectl-macos/

4) Install minikube and start it with docker as a driver:

minikube start --driver=docker

5) Check everything is all right on minikube:

minikube status

6) Point your shell to minikube’s docker-daemon, to fetch the local image of the simulation server

eval $(minikube -p minikube docker-env)

7) Ensure you are in the same directory as the cloned repository. Build the image for the simulation server, which is based on the code in https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/:

docker build -t go-server .  

The endpoint /healthz alternates its response every 60 seconds between HTTP 200 and HTTP 500 status codes. Probing this endpoint will simulate a heavy load on the server, resulting in its unavailability for 60 seconds after being available for 60 seconds.

8) Apply the Kubernetes configuration with the erroneous liveness:

kubectl apply -f readiness-liveness-failing.yaml

9) Check how the pod is going:

kubectl get pods -A

10) You should see many restarts:

Restarts due to failing liveness probe

11) To track what is happening under /healthz:

kubectl port-forward [name of your pod] 8080

/healthz shows the current HTTP status, /started shows the current second counter. Once the counter reaches 60, it will reset back to zero.

watch -n 2 curl -v 127.0.0.1:8080/healthz 
watch -n 2 curl -v 127.0.0.1:8080/started

Solving the issue

1) Delete the current Kubernetes configuration:

kubectl delete -f readiness-liveness-failing.yaml

2) Apply the configuration which fixes the issue:

kubectl apply -f readiness-liveness-solution.yaml

This configuration only inverts the periodSeconds and failureThreshold between readiness and liveness. After being unavailable for 30 seconds, the pod is marked as not ready and no longer receives traffic. Only after 30*10 seconds (5 minutes) would the liveness probe fail and restart the container.

This configuration also sets the replicaset to two so that the remaining pod can serve the request if one pod is unavailable due to a failing readiness problem.

3) You should not see any restarts. One of the pods becomes temporarily not ready:

No restarts since the pod is being removed first from the Service load balancer due to a failing readiness probe
No restarts since the pod is being removed first from the Service load balancer due to a failing readiness probe

Then, after /healthz starts responding HTTP 200, the pod turns ready again on the next successful readiness probe.

Pod back to a ready state

--

--