16 Aug 2020

Deepdive into Kubernetes Probe's period seconds

Kubernetes has a container’s health check mechanism called Probe. This might be a common feature for almost Kubernetes users.

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

Now at v1.18, there are three types of probes ➡️ Liveness Probe, Readiness Probe and Startup Probe.

Though these three types of probes are different only in their behavior when a health check fails, the basic mechanism of the health check itself is not so different.

Also Probes has three types of health checks ➡️ command execution, TCP or HTTP request.

This kind of health check mechanism is a popular idea not only in Kubernetes but also in other products. However, the detail behaviors are different depending on each products.

Probing loop period seconds

Let’s take the following example 👀 In order to probe your application is ready to accept requests, you might sometimes need to check not only your application but also the databases or external services it depends on. In such a case, the health check may take longer than expected due to network latency or external service congestion.

Suppose that Kubernetes Readiness Probe is enabled with PeriodSeconds set to 3 seconds, and a single probe of HTTP request takes 2 seconds.

In this case, will the next Probe request begin 3 seconds after the previous request is responded? Or will it be performed 3 seconds after the last request, regardless of the previous response time? (meaning 1 second later after the previous request is responded)

Furthermore, if the Probe request takes more than 3 seconds, will the next request start in parallel to the previous request?

The Kubernetes official documentation does not describe this behavior in detail (I worked on the official website translatation into Japanese)

Deepdive into kubelet code

Probes is performed by the Kubelet on the node where the container is running. The Kubelet Prober package contains the code about it.

https://github.com/kubernetes/kubernetes/tree/master/pkg/kubelet/prober

You can see three major components here: prober.go, prober_manager.go, and worker.go. It is helpful to understand only by its comments.

prober

Prober helps to check the liveness/readiness/startup of a container.
prober_manager

Manager manages pod probing. It creates a probe “worker” for every container that specifies a probe (AddPod). The worker periodically probes its assigned container and caches the results.
worker

worker handles the periodic probing of its assigned container. Each worker has a go-routine associated with it which runs the probe loop until the container permanently terminates, or the stop channel is closed.

From the comments we can see that the worker has been running Probe in goroutine periodically. Let’s see the worker’s run() function.

// run periodically probes the container.
func (w *worker) run() {
	probeTickerPeriod := time.Duration(w.spec.PeriodSeconds) * time.Second
 
	// If kubelet restarted the probes could be started in rapid succession.
	// Let the worker wait for a random portion of tickerPeriod before probing.
	time.Sleep(time.Duration(rand.Float64() * float64(probeTickerPeriod)))
 
	probeTicker := time.NewTicker(probeTickerPeriod)
 
	defer func() {
		// Clean up.
		probeTicker.Stop()
		if !w.containerID.IsEmpty() {
			w.resultsManager.Remove(w.containerID)
		}
 
		w.probeManager.removeWorker(w.pod.UID, w.container.Name, w.probeType)
		ProberResults.Delete(w.proberResultsSuccessfulMetricLabels)
		ProberResults.Delete(w.proberResultsFailedMetricLabels)
		ProberResults.Delete(w.proberResultsUnknownMetricLabels)
	}()
 
probeLoop:
	for w.doProbe() {
		// Wait for next probe tick.
		select {
		case <-w.stopCh:
			break probeLoop
		case <-probeTicker.C:
			// continue
		}
	}
}

At probeLoop, the sleep logic is implemented by <-probeTicker.C.

It is a channel of time.Ticker that blocks the Probes execution loop at regular intervals. Therefore doProbe() will be executed at the intervals.

https://golang.org/pkg/time/#Ticker

NewTicker returns a new Ticker containing a channel that will send the time with a period specified by the duration argument. It adjusts the intervals or drops ticks to make up for slow receivers.

So how does the time.Ticker loop behave depending on the execution time of doProbe()?

Let’s write a small code with the same logic and see how it behaves. It is a code that the execution time of doProbe() gradually increase by 1 seconds.

PeriodSeconds is set to 3 seconds.

package main
 
import (
	"context"
	"log"
	"time"
)
 
func main() {
	ctx, cancel := context.WithTimeout(context.Background(), time.Second*20)
	defer cancel()
 
	t := time.NewTicker(time.Millisecond * 3000)
	defer t.Stop()
 
probeLoop:
	for doProbe() {
		select {
		case <-ctx.Done():
			break probeLoop
		case <-t.C:
			log.Println("ticker next")
		}
	}
	log.Println("finished")
}
 
var i uint = 0
 
func doProbe() (keepGoing bool) {
	log.Println("doProbe start", i)
	time.Sleep(time.Second * time.Duration(i))
	log.Println("doProbe end")
	i++
	return true
}

The output is as follows.

$ go run .
2020/08/18 23:29:30 doProbe start 0
2020/08/18 23:29:30 doProbe end
2020/08/18 23:29:33 ticker next     # next after 3 seconds
2020/08/18 23:29:33 doProbe start 1
2020/08/18 23:29:34 doProbe end
2020/08/18 23:29:36 ticker next     # next after 2 seconds
2020/08/18 23:29:36 doProbe start 2
2020/08/18 23:29:38 doProbe end
2020/08/18 23:29:39 ticker next     # next after 1 seconds
2020/08/18 23:29:39 doProbe start 3
2020/08/18 23:29:42 doProbe end
2020/08/18 23:29:42 ticker next     # next after 0 seconds
2020/08/18 23:29:42 doProbe start 4
2020/08/18 23:29:46 doProbe end
2020/08/18 23:29:46 ticker next     # next after 0 seconds
2020/08/18 23:29:46 doProbe start 5
2020/08/18 23:29:51 doProbe end
2020/08/18 23:29:51 finished

When a Probe takes less than 3 seconds, it run every 3 seconds regardless of the Probe’s execution time. For example, when a Probe takes 2 seconds, the next Probe start 1 seconds after the previous Probe ends. When a Probe takes longer than 3 seconds, the next Probe will start as soon as the Probe finishes.

☑️Here is a summary of this page

When a Probe takes less than period seconds, it run every period seconds regardless of the Probe’s execution time.
When a Probe takes longer than period seconds, the next Probe will start as soon as the Probe finishes.

As you can see, the functionality of Kubernetes depends on that of Go. Even if you are not a Kubernetes developer or contributor, reading codes(or even only comments) will give you some good tips.

Have a better k8s life 🙌