背景
In 配置活动、就绪和启动探针 https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes您可以在文档中找到信息:
kubelet 使用liveness probes
了解何时重新启动容器。例如,活性探针可能会捕获死锁,即应用程序正在运行但无法取得进展。在这种状态下重新启动容器有助于提高应用程序的可用性,尽管存在错误。
kubelet 使用readiness probes
了解容器何时准备好开始接受流量。当 Pod 的所有容器都准备就绪时,该 Pod 就被认为准备就绪。该信号的一种用途是控制哪些 Pod 用作服务的后端。当 Pod 未准备好时,它将从服务负载均衡器中删除。
As GKE
master是google管理的,你找不到kubelet
记录使用CLI
(你可以尝试使用Stackdriver
)。我已经在上面测试过Kubeadm
聚类并设置verbosity
水平至8
.
当您使用时$ kubectl get events
您仅收到过去一小时的事件(可以在 Kubernetes 设置中更改 -Kubeadm
但我不认为它可以改变GKE
因为 master 由 google 管理。)
$ kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
37m Normal Starting node/kubeadm Starting kubelet.
...
33m Normal Scheduled pod/liveness-http Successfully assigned default/liveness-http to kubeadm
33m Normal Pulling pod/liveness-http Pulling image "k8s.gcr.io/liveness"
33m Normal Pulled pod/liveness-http Successfully pulled image "k8s.gcr.io/liveness" in 893.953679ms
33m Normal Created pod/liveness-http Created container liveness
33m Normal Started pod/liveness-http Started container liveness
3m12s Warning Unhealthy pod/liveness-http Readiness probe failed: HTTP probe failed with statuscode: 500
30m Warning Unhealthy pod/liveness-http Liveness probe failed: HTTP probe failed with statuscode: 500
8m17s Warning BackOff pod/liveness-http Back-off restarting failed container
之后再次执行相同的命令~1 hour
.
$ kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
33s Normal Pulling pod/liveness-http Pulling image "k8s.gcr.io/liveness"
5m40s Warning Unhealthy pod/liveness-http Readiness probe failed: HTTP probe failed with statuscode: 500
15m Warning BackOff pod/liveness-http Back-off restarting failed container
Tests
The Readiness Probe
检查每 10 秒执行一次,持续时间超过一小时。
Mar 09 14:48:34 kubeadm kubelet[3855]: I0309 14:48:34.222085 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 14:48:44 kubeadm kubelet[3855]: I0309 14:48:44.221782 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 14:48:54 kubeadm kubelet[3855]: I0309 14:48:54.221828 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
...
Mar 09 15:01:34 kubeadm kubelet[3855]: I0309 15:01:34.222491 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4
562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 15:01:44 kubeadm kubelet[3855]: I0309 15:01:44.221877 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 15:01:54 kubeadm kubelet[3855]: I0309 15:01:54.221976 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
...
Mar 09 15:10:14 kubeadm kubelet[3855]: I0309 15:10:14.222163 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 15:10:24 kubeadm kubelet[3855]: I0309 15:10:24.221744 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 15:10:34 kubeadm kubelet[3855]: I0309 15:10:34.223877 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
...
Mar 09 16:04:14 kubeadm kubelet[3855]: I0309 16:04:14.222853 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 16:04:24 kubeadm kubelet[3855]: I0309 16:04:24.222531 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
另外,还有Liveness probe
条目。
Mar 09 16:12:58 kubeadm kubelet[3855]: I0309 16:12:58.462878 3855 prober.go:117] Liveness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 16:13:58 kubeadm kubelet[3855]: I0309 16:13:58.462906 3855 prober.go:117] Liveness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
Mar 09 16:14:58 kubeadm kubelet[3855]: I0309 16:14:58.465470 3855 kuberuntime_manager.go:656] Container "liveness" ({"docker" "95567f85708ffac8b34b6c6f2bdb4
9d8eb57e7704b7b416083c7f296dd40cd0b"}) of pod liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a): Container liveness failed liveness probe, will be restarted
Mar 09 16:14:58 kubeadm kubelet[3855]: I0309 16:14:58.465587 3855 kuberuntime_manager.go:712] Killing unwanted container "liveness"(id={"docker" "95567f85708ffac8b34b6c6f2bdb49d8eb57e7704b7b416083c7f296dd40cd0b"}) for pod "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a)"
测试总时间:
$ kubectl get po -w
NAME READY STATUS RESTARTS AGE
liveness-http 0/1 Running 21 99m
liveness-http 0/1 CrashLoopBackOff 21 101m
liveness-http 0/1 Running 22 106m
liveness-http 1/1 Running 22 106m
liveness-http 0/1 Running 22 106m
liveness-http 0/1 Running 23 109m
liveness-http 1/1 Running 23 109m
liveness-http 0/1 Running 23 109m
liveness-http 0/1 CrashLoopBackOff 23 112m
liveness-http 0/1 Running 24 117m
liveness-http 1/1 Running 24 117m
liveness-http 0/1 Running 24 117m
结论
不再调用活性探针检查
Liveness check
在 Kubernetes 创建 pod 时创建,并且每次都会重新创建Pod
已重新启动。在您的配置中您已设置initialDelaySeconds: 20
所以在创建一个之后pod
,Kubernetes会等待20秒,然后它会调用liveness
探测3次(如您所设置的failureThreshold: 3
)。 3次失败后,Kubernetes将根据RestartPolicy
。此外,在日志中您还可以在日志中找到:
Mar 09 16:14:58 kubeadm kubelet[3855]: I0309 16:14:58.465470 3855 kuberuntime_manager.go:656] Container "liveness" ({"docker" "95567f85708ffac8b34b6c6f2bdb4
9d8eb57e7704b7b416083c7f296dd40cd0b"}) of pod liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a): Container liveness failed liveness probe, will be restarted
为什么会重启呢?答案可以在容器探针 https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-probes.
livenessProbe:
指示容器是否正在运行。如果活性探测失败,kubelet 将终止容器,并且容器将遵循其重新启动策略。
Default 重启政策 https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy in GKE
is Always
。所以你的 Pod 会一遍又一遍地重新启动。
已调用就绪探测检查,但间隔变长(最大间隔看起来约为 10 分钟)
我认为你已经得出了这个结论,因为你是基于$ kubectl get events
and $ kubectl describe po
。在这两种情况下,默认事件都会在 1 小时后删除。在我的Tests
你可以看到的部分Readiness probe
条目来自14:48:34
till 16:04:24
,因此 Kubernetes 每 10 秒调用一次Readiness Probe
.
为什么 Liveness 探针没有运行,并且 Readiness 探针的间隔发生了变化?
正如我在Tests
部分, 的Readiness probe
没有改变。在这种情况下误导是使用$ kubectl events
。关于Liveiness Probe
它仍然在调用,但在 pod 被调用后只调用了 3 次created
/restarted
。我还包括了输出$ kubectl get po -w
. When pod
重新创建后,您可能会在 kubelet 日志中找到这些liveness probes
.
在我的计划中,我将创建警报策略,其条件如下:
If liveness probe
失败 3 次,根据您当前的设置,它将重新启动该 pod。在这种情况下,你可以使用每个restart
创建一个alert
.
Metric: kubernetes.io/container/restart_count
Resource type: k8s_container
您可以在 Stackoverflow 案例中找到一些有用的信息:Monitoring alert
like:
- 使用 Google Container Engine (GKE) 和 Stackdriver 监控 Pod 状态并发出警报或重新启动 https://stackoverflow.com/questions/43789276/monitoring-and-alerting-on-pod-status-or-restart-with-google-container-engine-g
- 我可以使用 Google Cloud Monitoring 来监控发生故障的 Container/Pod 吗? https://stackoverflow.com/questions/61248752/can-i-use-google-cloud-monitoring-to-monitor-a-failing-container-pod