kubernetes入门到实战(八)Pod健康检查机制
2021-02-07 00:18
标签:sel security tcp建立连接 def 发送 cat -o data scheduler 应用在运行过程中难免会出现错误,如程序异常,软件异常,硬件故障,网络故障等,kubernetes提供Health Check健康检查机制,当发现应用异常时会自动重启容器,将应用从service服务中剔除,保障应用的高可用性。k8s定义了三种探针Probe: 每种探测机制支持三种健康检查方法,分别是命令行exec,httpGet和tcpSocket,其中exec通用性最强,适用与大部分场景,tcpSocket适用于TCP业务,httpGet适用于web业务。 每种探测方法能支持几个相同的检查参数,用于设置控制检查时间: 许多应用程序运行过程中无法检测到内部故障,如死锁,出现故障时通过重启业务可以恢复,kubernetes提供liveness在线健康检查机制,我们以exec为例,创建一个容器启动过程中创建一个文件/tmp/liveness-probe.log,10s后将其删除,定义liveness健康检查机制在容器中执行命令ls -l /tmp/liveness-probe.log,通过文件的返回码判断健康状态,如果返回码非0,暂停20s后kubelet会自动将该容器重启。 1、定义一个容器,启动时创建一个文件,健康检查时ls -l /tmp/liveness-probe.log返回码为0,健康检查正常,10s后将其删除,返回码为非0,健康检查异常 2、应用配置生成容器 3、查看容器的event日志,容器启动后,10s以内容器状态正常,11s开始执行liveness健康检查,检查异常,触发容器重启 4、查看容器重启次数,容器不停的执行,重启次数会响应增加,可以看到RESTARTS的次数在持续增加 1、httpGet probe主要主要用于web场景,通过向容器发送http请求,根据返回码判断容器的健康状态,返回码小于4xx即表示健康,如下定义一个nginx应用,通过探测http:// 2、生成pod并查看健康状态 3、模拟故障,将pod中的path文件所属文件删除,此时发送http请求时会健康检查异常,会触发容器自动重启 4、再次查看pod的列表,此时会RESTART的次数会增加1,表示重启重启过一次,AGE则多久前重启的时间 5、查看pod的详情,观察容器重启的情况,通过Liveness 检查容器出现404错误,触发重启。 1、tcpsocket健康检查适用于TCP业务,通过向指定容器建立一个tcp连接,可以建立连接则健康检查正常,否则健康检查异常,依旧以nignx为例使用tcp健康检查机制,探测80端口的连通性 2、应用配置创建容器 3、模拟故障,获取pod所属节点,登录到pod中,安装查看进程工具htop 4、运行htop查看进程,容器进程通常为1,kill掉进程观察容器状态,观察RESTART次数重启次数增加 5、查看容器详情,发现容器有重启的记录 就绪检查用于应用接入到service的场景,用于判断应用是否已经就绪完毕,即是否可以接受外部转发的流量,健康检查正常则将pod加入到service的endpoints中,健康检查异常则从service的endpoints中删除,避免影响业务的访问。 1、创建一个pod,使用httpGet的健康检查机制,定义readiness就绪检查探针检查路径/test.html 2、定义一个service,将上述的pod加入到service中,注意使用上述定义的labels,app=nginx 3、生成配置 4、此时pod状态正常,此时readiness健康检查异常 5、查看services的endpoints,发现此时endpoints为空,因为readiness就绪检查异常,kubelet认为此时pod并未就绪,因此并未将其加入到endpoints中。 6、进入到pod中手动创建网站文件,使readiness健康检查正常 7、此时readiness健康检查正常,kubelet检测到pod就绪会将其加入到endpoints中 8、同理,如果此时容器的健康检查异常,kubelet会自动将其动endpoint中 本章介绍kubernetes中健康检查两种Probe:livenessProbe和readinessProbe,livenessProbe主要用于存活检查,检查容器内部运行状态,readiness主要用于就绪检查,是否可以接受流量,通常需要和service的endpoints结合,当就绪准备妥当时加入到endpoints中,当就绪异常时从endpoints中删除,从而实现了services的健康检查和服务探测机制。对于Probe机制提供了三种检测的方法,分别适用于不同的场景:1. exec命令行,通过命令或shell实现健康检查,2. tcpSocket通过TCP协议探测端口,建立tcp连接,3. httpGet通过建立http请求探测,读者可多实操掌握其用法。 健康检查:https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ TKE健康检查设置方法:https://cloud.tencent.com/document/product/457/32815 kubernetes入门到实战(八)Pod健康检查机制 标签:sel security tcp建立连接 def 发送 cat -o data scheduler 原文地址:https://blog.51cto.com/happylab/25038651.1 健康检查概述
1.2 exec命令行健康检查
[root@node-1 demo]# cat centos-exec-liveness-probe.yaml
apiVersion: v1
kind: Pod
metadata:
name: exec-liveness-probe
annotations:
kubernetes.io/description: "exec-liveness-probe"
spec:
containers:
- name: exec-liveness-probe
image: centos:latest
imagePullPolicy: IfNotPresent
args: #容器启动命令,生命周期为30s
- /bin/sh
- -c
- touch /tmp/liveness-probe.log && sleep 10 && rm -f /tmp/liveness-probe.log && sleep 20
livenessProbe:
exec: #健康检查机制,通过ls -l /tmp/liveness-probe.log返回码判断容器的健康状态
command:
- ls
- l
- /tmp/liveness-probe.log
initialDelaySeconds: 1
periodSeconds: 5
timeoutSeconds: 1
[root@node-1 demo]# kubectl apply -f centos-exec-liveness-probe.yaml
pod/exec-liveness-probe created
[root@node-1 demo]# kubectl describe pods exec-liveness-probe | tail
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 28s default-scheduler Successfully assigned default/exec-liveness-probe to node-3
Normal Pulled 27s kubelet, node-3 Container image "centos:latest" already present on machine
Normal Created 27s kubelet, node-3 Created container exec-liveness-probe
Normal Started 27s kubelet, node-3 Started container exec-liveness-probe
#容器已启动
Warning Unhealthy 20s (x2 over 25s) kubelet, node-3 Liveness probe failed: /tmp/liveness-probe.log
ls: cannot access l: No such file or directory #执行健康检查,检查异常
Warning Unhealthy 15s kubelet, node-3 Liveness probe failed: ls: cannot access l: No such file or directory
ls: cannot access /tmp/liveness-probe.log: No such file or directory
Normal Killing 15s kubelet, node-3 Container exec-liveness-probe failed liveness probe, will be restarted
#重启容器
[root@node-1 demo]# kubectl get pods exec-liveness-probe
NAME READY STATUS RESTARTS AGE
exec-liveness-probe 1/1 Running 6 5m19s
1.3 httpGet健康检查
[root@node-1 demo]# cat nginx-httpGet-liveness-readiness.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-httpget-livess-readiness-probe
annotations:
kubernetes.io/description: "nginx-httpGet-livess-readiness-probe"
spec:
containers:
- name: nginx-httpget-livess-readiness-probe
image: nginx:latest
ports:
- name: http-80-port
protocol: TCP
containerPort: 80
livenessProbe: #健康检查机制,通过httpGet实现实现检查
httpGet:
port: 80
scheme: HTTP
path: /index.html
initialDelaySeconds: 3
periodSeconds: 10
timeoutSeconds: 3
[root@node-1 demo]# kubectl apply -f nginx-httpGet-liveness-readiness.yaml
pod/nginx-httpget-livess-readiness-probe created
[root@node-1 demo]# kubectl get pods nginx-httpget-livess-readiness-probe
NAME READY STATUS RESTARTS AGE
nginx-httpget-livess-readiness-probe 1/1 Running 0 6s
查询pod所属的节点
[root@node-1 demo]# kubectl get pods nginx-httpget-livess-readiness-probe -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-httpget-livess-readiness-probe 1/1 Running 1 3m9s 10.244.2.19 node-3
[root@node-1 demo]# kubectl get pods nginx-httpget-livess-readiness-probe
NAME READY STATUS RESTARTS AGE
nginx-httpget-livess-readiness-probe 1/1 Running 1 4m22s
[root@node-1 demo]# kubectl describe pods nginx-httpget-livess-readiness-probe | tail
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m45s default-scheduler Successfully assigned default/nginx-httpget-livess-readiness-probe to node-3
Normal Pulling 3m29s (x2 over 5m45s) kubelet, node-3 Pulling image "nginx:latest"
Warning Unhealthy 3m29s (x3 over 3m49s) kubelet, node-3 Liveness probe failed: HTTP probe failed with statuscode: 404
Normal Killing 3m29s kubelet, node-3 Container nginx-httpget-livess-readiness-probe failed liveness probe, will be restarted
Normal Pulled 3m25s (x2 over 5m41s) kubelet, node-3 Successfully pulled image "nginx:latest"
Normal Created 3m25s (x2 over 5m40s) kubelet, node-3 Created container nginx-httpget-livess-readiness-probe
Normal Started 3m25s (x2 over 5m40s) kubelet, node-3 Started container nginx-httpget-livess-readiness-probe
1.4 tcpSocket健康检查
[root@node-1 demo]# cat nginx-tcp-liveness.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-tcp-liveness-probe
annotations:
kubernetes.io/description: "nginx-tcp-liveness-probe"
spec:
containers:
- name: nginx-tcp-liveness-probe
image: nginx:latest
ports:
- name: http-80-port
protocol: TCP
containerPort: 80
livenessProbe: #健康检查为tcpSocket,探测TCP 80端口
tcpSocket:
port: 80
initialDelaySeconds: 3
periodSeconds: 10
timeoutSeconds: 3
[root@node-1 demo]# kubectl apply -f nginx-tcp-liveness.yaml
pod/nginx-tcp-liveness-probe created
[root@node-1 demo]# kubectl get pods nginx-tcp-liveness-probe
NAME READY STATUS RESTARTS AGE
nginx-tcp-liveness-probe 1/1 Running 0 6s
获取pod所在node
[root@node-1 demo]# kubectl get pods nginx-tcp-liveness-probe -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-tcp-liveness-probe 1/1 Running 0 99s 10.244.2.20 node-3
root@nginx-httpget-livess-readiness-probe:/# kill 1
root@nginx-httpget-livess-readiness-probe:/# command terminated with exit code 137
查看pod情况
[root@node-1 demo]# kubectl get pods nginx-tcp-liveness-probe
NAME READY STATUS RESTARTS AGE
nginx-tcp-liveness-probe 1/1 Running 1 13m
[root@node-1 demo]# kubectl describe pods nginx-tcp-liveness-probe | tail
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14m default-scheduler Successfully assigned default/nginx-tcp-liveness-probe to node-3
Normal Pulling 44s (x2 over 14m) kubelet, node-3 Pulling image "nginx:latest"
Normal Pulled 40s (x2 over 14m) kubelet, node-3 Successfully pulled image "nginx:latest"
Normal Created 40s (x2 over 14m) kubelet, node-3 Created container nginx-tcp-liveness-probe
Normal Started 40s (x2 over 14m) kubelet, node-3 Started container nginx-tcp-liveness-probe
1.5 readiness健康就绪
[root@node-1 demo]# cat httpget-liveness-readiness-probe.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-tcp-liveness-probe
annotations:
kubernetes.io/description: "nginx-tcp-liveness-probe"
labels: #需要定义labels,后面定义的service需要调用
app: nginx
spec:
containers:
- name: nginx-tcp-liveness-probe
image: nginx:latest
ports:
- name: http-80-port
protocol: TCP
containerPort: 80
livenessProbe: #存活检查探针
httpGet:
port: 80
path: /index.html
scheme: HTTP
initialDelaySeconds: 3
periodSeconds: 10
timeoutSeconds: 3
readinessProbe: #就绪检查探针
httpGet:
port: 80
path: /test.html
scheme: HTTP
initialDelaySeconds: 3
periodSeconds: 10
timeoutSeconds: 3
[root@node-1 demo]# cat nginx-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app: nginx
name: nginx-service
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
selector:
app: nginx
type: ClusterIP
[root@node-1 demo]# kubectl apply -f httpget-liveness-readiness-probe.yaml
pod/nginx-tcp-liveness-probe created
[root@node-1 demo]# kubectl apply -f nginx-service.yaml
service/nginx-service created
[root@node-1 ~]# kubectl get pods nginx-httpget-livess-readiness-probe
NAME READY STATUS RESTARTS AGE
nginx-httpget-livess-readiness-probe 1/1 Running 2 153m
#readiness健康检查异常,404报错(最后一行)
[root@node-1 demo]# kubectl describe pods nginx-tcp-liveness-probe | tail
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m6s default-scheduler Successfully assigned default/nginx-tcp-liveness-probe to node-3
Normal Pulling 2m5s kubelet, node-3 Pulling image "nginx:latest"
Normal Pulled 2m1s kubelet, node-3 Successfully pulled image "nginx:latest"
Normal Created 2m1s kubelet, node-3 Created container nginx-tcp-liveness-probe
Normal Started 2m1s kubelet, node-3 Started container nginx-tcp-liveness-probe
Warning Unhealthy 2s (x12 over 112s) kubelet, node-3 Readiness probe failed: HTTP probe failed with statuscode: 404
[root@node-1 ~]# kubectl describe services nginx-service
Name: nginx-service
Namespace: default
Labels: app=nginx
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app":"nginx"},"name":"nginx-service","namespace":"default"},"s...
Selector: app=nginx
Type: ClusterIP
IP: 10.110.54.40
Port: http 80/TCP
TargetPort: 80/TCP
Endpoints:
[root@node-1 ~]# kubectl exec -it nginx-httpget-livess-readiness-probe /bin/bash
root@nginx-httpget-livess-readiness-probe:/# echo "readiness probe demo" >/usr/share/nginx/html/test.html
健康检查正常
[root@node-1 demo]# curl http://10.244.2.22/test.html
查看endpoints情况
readines[root@node-1 demo]# kubectl describe endpoints nginx-service
Name: nginx-service
Namespace: default
Labels: app=nginx
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2019-09-30T14:33:01Z
Subsets:
Addresses: 10.244.2.22 #就绪地址,已从NotReady中提出,加入到正常的Address列表中
NotReadyAddresses:
删除站点信息,使健康检查异常
[root@node-1 demo]# kubectl exec -it nginx-tcp-liveness-probe /bin/bash
root@nginx-tcp-liveness-probe:/# rm -f /usr/share/nginx/html/test.html
查看pod健康检查event日志
[root@node-1 demo]# kubectl get pods nginx-tcp-liveness-probe
NAME READY STATUS RESTARTS AGE
nginx-tcp-liveness-probe 0/1 Running 0 11m
[root@node-1 demo]# kubectl describe pods nginx-tcp-liveness-probe | tail
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 12m default-scheduler Successfully assigned default/nginx-tcp-liveness-probe to node-3
Normal Pulling 12m kubelet, node-3 Pulling image "nginx:latest"
Normal Pulled 11m kubelet, node-3 Successfully pulled image "nginx:latest"
Normal Created 11m kubelet, node-3 Created container nginx-tcp-liveness-probe
Normal Started 11m kubelet, node-3 Started container nginx-tcp-liveness-probe
Warning Unhealthy 119s (x32 over 11m) kubelet, node-3 Readiness probe failed: HTTP probe failed with statuscode: 404
查看endpoints
[root@node-1 demo]# kubectl describe endpoints nginx-service
Name: nginx-service
Namespace: default
Labels: app=nginx
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2019-09-30T14:38:01Z
Subsets:
Addresses:
写在最后
附录