一、知识准备
● 本文详细探索deployment在滚动更新时候的行为
● 相关的参数介绍: livenessProbe:存活性探测。判断pod是否已经停止 readinessProbe:就绪性探测。判断pod是否能够提供正常服务 maxSurge:在滚动更新过程中最多可以存在的pod数 maxUnavailable:在滚动更新过程中最多不可用的pod数二、环境准备
组件 | 版本 |
---|---|
OS | Ubuntu 18.04.1 LTS |
docker | 18.06.0-ce |
三、准备镜像、yaml文件
首先准备2个不同版本的镜像,用于测试(已经在阿里云上创建好2个不同版本的nginx镜像)
docker pull registry.cn-beijing.aliyuncs.com/mrvolleyball/nginx:v1docker pull registry.cn-beijing.aliyuncs.com/mrvolleyball/nginx:delay_v1
2个镜像都提供相同的服务,只不过nginx:delay_v1
会延迟启动20才启动nginx
root@k8s-master:~# docker run -d --rm -p 10080:80 nginx:v1e88097841c5feef92e4285a2448b943934ade5d86412946bc8d86e262f80a050root@k8s-master:~# curl http://127.0.0.1:10080----------version: v1hostname: f5189a5d3ad3
yaml文件:
root@k8s-master:~# more roll_update.yamlapiVersion: extensions/v1beta1kind: Deploymentmetadata: name: update-deploymentspec: replicas: 3 template: metadata: labels: app: roll-update spec: containers: - name: nginx image: registry.cn-beijing.aliyuncs.com/mrvolleyball/nginx:v1 imagePullPolicy: Always---apiVersion: v1kind: Servicemetadata: name: nginx-servicespec: selector: app: roll-update ports: - protocol: TCP port: 10080 targetPort: 80
四、livenessProbe与readinessProbe
livenessProbe:存活性探测,最主要是用来探测pod是否需要重启
readinessProbe:就绪性探测,用来探测pod是否已经能够提供服务● 在滚动更新的过程中,pod会动态的被delete,然后又被create出来。存活性探测保证了始终有足够的pod存活提供服务,一旦出现pod数量不足,k8s会立即拉起新的pod
● 但是在pod启动的过程中,服务正在打开,并不可用,这时候如果有流量打过来,就会造成报错下面来模拟一下这个场景:
首先apply上述的配置文件
root@k8s-master:~# kubectl apply -f roll_update.yamldeployment.extensions "update-deployment" createdservice "nginx-service" createdroot@k8s-master:~# kubectl get pod -owideNAME READY STATUS RESTARTS AGE IP NODEupdate-deployment-7db77f7cc6-c4s2v 1/1 Running 0 28s 10.10.235.232 k8s-masterupdate-deployment-7db77f7cc6-nfgtd 1/1 Running 0 28s 10.10.36.82 k8s-node1update-deployment-7db77f7cc6-tflfl 1/1 Running 0 28s 10.10.169.158 k8s-node2root@k8s-master:~# kubectl get svcNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEnginx-service ClusterIP 10.254.254.19910080/TCP 1m
重新打开终端,测试当前服务的可用性(每秒做一次循环去获取nginx的服务内容):
root@k8s-master:~# while :; do curl http://10.254.254.199:10080; sleep 1; done----------version: v1hostname: update-deployment-7db77f7cc6-nfgtd----------version: v1hostname: update-deployment-7db77f7cc6-c4s2v----------version: v1hostname: update-deployment-7db77f7cc6-tflfl----------version: v1hostname: update-deployment-7db77f7cc6-nfgtd...
这时候把镜像版本更新到nginx:delay_v1,这个镜像会延迟启动nginx,也就是说,会先sleep 20s,然后才去启动nginx服务。这就模拟了在服务启动过程中,虽然pod已经是存在的状态,但是并没有真正提供服务
root@k8s-master:~# kubectl patch deployment update-deployment --patch '{"metadata":{"annotations":{"kubernetes.io/change-cause":"update version to v2"}} ,"spec": {"template": {"spec": {"containers": [{"name": "nginx","image":"registry.cn-beijing.aliyuncs.com/mrvolleyball/nginx:delay_v1"}]}}}}'deployment.extensions "update-deployment" patched
...----------version: v1hostname: update-deployment-7db77f7cc6-h6hvtcurl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refusedcurl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refusedcurl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refusedcurl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refusedcurl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refusedcurl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refusedcurl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refusedcurl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refusedcurl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refusedcurl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refusedcurl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refusedcurl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused----------version: delay_v1hostname: update-deployment-d788c7dc6-6th87----------version: delay_v1hostname: update-deployment-d788c7dc6-n22vz----------version: delay_v1hostname: update-deployment-d788c7dc6-njmpz----------version: delay_v1hostname: update-deployment-d788c7dc6-6th87
可以看到,由于延迟启动,nginx并没有真正做好准备提供服务,此时流量已经发到后端,导致服务不可用的状态
所以,加入readinessProbe是非常必要的手段:
apiVersion: extensions/v1beta1kind: Deploymentmetadata: name: update-deploymentspec: replicas: 3 template: metadata: labels: app: roll-update spec: containers: - name: nginx image: registry.cn-beijing.aliyuncs.com/mrvolleyball/nginx:v1 imagePullPolicy: Always readinessProbe: tcpSocket: port: 80 initialDelaySeconds: 5 periodSeconds: 10---apiVersion: v1kind: Servicemetadata: name: nginx-servicespec: selector: app: roll-update ports: - protocol: TCP port: 10080 targetPort: 80
重复上述步骤,先创建nginx:v1
,然后patch到nginx:delay_v1
root@k8s-master:~# kubectl apply -f roll_update.yamldeployment.extensions "update-deployment" createdservice "nginx-service" createdroot@k8s-master:~# kubectl patch deployment update-deployment --patch '{"metadata":{"annotations":{"kubernetes.io/change-cause":"update version to v2"}} ,"spec": {"template": {"spec": {"containers": [{"name": "nginx","image":"registry.cn-beijing.aliyuncs.com/mrvolleyball/nginx:delay_v1"}]}}}}'deployment.extensions "update-deployment" patched
root@k8s-master:~# kubectl get pod -owideNAME READY STATUS RESTARTS AGE IP NODEbusybox 1/1 Running 0 45d 10.10.235.255 k8s-masterlifecycle-demo 1/1 Running 0 32d 10.10.169.186 k8s-node2private-reg 1/1 Running 0 92d 10.10.235.209 k8s-masterupdate-deployment-54d497b7dc-4mlqc 0/1 Running 0 13s 10.10.169.178 k8s-node2update-deployment-54d497b7dc-pk4tb 0/1 Running 0 13s 10.10.36.98 k8s-node1update-deployment-6d5d7c9947-l7dkb 1/1 Terminating 0 1m 10.10.169.177 k8s-node2update-deployment-6d5d7c9947-pbzmf 1/1 Running 0 1m 10.10.36.97 k8s-node1update-deployment-6d5d7c9947-zwt4z 1/1 Running 0 1m 10.10.235.246 k8s-master
● 由于设置了readinessProbe,虽然pod已经启动起来了,但是并不会立即投入使用,所以出现了 READY: 0/1
的情况
Terminating
状态,因为滚动更新的限制,至少要保证有pod可用 再查看curl的状态,image的版本平滑更新到了nginx:delay_v1
,没有出现报错的状况
root@k8s-master:~# while :; do curl http://10.254.66.136:10080; sleep 1; done...version: v1hostname: update-deployment-6d5d7c9947-pbzmf----------version: v1hostname: update-deployment-6d5d7c9947-zwt4z----------version: v1hostname: update-deployment-6d5d7c9947-pbzmf----------version: v1hostname: update-deployment-6d5d7c9947-zwt4z----------version: delay_v1hostname: update-deployment-54d497b7dc-pk4tb----------version: delay_v1hostname: update-deployment-54d497b7dc-4mlqc----------version: delay_v1hostname: update-deployment-54d497b7dc-pk4tb----------version: delay_v1hostname: update-deployment-54d497b7dc-4mlqc...
五、maxSurge与maxUnavailable
● 在滚动更新中,有几种更新方案:先删除老的pod,然后添加新的pod;先添加新的pod,然后删除老的pod。在这个过程中,服务必须是可用的(也就是livenessProbe与readiness必须检测通过)
● 在具体的实施中,由maxSurge与maxUnavailable来控制究竟是先删老的还是先加新的以及粒度 ● 若指定的副本数为3: maxSurge=1 maxUnavailable=0:最多允许存在4个(3+1)pod,必须有3个pod(3-0)同时提供服务。先创建一个新的pod,可用之后删除老的pod,直至全部更新完毕 maxSurge=0 maxUnavailable=1:最多允许存在3个(3+0)pod,必须有2个pod(3-1)同时提供服务。先删除一个老的pod,然后创建新的pod,直至全部更新完毕 ● 归根结底,必须满足maxSurge与maxUnavailable的条件,如果maxSurge与maxUnavailable同时为0,那就没法更新了,因为又不让删除,也不让添加,这种条件是无法满足的六、小结
● 本文介绍了deployment滚动更新过程中,maxSurge、maxUnavailable、liveness、readiness等参数的使用
● 在滚动更新过程中,还有留有一个问题。比如在一个大型的系统中,某个业务的pod数很多(100个),执行一次滚动更新时,势必会造成pod版本不一致(有些pod是老版本,有些pod是新版本),用户访问很有可能会造成多次结果不一致的现象,直至版本更新完毕。关于这个问题待之后慢慢讨论至此,本文结束
在下才疏学浅,有撒汤漏水的,请各位不吝赐教...