以无侵方式实现Deployment原地升级

2024-05-10 1400阅读

如何以无侵方式实现Deployment原地升级?

本文将展示如何以无侵、原生的方式实现Deployment原地升级。

以无侵方式实现Deployment原地升级
(图片来源网络,侵删)

在文章末尾会提供shell脚本供大家参考。

本文的原地升级仅指镜像更新

本篇kubernetes版本为v1.27.3。

原地升级的概念以及OpenKruise的实现方式可以参考文章:从源码解析Kruise原地升级原理

kubernetes项目地址: https://github.com/kubernetes/kubernetes

controller命令main入口: cmd/kube-controller-manager/controller-manager.go

controller相关代码目录: pkg/controller

需要解决的问题

我们知道, Deployment是以管理多个RS的方式来控制升级的。 当我们修改image之后, 会同时存在两个镜像分别为"old image"和"new image"的RS,当"new image"的RS状态正常后, 另外一个RS会被回收。 在这期间,pod也同时完成新建的操作。

~|⇒ kubectl get deployment,rs,pod
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nginx   1/1     1            1           98s
NAME                               DESIRED   CURRENT   READY   AGE
replicaset.apps/nginx-54b596f5bf   1         1         1       98s
NAME                         READY   STATUS    RESTARTS   AGE
pod/nginx-54b596f5bf-cw9n8   1/1     Running   0          98s
~|⇒ kubectl edit deployments.apps nginx # 修改image
deployment.apps/nginx edited
~|⇒ kubectl get deployment,rs,pod # 出现两个rs, 两个pod
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nginx   1/1     1            1           3m12s
NAME                               DESIRED   CURRENT   READY   AGE
replicaset.apps/nginx-54b596f5bf   1         1         1       3m12s
replicaset.apps/nginx-564768b864   1         1         0       2s
NAME                         READY   STATUS              RESTARTS   AGE
pod/nginx-54b596f5bf-cw9n8   1/1     Running             0          3m12s
pod/nginx-564768b864-vzqfp   0/1     ContainerCreating   0          2s
~|⇒ kubectl describe deployments.apps nginx
Name:                   nginx
Namespace:              default
CreationTimestamp:      Mon, 04 Mar 2024 11:44:49 +0800
Labels:                 app=nginx
Annotations:            deployment.kubernetes.io/revision: 2
Selector:               name=nginx
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  name=nginx
  Containers:
   nginx:
    Image:        nginx:1.25.4
    Port:         80/TCP
    Host Port:    0/TCP
    Environment:  
    Mounts:       
  Volumes:        
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable 
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  nginx-54b596f5bf (0/0 replicas created) # 标记出新旧的rs
NewReplicaSet:   nginx-564768b864 (1/1 replicas created)

如果想要实现原地升级, 需要解决以下问题:

  • 修改Deployment的image字段后, 阻止资源的的重建
  • 不重建pod的前提下,更新pod中的容器
  • 保证Deployment和ReplicaSet以及Pod的相关信息一致,状态正常

    解决方案

    更新容器

    先说更新容器的问题。

    在 从源码解析Kruise原地升级原理 这篇文章中有提, 修改Pod中容器的镜像,Pod是不会重建的,本身是具有原地升级的能力。

    相关信息一致

    这个也好解决, 把更新内容同时更新到Deployment和ReplicaSet以及Pod中即可。

    阻止资源的的重建

    阻止资源的的重建才是这个问题的关键。

    我们可以通过修改代码的运行逻辑,或者一些hack(如用webhook)的手段来做这件事情,但这不够优雅或者入侵了k8s的原生逻辑。

    有一个命令可以满足我们的需求 – rollout pause

    csi-driver-nfs|master ⇒ kubectl rollout pause --help
    Mark the provided resource as paused.
     Paused resources will not be reconciled by a controller. Use "kubectl rollout resume" to resume a paused resource.
    Currently only deployments support being paused.
    

    这个命令可以暂停Deployment,被暂停的资源不会被Controller控制,这正好满足我们的需求。

    pause功能(源码)分析

    Deployment控制器的源码解析,可以看文章 《Deployment控制器源码解析》

    源码位置 pkg/controller/deployment

    Deployment处理最终会由DeploymentController.syncDeployment方法处理, 方法中会对Pause状态判断并处理

    func (dc *DeploymentController) syncDeployment(ctx context.Context, key string) error {
        //...
        if d.Spec.Paused {
    		return dc.sync(ctx, d, rsList)
    	}
        //...
    }
    func (dc *DeploymentController) sync(ctx context.Context, d *apps.Deployment, rsList []*apps.ReplicaSet) error {
        // 负责更新rs, 我们只看这里
    	newRS, oldRSs, err := dc.getAllReplicaSetsAndSyncRevision(ctx, d, rsList, false)
        // ...
    }
    // 最终会由这个函数处理
    func (dc *DeploymentController) getNewReplicaSet(ctx context.Context, d *apps.Deployment, rsList, oldRSs []*apps.ReplicaSet, createIfNotExisted bool) (*apps.ReplicaSet, error) {
    	logger := klog.FromContext(ctx)
        // 通过对比deployment中的pod template hash和rs中的pod template hash来判断是否有新的rs存在
        // 即不需要更新的rs
    	existingNewRS := deploymentutil.FindNewReplicaSet(d, rsList)
    	// 存在的最大版本
    	maxOldRevision := deploymentutil.MaxRevision(oldRSs)
    	// 新版本 
    	newRevision := strconv.FormatInt(maxOldRevision+1, 10)
        // 注意看这里, 如果存在新的rs, 会更新同步rs与deployment中关联的信息, 使其保持一致
        // 这里包含:
        // deploy.annotations -> rs.annotations
        // rs.revision -> deploy.revision
    	if existingNewRS != nil {
    		rsCopy := existingNewRS.DeepCopy()
            // 同步tAnnotation
    		annotationsUpdated := deploymentutil.SetNewReplicaSetAnnotations(ctx, d, rsCopy, newRevision, true, maxRevHistoryLengthInChars)
    		minReadySecondsNeedsUpdate := rsCopy.Spec.MinReadySeconds != d.Spec.MinReadySeconds
    		if annotationsUpdated || minReadySecondsNeedsUpdate {
    			rsCopy.Spec.MinReadySeconds = d.Spec.MinReadySeconds
    			return dc.client.AppsV1().ReplicaSets(rsCopy.ObjectMeta.Namespace).Update(ctx, rsCopy, metav1.UpdateOptions{})
    		}
    		// 同步revision
    		needsUpdate := deploymentutil.SetDeploymentRevision(d, rsCopy.Annotations[deploymentutil.RevisionAnnotation])
            // 更新进度
    		cond := deploymentutil.GetDeploymentCondition(d.Status, apps.DeploymentProgressing)
    		if deploymentutil.HasProgressDeadline(d) && cond == nil {
    			msg := fmt.Sprintf("Found new replica set %q", rsCopy.Name)
    			condition := deploymentutil.NewDeploymentCondition(apps.DeploymentProgressing, v1.ConditionTrue, deploymentutil.FoundNewRSReason, msg)
    			deploymentutil.SetDeploymentCondition(&d.Status, *condition)
    			needsUpdate = true
    		}
    		if needsUpdate {
    			var err error
    			if _, err = dc.client.AppsV1().Deployments(d.Namespace).UpdateStatus(ctx, d, metav1.UpdateOptions{}); err != nil {
    				return nil, err
    			}
    		}
    		return rsCopy, nil
    	}
        // sync调用时 createIfNotExisted = false
        // 所以到这里就结束了, 下面的函数省略....
    	if !createIfNotExisted {
    		return nil, nil
    	}
    	// ...
    }
    

    从上述代码我们可以确定我们的操作顺序及方法:

    1. kubectl rollout pause deployment xxx 暂停Deployment
    2. 修改pod中的image字段
    3. 修改rs中的image字段
    4. 修改Deployment中的image字段
    5. kubectl rollout resume deployment xxx 恢复Deployment

    从pod开始修改,ownerReference资源的更新动作触发时, 检查"pod template"会始终与被控资源保持一致, 以此跳过资源的重建。

    实践

    1. 获取当前资源信息
    csi-driver-nfs|master ⇒ kubectl get deployment,rs,pod
    NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/nginx   1/1     1            1           5h18m
    NAME                               DESIRED   CURRENT   READY   AGE
    replicaset.apps/nginx-54b596f5bf   0         0         0       5h18m
    replicaset.apps/nginx-564768b864   1         1         1       5h15m
    NAME                         READY   STATUS    RESTARTS   AGE
    pod/nginx-564768b864-vzqfp   1/1     Running   0          5h15m
    
    1. 暂停deployemnt
    csi-driver-nfs|master ⇒ kubectl rollout pause deployment nginx
    csi-driver-nfs|master ⇒ kubectl describe deployments.apps nginx
    Name:                   nginx
    Namespace:              default
    CreationTimestamp:      Mon, 04 Mar 2024 11:44:49 +0800
    Labels:                 app=nginx
    Annotations:            deployment.kubernetes.io/revision: 2
    Selector:               name=nginx
    Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
    StrategyType:           RollingUpdate
    MinReadySeconds:        0
    RollingUpdateStrategy:  25% max unavailable, 25% max surge
    Pod Template:
      Labels:  name=nginx
      Containers:
       nginx:
        Image:        nginx:1.25.4
        Port:         80/TCP
        Host Port:    0/TCP
        Environment:  
        Mounts:       
      Volumes:        
    Conditions:
      Type           Status   Reason
      ----           ------   ------
      Available      True     MinimumReplicasAvailable
      Progressing    Unknown  DeploymentPaused # 标记出deployment被暂停
    OldReplicaSets:  nginx-54b596f5bf (0/0 replicas created)
    NewReplicaSet:   nginx-564768b864 (1/1 replicas created)
    Events:          
    
    1. 修改pod中的image字段, nginx:1.25.4 --> nginx:1.25

    修改完成后pod没有被重建, restrts+1 , revision+1

    csi-driver-nfs|master ⇒ kubectl get deployment nginx -o jsonpath="{.spec.template.spec.containers[0]}"
    {"image":"nginx:1.25.4","imagePullPolicy":"Always","name":"nginx","ports":[{"containerPort":80,"protocol":"TCP"}],"resources":{},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File"}
    csi-driver-nfs|master ⇒ kubectl edit pod nginx-564768b864-vzqfp
    pod/nginx-564768b864-vzqfp edited
    csi-driver-nfs|master ⇒ kubectl get pod
    NAME                     READY   STATUS    RESTARTS     AGE
    nginx-564768b864-vzqfp   1/1     Running   1 (6s ago)   5h20m 
    csi-driver-nfs|master ⇒ kubectl get pod nginx-564768b864-vzqfp -o jsonpath='{.spec.containers[0]}'
    {"image":"nginx:1.25","imagePullPolicy":"Always","name":"nginx","ports":[{"containerPort":80,"protocol":"TCP"}],"resources":{},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","volumeMounts":[{"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount","name":"kube-api-access-qwfr5","readOnly":true}]}%
    
    1. 修改rs中的image字段, pod无变化
    csi-driver-nfs|master ⇒ kubectl edit rs nginx-564768b864
    replicaset.apps/nginx-564768b864 edited
    csi-driver-nfs|master ⇒ kubectl get rs
    NAME               DESIRED   CURRENT   READY   AGE
    nginx-54b596f5bf   0         0         0       5h28m # 这个旧版本是之前修改非本次实验内容留存的, 不用管
    nginx-564768b864   1         1         1       5h25m # 注意看我们后续的操作会不会使这个rs被回收
    csi-driver-nfs|master ⇒ kubectl get pod
    NAME                     READY   STATUS    RESTARTS        AGE
    nginx-564768b864-vzqfp   1/1     Running   1 (4m56s ago)   5h25m
    
    1. 修改Deployment中的image字段, pod无变化
    csi-driver-nfs|master ⇒ kubectl edit deployments.apps nginx
    deployment.apps/nginx edited
    csi-driver-nfs|master ⇒ kubectl get rs
    NAME               DESIRED   CURRENT   READY   AGE
    nginx-54b596f5bf   0         0         0       5h30m
    nginx-564768b864   1         1         1       5h27m
    csi-driver-nfs|master ⇒ kubectl get pod
    NAME                     READY   STATUS    RESTARTS       AGE
    nginx-564768b864-vzqfp   1/1     Running   1 (7m4s ago)   5h27m
    
    1. 记录当前资源信息
    • Deployment状态为 DeploymentPaused,
      • OldReplicaSets: nginx-54b596f5bf (0/0 replicas created)
      • NewReplicaSet: nginx-564768b864 (1/1 replicas created)
      • Deployment revision版本: deployment.kubernetes.io/revision: “2”
      • Deployment resource版本: resourceVersion: “159028”
      • RS revision版本:deployment.kubernetes.io/revision: “2”
      • RS resource版本: resourceVersion: “158921”
        1. 恢复Deployment
        csi-driver-nfs|master ⇒ kubectl rollout resume deployment nginx
        
        1. 查看资源信息
        • Deployment状态为 NewReplicaSetAvailable , rs状态与上文一致
        • Deployment revision版本 与上文一致
        • Deployment resource版本 变更 (因为状态变化)
        • RS信息均无变换
          1. 确认原地升级完成

          原地升级脚本

          脚本代码访问https://github.com/Forget-C/demo/tree/main/inplaceupdate/scripts

          使用方法

          脚本接收4个参数:

          • Deployment名称
          • Deployment的namespace
          • Deployment的container名称
          • Deployment的container的镜像

            4个参数缺一不可, 且顺序不能错。

            scripts|main⚡ ⇒ bash inplaceupdate.sh help                            
            Usage: inplaceupdate.sh    
            

            脚本执行后,会修改pod、rs、deployment的镜像, 但不会删除pod, pod的属性也不会变更。

            检查原地升级是否成功的方法为查看

            • pod的镜像是否变更
            • pod restart次数+1

              执行

              scripts|main⚡ ⇒ bash  inplaceupdate.sh nginx default nginx nginx:1.25  
              deployment.apps/nginx paused
              Pod nginx-54b596f5bf-qwgkl updated
              Replicaset nginx-54b596f5bf updated
              Deployment nginx updated
              deployment.apps/nginx resumed
              Deployment nginx change to nginx:1.25 completed successfully
              Waiting for pods to be ready...
              Pod nginx-54b596f5bf-qwgkl is ready
              All pods are ready
              
VPS购买请点击我

免责声明:我们致力于保护作者版权,注重分享,被刊用文章因无法核实真实出处,未能及时与作者取得联系,或有版权异议的,请联系管理员,我们会立即处理! 部分文章是来自自研大数据AI进行生成,内容摘自(百度百科,百度知道,头条百科,中国民法典,刑法,牛津词典,新华词典,汉语词典,国家院校,科普平台)等数据,内容仅供学习参考,不准确地方联系删除处理! 图片声明:本站部分配图来自人工智能系统AI生成,觅知网授权图片,PxHere摄影无版权图库和百度,360,搜狗等多加搜索引擎自动关键词搜索配图,如有侵权的图片,请第一时间联系我们,邮箱:ciyunidc@ciyunshuju.com。本站只作为美观性配图使用,无任何非法侵犯第三方意图,一切解释权归图片著作权方,本站不承担任何责任。如有恶意碰瓷者,必当奉陪到底严惩不贷!

目录[+]