Pod调度(定向调度NodeName/NodeSelector亲和性调度NodeAffinity/PodAffinity/PodAntiAffinity污点Taints和容忍调度Toleration)

2023-11-07

一、pod调度概述

在默认情况下,一个Pod在哪个Node节点上运行,是由Scheduler组件采用相应的算法计算出来的,这个过程是不受人工控制的。但是在实际使用中,这并不满足的需求,因为很多情况下,我们想控制某些Pod到达某些节点上,那么应该怎么做呢?这就要求了解kubernetes对Pod的调度规则。

kubernetes提供了四大类调度方式:

  • 自动调度:运行在哪个节点上完全由Scheduler经过一系列的算法计算得出
  • 定向调度:NodeName、NodeSelector
  • 亲和性调度:NodeAffinity、PodAffinity、PodAntiAffinity
  • 污点(容忍)调度:Taints、Toleration

二、定向调度

定向调度,指的是利用在pod上声明 nodeName 或者 nodeSelector ,以此将Pod调度到期望的node节点上。

注意,这里的调度是强制的,这就意味着即使要调度的目标Node不存在,也会向上面进行调度,只不过pod运行失败而已。

(一)nodeName

NodeName用于强制约束将Pod调度到指定的Name的Node节点上。

这种方式,其实是直接跳过Scheduler的调度逻辑,直接将Pod调度到指定名称的节点。

[root@k8s-master ~]# vim pod-nodename.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-nodename
  namespace: test
spec:
  containers:
  - name: nginx
    image: nginx:1.17.1
    imagePullPolicy: IfNotPresent
    ports:
    - name: nginx-port
      containerPort: 801
  nodeName: k8s-node01  # 指定将该pod调度到k8s-node01节点上。nodeName键所指定的值必须是 kubectl get node 命令所查出来的,nodeName是固定写法

[root@k8s-master ~]# kubectl get node
NAME         STATUS   ROLES    AGE     VERSION
k8s-master   Ready    master   5d20h   v1.17.4
k8s-node01   Ready    <none>   5d20h   v1.17.4
k8s-node02   Ready    <none>   5d20h   v1.17.4

[root@k8s-master ~]# kubectl create -f pod-nodename.yaml 
pod/pod-nodename created

[root@k8s-master ~]# kubectl get pod pod-nodename -n test -o wide
NAME           READY   STATUS    RESTARTS   AGE   IP            NODE         NOMINATED NODE   READINESS GATES
pod-nodename   1/1     Running   0          17s   10.244.1.36   k8s-node01   <none>           <none>

[root@k8s-master ~]# kubectl delete -f pod-nodename.yaml 
pod "pod-nodename" deleted

# 指定将pod运行在一个不属于集群中的某节点上
[root@k8s-master ~]# vim pod-nodename.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: pod-nodename
  namespace: test
spec:
  containers:
  - name: nginx
    image: nginx:1.17.1
    imagePullPolicy: IfNotPresent
    ports:
    - name: nginx-port
      containerPort: 80
  nodeName: node100

[root@k8s-master ~]# kubectl create -f pod-nodename.yaml 
pod/pod-nodename created

[root@k8s-master ~]# kubectl get pod pod-nodename -n test -o wide -w
NAME           READY   STATUS    RESTARTS   AGE   IP       NODE      NOMINATED NODE   READINESS GATES
pod-nodename   0/1     Pending   0          29s   <none>   node100   <none>           <none>
pod-nodename   0/1     Terminating   0          56s   <none>   node100   <none>           <none>
pod-nodename   0/1     Terminating   0          56s   <none>   node100   <none>           <none>
# 因为没有该节点,所以无法调度到,过一会就会自动删除该pod

(二)nodeSelector

NodeSelector用于将pod调度到添加了指定标签的node节点上。

它是通过kubernetes的label-selector机制实现的,也就是说,在pod创建之前,会由sceduler使用MatchNodeSelector调度策略进行label匹配,找出目标node,然后将pod调度到目标节点,该匹配规则是强制约束,没有则失败。

首先为node节点添加标签:

[root@k8s-master ~]# kubectl label node k8s-node01 nodeenv=pro
node/k8s-node01 labeled
[root@k8s-master ~]# kubectl label node k8s-node02 nodeenv=test
node/k8s-node02 labeled

# 查看
[root@k8s-master ~]# kubectl get node --show-labels
NAME         STATUS   ROLES    AGE     VERSION   LABELS
k8s-master   Ready    master   5d20h   v1.17.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master,kubernetes.io/os=linux,node-role.kubernetes.io/master=
k8s-node01   Ready    <none>   5d20h   v1.17.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node01,kubernetes.io/os=linux,nodeenv=pro
k8s-node02   Ready    <none>   5d20h   v1.17.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node02,kubernetes.io/os=linux,nodeenv=test

创建使用nodeSelector调度pod的配置文件:

[root@k8s-master ~]# vim pod-nodeselector.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-nodeselector
  namespace: test
spec:
  containers:
  - name: nginx
    image: nginx:1.17.1
    imagePullPolicy: IfNotPresent
  nodeSelector:
    nodeenv: pro   # 指定调度到具有 nodeenv=pro 标签的节点上。键值对必须存在,不存在调度会失败

# 创建
[root@k8s-master ~]# kubectl create -f pod-nodeselector.yaml
pod/pod-nodeselector created

# 调度到指定节点上
[root@k8s-master ~]# kubectl get pod pod-nodeselector -n test -o wide
NAME               READY   STATUS    RESTARTS   AGE   IP            NODE         NOMINATED NODE   READINESS GATES
pod-nodeselector   1/1     Running   0          16s   10.244.1.37   k8s-node01   <none>           <none>

# 删除
[root@k8s-master ~]# kubectl delete -f pod-nodeselector.yaml 
pod "pod-nodeselector" deleted

# 指定pod调度带到在不存在的节点标签上
[root@k8s-master ~]# vim pod-nodeselector.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: pod-nodeselector
  namespace: test
spec:
  containers:
  - name: nginx
    image: nginx:1.17.1
    imagePullPolicy: IfNotPresent
  nodeSelector:
    nodeenv: pro100   # 不存在的节点标签

[root@k8s-master ~]# kubectl create -f pod-nodeselector.yaml
pod/pod-nodeselector created

# 找不带此node节点,调度失败
[root@k8s-master ~]# kubectl get pod pod-nodeselector -n test -o wide -w
NAME               READY   STATUS    RESTARTS   AGE    IP       NODE     NOMINATED NODE   READINESS GATES
pod-nodeselector   0/1     Pending   0          2m3s   <none>   <none>   <none>           <none>

[root@k8s-master ~]# kubectl describe pod pod-nodeselector -n test
……省略……
Events:
  Type     Reason            Age        From               Message
  ----     ------            ----       ----               -------
  Warning  FailedScheduling  <unknown>  default-scheduler  0/3 nodes are available: 3 node(s) didn't match node selector.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/3 nodes are available: 3 node(s) didn't match node selector.  # 在3个节点(1个master,2个node)上无法找到具有该标签的节点

# 虽然调度失败,但不会自动删除该pod,需手动删除
[root@k8s-master ~]# kubectl delete -f pod-nodeselector.yaml 
pod "pod-nodeselector" deleted

# 删除主机标签
[root@k8s-master ~]# kubectl label node k8s-node01 nodeenv-
node/k8s-node01 labeled
[root@k8s-master ~]# kubectl label node k8s-node02 nodeenv-
node/k8s-node02 labeled

三、亲和性调度

上面基于 nodeName 和 nodeSelector 的两种定向调度的方式,使用比较方便,但是存在一定问题,那就是如果没有满足条件的node,那么pod将不会被调度和运行,即使在集群中还有可用node列表也不行,这就限制了它的使用场景。

基于此问题,kubernetes提供了一种亲和性调度(Affinity)。它在NodeSelector的基础之上的进行了扩展,可以通过配置的形式,实现优先选择满足条件的Node进行调度,如果没有,也可以调度到不满足条件的节点上,使调度更加灵活。

Affinity主要分为三类:

  • nodeAffinity(node亲和性) ∶以node为目标,解决pod可以调度到哪些node的问题
  • podAffinitypod亲和性):以pod为目标,解决pod可以和哪些已存在的pod部署在同一个拓扑域中的问题
  • podAntiAffinity(pod反亲和性):以pod为目标,解决pod不能和哪些已存在pod部署在同一个拓扑域中的问题

关于亲和性(反亲和性)使用场景的说明:

亲和性:如果两个应用频繁交互,那就有必要利用亲和性让两个应用的尽可能的靠近,这样可以减少因网络通信而带来的性能损耗。


反亲和性:当应用的采用多副本部署时,有必要采用反亲和性让各个应用实例打散分布在各个node上,这样可以提高服务的高可用性。

(一)NodeAffinity

以node为目标,解决pod可以调度到哪些node的问题

NodeAffinity的可配置项

[root@k8s-master ~]# kubectl explain pod.spec.affinity.nodeAffinity
KIND:     Pod
VERSION:  v1
RESOURCE: nodeAffinity <Object>

FIELDS:
   requiredDuringSchedulingIgnoredDuringExecution:   # Node节点必须满足指定的所有规则才可以,相当于硬限制
     nodeSelectorTerms:    # 节点选择列表
#      matchFields    	   # 按节点字段列出的节点选择器要求列表
     - matchExpressions:   # 按节点标签列出的节点选择器要求列表(推荐)
       - key:	      # 键
         values:      # 值
         operator:    # 关系符,支持Exists、DoesNotExist、In、NotIn、Gt(大于)、Lt(小于)
    
   preferredDuringSchedulingIgnoredDuringExecution:     # 优先调度到满足指定的规则的node,相当于软限制(倾向)
   - preference:       # 一个节点选择器项,与相应的权重相关联
#      matchFilelds    # 按节点字段列出的节点选择器要求列表
       matchExpressions:    按节点标签列出的节点选择器要求列表(推荐)
       - key:	       # 键
         values:       # 值
         operator:     # 关系符,支持Exists、DoesNotExist、In、NotIn、Gt、Lt
     weight:    	   # 倾向权重,范围 1-100    

关系符的说明:

- matchExpressions:
  - key: nodeenv
    operator: Exists	# 匹配存在标签的key为nodeenv的节点
  - key: nodeenv
    operator: In
    values: ["xxx","yyy"]	# 匹配标签的key为nodeenv,且value是"xxx"或"yyy"的节
  - key: nodeenv
    operator: Gt
    values: "xxx"	# 匹配标签的key为nodeenv,且value大于"xxx"的节点

requiredDuringSchedulingIgnoredDuringExecution(硬限制)实例

# 查看节点标签
[root@k8s-master ~]# kubectl get node --show-labels
NAME         STATUS   ROLES    AGE     VERSION   LABELS
k8s-master   Ready    master   6d19h   v1.17.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master,kubernetes.io/os=linux,node-role.kubernetes.io/master=
k8s-node01   Ready    <none>   6d18h   v1.17.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node01,kubernetes.io/os=linux
k8s-node02   Ready    <none>   6d18h   v1.17.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node02,kubernetes.io/os=linux

# 给node01和node02节点打上 nodeenv 的标签
[root@k8s-master ~]# kubectl label node k8s-node01 nodeenv=pro
node/k8s-node01 labeled
[root@k8s-master ~]# kubectl label node k8s-node02 nodeenv=test
node/k8s-node02 labeled

# 标记成功
[root@k8s-master ~]# kubectl get node --show-labels
NAME         STATUS   ROLES    AGE     VERSION   LABELS
k8s-master   Ready    master   6d19h   v1.17.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master,kubernetes.io/os=linux,node-role.kubernetes.io/master=
k8s-node01   Ready    <none>   6d18h   v1.17.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node01,kubernetes.io/os=linux,nodeenv=pro
k8s-node02   Ready    <none>   6d18h   v1.17.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node02,kubernetes.io/os=linux,nodeenv=test

# 编写配置文件
[root@k8s-master ~]# vim pod-nodeaffinity-required.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: pod-nodeaffinity-required
  namespace: test
spec:
  containers: 
  - name: nginx
    image: nginx:1.17.1
    imagePullPolicy: IfNotPresent
  affinity:    # 亲和性设置
    nodeAffinity:     # 设置node亲和性
      requiredDuringSchedulingIgnoredDuringExecution:     # 硬限制
        nodeSelectorTerms:      
        - matchExpressions:     # 匹配节点主机标签的键为nodeenv,值为"xxx"或"yyy"的主机
          - key: nodeenv
            operator: In
            values: ["xxx","yyy"]

# 创建
[root@k8s-master ~]# kubectl create -f pod-nodeaffinity-required.yaml 
pod/pod-nodeaffinity-required created

# 查看,创建失败
[root@k8s-master ~]# kubectl get pod pod-nodeaffinity-required -n test
NAME                        READY   STATUS    RESTARTS   AGE
pod-nodeaffinity-required   0/1     Pending   0          77s

# 查看详情
[root@k8s-master ~]# kubectl describe pod pod-nodeaffinity-required -n test
……省略……
Events:
  Type     Reason            Age        From               Message
  ----     ------            ----       ----               -------
  Warning  FailedScheduling  <unknown>  default-scheduler  0/3 nodes are available: 3 node(s) didn't match node selector.
# 三个主机均不满足匹配条件,调度失败

# 删除
[root@k8s-master ~]# kubectl delete -f pod-nodeaffinity-required.yaml 
pod "pod-nodeaffinity-required" deleted

# 重新修改配置文件
[root@k8s-master ~]# vim pod-nodeaffinity-required.yaml 
apiVersion: v1
kind: Pod
metadata: 
  name: pod-nodeaffinity-required
  namespace: test
spec:
  containers:
  - name: nginx
    image: nginx:1.17.1
    imagePullPolicy: IfNotPresent
  affinity:
    nodeAffinity: 
      requiredDuringSchedulingIgnoredDuringExecution: 
        nodeSelectorTerms:
        - matchExpressions:
          - key: nodeenv
            operator: In
            values: ["pro","yyy"]     # 匹配节点主机标签的键为nodeenv,值为"pro"或"yyy"的主机

# 创建
[root@k8s-master ~]# kubectl create -f pod-nodeaffinity-required.yaml
pod/pod-nodeaffinity-required created

# 查看,运行成功且调度在k8s-node01节点上
[root@k8s-master ~]# kubectl get pod pod-nodeaffinity-required -n test -o wide
NAME                        READY   STATUS    RESTARTS   AGE   IP            NODE         NOMINATED NODE   READINESS GATES
pod-nodeaffinity-required   1/1     Running   0          35s   10.244.1.38   k8s-node01   <none>           <none>

# 因为k8s-node01节点上存在标签 nodeenv=pro
[root@k8s-master ~]# kubectl get node -l nodeenv=pro --show-labels
NAME         STATUS   ROLES    AGE     VERSION   LABELS
k8s-node01   Ready    <none>   6d19h   v1.17.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node01,kubernetes.io/os=linux,nodeenv=pro

# 删除
[root@k8s-master ~]# kubectl delete -f pod-nodeaffinity-required.yaml 
pod "pod-nodeaffinity-required" deleted

preferredDuringSchedulingIgnoredDuringExecution(软限制)实例

# 查看node标签
[root@k8s-master ~]# kubectl get node --show-labels
NAME         STATUS   ROLES    AGE     VERSION   LABELS
k8s-master   Ready    master   6d19h   v1.17.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master,kubernetes.io/os=linux,node-role.kubernetes.io/master=
k8s-node01   Ready    <none>   6d19h   v1.17.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node01,kubernetes.io/os=linux,nodeenv=pro
k8s-node02   Ready    <none>   6d19h   v1.17.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node02,kubernetes.io/os=linux,nodeenv=test

# 编写配置文件
[root@k8s-master ~]# vim pod-nodeaffinity-preferred.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-nodeaffinity-preferred
  namespace: test
spec:
  containers:
  - name: nginx
    image: nginx:1.17.1
    imagePullPolicy: IfNotPresent
  affinity:     # 亲和性设置
    nodeAffinity:	 # 设置node亲和性
      preferredDuringSchedulingIgnoredDuringExecution:	 # 软限制
      - weight: 1
        preference:
          matchExpressions:	 # 匹配节点主机标签的键为nodeenv,值为"xxx"或"yyy"的主机(当前环境中没有)

          - key: nodeenv
            operator: In
            values: ["xxx","yyy"]

# 创建
[root@k8s-master ~]# kubectl create -f pod-nodeaffinity-preferred.yaml 
pod/pod-nodeaffinity-preferred created

# 查看,运行成功,调度到k8s-node02节点上,软限制即使没有满足匹配条件的,也会调度到不满足条件的节点上。如果有满足条件的优先调度到满足条件的节点上
[root@k8s-master ~]# kubectl get pod pod-nodeaffinity-preferred -n test -o wide
NAME                         READY   STATUS    RESTARTS   AGE   IP            NODE         NOMINATED NODE   READINESS GATES
pod-nodeaffinity-preferred   1/1     Running   0          3s    10.244.2.33   k8s-node02   <none>           <none>

# 删除
[root@k8s-master ~]# kubectl delete -f pod-nodeaffinity-preferred.yaml 
pod "pod-nodeaffinity-preferred" deleted

NodeAffinity 规则设置的注意事项:

  1. 如果同时定义了 nodeSelector 和 nodeAffinity ,那么必须两个条件都得到满足,pod 才能运行在指定的 node 上
  2. 如果 nodeAffinity 指定了多个 nodeSelectorTerms ,那么只需要其中一个能够匹配成功即可
  3. 如果一个 nodeSelectorTerms 中有多个 matchExpressions ,则一个节点必须满足所有的才能匹配成功
  4. 如果一个 pod 所在的 node 在 pod 运行期间其标签发生了改变,不再符合该 pod 的节点亲和性需求,则系统将忽略此变化

(二)PodAffinity

PodAffinity 主要实现以一个运行的pod为参照,让新创建的pod跟参照pod在一个拓扑域中的功能

PodAffinity 的可配置项:

[root@k8s-master ~]# kubectl explain pod.spec.affinity.podAffinity
KIND:     Pod
VERSION:  v1
RESOURCE: podAffinity <Object>

FIELDS:
   requiredDuringSchedulingIgnoredDuringExecution:    # 硬限制
     
     - labelSelector:		# 标签选择器
         matchExpressions:    # 按节点标签列出的节点选择器要求列表(推荐)
         - key:		# 键
           values:    # 值
           operator:  # 关系符,支持In,NotIn,Exists,DoesNotExist
#        matchLabels:  # 指多个matchExpressions映射的内容
	   namespaces:		# 指定参照pod的namespace 
       topologyKey:	 	# 指定调度作用域。kubernetes.io/hostname 表示以node节点为区分范围;beta.kubernetes.io/os 表示以node节点的操作系统类型来区分
     	
     

   preferredDuringSchedulingIgnoredDuringExecution:   # 软限制
     podAffinityTerm:  # 选项
       namespaces:
       topologyKey:
       labelSelector:
         matchExpressions:    # 按节点标签列出的节点选择器要求列表(推荐)
           key:			# 键
           values:      # 值
           operator:  	# 关系符,支持In,NotIn,Exists,DoesNotExist
#        matchLabels:  	# 指多个matchExpressions映射的内容       
     weight  # 倾向权重,1-100      

requiredDuringSchedulingIgnoredDuringExecution(硬限制)实例

# 创建用于参照的目标pod
[root@k8s-master ~]# vim pod-podaffinity-target.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-podaffinity-target
  namespace: test
  labels:
    podenv: pro # 设置标签,用于后面使用podaffinity运行起来的pod以该pod为参照的依据或联系
spec:
  containers:
  - name: nginx
    image: nginx:1.17.1
  nodeName: k8s-node01

# 启动目标pod
[root@k8s-master ~]# kubectl create -f pod-podaffinity-target.yaml
pod/pod-podaffinity-target created

# 查看
[root@k8s-master ~]# kubectl get pod pod-podaffinity-target -n test -o wide --show-labels
NAME                     READY   STATUS    RESTARTS   AGE     IP            NODE         NOMINATED NODE   READINESS GATES   LABELS
pod-podaffinity-target   1/1     Running   0          2m40s   10.244.1.39   k8s-node01   <none>           <none>            podenv=pro


# 创建使用podaffinity硬限制调度策略的pod
apiVersion: v1
kind: Pod
metadata:
  name: pod-podaffinity-required
  namespace: test
spec:
  containers:
  - name: nginx
    image: nginx:1.17.1
  affinity:    # 亲和性设置
    podAffinity:    # 设置pod亲和性
      requiredDuringSchedulingIgnoredDuringExecution:    # 硬限制
      - labelSelector:
          matchExpressions:    # 匹配目标pod的标签其键为podenv ,值在[“xxx”,"yyy"]
          - key: podenv
            operator: In
            values: ["xxx","yyy"]
        topologyKey: kubernetes.io/hostname    # 作用域,条件满足将该pod与目标pod运行在同一节点上
# 上面配置表达的意思是:新pod必须要与拥有标签nodeenv=xxx或podeenv=yyy的pod调度运行在同一node上

[root@k8s-master ~]# kubectl create -f pod-podaffinity-required.yaml
pod/pod-podaffinity-required created

# 调度失败
[root@k8s-master ~]# kubectl get pod pod-podaffinity-required -n test -o wide --show-labels
NAME                       READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES   LABELS
pod-podaffinity-required   0/1     Pending   0          44s   <none>   <none>   <none>           <none>            <none>

# 查看原因
[root@k8s-master ~]# kubectl describe pod pod-podaffinity-required -n test
Events:
  Type     Reason            Age        From               Message
  ----     ------            ----       ----               -------
  Warning  FailedScheduling  <unknown>  default-scheduler  0/3 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 node(s) didn't match pod affinity rules.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/3 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 node(s) didn't match pod affinity rules.
# master主节点有污点,其余2个节点无法匹配上具有该亲和性规则的pod
# 虽然上面创建并运行的目标pod具有标签podenv=pro,键符合,但是值不符合,上面的亲和性规则要求的是键为 podenv ,值为 xxx 或者 yyy

# 删除该pod,修改配置文件中的pod亲和性规则
[root@k8s-master ~]# kubectl delete -f pod-podaffinity-required.yaml 
pod "pod-podaffinity-required" deleted

[root@k8s-master ~]# vim pod-podaffinity-required.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-podaffinity-required
  namespace: test
spec:
  containers:
  - name: nginx
    image: nginx:1.17.1
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: podenv
            operator: In
            values: ["pro","yyy"]      # 修改为目标pod对应的值 
        topologyKey: kubernetes.io/hostname

# 查看,调度运行成功,并且与目标pod调度在同一节点
[root@k8s-master ~]# kubectl get pod pod-podaffinity-required -n test -o wide --show-labels
NAME                       READY   STATUS    RESTARTS   AGE   IP            NODE         NOMINATED NODE   READINESS GATES   LABELS
pod-podaffinity-required   1/1     Running   0          21s   10.244.1.41   k8s-node01   <none>           <none>            <none>
[root@k8s-master ~]# kubectl get pod -n test -o wide --show-labels
NAME                       READY   STATUS    RESTARTS   AGE     IP            NODE         NOMINATED NODE   READINESS GATES   LABELS
pod-podaffinity-required   1/1     Running   0          26s     10.244.1.41   k8s-node01   <none>           <none>            <none>
pod-podaffinity-target     1/1     Running   0          5m29s   10.244.1.40   k8s-node01   <none>           <none>            podenv=pro

# 删除
[root@k8s-master ~]# kubectl delete -f pod-podaffinity-required.yaml 
pod "pod-podaffinity-required" deleted
[root@k8s-master ~]# kubectl delete -f pod-podaffinity-target.yaml 
pod "pod-podaffinity-target" deleted

(三)podAntiAffinity

podAntiAffinity (pod反亲和性)主要实现以运行的Pod为参照,让新创建的pod调度到与参照的pod不在一个区域中。

可配置项同podAffinity

# 创建参照pod
[root@k8s-master ~]# vim pod-podaffinity-target.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: pod-podaffinity-target
  namespace: test
  labels:
    podenv: pro # 设置标签
spec:
  containers:
  - name: nginx
    image: nginx:1.17.1
  nodeName: k8s-node01

# 运行
[root@k8s-master ~]# kubectl create -f pod-podaffinity-target.yaml
pod/pod-podaffinity-target created

# 查看
[root@k8s-master ~]# kubectl get pod pod-podaffinity-target -n test -o wide --show-labels
NAME                     READY   STATUS    RESTARTS   AGE   IP            NODE         NOMINATED NODE   READINESS GATES   LABELS
pod-podaffinity-target   1/1     Running   0          34s   10.244.1.46   k8s-node01   <none>           <none>            podenv=pro

# 创建基于pod反亲和性调度规则的pod
[root@k8s-master ~]# vim pod-podantiaffinity-required.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-podantiaffinity-required
  namespace: test
spec:
  containers:
  - name: nginx
    image: nginx:1.17.1
  affinity:		# 亲和性设置
    podAntiAffinity:		# 设置pod反亲和性
      requiredDuringSchedulingIgnoredDuringExecution:		# 硬限制
      - labelSelector:
          matchExpressions:		# 匹配键为podenv,值为pro的pod
          - key: podenv
            operator: In
            values: ["pro"]
        topologyKey: kubernetes.io/hostname
# 该配置规则意思为:该创建的pod必须与拥有标签nodeenv=pro的pod不在同一个node上

# 创建
[root@k8s-master ~]# kubectl create -f pod-podantiaffinity-required.yaml
pod/pod-podantiaffinity-required created

# 查看
[root@k8s-master ~]# kubectl get pod  -n test -o wide --show-labels
NAME                           READY   STATUS    RESTARTS   AGE    IP            NODE         NOMINATED NODE   READINESS GATES   LABELS
pod-podaffinity-target         1/1     Running   0          9m1s   10.244.1.46   k8s-node01   <none>           <none>            podenv=pro
pod-podantiaffinity-required   1/1     Running   0          91s    10.244.2.36   k8s-node02   <none>           <none>            <none>
# 可以看到新创建的pod运行在k8s-node02节点上,与拥有标签nodeenv=pro的pod运行在不同的node之上

四、污点和容忍

(一)污点(Taints)

前面的调度方式都是站在Pod的角度上,通过在Pod上添加属性,来确定Pod是否要调度到指定的Node上,其实我们也可以站在Node的角度上,通过在Node上添加污点属性,来决定是否允许Pod调度过来。

Node被设置上污点之后就和Pod之间存在了一种相斥的关系,进而拒绝Pod调度进来,甚至可以将已经存在的Pod驱逐出去。

污点的格式为:

 key=value:effect 
 # key 和 value 是污点的标签。effect 描述污点的作用域,effect支持如下三个选项:

PreferNoSchedule
# kubernetes将尽量避免把Pod调度到具有该污点的Node上,除非没有其他节点可调度

NoSchedule
# kubernetes将不会把Pod调度到具有该污点的Node上,但不会影响当前Node上已存在的Pod

NoExecute
# kubernetes将不会把Pod调度到具有该污点的Node上,同时也会将Node上已存在的Pod驱离

在这里插入图片描述
使用kubectl设置和去除污点的命令:

#设置污点
kubectl taint node 节点名 key=value:effect
#去除污点
kubectl taint node 节点名 key:effect-
#去除所有污点
kubectl taint node 节点名 key-

演示

  1. 准备节点 k8s-node01(为了效果更加明显,暂时停止k8s-node02节点)
  2. 为k8s-node01节点设置一个污点:tag=test:PreferNoSchedule;然后创建pod1(pod1可以)
  3. 修改为k8s-node01节点设置一个污点:tag=test:NoSchedule;然后创建pod2(pod1正常 pod2失败)
  4. 修改为k8s-node01节点设置一个污点:tag=test:NoExecute;然后创建pod3(3个都失败)
[root@k8s-master ~]# kubectl get node
NAME         STATUS     ROLES    AGE   VERSION
k8s-master   Ready      master   9d    v1.17.4
k8s-node01   Ready      <none>   9d    v1.17.4
k8s-node02   NotReady   <none>   9d    v1.17.4

# 为k8s-node01节点设置污点(PreferNoSchedule)
[root@k8s-master ~]# kubectl taint node k8s-node01 tag=test:PreferNoSchedule
node/k8s-node01 tainted

# 查看
[root@k8s-master ~]# kubectl describe node k8s-node01 | grep Taints
Taints:             tag=test:PreferNoSchedule

# 创建名为taint1的pod
[root@k8s-master ~]# kubectl run taint1 --image=nginx:1.17.1 -n test
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
deployment.apps/taint1 created

# 调度到k8s-node01,因为k8s-node02节点不可用,且k8s-master节点本身默认带有污点NoSchedule。而k8s-node01节点的污点作用域为PreferNoSchedule(除非没有可用的节点才允许pod调度过来),且其他两个节点都不能让taint1这个pod调度过去,所以创建的taint1只能调度到k8s-node01节点上
[root@k8s-master ~]# kubectl describe node k8s-master | grep Taints
Taints:             node-role.kubernetes.io/master:NoSchedule
[root@k8s-master ~]# kubectl get pod -n test -o wide
NAME                      READY   STATUS    RESTARTS   AGE   IP            NODE         NOMINATED NODE   READINESS GATES
taint1-766c47bf55-k9zgg   1/1     Running   0          12s   10.244.1.50   k8s-node01   <none>           <none>



# 为k8s-node01 节点设置污点(取消PreferNoSchedule,设置NoSchedule)
[root@k8s-master ~]# kubectl taint node k8s-node01 tag:PreferNoSchedule-
node/k8s-node01 untainted
[root@k8s-master ~]# kubectl describe node k8s-node01 | grep Taints
Taints:             <none>
[root@k8s-master ~]# kubectl taint node k8s-node01 tag=test:NoSchedule
node/k8s-node01 tainted
[root@k8s-master ~]# kubectl describe node k8s-node01 | grep Taints
Taints:             tag=test:NoSchedule

# 再次查看taint1这个pod情况
[root@k8s-master ~]# kubectl get pod -n test -o wide
NAME                      READY   STATUS    RESTARTS   AGE     IP            NODE         NOMINATED NODE   READINESS GATES
taint1-766c47bf55-k9zgg   1/1     Running   0          8m32s   10.244.1.50   k8s-node01   <none>           <none>
# 虽然node01节点的污点作用域改为NoSchedule,但是taint1这个pod还是可以继续运行下去,
# 因为NoSchedule只是让新创建的pod无论如何都不能调度到node01节点上,而对于之前已经运行在此节点上的pod不会受影响

# 创建名为taint2的pod
[root@k8s-master ~]# kubectl run taint2 --image=nginx:1.17.1 -n test
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
deployment.apps/taint2 created

# 查看taint2
[root@k8s-master ~]# kubectl get pod -n test -o wide
NAME                      READY   STATUS    RESTARTS   AGE     IP            NODE         NOMINATED NODE   READINESS GATES
taint1-766c47bf55-k9zgg   1/1     Running   0          17m     10.244.1.50   k8s-node01   <none>           <none>
taint2-84946958cf-ttp87   0/1     Pending   0          8m50s   <none>        <none>       <none>           <none>
[root@k8s-master ~]# kubectl describe pod taint2-84946958cf-ttp87 -n test
……省略……
Events:
  Type     Reason            Age        From               Message
  ----     ------            ----       ----               -------
  Warning  FailedScheduling  <unknown>  default-scheduler  0/3 nodes are available: 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/3 nodes are available: 3 node(s) had taints that the pod didn't tolerate.
# 因为master节点有NoSchedule污点,node02节点不可用,node01节点的污点也为NoSchedule,
# 所以taint2这个pod无法调度到任何节点上

# 为k8s-node01节点再次修改污点作用域(取消NoSchedule,改为NoExecute)
[root@k8s-master ~]# kubectl taint node k8s-node01 tag:NoSchedule-
node/k8s-node01 untainted
[root@k8s-master ~]# kubectl taint node k8s-node01 tag=test:NoExecute
node/k8s-node01 tainted
[root@k8s-master ~]# kubectl describe node k8s-node01 | grep Taints
Taints:             tag=test:NoExecute

# 此时再查看taint1和taint2这两个pod的情况
[root@k8s-master ~]# kubectl get pod -n test -o wide
NAME                      READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
taint1-766c47bf55-z7dcs   0/1     Pending   0          85s   <none>   <none>   <none>           <none>
taint2-84946958cf-xbrc7   0/1     Pending   0          85s   <none>   <none>   <none>           <none>
# 因为node02不可用,master有污点,而node01的污点作用域又改为NoExecute,所以即使taint1这个pod之前就已经运行node01上了,也会被驱逐出去

# 创建名为taint3的pod
[root@k8s-master ~]# kubectl run taint3 --image=nginx:1.17.1 -n test
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
deployment.apps/taint3 created

# [root@k8s-master ~]# kubectl get pod -n test -o wide
NAME                      READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
taint1-766c47bf55-z7dcs   0/1     Pending   0          85s   <none>   <none>   <none>           <none>
taint2-84946958cf-xbrc7   0/1     Pending   0          85s   <none>   <none>   <none>           <none>
taint3-57d45f9d4c-nx72m   0/1     Pending   0          26s   <none>   <none>   <none>           <none>
[root@k8s-master ~]# kubectl describe pod taint3-57d45f9d4c-nx72m -n test
……省略……
Events:
  Type     Reason            Age        From               Message
  ----     ------            ----       ----               -------
  Warning  FailedScheduling  <unknown>  default-scheduler  0/3 nodes are available: 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/3 nodes are available: 3 node(s) had taints that the pod didn't tolerate.
# 同理,NoExecute作用域的污点不仅会驱逐之前的pod,新创建的pod也不能调度其上,即使没有可用的节点

# 去除污点
[root@k8s-master ~]# kubectl taint node k8s-node01 tag-
node/k8s-node01 untainted
[root@k8s-master ~]# kubectl describe node k8s-node01 | grep Taints
Taints:             <none>

# 使用kubeadm搭建的集群,默认就会给master节点加上一个污点标记NoSchedule,所以pod就不会调度到master节点上

(二)容忍(Toleration)

污点的作用是可以在节点上添加污点作用用于拒绝pod调度上来,但是如果就是想将一个pod调度到一个有污点的node上去,这时就可以使用到容忍

在这里插入图片描述

污点就是拒绝容忍,容忍就是忽略,Node通过污点拒绝pod调度到其上,pod通过容忍忽略node的拒绝

配置项:

[root@k8s-master ~]# kubectl explain pod.spec.tolerations
KIND:     Pod
VERSION:  v1
RESOURCE: tolerations <[]Object>

tolerations:			# 容忍规则
- key: "string"			# 对应着要容忍的污点的键,如果为空则意味着匹配所有的键
  value: "string"		# 对应着要容忍的污点的值
  operator:	"string"	# key-value的运算符,支持Equal和Exists(默认)。如果是Exists,那么就只匹配键,与值无关
  effect: "string"		# 对应污点的effect,空意味着匹配所有影响
  tolerationSeconds: integer 	# 容忍时间,当effect为NoExecute时生效,表示pod在node上的停留时间。即创建的pod可以调度到影响域为NoExecute的节点上,但是只允许运行该时长后再进行驱逐

【例 】

[root@k8s-master ~]# kubectl get node
NAME         STATUS     ROLES    AGE   VERSION
k8s-master   Ready      master   9d    v1.17.4
k8s-node01   Ready      <none>   9d    v1.17.4
k8s-node02   NotReady   <none>   9d    v1.17.4
[root@k8s-master ~]# kubectl describe node k8s-master | grep Taints
Taints:             node-role.kubernetes.io/master:NoSchedule
[root@k8s-master ~]# kubectl describe node k8s-node01 | grep Taints
Taints:             tag=test:NoExecute
# k8s-master节点污点规则为NoSchedule,k8s-node02节点不可用,k8s-node01节点污点规则为NoExecute

# 创建不添加容忍规则的pod
[root@k8s-master ~]# vim pod-toleration.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-toleration
  namespace: test
spec:
  containers:
  - name: nginx
    image: nginx:1.17.1
[root@k8s-master ~]# kubectl create -f pod-toleration.yaml
pod/pod-toleration created
# 无法调度成功
[root@k8s-master ~]# kubectl get pod pod-toleration -n test -o wide
NAME             READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
pod-toleration   0/1     Pending   0          47s   <none>   <none>   <none>           <none>

# 查看k8s-node01节点的污点信息
[root@k8s-master ~]# kubectl describe node k8s-node01 | grep Taints
Taints:             tag=test:NoExecute

# 修改配置文件,为该pod添加容忍规则
[root@k8s-master ~]# kubectl delete -f pod-toleration.yaml 
[root@k8s-master ~]# vim pod-toleration.yaml
pod "pod-toleration" deleted
apiVersion: v1
kind: Pod
metadata:
  name: pod-toleration
  namespace: test
spec:
  containers:
  - name: nginx
    image: nginx:1.17.1
  tolerations:		# 添加容忍
  - key: "tag"		# 要容忍的污点的key
    operator: "Equal"		# 操作符
    value: "test"		# 容忍的污点的value
    effect: "NoExecute"		# 添加容忍的规则,必须和标记的污点规则相同。
# 其实就是容忍 node01节点上的污点规则 tag=test:NoExecute ,使得该pod可以调度到node01节点上

# 创建
[root@k8s-master ~]# kubectl create -f pod-toleration.yaml
pod/pod-toleration created

# 此时就可以正常调度运行
[root@k8s-master ~]# kubectl get pod pod-toleration -n test -o wide
NAME             READY   STATUS    RESTARTS   AGE   IP            NODE         NOMINATED NODE   READINESS GATES
pod-toleration   1/1     Running   0          15s   10.244.1.56   k8s-node01   <none>           <none>
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

Pod调度(定向调度NodeName/NodeSelector亲和性调度NodeAffinity/PodAffinity/PodAntiAffinity污点Taints和容忍调度Toleration) 的相关文章

随机推荐