kube-scheduler pod cidr bugfix
之前写一个需求需要做容器网络的规划,发现kuberntes
在调度的时候不会把 ip 地址作为一个调度的参考项。也就是手当node
上规划出来的子网中的 ip 用光且 cpu 和 mem 以及其他调度参考项都满足的时候 pod 还是会被分配到这个节点上,并且kubelet
会伴随着如下报错:
1 NetworkPlugin kubenet failed to set up pod "frontend-jh0kf_default" network: Error adding container to network: no IP addresses available in network: kubenet
修复方案有很多种,核心思路是围绕着调度器参考的对象。比较优雅的方式是在kube-scheduler
中将 ip 地址也作为一个调度资源,但是这个实现起来工作量相对其他方法大了一点;有个折中取巧的方式是利用kube-scheduler
中的一个Allocated Pod
来实现,工作量小,实现简单。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 @@ -29,6 +29,8 @@ import ( v1helper "k8s.io/kubernetes/pkg/apis/core/v1/helper" priorityutil "k8s.io/kubernetes/pkg/scheduler/algorithm/priorities/util" "k8s.io/kubernetes/pkg/scheduler/util" + "net" + "math" ) var ( @@ -315,7 +317,16 @@ func (n *NodeInfo) AllowedPodNumber() int { if n == nil || n.allocatableResource == nil { return 0 } - return n.allocatableResource.AllowedPodNumber + ip, cidr, err := net.ParseCIDR(n.node.Spec.PodCIDR) + if err != nil || ip.To4() == nil { + return n.allocatableResource.AllowedPodNumber + } + size, _ := cidr.Mask.Size() + if size >= 31 { + return 0 + } + // -3 (network address, broadcaster address, gateway address) + return int(math.Min(math.Pow(2, float64(32-size)) - 3, float64(n.allocatableResource.AllowedPodNumber))) }
不过还有需要考虑的是当 pod 使用的是 hostNetwork: true
,上面 patch 工作是不符合预期的。
测试 case –node-cidr-mask-size=30 期望只有一个 pod 分配到 ip 地址并运行,可以查看到 cm 的信息如下:
1 2 3 4 5 6 7 [root@VM_128_11_centos ~]# systemctl status kube-controller-manager.service -l ● kube-controller-manager.service - kube-controller-manager Loaded: loaded (/usr/lib/systemd/system/kube-controller-manager.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2018-08-07 13:37:56 CST; 2min 23s ago Main PID: 20759 (kube-controller) CGroup: /system.slice/kube-controller-manager.service └─20759 /usr/bin/kube-controller-manager --node-cidr-mask-size=30 --cluster-cidr=10.255.0.0/19 --allocate-node-cidrs=true --master=http://127.0.0.1:60001 --cloud-config=/etc/kubernetes/qcloud.conf --service-account-private-key-file=/etc/kubernetes/server.key --service-cluster-ip-range=10.255.31.0/24 --allow-untagged-cloud=true --cloud-provider=qcloud --cluster-name=cls-n1jte9ty --root-ca-file=/etc/kubernetes/cluster-ca.crt --use-service-account-credentials=true --horizontal-pod-autoscaler-use-rest-clients=true
kubelet 信息如下,看见cni
插件的参数:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Aug 07 13:38:24 VM-0-43-ubuntu kubelet[23809]: I0807 13:38:24.454373 23809 kubenet_linux.go:308] CNI network config set to { Aug 07 13:38:24 VM-0-43-ubuntu kubelet[23809]: "cniVersion": "0.1.0", Aug 07 13:38:24 VM-0-43-ubuntu kubelet[23809]: "name": "kubenet", Aug 07 13:38:24 VM-0-43-ubuntu kubelet[23809]: "type": "bridge", Aug 07 13:38:24 VM-0-43-ubuntu kubelet[23809]: "bridge": "cbr0", Aug 07 13:38:24 VM-0-43-ubuntu kubelet[23809]: "mtu": 1500, Aug 07 13:38:24 VM-0-43-ubuntu kubelet[23809]: "addIf": "eth0", Aug 07 13:38:24 VM-0-43-ubuntu kubelet[23809]: "isGateway": true, Aug 07 13:38:24 VM-0-43-ubuntu kubelet[23809]: "ipMasq": false, Aug 07 13:38:24 VM-0-43-ubuntu kubelet[23809]: "hairpinMode": false, Aug 07 13:38:24 VM-0-43-ubuntu kubelet[23809]: "ipam": { Aug 07 13:38:24 VM-0-43-ubuntu kubelet[23809]: "type": "host-local", Aug 07 13:38:24 VM-0-43-ubuntu kubelet[23809]: "subnet": "10.255.0.0/30", Aug 07 13:38:24 VM-0-43-ubuntu kubelet[23809]: "gateway": "10.255.0.1", Aug 07 13:38:24 VM-0-43-ubuntu kubelet[23809]: "routes": [ Aug 07 13:38:24 VM-0-43-ubuntu kubelet[23809]: { "dst": "0.0.0.0/0" } Aug 07 13:38:24 VM-0-43-ubuntu kubelet[23809]: ] Aug 07 13:38:24 VM-0-43-ubuntu kubelet[23809]: } Aug 07 13:38:24 VM-0-43-ubuntu kubelet[23809]: }
确认一下运行中的 pod 数量和 pod 所在节点的信息:
1 2 [root@VM_128_11_centos ~]# kubectl get pod --all-namespaces | grep Running | wc -l 1
1 2 3 4 5 6 7 [root@VM_128_11_centos ~]# kubectl describe node 172.30.0.43 ... Non-terminated Pods: (1 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- default guohao-555fb5456d-kdx8n 0 (0%) 0 (0%) 0 (0%) 0 (0%) Allocated resources:
case –node-cidr-mask-size=29 期望运行 2^(32-29) - 3 = 5 个 pod 分配到 ip 并运行,可以查看到下面 kubelet 的 cni 信息:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Aug 07 13:44:48 VM-0-43-ubuntu kubelet[25163]: I0807 13:44:48.669847 25163 docker_service.go:307] docker cri received runtime config &RuntimeConfig{NetworkConfig:&NetworkConfig{PodCidr:10.255.0.0/29,},} Aug 07 13:44:48 VM-0-43-ubuntu kubelet[25163]: I0807 13:44:48.669902 25163 kubenet_linux.go:308] CNI network config set to { Aug 07 13:44:48 VM-0-43-ubuntu kubelet[25163]: "cniVersion": "0.1.0", Aug 07 13:44:48 VM-0-43-ubuntu kubelet[25163]: "name": "kubenet", Aug 07 13:44:48 VM-0-43-ubuntu kubelet[25163]: "type": "bridge", Aug 07 13:44:48 VM-0-43-ubuntu kubelet[25163]: "bridge": "cbr0", Aug 07 13:44:48 VM-0-43-ubuntu kubelet[25163]: "mtu": 1500, Aug 07 13:44:48 VM-0-43-ubuntu kubelet[25163]: "addIf": "eth0", Aug 07 13:44:48 VM-0-43-ubuntu kubelet[25163]: "isGateway": true, Aug 07 13:44:48 VM-0-43-ubuntu kubelet[25163]: "ipMasq": false, Aug 07 13:44:48 VM-0-43-ubuntu kubelet[25163]: "hairpinMode": false, Aug 07 13:44:48 VM-0-43-ubuntu kubelet[25163]: "ipam": { Aug 07 13:44:48 VM-0-43-ubuntu kubelet[25163]: "type": "host-local", Aug 07 13:44:48 VM-0-43-ubuntu kubelet[25163]: "subnet": "10.255.0.0/29", Aug 07 13:44:48 VM-0-43-ubuntu kubelet[25163]: "gateway": "10.255.0.1", Aug 07 13:44:48 VM-0-43-ubuntu kubelet[25163]: "routes": [ Aug 07 13:44:48 VM-0-43-ubuntu kubelet[25163]: { "dst": "0.0.0.0/0" } Aug 07 13:44:48 VM-0-43-ubuntu kubelet[25163]: ] Aug 07 13:44:48 VM-0-43-ubuntu kubelet[25163]: } Aug 07 13:44:48 VM-0-43-ubuntu kubelet[25163]: }
1 2 [root@VM_128_11_centos ~]# kubectl get pod --all-namespaces |grep Running | wc -l 5
1 2 3 4 5 6 7 8 9 10 [root@VM_128_11_centos ~]# kubectl describe node 172.30.0.43 ... Non-terminated Pods: (5 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- default guohao-555fb5456d-kjzrk 0 (0%) 0 (0%) 0 (0%) 0 (0%) default guohao-555fb5456d-lxrmn 0 (0%) 0 (0%) 0 (0%) 0 (0%) default guohao-555fb5456d-t4fq4 0 (0%) 0 (0%) 0 (0%) 0 (0%) default guohao-555fb5456d-t9k2b 0 (0%) 0 (0%) 0 (0%) 0 (0%) kube-system l7-lb-controller-95dcf7bd7-v9wx7 0 (0%) 0 (0%) 0 (0%) 0 (0%)
结论 当时masksize
为30
和29
时候都是符合预期的,但是问题是只有使用kubenet
时这个patch
才能正常工作,如果使用其他的CNI
实现这样实现就显得很鸡肋。因为PodCIDR
是被kubenet
传递给host-local
插件的,其余的cni
插件不一定使用这个。