linuxea:bootstrap-kubelet.conf: no such file or directory kubelet证书轮换失败

marksugar
2022-02-22 / 0 评论 / 2,944 阅读 / 正在检测是否收录...
温馨提示:
本文最后更新于2022年03月15日,已超过320天没有更新,若内容或图片失效,请留言反馈。

在上次处理 kubelet.go node "master" not found问题之后的一段时间里面,我又遇到了相同的问题发生在其他节点。它的表现方式是/etc/kubernetes/bootstrap-kubelet.conf: no such file or directory

我此前也写了一篇文章linuxea:处理k8s kubelet.go node "master" not found问题

假如按照此前的方式删除/etc/kubernetes/bootstrap-kubelet.conf之后可能就会出现kubelet.go node "master" not found的问题,随后使用admin.conf来替换启动文件来解决这个问题的

但是我随后发现,这个问题的缘由是kubelet的证数到期后进行了证数更新导致的上面的这个错误,从而误导了我删除了10-kubeadm.conf种的--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf字段后重启,并使用将masteradmin.conf替换成kubelet.conf来解决了这个问题,这一个操作,似乎掩盖了真正的问题所在.

究其原因是因为Kubelet的证数没有更新。这种情况发生在手动执行了更新证数到期时间后导致的,kubeadm更新证数并不会更新到Kubelet的证数(实际上是客户端证书轮换失败)。

于是当kublet被重启后,就发生了证数不一致的问题,此前将master的admin.conf替换成kubelet.conf来解决了这个问题的假象在于没有重启kubelet。

  • 我个人并没有这种腿癖好,下午太困,群友说美腿提神啊(来自网图),响应号召

我们来看相同的报错,发生在1.16的kubernetes版本中:

2月 09 16:41:11 master systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
2月 09 16:41:11 master systemd[1]: Unit kubelet.service entered failed state.
2月 09 16:41:11 master systemd[1]: kubelet.service failed.
2月 09 16:41:22 master systemd[1]: kubelet.service holdoff time over, scheduling restart.
2月 09 16:41:22 master systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
2月 09 16:41:22 master systemd[1]: Started kubelet: The Kubernetes Node Agent.
2月 09 16:41:22 master kubelet[74138]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
2月 09 16:41:22 master kubelet[74138]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
2月 09 16:41:22 master kubelet[74138]: I0209 16:41:22.222741   74138 server.go:410] Version: v1.16.3
2月 09 16:41:22 master kubelet[74138]: I0209 16:41:22.223911   74138 plugins.go:100] No cloud provider specified.
2月 09 16:41:22 master kubelet[74138]: I0209 16:41:22.223954   74138 server.go:773] Client rotation is on, will bootstrap in background
2月 09 16:41:22 master systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
2月 09 16:41:22 master kubelet[74138]: E0209 16:41:22.227202   74138 bootstrap.go:265] part of the existing bootstrap client certificate is expired: 2021-03-18 08:46:29 +0000 UTC
2月 09 16:41:22 master kubelet[74138]: F0209 16:41:22.227239   74138 server.go:271] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
2月 09 16:41:22 master systemd[1]: Unit kubelet.service entered failed state.
2月 09 16:41:22 master systemd[1]: kubelet.service failed.

此前的方式就是直接删除了/etc/kubernetes/bootstrap-kubelet.conf(kubeadm安装)这段,这段位于kubelet启动的的配置文件内,你可以通过命令来查看
贴图的日期不重要,仅仅提供说明

[root@master ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Thu 2021-12-30 03:08:09 CST; 1 months 23 days ago
     Docs: https://kubernetes.io/docs/
 Main PID: 32478 (kubelet)
    Tasks: 29
   Memory: 106.9M
   CGroup: /system.slice/kubelet.service
           └─32478 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/confi...

证数查看

首先我们查看证书

[root@master pki]#  kubeadm alpha certs check-expiration
CERTIFICATE                EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
admin.conf                 Feb 07, 2032 08:31 UTC   9y              no      
apiserver                  Feb 07, 2032 08:31 UTC   9y              no      
apiserver-etcd-client      Feb 07, 2032 08:31 UTC   9y              no      
apiserver-kubelet-client   Feb 07, 2032 08:31 UTC   9y              no      
controller-manager.conf    Feb 07, 2032 08:31 UTC   9y              no      
etcd-healthcheck-client    Feb 07, 2032 08:31 UTC   9y              no      
etcd-peer                  Feb 07, 2032 08:31 UTC   9y              no      
etcd-server                Feb 07, 2032 08:31 UTC   9y              no      
front-proxy-client         Feb 07, 2032 08:31 UTC   9y              no      
scheduler.conf             Feb 07, 2032 08:31 UTC   9y              no  

查看到的日期是正常的
而后我们查看kubelet的证数,kubelet.conf是在/var/lib/kubelet/pki的连接文件,于是我们查看它的证数到期时间

[root@master ]# cd /var/lib/kubelet/pki
[root@master pki]# ls
kubelet-client-2020-03-18-16-46-37.pem  kubelet-client-2021-01-28-09-11-35.pem  kubelet-client-current.pem  kubelet.key
kubelet-client-2020-03-18-16-47-03.pem  kubelet-client-2022-02-09-16-22-05.pem  kubelet.crt
[root@master pki]#  openssl x509 -noout -enddate -in ./kubelet.crt 
notAfter=Mar 18 07:46:26 2021 GMT

我们可以看到在Mar 18 07:46:26 2021 GMT也就是说在2021 年 3 月 18 日 07:46:26就已经到期了

  • kubelet-client-2022-02-09-16-22-05.pem 文件是通过kubeadm alpha certs renew all更新后的,可以看到有不同的日期。这个kubeadm是有10年的时间的,所以它并不影响。但是这个pem和我们的日期也是对不上的

kubelet client的日志也没更新

Kubelet 客户端证书轮换失败

来源于kublet的文章Kubelet 客户端证书轮换失败原文如下:

By default, kubeadm configures a kubelet with automatic rotation of client certificates by using the /var/lib/kubelet/pki/kubelet-client-current.pem symlink specified in /etc/kubernetes/kubelet.conf. If this rotation process fails you might see errors such as x509: certificate has expired or is not yet valid in kube-apiserver logs. To fix the issue you must follow these steps:

  1. Backup and delete /etc/kubernetes/kubelet.conf and /var/lib/kubelet/pki/kubelet-client* from the failed node.
  2. From a working control plane node in the cluster that has /etc/kubernetes/pki/ca.key execute kubeadm kubeconfig user --org system:nodes --client-name system:node:$NODE > kubelet.conf. $NODE must be set to the name of the existing failed node in the cluster. Modify the resulted kubelet.conf manually to adjust the cluster name and server endpoint, or pass kubeconfig user --config (it accepts InitConfiguration). If your cluster does not have the ca.key you must sign the embedded certificates in the kubelet.conf externally.
  3. Copy this resulted kubelet.conf to /etc/kubernetes/kubelet.conf on the failed node.
  4. Restart the kubelet (systemctl restart kubelet) on the failed node and wait for /var/lib/kubelet/pki/kubelet-client-current.pem to be recreated.
  5. Manually edit the kubelet.conf to point to the rotated kubelet client certificates, by replacing client-certificate-data and client-key-data with:

    client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem
    client-key: /var/lib/kubelet/pki/kubelet-client-current.pem
  6. Restart the kubelet.
  7. Make sure the node becomes Ready.

翻译过来的意思如下:

默认情况下,kubeadm 通过使用/var/lib/kubelet/pki/kubelet-client-current.pem/etc/kubernetes/kubelet.conf. 如果此轮换过程失败,您可能会x509: certificate has expired or is not yet valid 在 kube-apiserver 日志中看到错误。要解决此问题,您必须执行以下步骤:

  1. 从故障节点备份/etc/kubernetes/kubelet.conf和删除。/var/lib/kubelet/pki/kubelet-client*
  2. 从集群中具有/etc/kubernetes/pki/ca.key执行 的工作控制平面节点kubeadm kubeconfig user --org system:nodes --client-name system:node:$NODE > kubelet.conf$NODE必须设置为集群中现有故障节点的名称。手动修改结果kubelet.conf以调整集群名称和服务器端点,或通过kubeconfig user --config(它接受InitConfiguration)。如果您的集群没有,您必须在外部ca.key签署嵌入式证书。kubelet.conf
  3. 将此结果复制kubelet.conf/etc/kubernetes/kubelet.conf故障节点上。
  4. 重新启动故障节点上的 kubelet ( systemctl restart kubelet) 并等待 /var/lib/kubelet/pki/kubelet-client-current.pem重新创建。
  5. 手动编辑kubelet.conf以指向旋转的 kubelet 客户端证书,方法是将 client-certificate-data和替换client-key-data为:

    client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem
    client-key: /var/lib/kubelet/pki/kubelet-client-current.pem
  6. 重启 kubelet。
  7. 确保节点变为Ready.

在github上有好几种办法,然而这种方式,被一些大佬吐槽,评价是过于粗糙

解决方法是复制/etc/kubernetes/admin.conf特定键的内容client-certificate-data并将client-key-data这些新字符串粘贴到/etc/kubernetes/kubelet.conf相同键下的文件中。然后只是一个service kubelet restart

[root@master kubernetes]# cat admin.conf 
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN5RENDQWJDZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJd01ETXhPREE0TkRZeU4xb1hEVE13TURNeE5qQTRORFl5TjFvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTGFaClRWNODRKWVBPM09yKzdVbS9KN29sRVFEa3RGT3RWWHg0NWhQU0MrVkhWVEZib1JvOWEKNnVHT05iTWNHWVJjcERBbUZSU2pycnFlaFhmbTNjVWJaRUxrdmpTNXFsaFVONGlYak9idFFVYnQ4cHREYU9QSgo1cDUybjRnczdKMU92bzhKRjYzYU83Vy91cHdJS05MOEovWlpUVTh0YlU1TklkUzZCMXE1cFRSQTFBVT0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
    server: https://master:6443
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: kubernetes-admin
  name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
  user:
    client-certificate-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM4akNDQWRxZ0F3SUJBZ0lJUm91STNYU1ZTak13RFFZSktvWklodmNOQVFFTEJRQXdGVEVUTUJFR0ExVUUKQXhNS2EzVmlaWEp1WlhSbGN6QWVGdzB5TURBek1UZ3dPRFEyTWpkYUZ3MHpNakF5TURjd09ETXhNVGxhTURReApGekFWQmdOVkJBb1REb9FUmJWenpRQndxZ1djMkMrbmVmRlNYK0FQMHdrL2VmdXJpdGRqUTAKeFhVNjgwNnF0b1hzM3VHaWtNQkc1WmQzT2srLzc5NlZGM29TZllObU5CaVAxY3FjVUJIcVFpOTdQNVZSL2RmawpaR0phMVJoNE5aRk9IaXVqRXFFOGQxUFVLOTg0SHNxOTcxN0dIelRaZGNDMW1EcFF3d3FUdktVRlZOa3hQdFljCjdDWkl1QUltZWFwcXlQVkFhdEp5Vk5kVy9NRlVya0ZjTHZFMnlRQ1pXd1NxL3RnSDFtMD0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
    client-key-data: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcEFJQkFBS0NBUUVBcXVmRUo2NG9wR2txM1Vzd21SNGFiOTRuS0RjTTFMSWRsYnBXVkIraDAzZGp5K0ZICnJsRVVSVUdESEtBZjIvN2EwbTNrS0xoSWVudC9GRVRxSm5Kd3RUUzdmUDlDVzVwUGR2OHdEQ3o3U1dzK1ZrczcKTTVjcXhMNFovem5ySU9LZ2FmQzIyaTVFdjgrRjBqdW85b1lES3VwMFQ0bmxON3dNeXdjN1dFS0dNcGtEZGNnTgpwem1kTGZDSzQvNXdWeFhVcDFvTDJ1OHowV0RLKzcyN3plaFVMcFpZN0lXRG1PRnd2YzFxcmp6RFBCYWNxd3MwCnJyMkx6RXllRWt6cUZpd3BkcXBmbE4rYkxTZkN3ekNlWFdTcEVQ5UEVnV0dEWFlaYUhGTzBRZVF0a2Vnd2xoeWdXeXNZOTBBZnArbQpOeVByZW8zRngzaTlBUG9QeWRuNHFtbVd2dmhiT2FhUGZyK1pBUmFOa0JCaXc1OUw3eW5IMVhLcExMMDBGZHlCClFRYS8rUUtCZ1FDYzFLaXV3Ui9ZWGY5aGtKeWVZRTZHUXhKeEc2OWl2MDNuZm1ldi9zeExKZDY3WmxBemRrbDgKc3Vtb29uK0dhc0V4SGFqQUhkVVlNZmplU2ZxUkNOR1FISWM4cGFNYjQxbFErRGowRlBydzRHeThjcTBNWEtleQpIelduazQrVmpXeW9URVJoTnpkSEVUdXFKUG51TFdqbFhSaFhLWCtIVmVZVUdwN3pRNHFXQWc9PQotLS0tLUVORCBSU0EgUFJJVkFURSBLRVktLS0tLQo= 

修改后如下

[root@master kubernetes]# cat kubelet.conf
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN5RENDQWJDZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJd01ETXhPREE0TkRZeU4xb1hEVE13TURNeE5qQTRORFl5TjFvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTGFaClRWNODRKWVBPM09yKzdVbS9KN29sRVFEa3RGT3RWWHg0NWhQU0MrVkhWVEZib1JvOWEKNnVHT05iTWNHWVJjcERBbUZSU2pycnFlaFhmbTNjVWJaRUxrdmpTNXFsaFVONGlYak9idFFVYnQ4cHREYU9QSgo1cDUybjRnczdKMU92bzhKRjYzYU83Vy91cHdJS05MOEovWlpUVTh0YlU1TklkUzZCMXE1cFRSQTFBVT0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
    server: https://master:6443
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: system:node:master
  name: system:node:master@kubernetes
current-context: system:node:master@kubernetes
kind: Config
preferences: {}
users:
- name: system:node:master
  user:
    client-certificate-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM4akNDQWRxZ0F3SUJBZ0lJUm91STNYU1ZTak13RFFZSktvWklodmNOQVFFTEJRQXdGVEVUTUJFR0ExVUUKQXhNS2EzVmlaWEp1WlhSbGN6QWVGdzB5TURBek1UZ3dPRFEyTWpkYUZ3MHpNakF5TURjd09ETXhNVGxhTURReApGekFWQmdOVkJBb1REb9FUmJWenpRQndxZ1djMkMrbmVmRlNYK0FQMHdrL2VmdXJpdGRqUTAKeFhVNjgwNnF0b1hzM3VHaWtNQkc1WmQzT2srLzc5NlZGM29TZllObU5CaVAxY3FjVUJIcVFpOTdQNVZSL2RmawpaR0phMVJoNE5aRk9IaXVqRXFFOGQxUFVLOTg0SHNxOTcxN0dIelRaZGNDMW1EcFF3d3FUdktVRlZOa3hQdFljCjdDWkl1QUltZWFwcXlQVkFhdEp5Vk5kVy9NRlVya0ZjTHZFMnlRQ1pXd1NxL3RnSDFtMD0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
    client-key-data: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcEFJQkFBS0NBUUVBcXVmRUo2NG9wR2txM1Vzd21SNGFiOTRuS0RjTTFMSWRsYnBXVkIraDAzZGp5K0ZICnJsRVVSVUdESEtBZjIvN2EwbTNrS0xoSWVudC9GRVRxSm5Kd3RUUzdmUDlDVzVwUGR2OHdEQ3o3U1dzK1ZrczcKTTVjcXhMNFovem5ySU9LZ2FmQzIyaTVFdjgrRjBqdW85b1lES3VwMFQ0bmxON3dNeXdjN1dFS0dNcGtEZGNnTgpwem1kTGZDSzQvNXdWeFhVcDFvTDJ1OHowV0RLKzcyN3plaFVMcFpZN0lXRG1PRnd2YzFxcmp6RFBCYWNxd3MwCnJyMkx6RXllRWt6cUZpd3BkcXBmbE4rYkxTZkN3ekNlWFdTcEVQ5UEVnV0dEWFlaYUhGTzBRZVF0a2Vnd2xoeWdXeXNZOTBBZnArbQpOeVByZW8zRngzaTlBUG9QeWRuNHFtbVd2dmhiT2FhUGZyK1pBUmFOa0JCaXc1OUw3eW5IMVhLcExMMDBGZHlCClFRYS8rUUtCZ1FDYzFLaXV3Ui9ZWGY5aGtKeWVZRTZHUXhKeEc2OWl2MDNuZm1ldi9zeExKZDY3WmxBemRrbDgKc3Vtb29uK0dhc0V4SGFqQUhkVVlNZmplU2ZxUkNOR1FISWM4cGFNYjQxbFErRGowRlBydzRHeThjcTBNWEtleQpIelduazQrVmpXeW9URVJoTnpkSEVUdXFKUG51TFdqbFhSaFhLWCtIVmVZVUdwN3pRNHFXQWc9PQotLS0tLUVORCBSU0EgUFJJVkFURSBLRVktLS0tLQo=

最后我们得到的结果是,通过kubeadm alpha certs renew all更新的k8s证数,是不会更新kubelet.conf的证数的,并且这在github上得到了进一步的讨论和证实

参考

Kubelet can't running after renew certificates
linuxea:处理k8s kubelet.go node "master" not found问题

0

评论

博主关闭了当前页面的评论