首页
About Me
推荐
weibo
github
Search
1
linuxea:gitlab-ci之docker镜像质量品质报告
49,451 阅读
2
linuxea:如何复现查看docker run参数命令
23,046 阅读
3
Graylog收集文件日志实例
18,582 阅读
4
linuxea:jenkins+pipeline+gitlab+ansible快速安装配置(1)
18,275 阅读
5
git+jenkins发布和回滚示例
18,181 阅读
ops
Openvpn
Sys Basics
rsync
Mail
NFS
Other
Network
HeartBeat
server 08
Code
Awk
Shell
Python
Golang
virtualization
KVM
Docker
openstack
Xen
kubernetes
kubernetes-cni
Service Mesh
Data
Mariadb
PostgreSQL
MongoDB
Redis
MQ
Ceph
TimescaleDB
kafka
surveillance system
zabbix
ELK Stack/logs
Open-Falcon
Prometheus
victoriaMetrics
Web
apache
Tomcat
Nginx
自动化
Puppet
Ansible
saltstack
Proxy
HAproxy
Lvs
varnish
更多
互联咨询
最后的净土
软件交付
持续集成
gitops
devops
登录
Search
标签搜索
kubernetes
docker
zabbix
Golang
mariadb
持续集成工具
白话容器
elk
linux基础
nginx
dockerfile
Gitlab-ci/cd
最后的净土
基础命令
gitops
jenkins
docker-compose
Istio
haproxy
saltstack
marksugar
累计撰写
690
篇文章
累计收到
139
条评论
首页
栏目
ops
Openvpn
Sys Basics
rsync
Mail
NFS
Other
Network
HeartBeat
server 08
Code
Awk
Shell
Python
Golang
virtualization
KVM
Docker
openstack
Xen
kubernetes
kubernetes-cni
Service Mesh
Data
Mariadb
PostgreSQL
MongoDB
Redis
MQ
Ceph
TimescaleDB
kafka
surveillance system
zabbix
ELK Stack/logs
Open-Falcon
Prometheus
victoriaMetrics
Web
apache
Tomcat
Nginx
自动化
Puppet
Ansible
saltstack
Proxy
HAproxy
Lvs
varnish
更多
互联咨询
最后的净土
软件交付
持续集成
gitops
devops
页面
About Me
推荐
weibo
github
搜索到
71
篇与
的结果
2021-12-29
linuxea:calico间歇失败探针Readiness /Liveness probe faile
我们有两个版本的内核节点,分别是两个批次的节点,安装了3.10.95和5.4.x的内核在出现Readiness probe failed和Liveness probe failed是一次pod驱逐而后就发生的查看事件# kubectl get events --sort-by='.lastTimestamp' -n kube-system LAST SEEN TYPE REASON OBJECT MESSAGE 50m Warning Unhealthy pod/calico-node-dbjr4 Readiness probe failed: 48m Warning Unhealthy pod/calico-node-8t7rc Liveness probe failed: 43m Warning Unhealthy pod/calico-node-msj2z Liveness probe failed: 34m Warning Unhealthy pod/calico-node-9gxqf Liveness probe failed: 34m Warning Unhealthy pod/calico-node-cj8qk Readiness probe failed: 32m Warning Unhealthy pod/calico-node-tph5b Liveness probe failed:回到Kuboard面板查看事件有一种可能tcp_tw_recycle,tcp_tw_recycle 在低版本的内核里面是可以被启用的,于是我们查看,发现过是打开状态关闭tcp_tw_recycle在 kernel 的4.12之后的版本中tcp_tw_recycle已经被移除了net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_tw_reuse = 1于是我们关闭sysctl -w net.ipv4.tcp_tw_recycle=0当打开tcp_tw_recycle的时候很有可能会发生calico的状态存在断续的或者一直起不来的情况,原因就是这种连接资源快速回收和重用的机制会在k8s中出现问题,特别是在有会话session的时候tcp_tw_recyle=1无法接收 NATed 数据包的服务器。默认情况下,当tcp_tw_reuse和tcp_tw_recycle都被禁用时,内核将确保处于TIME_WAIT状态的套接字将保持在该状态足够长的时间——足够长的时间以确保属于未来连接的数据包不会被误认为是旧连接的延迟数据包。当您启用时tcp_tw_reuse,处于TIME_WAIT状态的套接字可以在它们到期之前使用,并且内核将尝试确保没有关于 TCP 序列号的冲突。如果您启用tcp_timestamps(又名 PAWS,用于防止包装序列号),它将确保不会发生这些冲突。然而,你需要TCP时间戳上启用两个端(至少,这是我的理解)。有关详细信息,请参阅tcp_twsk_unique的定义。当您启用 时tcp_tw_recycle,内核会变得更加激进,并且会对远程主机使用的时间戳做出假设。它将跟踪每个具有连接TIME_WAIT状态的远程主机使用的最后一个时间戳),如果时间戳正确增加,则允许重新使用套接字。但是,如果主机使用的时间戳发生变化(即及时回溯),SYN数据包将被悄悄丢弃,并且连接将无法建立(您将看到类似于“连接超时”的错误)。如果您想深入了解内核代码,tcp_timewait_state_process的定义可能是一个很好的起点。现在,时间戳永远不应该回到过去;除非:主机重新启动(但是,当它恢复时,TIME_WAIT套接字可能已经过期,所以这将不是问题);IP 地址很快被其他东西重用(TIME_WAIT连接会保留一点,但其他连接可能会被攻击TCP RST,这会释放一些空间);网络地址转换(或 smarty-pants 防火墙)涉及连接中间。在后一种情况下,您可以在同一个 IP 地址后面有多个主机,因此,不同的时间戳序列(或者,所述时间戳在每个连接上由防火墙随机化)。在这种情况下,一些主机将随机无法连接,因为它们被映射到TIME_WAIT服务器存储桶具有更新时间戳的端口。这就是为什么文档会告诉您“NAT 设备或负载平衡器可能会因为设置而开始丢帧”的原因。有些人建议不要管tcp_tw_recycle,但启用tcp_tw_reuse并降低tcp_fin_timeout. 我同意 :-)阅读-> 这篇文章有助于理解net.ipv4.ip_forward = 1 net.bridge.bridge-nf-call-iptables = 1 net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-arptables = 1 net.ipv4.tcp_tw_reuse = 0 net.core.somaxconn = 32768 net.netfilter.nf_conntrack_max=1000000 vm.swappiness = 0 vm.max_map_count=655360 fs.file-max=6553600 net.ipv4.tcp_keepalive_time = 600 net.ipv4.tcp_keepalive_intvl = 30 net.ipv4.tcp_keepalive_probes = 10问题仍然没有解决,此时我们并没有发现有用的日志信息,查看kublet日志如下,仍然提示探针失败如下12月 29 10:52:10 master1 kubelet[1458]: E1229 10:52:10.468294 1458 remote_runtime.go:392] ExecSync 312728508abf1a1bfea28a70a174e8a97e57685fc8bae741fa1702fb341e4b06 '/bin/calico-node -felix-ready -bird-ready' from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded 12月 29 10:52:10 master1 kubelet[1458]: I1229 10:52:10.468344 1458 prober.go:117] Readiness probe for "calico-node-dbjr4_kube-system(8d8d3b92-b7d3-424d-a6af-48b8559cbe15):calico-node" failed (failure): 12月 29 10:52:10 master1 kubelet[1458]: E1229 10:52:10.543839 1458 remote_runtime.go:392] ExecSync 4805cd505d5b6494d26130cc0bf83a458c1b099c1eefebac8dff942ffe04b980 '/usr/bin/check-status -r' from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded 12月 29 10:52:10 master1 kubelet[1458]: I1229 10:52:10.543899 1458 prober.go:117] Readiness probe for "calico-kube-controllers-5b75d4f8db-pn5ll_kube-system(b4c2985a-6476-448c-9e44-c2a1e7a7db07):calico-kube-controllers" failed (failure):查看系统,目前并没有其他特别的异常,并且业务正常,我们试图调整探针返回时间来测试调整Liveness/Readinesscalico探针的命令如下/bin/calico-node '-felix-live' - '-bird-live' /bin/calico-node '-felix-ready' '-bird-ready' 我们在yaml文件中默认配置的是1秒,在1.20版本也是如此 livenessProbe: exec: command: - /bin/calico-node - '-felix-live' - '-bird-live' failureThreshold: 6 initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 name: calico-node readinessProbe: exec: command: - /bin/calico-node - '-felix-ready' - '-bird-ready' failureThreshold: 3 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1但是这在一个负载较高的环境中可能需要增加timeoutSeconds关于探针阅读这篇linuxea:kubernetes 存活状态liveness probe(7)failureThreshold:探测几次,默认3次探测失败,则失败 periodSeconds:间隔时间,10秒 timeoutSeconds:探测超时时间,1秒 initialDelayseconds : 初始化探测延迟时间(在容器启动的时候,直接修改 timeoutSeconds: 5,而后观察一小时。问题解决# kubectl get events --sort-by='.lastTimestamp' -n kube-system No resources found in kube-system namespace.参考Dropping of connections with tcp_tw_recycleWhy does the TCP protocol have a TIME_WAIT state?calico-node unexplained readiness/liveness probe failsCalico pod Intermittently triggers probe failed eventslinuxea:kubernetes 存活状态liveness probe(7)
2021年12月29日
3,091 阅读
0 评论
0 点赞
2021-12-21
linuxea:处理k8s kubelet.go node "master" not found问题
一个项目的master节点down了之后kubelet起不来,由于是远古时期的项目,早期未作任何监控手段。因此也没有监控,没有文档,非常棘手经过查看,master节点的kubelet未启动,查看kubelet日志: failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory我重新梳理了问题的关键,请转到bootstrap-kubelet.conf: no such file or directory kubelet证书轮换失败failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory随即在/usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf中去掉--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf后重启master的kubelet已经启动起来,但是还是有问题,status是NotReady查看master状态也是NotReady,而后就暂停了master的调度SchedulingDisabled[root@master ~]# kubectl get node NAME STATUS ROLES AGE VERSION master NotReady,SchedulingDisabled master 647d v1.16.3 worker1 Ready <none> 647d v1.16.3 worker2 Ready <none> 647d v1.16.3我们留意到worker是ready的,于是我们查看worker节点。发现worker节点调度正常,并且可以进行拉取镜像,并且此时业务也恢复了。我们使用kubectl describe master查看到master节点还是有问题[root@master ~]# kubectl describe node master Name: master Roles: master Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=master kubernetes.io/os=linux node-role.kubernetes.io/master= Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock node.alpha.kubernetes.io/ttl: 0 projectcalico.org/IPv4Address: 192.168.133.6/25 projectcalico.org/IPv4IPIPTunnelAddr: 10.100.219.64 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Sat, 14 Mar 2020 11:37:43 +0800 Taints: node.kubernetes.io/unreachable:NoExecute node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/unreachable:NoSchedule node.kubernetes.io/unschedulable:NoSchedule Unschedulable: true Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- NetworkUnavailable False Thu, 04 Mar 2021 10:26:49 +0800 Thu, 04 Mar 2021 10:26:49 +0800 CalicoIsUp Calico is running on this node MemoryPressure Unknown Tue, 21 Dec 2021 09:17:47 +0800 Tue, 21 Dec 2021 09:18:41 +0800 NodeStatusUnknown Kubelet stopped posting node status. DiskPressure Unknown Tue, 21 Dec 2021 09:17:47 +0800 Tue, 21 Dec 2021 09:18:41 +0800 NodeStatusUnknown Kubelet stopped posting node status. PIDPressure Unknown Tue, 21 Dec 2021 09:17:47 +0800 Tue, 21 Dec 2021 09:18:41 +0800 NodeStatusUnknown Kubelet stopped posting node status. Ready Unknown Tue, 21 Dec 2021 09:17:47 +0800 Tue, 21 Dec 2021 09:18:41 +0800 NodeStatusUnknown Kubelet stopped posting node status. Addresses: InternalIP: 192.168.133.6 Hostname: master Capacity: cpu: 4 ephemeral-storage: 36805060Ki hugepages-2Mi: 0 memory: 16259676Ki pods: 110 Allocatable: cpu: 4 ephemeral-storage: 33919543240 hugepages-2Mi: 0 memory: 16157276Ki pods: 110 System Info: Machine ID: abfbcf67c72dca40affb0cd0c8debd6d System UUID: 63B3A0C6-AEB7-3841-BA81-2724178E0890 Boot ID: 02424d77-4b14-45ea-a632-038dbc70a2c4 Kernel Version: 3.10.0-957.el7.x86_64 OS Image: CentOS Linux 7 (Core) Operating System: linux Architecture: amd64 Container Runtime Version: docker://18.9.7 Kubelet Version: v1.16.3 Kube-Proxy Version: v1.16.3 PodCIDR: 10.100.0.0/24 PodCIDRs: 10.100.0.0/24 Non-terminated Pods: (6 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE --------- ---- ------------ ---------- --------------- ------------- --- kube-system calico-node-v2rmv 250m (6%) 0 (0%) 0 (0%) 0 (0%) 4h15m kube-system etcd-master 0 (0%) 0 (0%) 0 (0%) 0 (0%) 362d kube-system kube-apiserver-master 250m (6%) 0 (0%) 0 (0%) 0 (0%) 362d kube-system kube-controller-manager-master 200m (5%) 0 (0%) 0 (0%) 0 (0%) 362d kube-system kube-proxy-k7shj 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4h15m kube-system kube-scheduler-master 100m (2%) 0 (0%) 0 (0%) 0 (0%) 362d Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 800m (20%) 0 (0%) memory 0 (0%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) Events: <none>接着查看日志,提示node "master" not found[root@master ~]# journalctl -u kubelet -f -- Logs begin at Sun 2021-10-17 15:04:51 CST. -- Dec 21 14:56:18 master kubelet[34678]: E1221 14:56:18.159602 34678 kubelet.go:2267] node "master" not found Dec 21 14:56:18 master kubelet[34678]: E1221 14:56:18.259813 34678 kubelet.go:2267] node "master" not found Dec 21 14:56:18 master kubelet[34678]: E1221 14:56:18.329948 34678 reflector.go:123] k8s.io/kubernetes/pkg/kubelet/kubelet.go:450: Failed to list *v1.Service: services is forbidden: User "system:anonymous" cannot list resource "services" in API group "" at the cluster scope Dec 21 14:56:18 master kubelet[34678]: E1221 14:56:18.360614 34678 kubelet.go:2267] node "master" not found Dec 21 14:56:18 master kubelet[34678]: E1221 14:56:18.460822 34678 kubelet.go:2267] node "master" not found Dec 21 14:56:18 master kubelet[34678]: E1221 14:56:18.531167 34678 reflector.go:123] k8s.io/kubernetes/pkg/kubelet/kubelet.go:459: Failed to list *v1.Node: nodes "master" is forbidden: User "system:anonymous" cannot list resource "nodes" in API group "" at the cluster scope Dec 21 14:56:18 master kubelet[34678]: E1221 14:56:18.560990 34678 kubelet.go:2267] node "master" not found Dec 21 14:56:18 master kubelet[34678]: E1221 14:56:18.661174 34678 kubelet.go:2267] node "master" not found Dec 21 14:56:18 master kubelet[34678]: E1221 14:56:18.731398 34678 reflector.go:123] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to list *v1.Pod: pods is forbidden: User "system:anonymous" cannot list resource "pods" in API group "" at the cluster scope Dec 21 14:56:18 master kubelet[34678]: E1221 14:56:18.761342 34678 kubelet.go:2267] node "master" not found Dec 21 14:56:18 master kubelet[34678]: E1221 14:56:18.861818 34678 kubelet.go:2267] node "master" not found Dec 21 14:56:18 master kubelet[34678]: E1221 14:56:18.930789 34678 reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1beta1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:anonymous" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope Dec 21 14:56:18 master kubelet[34678]: E1221 14:56:18.962003 34678 kubelet.go:2267] node "master" not found Dec 21 14:56:19 master kubelet[34678]: E1221 14:56:19.062175 34678 kubelet.go:2267] node "master" not found Dec 21 14:56:19 master kubelet[34678]: E1221 14:56:19.131629 34678 reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1beta1.RuntimeClass: runtimeclasses.node.k8s.io is forbidden: User "system:anonymous" cannot list resource "runtimeclasses" in API group "node.k8s.io" at the cluster scope Dec 21 14:56:19 master kubelet[34678]: E1221 14:56:19.162315 34678 kubelet.go:2267] node "master" not found Dec 21 14:56:19 master kubelet[34678]: E1221 14:56:19.262472 34678 kubelet.go:2267] node "master" not found Dec 21 14:56:19 master kubelet[34678]: E1221 14:56:19.331431 34678 reflector.go:123] k8s.io/kubernetes/pkg/kubelet/kubelet.go:450: Failed to list *v1.Service: services is forbidden: User "system:anonymous" cannot list resource "services" in API group "" at the cluster scope Dec 21 14:56:19 master kubelet[34678]: E1221 14:56:19.362644 34678 kubelet.go:2267] node "master" not found Dec 21 14:56:19 master kubelet[34678]: E1221 14:56:19.462865 34678 kubelet.go:2267] node "master" not found Dec 21 14:56:19 master kubelet[34678]: E1221 14:56:19.532574 34678 reflector.go:123] k8s.io/kubernetes/pkg/kubelet/kubelet.go:459: Failed to list *v1.Node: nodes "master" is forbidden: User "system:anonymous" cannot list resource "nodes" in API group "" at the cluster scope Dec 21 14:56:19 master kubelet[34678]: E1221 14:56:19.563028 34678 kubelet.go:2267] node "master" not found Dec 21 14:56:19 master kubelet[34678]: E1221 14:56:19.663176 34678 kubelet.go:2267] node "master" not found Dec 21 14:56:19 master kubelet[34678]: E1221 14:56:19.732705 34678 reflector.go:123] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to list *v1.Pod: pods is forbidden: User "system:anonymous" cannot list resource "pods" in API group "" at the cluster scope Dec 21 14:56:19 master kubelet[34678]: E1221 14:56:19.763328 34678 kubelet.go:2267] node "master" not found Dec 21 14:56:19 master kubelet[34678]: E1221 14:56:19.863481 34678 kubelet.go:2267] node "master" not found Dec 21 14:56:19 master kubelet[34678]: E1221 14:56:19.932104 34678 reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1beta1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:anonymous" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope Dec 21 14:56:19 master kubelet[34678]: E1221 14:56:19.963637 34678 kubelet.go:2267] node "master" not found Dec 21 14:56:20 master kubelet[34678]: E1221 14:56:20.063823 34678 kubelet.go:2267] node "master" not found Dec 21 14:56:20 master kubelet[34678]: E1221 14:56:20.133218 34678 reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1beta1.RuntimeClass: runtimeclasses.node.k8s.io is forbidden: User "system:anonymous" cannot list resource "runtimeclasses" in API group "node.k8s.io" at the cluster scope Dec 21 14:56:20 master kubelet[34678]: E1221 14:56:20.164019 34678 kubelet.go:2267] node "master" not found Dec 21 14:56:20 master kubelet[34678]: E1221 14:56:20.264632 34678 kubelet.go:2267] node "master" not found Dec 21 14:56:20 master kubelet[34678]: E1221 14:56:20.332795 34678 reflector.go:123] k8s.io/kubernetes/pkg/kubelet/kubelet.go:450: Failed to list *v1.Service: services is forbidden: User "system:anonymous" cannot list resource "services" in API group "" at the cluster scope Dec 21 14:56:20 master kubelet[34678]: E1221 14:56:20.364844 34678 kubelet.go:2267] node "master" not found然而此时master的kubelet 是runing的,但是没有状态还是没有ready,接着我重启了docker程序,仍然没有进展,反倒让proxy和coredns,calico Pending了。而etcd,controlle,api-server却是正常running的。由于项目太旧,没有任何文档,半年前做过证书更新,于是查看证书[root@master ]# kubeadm alpha certs check-expiration CERTIFICATE EXPIRES RESIDUAL TIME EXTERNALLY MANAGED admin.conf Dec 22, 2030 06:19 UTC 9y no apiserver Dec 22, 2030 06:19 UTC 9y no apiserver-etcd-client Dec 22, 2030 06:19 UTC 9y no apiserver-kubelet-client Dec 22, 2030 06:19 UTC 9y no controller-manager.conf Dec 22, 2030 06:19 UTC 9y no etcd-healthcheck-client Dec 22, 2030 06:19 UTC 9y no etcd-peer Dec 22, 2030 06:19 UTC 9y no etcd-server Dec 22, 2030 06:19 UTC 9y no front-proxy-client Dec 22, 2030 06:19 UTC 9y no scheduler.conf Dec 22, 2030 06:19 UTC 9y no发现证书也没有问题而后我们确认master是否配置正确[root@master kubernetes]# cat kubelet.conf apiVersion: v1 clusters: - cluster: certificate-authority-data: server: https://master:6443 name: kubernetes contexts: - context: cluster: kubernetes user: kubernetes-admin name: kubernetes-admin@kubernetes current-context: kubernetes-admin@kubernetes kind: Config preferences: {} users: - name: kubernetes-admin user: client-certificate-data: client-key-data: L在查看hosts[root@master kubernetes]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.133.6 master一切都没有问题,现在怀疑是kubelet.conf的证书权限有问题,于是将node的kubelet拿到master尝试。发现还是没有解决最后在尝试将master的admin.conf替换成kubelet.conf[root@master kubernetes]# systemctl restart kubelet查看日志[root@master kubernetes]# journalctl -u kubelet -f -- Logs begin at Sun 2021-10-31 03:30:27 CST. -- Dec 21 15:15:48 master kubelet[39135]: I1221 15:15:48.782529 39135 reconciler.go:301] Volume detached for volume "calico-node-token-sfzfw" (UniqueName: "kubernetes.io/secret/b7039ad3-f568-47b7-a323-737898628333-calico-node-token-sfzfw") on node "master" DevicePath "" Dec 21 15:15:48 master kubelet[39135]: I1221 15:15:48.782541 39135 reconciler.go:301] Volume detached for volume "coredns-token-6rmpw" (UniqueName: "kubernetes.io/secret/bf57f81b-911a-48b7-aa1e-624193cd7c51-coredns-token-6rmpw") on node "master" DevicePath "" Dec 21 15:15:48 master kubelet[39135]: I1221 15:15:48.782552 39135 reconciler.go:301] Volume detached for volume "config-volume" (UniqueName: "kubernetes.io/configmap/f4d237c7-2212-4c91-8535-4cad2bb88644-config-volume") on node "master" DevicePath "" Dec 21 15:15:48 master kubelet[39135]: I1221 15:15:48.782564 39135 reconciler.go:301] Volume detached for volume "coredns-token-6rmpw" (UniqueName: "kubernetes.io/secret/f4d237c7-2212-4c91-8535-4cad2bb88644-coredns-token-6rmpw") on node "master" DevicePath "" Dec 21 15:15:48 master kubelet[39135]: W1221 15:15:48.993096 39135 kubelet_getters.go:292] Path "/var/lib/kubelet/pods/3114871e-b583-4e02-a8f3-b63a25b2dffa/volumes" does not exist Dec 21 15:15:49 master kubelet[39135]: W1221 15:15:49.215461 39135 kubelet_getters.go:292] Path "/var/lib/kubelet/pods/02f1318a-3b06-48cf-a96e-83322f66c17b/volumes" does not exist Dec 21 15:15:49 master kubelet[39135]: W1221 15:15:49.215524 39135 kubelet_getters.go:292] Path "/var/lib/kubelet/pods/b7039ad3-f568-47b7-a323-737898628333/volumes" does not exist Dec 21 15:15:49 master kubelet[39135]: W1221 15:15:49.215548 39135 kubelet_getters.go:292] Path "/var/lib/kubelet/pods/f4d237c7-2212-4c91-8535-4cad2bb88644/volumes" does not exist Dec 21 15:15:49 master kubelet[39135]: W1221 15:15:49.215570 39135 kubelet_getters.go:292] Path "/var/lib/kubelet/pods/83515e8e-8ba5-459d-9bc6-3d897a4f18e9/volumes" does not exist Dec 21 15:15:49 master kubelet[39135]: W1221 15:15:49.215593 39135 kubelet_getters.go:292] Path "/var/lib/kubelet/pods/bf57f81b-911a-48b7-aa1e-624193cd7c51/volumes" does not exist Dec 21 15:15:49 master kubelet[39135]: I1221 15:15:49.384045 39135 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "lib-modules" (UniqueName: "kubernetes.io/host-path/809554bc-4e5c-4645-b8dd-9963182a8196-lib-modules") pod "calico-node-k4vx7" (UID: "809554bc-4e5c-4645-b8dd-9963182a8196") Dec 21 15:15:49 master kubelet[39135]: I1221 15:15:49.384101 39135 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "xtables-lock" (UniqueName: "kubernetes.io/host-path/809554bc-4e5c-4645-b8dd-9963182a8196-xtables-lock") pod "calico-node-k4vx7" (UID: "809554bc-4e5c-4645-b8dd-9963182a8196") Dec 21 15:15:49 master kubelet[39135]: I1221 15:15:49.384152 39135 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "flexvol-driver-host" (UniqueName: "kubernetes.io/host-path/809554bc-4e5c-4645-b8dd-9963182a8196-flexvol-driver-host") pod "calico-node-k4vx7" (UID: "809554bc-4e5c-4645-b8dd-9963182a8196") Dec 21 15:15:49 master kubelet[39135]: I1221 15:15:49.384194 39135 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "cni-net-dir" (UniqueName: "kubernetes.io/host-path/809554bc-4e5c-4645-b8dd-9963182a8196-cni-net-dir") pod "calico-node-k4vx7" (UID: "809554bc-4e5c-4645-b8dd-9963182a8196") Dec 21 15:15:49 master kubelet[39135]: I1221 15:15:49.384233 39135 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "host-local-net-dir" (UniqueName: "kubernetes.io/host-path/809554bc-4e5c-4645-b8dd-9963182a8196-host-local-net-dir") pod "calico-node-k4vx7" (UID: "809554bc-4e5c-4645-b8dd-9963182a8196") Dec 21 15:15:49 master kubelet[39135]: I1221 15:15:49.384271 39135 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "policysync" (UniqueName: "kubernetes.io/host-path/809554bc-4e5c-4645-b8dd-9963182a8196-policysync") pod "calico-node-k4vx7" (UID: "809554bc-4e5c-4645-b8dd-9963182a8196") Dec 21 15:15:49 master kubelet[39135]: I1221 15:15:49.384305 39135 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "calico-node-token-sfzfw" (UniqueName: "kubernetes.io/secret/809554bc-4e5c-4645-b8dd-9963182a8196-calico-node-token-sfzfw") pod "calico-node-k4vx7" (UID: "809554bc-4e5c-4645-b8dd-9963182a8196") Dec 21 15:15:49 master kubelet[39135]: I1221 15:15:49.384332 39135 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "cni-bin-dir" (UniqueName: "kubernetes.io/host-path/809554bc-4e5c-4645-b8dd-9963182a8196-cni-bin-dir") pod "calico-node-k4vx7" (UID: "809554bc-4e5c-4645-b8dd-9963182a8196") Dec 21 15:15:49 master kubelet[39135]: I1221 15:15:49.384369 39135 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "var-run-calico" (UniqueName: "kubernetes.io/host-path/809554bc-4e5c-4645-b8dd-9963182a8196-var-run-calico") pod "calico-node-k4vx7" (UID: "809554bc-4e5c-4645-b8dd-9963182a8196") Dec 21 15:15:49 master kubelet[39135]: I1221 15:15:49.384398 39135 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "var-lib-calico" (UniqueName: "kubernetes.io/host-path/809554bc-4e5c-4645-b8dd-9963182a8196-var-lib-calico") pod "calico-node-k4vx7" (UID: "809554bc-4e5c-4645-b8dd-9963182a8196") Dec 21 15:15:49 master kubelet[39135]: W1221 15:15:49.986228 39135 kubelet_getters.go:292] Path "/var/lib/kubelet/pods/754ff5ae-8b4f-4f67-9027-68b5f59a8b90/volumes" does not exist Dec 21 15:15:50 master kubelet[39135]: I1221 15:15:50.387442 39135 reconciler.go:207]问题暂时得到解决[root@master kubernetes]# kubectl get pod -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system calico-kube-controllers-dc4d48847-mmwdk 1/1 Running 0 4h30m kube-system calico-node-9z59b 1/1 Running 0 21m kube-system calico-node-k4vx7 1/1 Running 0 22m kube-system calico-node-xpktb 1/1 Running 0 21m kube-system coredns-dffb59cff-4tnrn 1/1 Running 0 4h30m kube-system coredns-dffb59cff-hjt7v 1/1 Running 0 4h30m kube-system eip-nfs-minio-6c44f59b76-nq8fm 1/1 Running 4 237d kube-system etcd-master 1/1 Running 5 362d kube-system kube-apiserver-master 1/1 Running 3 362d kube-system kube-controller-manager-master 1/1 Running 244 362d kube-system kube-proxy-7d5qb 1/1 Running 0 22m kube-system kube-proxy-k755k 1/1 Running 0 21m kube-system kube-proxy-pxhb6 1/1 Running 0 22m kube-system kube-scheduler-master 1/1 Running 241 362d kube-system kuboard-7d745b566-wwp8w 1/1 Running 0 42d kube-system metrics-server-7dcd7b44c-dvpp4 1/1 Running 0 87d[root@master kubernetes]# kubectl get node NAME STATUS ROLES AGE VERSION master Ready,SchedulingDisabled master 647d v1.16.3 worker1 Ready <none> 647d v1.16.3 worker2 Ready <none> 647d v1.16.3[root@master kubernetes]# kubectl describe node master Name: master Roles: master ... Unschedulable: true Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- NetworkUnavailable False Tue, 21 Dec 2021 15:15:52 +0800 Tue, 21 Dec 2021 15:15:52 +0800 CalicoIsUp Calico is running on this node MemoryPressure False Tue, 21 Dec 2021 15:46:58 +0800 Tue, 21 Dec 2021 15:15:47 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Tue, 21 Dec 2021 15:46:58 +0800 Tue, 21 Dec 2021 15:15:47 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Tue, 21 Dec 2021 15:46:58 +0800 Tue, 21 Dec 2021 15:15:47 +0800 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Tue, 21 Dec 2021 15:46:58 +0800 Tue, 21 Dec 2021 15:15:47 +0800 KubeletReady kubelet is posting ready status Addresses: InternalIP: 192.168.133.6 Hostname: master Capacity: cpu: 4 ephemeral-storage: 36805060Ki hugepages-2Mi: 0 memory: 16259676Ki pods: 110 Allocatable: cpu: 4 ephemeral-storage: 33919543240 hugepages-2Mi: 0 memory: 15211100Ki pods: 110 System Info: Machine ID: abfbcf67c72dca40affb0cd0c8debd6d System UUID: 63B3A0C6-AEB7-3841-BA81-2724178E0890 Boot ID: 02424d77-4b14-45ea-a632-038dbc70a2c4 Kernel Version: 3.10.0-957.el7.x86_64 OS Image: CentOS Linux 7 (Core) Operating System: linux Architecture: amd64 Container Runtime Version: docker://18.9.7 Kubelet Version: v1.16.3 Kube-Proxy Version: v1.16.3 PodCIDR: 10.100.0.0/24 PodCIDRs: 10.100.0.0/24 Non-terminated Pods: (6 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE --------- ---- ------------ ---------- --------------- ------------- --- kube-system calico-node-k4vx7 250m (6%) 0 (0%) 0 (0%) 0 (0%) 31m kube-system etcd-master 0 (0%) 0 (0%) 0 (0%) 0 (0%) 362d kube-system kube-apiserver-master 250m (6%) 0 (0%) 0 (0%) 0 (0%) 362d kube-system kube-controller-manager-master 200m (5%) 0 (0%) 0 (0%) 0 (0%) 362d kube-system kube-proxy-pxhb6 0 (0%) 0 (0%) 0 (0%) 0 (0%) 31m kube-system kube-scheduler-master 100m (2%) 0 (0%) 0 (0%) 0 (0%) 362d Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 800m (20%) 0 (0%) memory 0 (0%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Starting 31m kubelet, master Starting kubelet. Normal NodeHasSufficientMemory 31m (x2 over 31m) kubelet, master Node master status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 31m (x2 over 31m) kubelet, master Node master status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 31m (x2 over 31m) kubelet, master Node master status is now: NodeHasSufficientPID Normal NodeNotReady 31m kubelet, master Node master status is now: NodeNotReady Normal NodeNotSchedulable 31m kubelet, master Node master status is now: NodeNotSchedulable Normal NodeAllocatableEnforced 31m kubelet, master Updated Node Allocatable limit across pods Normal NodeReady 31m kubelet, master Node master status is now: NodeReady Normal Starting 31m kube-proxy, master Starting kube-proxy.延伸阅读bootstrap-kubelet.conf: no such file or directory kubelet证书轮换失败
2021年12月21日
11,487 阅读
0 评论
0 点赞
2019-03-29
linuxea:docker与gVisor沙箱
容器彻底改变了开发,打包和部署应用程序的方式。但是,暴露给容器的系统表面足够多,以至于很多安全人员并不建议使用。越来越希望运行更多应用担任更多的角色,同时也出现不同的安全问题,这引发了对沙盒容器 - 容器的新兴趣,这些容器有助于在主机操作系统和容器内运行的应用程序之间提供安全的隔离边界。为此,我们想介绍一种新型沙箱gVisor,它有助于为容器提供安全隔离,同时比虚拟机(VM)更轻量级。gVisor与Docker和Kubernetes集成,使得在生产环境中运行沙盒容器变得简单易行。传统的容器不是沙箱在传统Linux容器中运行的应用程序以与常规(非容器化)应用程序相同的方式访问系统资源:通过直接向主机内核进行系统调用。内核以特权模式运行,允许它与必要的硬件交互并将结果返回给应用程序对于传统容器,内核对应用程序可以访问的资源施加了一些限制。这些限制是通过使用Linux cgroup和命名空间来实现的,但并非所有资源都可以通过这些机制进行控制。此外,即使有这些限制,内核仍会暴露出恶意应用程序可以直接攻击的很大的表面区域。像seccomp过滤器这样的内核功能可以在应用程序和主机内核之间提供更好的隔离,但是它们需要用户创建预定义的系统调用白名单。实际上,通常很难知道应用程序预先需要哪些系统调用。在应用程序需要的系统调用中发现漏洞时,过滤器也几乎没有帮助。现有的基于VM的容器技术一种改进容器隔离的方法是在其自己的虚拟机(VM)中运行每个容器。这为每个容器提供了自己的“机器”,包括内核和虚拟化设备,完全独立于主机。即使guest虚拟机中存在漏洞,虚拟机管理程序仍会隔离主机以及主机上运行的其他应用程序/容器。在不同的VM中运行容器可提供出色的隔离,兼容性和性能,但可能还需要更大的资源占用空间。Kata容器是一个开源项目,它使用精简的VM来保持最小的资源占用空间并最大化容器隔离的性能。与gVisor一样,Kata包含一个与Docker和Kubernetes兼容的Open Container Initiative(OCI)运行时API。带有gVisor的沙盒容器gVisor比VM更轻量级,同时保持相似的隔离级别。gVisor的核心是一个内核,它作为一个普通的,无特权的进程运行,支持大多数Linux系统调用。这个内核是用Go编写的,它是根据内存和类型安全性选择的。就像在VM中一样,在gVisor沙箱中运行的应用程序获得自己的内核和一组虚拟化设备,与主机和其他沙箱不同。gVisor通过拦截应用程序系统调用并充当客户内核提供强大的隔离边界,同时在用户空间中运行。与创建时需要一组固定资源的VM不同,gVisor可以随着时间的推移适应不断变化的资源,就像大多数普通的Linux进程一样。gVisor可以被认为是一个极其半虚拟化的操作系统,具有灵活的资源占用空间和比完整VM更低的固定成本。但是,这种灵活性的代价是更高的每系统调用开销和应用程序兼容性 - 更多内容如下。与Docker和Kubernetes集成gVisor在运行时通过runsc(“运行沙盒容器”的缩写)与Docker和Kubernetes无缝集成,符合OCI运行时API。docker的默认运行容器时候,runsc运行时是可以互换runc。安装简单;,一旦安装,它只需要一个额外的标志来在Docker中运行沙盒容器:目前支持产品的页面:https://gvisor.dev/docs/user_guide/compatibility/安装[root@linuxea.com ~]# wget https://storage.googleapis.com/gvisor/releases/nightly/latest/runsc [root@linuxea.com ~]# wget https://storage.googleapis.com/gvisor/releases/nightly/latest/runsc.sha512 [root@linuxea.com ~]# sha512sum -c runsc.sha512 [root@linuxea.com ~]# chmod a+x runsc如果此刻无法安装请尝试如下:githubwget https://raw.githubusercontent.com/marksugar/dockerMops/master/runsc/runsc -O /usr/local/bin/runsc wget https://raw.githubusercontent.com/marksugar/dockerMops/master/runsc/runsc.sha512 -O /usr/local/bin/runsc.sha512 cd /usr/local/bin/ sha512sum -c runsc.sha512 chmod a+x runsc.sha512添加到daemon.json文件中[root@linuxea.com ~]# cat /etc/docker/daemon.json { "bip": "172.31.0.1/16", "insecure-registries": ["registry.linuxea.com"], "runtimes": { "runsc": { "path": "/usr/local/bin/runsc" } } }重启[root@linuxea.com ~]# systemctl restart docker使用--runtime=runsc[root@linuxea.com ~]# docker run --name redis -d --runtime=runsc -p 6379:6379 marksugar/redis:5.0.0 c59fcbc99d295a9b0a3fda760dbc88c078c7dfa24d25842f2db9f3c1b1f0ee1c[root@linuxea.com ~]# docker logs redis [i] Start configuration /etc/redis [ok] /etc/redis/redis.conf config ready [i] If you want to use variables, please turn on REDIS_CONF=on [i] No variable substitution /etc/redis/redis.conf [i] Start up /usr/local/bin/redis-server /etc/redis/redis.conf [root@linuxea.com ~]# docker exec -i redis ps aux PID USER TIME COMMAND 1 root 0:00 {Initialization.} /bin/sh /Initialization.sh 2 root 0:00 /usr/local/bin/redis-server /etc/redis/redis.conf 6 root 0:00 ps aux为了方便期间,使用ubuntu验证[root@LinuxEA /etc/docker]# docker run -it --rm --runtime=runsc ubuntu dmesg [ 0.000000] Starting gVisor... [ 0.174239] Forking spaghetti code... [ 0.611425] Preparing for the zombie uprising... [ 0.631965] Waiting for children... [ 0.968107] Granting licence to kill(2)... [ 1.057559] Letting the watchdogs out... [ 1.267850] Creating cloned children... [ 1.536684] Creating process schedule... [ 2.013456] Reading process obituaries... [ 2.202789] Generating random numbers by fair dice roll... [ 2.205842] Digging up root... [ 2.644056] Ready!这样并不然是方便的,我们在加一个默认参数来配合使用 "default-runtime": "runsc"[root@LinuxEA ~]# cat /etc/docker/daemon.json { "bip": "10.10.10.1/24", "default-runtime": "runsc", "runtimes": { "runsc": { "path": "/usr/local/bin/runsc" } } }而后重启不需要添加--runtime=runsc参数也可[root@LinuxEA ~]# systemctl restart docker [root@LinuxEA ~]# docker run -it --rm ubuntu dmesg [ 0.000000] Starting gVisor... [ 0.218754] Creating cloned children... [ 0.635654] Creating bureaucratic processes... [ 0.765353] Generating random numbers by fair dice roll... [ 0.778448] Gathering forks... [ 1.029718] Searching for needles in stacks... [ 1.140639] Digging up root... [ 1.425273] Granting licence to kill(2)... [ 1.708911] Forking spaghetti code... [ 2.032427] Segmenting fault lines... [ 2.209941] Rewriting operating system in Javascript... [ 2.665411] Ready!倘若你使用的是--network=host,你可能还需要修改如下[root@Linuxea.com /data/linuxea/build]# cat /etc/docker/daemon.json { "bip": "10.10.10.1/24", "default-runtime": "runsc", "runtimes": { "runsc": { "path": "/usr/local/bin/runsc", "runtimeArgs": [ "--network=host" ] } } }而后重启即可[root@linuxea.com /data/linuxea/build]# docker exec -it mariadb dmesg [ 0.000000] Starting gVisor... [ 0.176162] Gathering forks... [ 0.337078] Consulting tar man page... [ 0.366665] Forking spaghetti code... [ 0.619321] Feeding the init monster... [ 0.796898] Letting the watchdogs out... [ 0.846904] Segmenting fault lines... [ 0.874947] Generating random numbers by fair dice roll... [ 1.122600] Mounting deweydecimalfs... [ 1.598290] Reading process obituaries... [ 1.650848] Searching for socket adapter... [ 1.761147] Ready!在Kubernetes中,大多数资源隔离发生在pod级别,使pod自然适合gVisor沙箱边界。Kubernetes社区目前正在正式化沙盒pod API,但现在可以获得实验性支持。在runsc运行时可以通过使用任一的运行沙盒在KubernetesCRI-O或CRI-containerd项目,其中转换消息从Kubelet成OCI运行命令。gVisor实现了Linux系统API的大部分(200个系统调用和计数),但不是全部。目前不支持某些系统调用和参数,/proc和/sys文件系统的某些部分也是如此。因此,并非所有应用程序都将在gVisor中运行,但许多应用程序运行得很好,包括Node.js,Java 8,MySQL,Jenkins,Apache,Redis,MongoDB等等。更多的使用参考https://github.com/google/gvisor,https://gvisor.dev/docs/user_guide/docker/更多阅读有关如何将密码信息传递到容器的更多信息的一些建议是:linuxea:kubernetes secret简单用法(25)linuxea:kubernetes 介绍ConfigMap与Secret(23)linuxea:docker的安全实践学习更多学习如何使用Docker CLI命令,Dockerfile命令,使用这些命令可以帮助你更有效地使用Docker应用程序。查看Docker文档和我的其他帖子以了解更多信息。docker目录白话容器docker-compose
2019年03月29日
4,921 阅读
1 评论
0 点赞
2019-02-13
linuxea:kubeadm 1.13 高可用
使用kubeadm安装配置kubernetes HA,etcd外放,使用VIP做故障转移,其中不同的是,这个VIP还做了域名解析。此前尝试使用keepalived+haproxy发现有一些问题。恰巧内部有内部的DNS服务器,这样一来,两台master通过域名和VIP做转移,实现了kubernetes的高可用,如下图环境如下:[root@linuxea.com ~]# kubectl version Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1",[root@linuxea.com ~]# docker -v Docker version 18.06.1-ce, build e68fc7a先决条件hostscat >> /etc/hosts << EOF 172.25.50.13 master-0.k8s.org 172.25.50.14 master-1.k8s.org 127.0.0.1 www.linuxea.com EOFhostname[root@linuxea.com ~]# hostnamectl set-hostname master-0.k8s.org [root@host-172-25-50-13 ~]# echo "DHCP_HOSTNAME=master-0.k8s.org" >> /etc/sysconfig/network-scripts/ifcfg-eth0 [root@linuxea.com ~]# systemctl restart network修改后重启下,在重启前,关闭防火墙[root@linuxea.com ~]# systemctl disable iptables firewalld.service [root@linuxea.com ~]# systemctl stop iptables firewalld.service [root@linuxea.com ~]# reboot当然了,我这里此前安装的就是iptablesswap[root@master-0 ~]# swapoff -a可以打开ipvscat << EOF > /etc/sysconfig/modules/ipvs.modules #!/bin/bash ipvs_modules_dir="/usr/lib/modules/\`uname -r\`/kernel/net/netfilter/ipvs" for i in \`ls \$ipvs_modules_dir | sed -r 's#(.*).ko.*#\1#'\`; do /sbin/modinfo -F filename \$i &> /dev/null if [ \$? -eq 0 ]; then /sbin/modprobe \$i fi done EOFchmod +x /etc/sysconfig/modules/ipvs.modules bash /etc/sysconfig/modules/ipvs.modulesecho "1" > /proc/sys/net/bridge/bridge-nf-call-iptables确保模块安装,nf_nat_ipv4也是关键之一[root@master-0 ~]# lsmod|grep ip_vs ip_vs_wrr 16384 0 ip_vs_wlc 16384 0 ip_vs_sh 16384 0 ip_vs_sed 16384 0 ip_vs_rr 16384 0 ip_vs_pe_sip 16384 0 nf_conntrack_sip 28672 1 ip_vs_pe_sip ip_vs_ovf 16384 0 ip_vs_nq 16384 0 ip_vs_mh 16384 0 ip_vs_lc 16384 0 ip_vs_lblcr 16384 0 ip_vs_lblc 16384 0 ip_vs_ftp 16384 0 ip_vs_fo 16384 0 ip_vs_dh 16384 0 ip_vs 151552 30 ip_vs_wlc,ip_vs_rr,ip_vs_dh,ip_vs_lblcr,ip_vs_sh,ip_vs_ovf,ip_vs_fo,ip_vs_nq,ip_vs_lblc,ip_vs_pe_sip,ip_vs_wrr,ip_vs_lc,ip_vs_mh,ip_vs_sedip_vs_ftp nf_nat 32768 2 nf_nat_ipv4,ip_vs_ftp nf_conntrack 135168 8 xt_conntrack,nf_conntrack_ipv4,nf_nat,ipt_MASQUERADE,nf_nat_ipv4,nf_conntrack_sip,nf_conntrack_netlink,ip_vs libcrc32c 16384 4 nf_conntrack,nf_nat,xfs,ip_vs如果觉得上面的步骤太繁琐,可以参考这里的脚本:curl -Lk https://raw.githubusercontent.com/marksugar/kubeadMHA/master/systeminit/chenage_hostname|bash curl -Lk https://raw.githubusercontent.com/marksugar/kubeadMHA/master/systeminit/ip_vs_a_init|bashkeepalivedinstall keepalived bash <(curl -s https://raw.githubusercontent.com/marksugar/lvs/master/keepliaved/install.sh|more)如下:输入Master或者BACKUP和VIP[root@master-0 ~]# bash <(curl -s https://raw.githubusercontent.com/marksugar/lvs/master/keepliaved/install.sh|more) You install role MASTER/BACKUP ? please enter(block letter):MASTER Please enter the use VIP: 172.25.50.15安装kubeadmcat <<EOF > /etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64 enabled=1 gpgcheck=1 repo_gpgcheck=1 gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg EOF setenforce 0 yum install -y kubelet kubeadm systemctl enable kubelet && systemctl start kubeletmaster-0部署kubeadm initmaster-0.k8s.orgapiVersion: kubeadm.k8s.io/v1beta1 kind: InitConfiguration nodeRegistration: name: master-0.k8s.org # taints: # - key: "kubeadmNode" # value: "master" # effect: "NoSchedule" localapiEndpoint: advertiseAddress: https://172.25.50.13 bindPort: 6443 --- apiVersion: kubeadm.k8s.io/v1beta1 kind: ClusterConfiguration kubernetesVersion: "v1.13.1" #controlPlaneEndpoint: 172.25.50.15:6444 controlPlaneEndpoint: master-vip.k8s.org:6443 apiServer: CertSANs: - master-vip.k8s.org timeoutForControlPlane: 5m0s etcd: external: endpoints: - "https://172.25.50.16:2379" - "https://172.25.50.17:2379" - "https://172.25.50.18:2379" caFile: /etc/kubernetes/pki/etcd/ca.pem certFile: /etc/kubernetes/pki/etcd/client.pem keyFile: /etc/kubernetes/pki/etcd/client-key.pem networking: serviceSubnet: 172.25.50.0/23 podSubnet: 172.25.56.0/22 dnsDomain: cluster.local imageRepository: k8s.gcr.io clusterName: "Acluster" #dns: # type: CoreDNS --- apiVersion: kubeproxy.config.k8s.io/v1alpha1 kind: KubeProxyConfiguration mode: "ipvs"开始初始化[root@master-0 ~]# kubeadm init --config ./kubeadm-init.yaml ... Your Kubernetes master has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ You can now join any number of machines by running the following on each node as root: kubeadm join master-vip.k8s.org:6443 --token gjflgc.jg9i5vyrmiv295h3 --discovery-token-ca-cert-hash sha256:9b7943a35e4b6199b5f9fe50473bd336e28c184975d90e3a0f3076c25b694a18[root@master-0 ~]# mkdir -p $HOME/.kube [root@master-0 ~]# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config [root@master-0 ~]# sudo chown $(id -u):$(id -g) $HOME/.kube/config[root@master-0 ~]# kubectl get cs,nodes NAME STATUS MESSAGE ERROR componentstatus/scheduler Healthy ok componentstatus/controller-manager Healthy ok componentstatus/etcd-0 Healthy {"health":"true"} componentstatus/etcd-1 Healthy {"health":"true"} componentstatus/etcd-2 Healthy {"health":"true"} NAME STATUS ROLES AGE VERSION node/master-0.k8s.org NotReady master 75s v1.13.1[root@master-0 ~]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE coredns-86c58d9df4-br2x2 0/1 Pending 0 71s coredns-86c58d9df4-kcm42 0/1 Pending 0 71s kube-apiserver-master-0.k8s.org 1/1 Running 0 28s kube-controller-manager-master-0.k8s.org 1/1 Running 0 29s kube-proxy-rp8dg 1/1 Running 0 71s kube-scheduler-master-0.k8s.org 1/1 Running 0 31s安装calico[root@master-0 ~]# wget https://docs.projectcalico.org/v3.4/getting-started/kubernetes/installation/hosted/calico.yaml[root@master-0 ~]# cd calico/ [root@master-0 calico]# ls calicoctl calico.yamlapply[root@master-0 calico]# kubectl apply -f ./ configmap/calico-config created secret/calico-etcd-secrets created daemonset.extensions/calico-node created serviceaccount/calico-node created deployment.extensions/calico-kube-controllers created serviceaccount/calico-kube-controllers created clusterrole.rbac.authorization.k8s.io/calico-kube-controllers created clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created clusterrole.rbac.authorization.k8s.io/calico-node created clusterrolebinding.rbac.authorization.k8s.io/calico-node created[root@master-0 calico]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-5d94b577bb-7k7jn 1/1 Running 0 27s calico-node-bdkgn 1/1 Running 0 27s coredns-86c58d9df4-br2x2 1/1 Running 0 3m8s coredns-86c58d9df4-kcm42 1/1 Running 0 3m8s kube-apiserver-master-0.k8s.org 1/1 Running 0 2m25s kube-controller-manager-master-0.k8s.org 1/1 Running 0 2m26s kube-proxy-rp8dg 1/1 Running 0 3m8s kube-scheduler-master-0.k8s.org 1/1 Running 0 2m28s你可能需要修改两个地方,一个网卡接口- name: IP value: "autodetect" - name: IP_AUTODETECTION_METHOD value: "interface=eth.*" 或者,直接写成ip- name: CALICO_IPV4POOL_CIDR value: "172.25.56.0/22"另外,如果有必要,你还要修改容忍度来容忍master的污点tolerations: # Mark the pod as a critical add-on for rescheduling. - key: CriticalAddonsOnly operator: Exists - key: node-role.kubernetes.io/master effect: NoSchedule可以参考我的github上的文件:https://github.com/marksugar/kubeadMHA/tree/master/calico延伸阅读:https://docs.projectcalico.org/v3.4/getting-started/kubernetes/installation/calico安装metrics-server我们创建 一个目录来存放metrics-server[root@master-0 ~]# mkdir deploy/metrics-server/ [root@master-0 ~]# cd ~/deploy/metrics-server/而后我们只下载metrics-server的部署文件即可[root@master-0 metrics-server]# for i in \ aggregated-metrics-reader.yaml \ auth-delegator.yaml \ auth-reader.yaml \ metrics-apiservice.yaml \ metrics-server-deployment.yaml \ metrics-server-service.yaml \ resource-reader.yaml \ ;do curl -Lks https://raw.githubusercontent.com/kubernetes-incubator/metrics-server/master/deploy/1.8%2B/$i -o "${i}";done[root@master-0 metrics-server]# ll total 28 -rw-r--r-- 1 root root 384 Jan 1 21:37 aggregated-metrics-reader.yaml -rw-r--r-- 1 root root 308 Jan 1 21:37 auth-delegator.yaml -rw-r--r-- 1 root root 329 Jan 1 21:37 auth-reader.yaml -rw-r--r-- 1 root root 298 Jan 1 21:37 metrics-apiservice.yaml -rw-r--r-- 1 root root 815 Jan 1 21:37 metrics-server-deployment.yaml -rw-r--r-- 1 root root 249 Jan 1 21:37 metrics-server-service.yaml -rw-r--r-- 1 root root 502 Jan 1 21:37 resource-reader.yaml而后部署即可[root@master-0 metrics-server]# kubectl apply -f ./ clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created serviceaccount/metrics-server created deployment.extensions/metrics-server created service/metrics-server created clusterrole.rbac.authorization.k8s.io/system:metrics-server created clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created[root@master-0 metrics-server]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-5d94b577bb-7k7jn 1/1 Running 0 64s calico-node-bdkgn 1/1 Running 0 64s coredns-86c58d9df4-br2x2 1/1 Running 0 3m45s coredns-86c58d9df4-kcm42 1/1 Running 0 3m45s kube-apiserver-master-0.k8s.org 1/1 Running 0 3m2s kube-controller-manager-master-0.k8s.org 1/1 Running 0 3m3s kube-proxy-rp8dg 1/1 Running 0 3m45s kube-scheduler-master-0.k8s.org 1/1 Running 0 3m5s metrics-server-54f6f996dc-kr5wz 1/1 Running 0 7s稍等片刻就可以看到节点资源状态等[root@master-0 metrics-server]# kubectl top node NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% master-0.k8s.org 292m 14% 1272Mi 35% [root@master-0 metrics-server]# kubectl top pods -n kube-system NAME CPU(cores) MEMORY(bytes) calico-kube-controllers-5d94b577bb-7k7jn 2m 10Mi calico-node-bdkgn 20m 19Mi coredns-86c58d9df4-br2x2 2m 11Mi coredns-86c58d9df4-kcm42 3m 11Mi kube-apiserver-master-0.k8s.org 79m 391Mi kube-controller-manager-master-0.k8s.org 33m 67Mi kube-proxy-rp8dg 2m 15Mi kube-scheduler-master-0.k8s.org 10m 13Mi metrics-server-54f6f996dc-kr5wz 1m 11Mi master-1部署复制密钥到master-1[root@master-0 metrics-server]# cd /etc/kubernetes/ && scp -r ./pki 172.25.50.14:/etc/kubernetes/ && scp ./admin.conf 172.25.50.14:/etc/kubernetes/ ca.pem 100% 1371 101.9KB/s 00:00 client.pem 100% 997 430.3KB/s 00:00 client-key.pem 100% 227 132.2KB/s 00:00 ca.key 100% 1675 366.5KB/s 00:00 ca.crt 100% 1025 1.5MB/s 00:00 apiserver-kubelet-client.key 100% 1679 817.6KB/s 00:00 apiserver-kubelet-client.crt 100% 1099 602.7KB/s 00:00 apiserver.key 100% 1679 170.8KB/s 00:00 apiserver.crt 100% 1261 266.2KB/s 00:00 front-proxy-ca.key 100% 1675 796.2KB/s 00:00 front-proxy-ca.crt 100% 1038 595.1KB/s 00:00 front-proxy-client.key 100% 1679 816.4KB/s 00:00 front-proxy-client.crt 100% 1058 512.4KB/s 00:00 sa.key 100% 1679 890.5KB/s 00:00 sa.pub 100% 451 235.0KB/s 00:00 admin.conf 100% 5450 442.9KB/s 00:00使用kubeadm token create --print-join-command获取当前token[root@master-0 kubernetes]# kubeadm token create --print-join-command kubeadm join master-vip.k8s.org:6443 --token qffwr6.4dqd3hshvfbxn3f8 --discovery-token-ca-cert-hash sha256:9b7943a35e4b6199b5f9fe50473bd336e28c184975d90e3a0f3076c25b694a18[root@master-1 ~]# echo "1" > /proc/sys/net/bridge/bridge-nf-call-iptables && bash /etc/sysconfig/modules/ipvs.modules[root@master-1 ~]# kubeadm join master-vip.k8s.org:6443 --token qffwr6.4dqd3hshvfbxn3f8 --discovery-token-ca-cert-hash sha256:9b7943a35e4b6199b5f9fe50473bd336e28c184975d90e3a0f3076c25b694a18 --experimental-control-plane ... This node has joined the cluster and a new control plane instance was created: * Certificate signing request was sent to apiserver and approval was received. * The Kubelet was informed of the new secure connection details. * Master label and taint were applied to the new node. * The Kubernetes control plane instances scaled up. To start administering your cluster from this node, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config Run 'kubectl get nodes' to see this node join the cluster. ...我们在备用的master节点使用experimental-control-plane延伸阅读:https://kubernetes.io/docs/setup/independent/high-availability/#external-etcd[root@master-1 ~]# mkdir -p $HOME/.kube [root@master-1 ~]# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config [root@master-1 ~]# sudo chown $(id -u):$(id -g) $HOME/.kube/config[root@master-1 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION master-0.k8s.org Ready master 11m v1.13.1 master-1.k8s.org Ready master 70s v1.13.1 [root@master-1 ~]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-5d94b577bb-7k7jn 1/1 Running 0 8m13s calico-node-bdkgn 1/1 Running 0 8m13s calico-node-nk6xw 1/1 Running 0 79s coredns-86c58d9df4-br2x2 1/1 Running 0 10m coredns-86c58d9df4-kcm42 1/1 Running 0 10m kube-apiserver-master-0.k8s.org 1/1 Running 0 10m kube-apiserver-master-1.k8s.org 1/1 Running 0 79s kube-controller-manager-master-0.k8s.org 1/1 Running 0 10m kube-controller-manager-master-1.k8s.org 1/1 Running 0 79s kube-proxy-cz8h8 1/1 Running 0 79s kube-proxy-rp8dg 1/1 Running 0 10m kube-scheduler-master-0.k8s.org 1/1 Running 0 10m kube-scheduler-master-1.k8s.org 1/1 Running 0 79s metrics-server-54f6f996dc-kr5wz 1/1 Running 0 7m16s[root@master-1 ~]# kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% master-0.k8s.org 226m 11% 1314Mi 36% master-1.k8s.org 120m 6% 788Mi 21% 添加node添加node主机,做好修改主机名,开启ipvs等。这里写在一个文件里curl -Lk https://raw.githubusercontent.com/marksugar/kubeadMHA/master/systeminit/chenage_hostname|bash主机名你也可以这样修改echo $(ip addr show eth0 | grep -Po 'inet \K[\d.]+'|awk -F. '{print $0}') > /etc/hostname CHOSTNAME=node-$(echo `sed 's@\.@-@g' /etc/hostname`).k8s.org CHOSTNAME_pretty='k8s node' sysctl -w kernel.hostname=$CHOSTNAME hostnamectl set-hostname $CHOSTNAME --static hostnamectl set-hostname "$CHOSTNAME_pretty" --pretty sysctl kernel.hostname=$CHOSTNAME echo -e "\033[31m\033[01m[ `hostnamectl` ]\033[0m"主机名修改后,还需要在做一些操作curl -Lk https://raw.githubusercontent.com/marksugar/kubeadMHA/master/systeminit/ip_vs_a_init|bash这个脚本可以在github打开查看即可而后添加即可[root@node-172-25-50-19.k8s.org ~]# kubeadm join master-vip.k8s.org:6443 --token qffwr6.4dqd3hshvfbxn3f8 --discovery-token-ca-cert-hash sha256:9b7943a35e4b6199b5f9fe50473bd336e28c184975d90e3a0f3076c25b694a18 [preflight] Running pre-flight checks [discovery] Trying to connect to API Server "master-vip.k8s.org:6443" [discovery] Created cluster-info discovery client, requesting info from "https://master-vip.k8s.org:6443" [discovery] Requesting info from "https://master-vip.k8s.org:6443" again to validate TLS against the pinned public key [discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "master-vip.k8s.org:6443" [discovery] Successfully established connection with API Server "master-vip.k8s.org:6443" [join] Reading configuration from the cluster... [join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml' [kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.13" ConfigMap in the kube-system namespace [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Activating the kubelet service [tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap... [patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "node-172-25-50-19.k8s.org" as an annotation This node has joined the cluster: * Certificate signing request was sent to apiserver and a response was received. * The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the master to see this node join the cluster.[root@master-0 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION master-0.k8s.org Ready master 133m v1.13.1 master-1.k8s.org Ready master 123m v1.13.1 node-172-25-50-19.k8s.org Ready <none> 15s v1.13.1故障测试我们关掉keepalived,模拟master-0宕机[root@master-0 kubernetes]# systemctl stop keepalived.service 此时eth0上的172.25.10.15就会飘逸到master-1上,master-0如下[root@master-0 kubernetes]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether fa:16:3e:ed:d3:21 brd ff:ff:ff:ff:ff:ff inet 172.25.50.13/16 brd 172.25.255.255 scope global dynamic eth0 valid_lft 82951sec preferred_lft 82951secmaster-1如下:[root@master-1 ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether fa:16:3e:ae:76:b9 brd ff:ff:ff:ff:ff:ff inet 172.25.50.14/16 brd 172.25.255.255 scope global dynamic eth0 valid_lft 85332sec preferred_lft 85332sec inet 172.25.50.15/16 brd 172.25.255.255 scope global secondary eth0:vip valid_lft forever preferred_lft forever[root@master-1 ~]# kubectl get node NAME STATUS ROLES AGE VERSION master-0.k8s.org Ready master 15m v1.13.1 master-1.k8s.org Ready master 5m7s v1.13.1 [root@master-1 ~]# kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-5d94b577bb-7k7jn 1/1 Running 0 12m calico-node-bdkgn 1/1 Running 0 12m calico-node-nk6xw 1/1 Running 0 5m17s coredns-86c58d9df4-br2x2 1/1 Running 0 14m coredns-86c58d9df4-kcm42 1/1 Running 0 14m kube-apiserver-master-0.k8s.org 1/1 Running 0 14m kube-apiserver-master-1.k8s.org 1/1 Running 0 5m17s kube-controller-manager-master-0.k8s.org 1/1 Running 0 14m kube-controller-manager-master-1.k8s.org 1/1 Running 0 5m17s kube-proxy-cz8h8 1/1 Running 0 5m17s kube-proxy-rp8dg 1/1 Running 0 14m kube-scheduler-master-0.k8s.org 1/1 Running 0 14m kube-scheduler-master-1.k8s.org 1/1 Running 0 5m17s metrics-server-54f6f996dc-kr5wz 1/1 Running 0 11m添加Master我们使用的是keepalived跑的vip,测试发现使用VIP更妥当一些,DNS轮询不见得好用。于是乎将域名解析到这个VIP上,依靠VIP飘逸做HA关于 keepalived,可以参考上面的安装方式,非常简单,运行以下脚本即可bash <(curl -s https://raw.githubusercontent.com/marksugar/lvs/master/keepliaved/install.sh|more)如下:输入Master或者BACKUP和VIP[root@master-0 ~]# bash <(curl -s https://raw.githubusercontent.com/marksugar/lvs/master/keepliaved/install.sh|more) You install role MASTER/BACKUP ? please enter(block letter):MASTER Please enter the use VIP: 172.25.50.15这里需要注意的是,如果是三台keepalived,需要手动修改权重了。如果我们要添加一台Master只需要kubeadm token create --print-join-command获取到token,而后使用--experimental-control-plane,大致如下:kubeadm join master-vip.k8s.org:6443 --token qffwr6.4dqd3hshvfbxn3f8 --discovery-token-ca-cert-hash sha256:9b7943a35e4b6199b5f9fe50473bd336e28c184975d90e3a0f3076c25b694a18 --experimental-control-plane延伸阅读:https://kubernetes.io/docs/setup/independent/high-availability/#external-etcdhttps://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1alpha3https://godoc.org/k8s.io/kube-proxy/config/v1alpha1#KubeProxyIPVSConfigurationhttps://k8smeetup.github.io/docs/setup/independent/high-availability/如果你是需要频繁的安装测试,那下面的这些命令或许有 用:kubeadm reset \rm -rf /var/lib/cni/ \rm -rf /var/lib/kubelet/* \rm -rf /etc/cni/ ip link delete cni0 ip link delete flannel.1 ip link delete docker0 ip link delete dummy0 ip link delete kube-ipvs0 ip link delete tunl0@NONE ip link delete tunl0 ip addr add IP/32 brd IP dev eth0 ip addr del IP/32 brd IP dev tunl0@NONE echo "1" > /proc/sys/net/bridge/bridge-nf-call-iptables && bash /etc/sysconfig/modules/ipvs.modules
2019年02月13日
3,646 阅读
0 评论
0 点赞
2019-01-24
linuxea: 基于kubernetes的etcd 3.3.10外部集群
etcd是一个分布式键值存储,它提供了一种在一组机器上存储数据的可靠方法。它是开源的,可在GitHub上获得。etcd在网络分区期间优雅地处理leader选举,并且可以容忍机器故障,包括leader。应用程序可以将数据读写到etcd中。一个简单的用例是将etcd中的数据库连接详细信息或功能标记存储为键值对。可以监视这些值,允许您的应用在更改时重新配置。高级用法利用一致性保证来实现数据库leader选举或跨工作集群进行分布式锁定etcd是用Go编写的,它具有出色的跨平台支持,较小二进制文件和活跃的社区。etcd机器之间的通信通过Raft一致性算法处理。此处的ETCD主要用来部署kubernetes高可用集群,此后的使用都是基于kubernetes。参考kubernetes官网:https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/etcd配置:https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/hardware.md#hardware-recommendations集群文档:https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/clustering.md示例参考:https://github.com/etcd-io/etcd/tree/master/hack/tls-setup参考:https://github.com/etcd-io/etcd/tree/master/hack/tls-setup/config参考:https://k8smeetup.github.io/docs/setup/independent/high-availability/我们首先配置etcd证书,etcd我们将会用在kubernetes上。这一步是必须的相比较github上etcd的示例,我们简单修改下在这之前 ,我们有必要修改一下主机名,并且配通所有的sshhostnamectl set-hostname etcd1 hostnamectl set-hostname etcd2 hostnamectl set-hostname etcd3cat >> /etc/hosts << EOF 172.25.50.16 etcd1 172.25.50.17 etcd2 172.25.50.18 etcd3 EOF[root@linuxea.com-16 /etc/etcda]# ssh-keygen -t rsa [root@linuxea.com-16 /etc/etcda]# for i in 172.25.50.{17,18};do ssh-copy-id $i; done安装cfssl和cfssljson[root@linuxea.com-16 ~]# curl -so /usr/local/bin/cfssl https://pkg.cfssl.org/R1.2/cfssl_linux-amd64 [root@linuxea.com-16 ~]# curl -o /usr/local/bin/cfssljson https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64 [root@linuxea.com-16 ~]# chmod +x /usr/local/bin/cfssl*生成证书[root@linuxea.com-16 ~]# mkdir -p /etc/kubernetes/pki/etcd [root@linuxea.com-16 ~]# cd /etc/kubernetes/pki/etcdca-config.jsoncat > cat /etc/kubernetes/pki/etcd/ca-config.json << EOF { "signing": { "default": { "expiry": "876000h" }, "profiles": { "server": { "expiry": "876000h", "usages": [ "signing", "key encipherment", "server auth", "client auth" ] }, "client": { "expiry": "876000h", "usages": [ "signing", "key encipherment", "client auth" ] }, "peer": { "expiry": "876000h", "usages": [ "signing", "key encipherment", "server auth", "client auth" ] } } } } EOFca-csr.jsoncat > ca-csr.json << EOL { "CN": "etcd", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "Shanghai", "L": "Shanghai", "O": "etcd", "OU": "Etcd Security" } ] } EOL生成CA证书[root@linuxea.com-16 /etc/kubernetes/pki/etcd]# cfssl gencert -initca ca-csr.json | cfssljson -bare ca - 2018/12/25 16:29:14 [INFO] generating a new CA key and certificate from CSR 2018/12/25 16:29:14 [INFO] generate received request 2018/12/25 16:29:14 [INFO] received CSR 2018/12/25 16:29:14 [INFO] generating key: rsa-2048 2018/12/25 16:29:14 [INFO] encoded CSR 2018/12/25 16:29:14 [INFO] signed certificate with serial number 472142876620060394898834048533122419461412171471[root@linuxea.com-16 /etc/kubernetes/pki/etcd]# ll total 20 -rw-r--r-- 1 root root 905 Dec 25 16:28 ca-config.json -rw-r--r-- 1 root root 1005 Dec 25 16:29 ca.csr -rw-r--r-- 1 root root 212 Dec 25 16:29 ca-csr.json -rw------- 1 root root 1679 Dec 25 16:29 ca-key.pem -rw-r--r-- 1 root root 1371 Dec 25 16:29 ca.pem生成 etcd 客户端证书[root@linuxea.com-16 /etc/kubernetes/pki/etcd]# cat client.json { "CN": "client", "key": { "algo": "ecdsa", "size": 256 } }[root@linuxea.com-16 /etc/kubernetes/pki/etcd]# cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=client client.json | cfssljson -bare client 2018/12/25 16:29:56 [INFO] generate received request 2018/12/25 16:29:56 [INFO] received CSR 2018/12/25 16:29:56 [INFO] generating key: ecdsa-256 2018/12/25 16:29:56 [INFO] encoded CSR 2018/12/25 16:29:56 [INFO] signed certificate with serial number 644510971695673396838569226835778482472560755733 2018/12/25 16:29:56 [WARNING] This certificate lacks a "hosts" field. This makes it unsuitable for websites. For more information see the Baseline Requirements for the Issuance and Management of Publicly-Trusted Certificates, v.1.1.6, from the CA/Browser Forum (https://cabforum.org); specifically, section 10.2.3 ("Information Requirements").如下[root@linuxea.com-16 /etc/kubernetes/pki/etcd]# ll total 36 -rw-r--r-- 1 root root 905 Dec 25 16:28 ca-config.json -rw-r--r-- 1 root root 1005 Dec 25 16:29 ca.csr -rw-r--r-- 1 root root 212 Dec 25 16:29 ca-csr.json -rw------- 1 root root 1679 Dec 25 16:29 ca-key.pem -rw-r--r-- 1 root root 1371 Dec 25 16:29 ca.pem -rw-r--r-- 1 root root 351 Dec 25 16:29 client.csr -rw-r--r-- 1 root root 95 Dec 25 16:29 client.json -rw------- 1 root root 227 Dec 25 16:29 client-key.pem -rw-r--r-- 1 root root 997 Dec 25 16:29 client.pemconfig.json对于config.json有两种方式,第一种,使用官网的,如下[root@linuxea.com-16 /etc/kubernetes/pki/etcd]# cfssl print-defaults csr > config.json[root@linuxea.com-16 /etc/kubernetes/pki/etcd]# sed -i '0,/CN/{s/example\.net/'"$PEER_NAME"'/}' config.json[root@linuxea.com-16 /etc/kubernetes/pki/etcd]# sed -i 's/www\.example\.net/'"$PRIVATE_IP"'/' config.json[root@linuxea.com-16 /etc/kubernetes/pki/etcd]# sed -i 's/example\.net/'"$PUBLIC_IP"'/' config.json[root@linuxea.com-16 /etc/kubernetes/pki/etcd]# cat config.json { "CN": "etcd1", "hosts": [ "", "172.25.50.16" ], "key": { "algo": "ecdsa", "size": 256 }, "names": [ { "C": "US", "L": "CA", "ST": "San Francisco" } ] }第二种方式,直接在这里编辑, 填写参与集群的ipcat > /etc/kubernetes/pki/etcd/config.json << EOF { "CN": "etcd1", "hosts": [ "127.0.0.1", "172.25.50.16", "172.25.50.17", "172.25.50.18" ], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "Shanghai", "L": "Shanghai", "O": "etcd", "OU": "Etcd Security" } ] } EOF运行 cfssl 命令,将会生成peer.pem、peer-key.pem、server.pem、server-key.pem。[root@linuxea.com-16 /etc/kubernetes/pki/etcd]# cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=server config.json | cfssljson -bare server 2018/12/25 16:37:53 [INFO] generate received request 2018/12/25 16:37:53 [INFO] received CSR 2018/12/25 16:37:53 [INFO] generating key: rsa-2048 2018/12/25 16:37:54 [INFO] encoded CSR 2018/12/25 16:37:54 [INFO] signed certificate with serial number 397776469717117599117003178668354588092528739871 2018/12/25 16:37:54 [WARNING] This certificate lacks a "hosts" field. This makes it unsuitable for websites. For more information see the Baseline Requirements for the Issuance and Management of Publicly-Trusted Certificates, v.1.1.6, from the CA/Browser Forum (https://cabforum.org); specifically, section 10.2.3 ("Information Requirements").[root@linuxea.com-16 /etc/kubernetes/pki/etcd]# cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=peer config.json | cfssljson -bare peer 2018/12/25 16:37:59 [INFO] generate received request 2018/12/25 16:37:59 [INFO] received CSR 2018/12/25 16:37:59 [INFO] generating key: rsa-2048 2018/12/25 16:37:59 [INFO] encoded CSR 2018/12/25 16:37:59 [INFO] signed certificate with serial number 453856739993256449551996181659627954567417235192 2018/12/25 16:37:59 [WARNING] This certificate lacks a "hosts" field. This makes it unsuitable for websites. For more information see the Baseline Requirements for the Issuance and Management of Publicly-Trusted Certificates, v.1.1.6, from the CA/Browser Forum (https://cabforum.org); specifically, section 10.2.3 ("Information Requirements").如下[root@linuxea.com-16 /etc/kubernetes/pki/etcd]# ll total 64 -rw-r--r-- 1 root root 905 Dec 25 16:28 ca-config.json -rw-r--r-- 1 root root 1005 Dec 25 16:29 ca.csr -rw-r--r-- 1 root root 212 Dec 25 16:29 ca-csr.json -rw------- 1 root root 1679 Dec 25 16:29 ca-key.pem -rw-r--r-- 1 root root 1371 Dec 25 16:29 ca.pem -rw-r--r-- 1 root root 351 Dec 25 16:29 client.csr -rw-r--r-- 1 root root 95 Dec 25 16:29 client.json -rw------- 1 root root 227 Dec 25 16:29 client-key.pem -rw-r--r-- 1 root root 997 Dec 25 16:29 client.pem -rw-r--r-- 1 root root 375 Dec 25 16:37 config.json -rw-r--r-- 1 root root 1078 Dec 25 16:37 peer.csr -rw------- 1 root root 1679 Dec 25 16:37 peer-key.pem -rw-r--r-- 1 root root 1456 Dec 25 16:37 peer.pem -rw-r--r-- 1 root root 1078 Dec 25 16:37 server.csr -rw------- 1 root root 1679 Dec 25 16:37 server-key.pem -rw-r--r-- 1 root root 1456 Dec 25 16:37 server.pem证书传递将这些生成的证书复制到etcd2和etcd3上[root@linuxea.com-16 /etc/kubernetes/pki/etcd]# for i in 172.25.50.{17,18} ;do scp -r /etc/kubernetes $i:/etc ;done ca-config.json 100% 905 635.1KB/s 00:00 ca-csr.json 100% 212 174.5KB/s 00:00 ca.pem 100% 1371 1.0MB/s 00:00 ca-key.pem 100% 1679 1.3MB/s 00:00 ca.csr 100% 1005 773.7KB/s 00:00 client.json 100% 95 76.1KB/s 00:00 client.pem 100% 997 751.7KB/s 00:00 client-key.pem 100% 227 171.9KB/s 00:00 client.csr 100% 351 256.1KB/s 00:00 config.json 100% 375 96.5KB/s 00:00 server.pem 100% 1456 365.4KB/s 00:00 server-key.pem 100% 1679 425.8KB/s 00:00 server.csr 100% 1078 276.8KB/s 00:00 peer.pem 100% 1456 366.5KB/s 00:00 peer-key.pem 100% 1679 439.1KB/s 00:00 peer.csr 100% 1078 289.2KB/s 00:00 ca-config.json 100% 905 569.4KB/s 00:00 ca-csr.json 100% 212 134.9KB/s 00:00 ca.pem 100% 1371 944.5KB/s 00:00 ca-key.pem 100% 1679 1.1MB/s 00:00 ca.csr 100% 1005 605.7KB/s 00:00 client.json 100% 95 63.3KB/s 00:00 client.pem 100% 997 748.6KB/s 00:00 client-key.pem 100% 227 151.1KB/s 00:00 client.csr 100% 351 244.9KB/s 00:00 config.json 100% 375 90.4KB/s 00:00 server.pem 100% 1456 322.2KB/s 00:00 server-key.pem 100% 1679 372.1KB/s 00:00 server.csr 100% 1078 253.3KB/s 00:00 peer.pem 100% 1456 325.0KB/s 00:00 peer-key.pem 100% 1679 394.1KB/s 00:00 peer.csr 100% 1078 259.3KB/s 00:00 安装etcd这里的环境变量在三台参与集群的机器分别运行版本3.3.10export ETCD_VERSION=v3.3.10 curl -sSL https://github.com/coreos/etcd/releases/download/${ETCD_VERSION}/etcd-${ETCD_VERSION}-linux-amd64.tar.gz | tar -xzv --strip-components=1 -C /usr/local/bin/ rm -rf etcd-$ETCD_VERSION-linux-amd64*先决变量主机名和ip地址,这里的eth0应该与机器的网卡相符export PEER_NAME=$(hostname) export PRIVATE_IP=$(ip addr show eth0 | grep -Po 'inet \K[\d.]+')环境变量写入到/etc/etcd.envtouch /etc/etcd.env echo "PEER_NAME=$PEER_NAME" >> /etc/etcd.env echo "PRIVATE_IP=$PRIVATE_IP" >> /etc/etcd.env启动脚本即将用到的变量,这里标记的是参与etcd集群的三个ip地址export etcd0_ip_address=172.25.50.16 export etcd1_ip_address=172.25.50.17 export etcd2_ip_address=172.25.50.18启动脚本这里面的变量就是上面设置的cat > /etc/systemd/system/etcd.service << EOL [Unit] Description=etcd Documentation=https://github.com/coreos/etcd Conflicts=etcd.service Conflicts=etcd2.service [Service] EnvironmentFile=/etc/etcd.env Type=notify Restart=always RestartSec=5s LimitNOFILE=40000 TimeoutStartSec=0 ExecStart=/usr/local/bin/etcd --name ${PEER_NAME} \ --data-dir /var/lib/etcd \ --listen-client-urls https://${PRIVATE_IP}:2379 \ --advertise-client-urls https://${PRIVATE_IP}:2379 \ --listen-peer-urls https://${PRIVATE_IP}:2380 \ --initial-advertise-peer-urls https://${PRIVATE_IP}:2380 \ --cert-file=/etc/kubernetes/pki/etcd/server.pem \ --key-file=/etc/kubernetes/pki/etcd/server-key.pem \ --client-cert-auth \ --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem \ --peer-cert-file=/etc/kubernetes/pki/etcd/peer.pem \ --peer-key-file=/etc/kubernetes/pki/etcd/peer-key.pem \ --peer-client-cert-auth \ --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem \ --initial-cluster etcd1=https://${etcd0_ip_address}:2380,etcd2=https://${etcd1_ip_address}:2380,etcd3=https://${etcd2_ip_address}:2380 \ --initial-cluster-state new [Install] WantedBy=multi-user.target EOLsystemctl daemon-reload systemctl enable etcd.service systemctl start etcd集群状态查看集群状态是需要证书的,我们配置一个环境变量CMD='--cacert=/etc/kubernetes/pki/etcd/ca.pem --cert=/etc/kubernetes/pki/etcd/server.pem --key=/etc/kubernetes/pki/etcd/server-key.pem' CMD1='https://172.25.50.16:2379,https://172.25.50.17:2379,https://172.25.50.18:2379' CMD2='--ca-file=/etc/kubernetes/pki/etcd/ca.pem --cert-file=/etc/kubernetes/pki/etcd/server.pem --key-file=/etc/kubernetes/pki/etcd/server-key.pem'[root@linuxea.com-16 /etc/kubernetes/pki/etcd]# for i in 172.25.50.{16,17,18}; do ETCDCTL_API=3 etcdctl --endpoints=https://${i}:2379 $CMD endpoint health; done https://172.25.50.16:2379 is healthy: successfully committed proposal: took = 1.984026ms https://172.25.50.17:2379 is healthy: successfully committed proposal: took = 3.357136ms https://172.25.50.18:2379 is healthy: successfully committed proposal: took = 3.55185ms[root@linuxea.com-16 /etc/kubernetes/pki/etcd]# ETCDCTL_API=3 etcdctl --endpoints=https://172.25.50.16:2379 $CMD member list 2e70a124f01a4a5, started, etcd3, https://172.25.50.18:2380, https://172.25.50.18:2379 5fba4c5d1e214899, started, etcd2, https://172.25.50.17:2380, https://172.25.50.17:2379 b55bca6849256d2d, started, etcd1, https://172.25.50.16:2380, https://172.25.50.16:2379 [root@linuxea.com-16 /etc/kubernetes/pki/etcd]# [root@linuxea.com-16 /etc/kubernetes/pki/etcd]# etcdctl -C $CMD1 $CMD2 cluster-health member 2e70a124f01a4a5 is healthy: got healthy result from https://172.25.50.18:2379 member 5fba4c5d1e214899 is healthy: got healthy result from https://172.25.50.17:2379 member b55bca6849256d2d is healthy: got healthy result from https://172.25.50.16:2379 cluster is healthy[root@linuxea.com-16 /etc/kubernetes/pki/etcd]# curl -Lk --cert ./server.pem --key ./server-key.pem https://172.25.50.16:2379/metrics|grep -v debugging延伸阅读:https://coreos.com/etcd/docs/latest/metrics.htmldocker安装如果安装此前的配置,docker的配置应该如下172.25.50.16docker run --net=host -d -v /etc/kubernetes/pki/etcd/:/etc/kubernetes/pki/etcd/ -p 4001:4001 -p 2380:2380 -p 2379:2379 \ --name etcd quay.io/coreos/etcd:v3.3.10 \ etcd -name etcd1 \ --data-dir /var/lib/etcd \ -advertise-client-urls https://172.25.50.16:2379,https://172.25.50.16:4001 \ -listen-client-urls https://0.0.0.0:2379,https://0.0.0.0:4001 \ -initial-advertise-peer-urls https://172.25.50.16:2380 \ -listen-peer-urls https://0.0.0.0:2380 \ --cert-file=/etc/kubernetes/pki/etcd/server.pem \ --key-file=/etc/kubernetes/pki/etcd/server-key.pem \ --client-cert-auth \ --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem \ --peer-cert-file=/etc/kubernetes/pki/etcd/peer.pem \ --peer-key-file=/etc/kubernetes/pki/etcd/peer-key.pem \ --peer-client-cert-auth \ --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem \ -initial-cluster-token etcd-cluster \ -initial-cluster etcd1=https://172.25.50.16:2380,etcd2=https://172.25.50.17:2380,etcd3=https://172.25.50.18:2380 \ -initial-cluster-state new172.25.50.17docker run --net=host -d -v /etc/kubernetes/pki/etcd/:/etc/kubernetes/pki/etcd/ -p 4001:4001 -p 2380:2380 -p 2379:2379 \ --name etcd quay.io/coreos/etcd:v3.3.10 \ etcd -name etcd2 \ --data-dir /var/lib/etcd \ -advertise-client-urls https://172.25.50.17:2379,https://172.25.50.17:4001 \ -listen-client-urls https://0.0.0.0:2379,https://0.0.0.0:4001 \ -initial-advertise-peer-urls https://172.25.50.17:2380 \ -listen-peer-urls https://0.0.0.0:2380 \ --cert-file=/etc/kubernetes/pki/etcd/server.pem \ --key-file=/etc/kubernetes/pki/etcd/server-key.pem \ --client-cert-auth \ --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem \ --peer-cert-file=/etc/kubernetes/pki/etcd/peer.pem \ --peer-key-file=/etc/kubernetes/pki/etcd/peer-key.pem \ --peer-client-cert-auth \ --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem \ -initial-cluster-token etcd-cluster \ -initial-cluster etcd1=https://172.25.50.16:2380,etcd2=https://172.25.50.17:2380,etcd3=https://172.25.50.18:2380 \ -initial-cluster-state new 172.25.50.18docker run --net=host -d -v /etc/kubernetes/pki/etcd/:/etc/kubernetes/pki/etcd/ -p 4001:4001 -p 2380:2380 -p 2379:2379 \ --name etcd quay.io/coreos/etcd:v3.3.10 \ etcd -name etcd3 \ --data-dir /var/lib/etcd \ -advertise-client-urls https://172.25.50.18:2379,https://172.25.50.18:4001 \ -listen-client-urls https://0.0.0.0:2379,https://0.0.0.0:4001 \ -initial-advertise-peer-urls https://172.25.50.18:2380 \ -listen-peer-urls https://0.0.0.0:2380 \ --cert-file=/etc/kubernetes/pki/etcd/server.pem \ --key-file=/etc/kubernetes/pki/etcd/server-key.pem \ --client-cert-auth \ --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem \ --peer-cert-file=/etc/kubernetes/pki/etcd/peer.pem \ --peer-key-file=/etc/kubernetes/pki/etcd/peer-key.pem \ --peer-client-cert-auth \ --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem \ -initial-cluster-token etcd-cluster \ -initial-cluster etcd1=https://172.25.50.16:2380,etcd2=https://172.25.50.17:2380,etcd3=https://172.25.50.18:2380 \ -initial-cluster-state new 这样比较麻烦,我们简化一下先决条件:配置各个主机的hostname[root@DT_Node-172_25_50_16 /etc/kubernetes/pki/etcd]# hostnamectl set-hostname etcd1 [root@DT_Node-172_25_50_17 /etc/kubernetes/pki/etcd]# hostnamectl set-hostname etcd2 [root@DT_Node-172_25_50_18 /etc/kubernetes/pki/etcd]# hostnamectl set-hostname etcd3环境变量export PEER_NAME=$(hostname) export PRIVATE_IP=$(ip addr show eth0 | grep -Po 'inet \K[\d.]+') export etcd0_ip_address=172.25.50.16 export etcd1_ip_address=172.25.50.17 export etcd2_ip_address=172.25.50.18现在docker的启动命令就如下所示了:docker run --net=host -d -v /etc/kubernetes/pki/etcd/:/etc/kubernetes/pki/etcd/ -p 4001:4001 -p 2380:2380 -p 2379:2379 \ -v /data/etcd:/data/etcd \ --name etcd quay.io/coreos/etcd:v3.3.10 \ etcd -name ${PEER_NAME} \ --data-dir /data/etcd \ -advertise-client-urls https://${PRIVATE_IP}:2379,https://${PRIVATE_IP}:4001 \ -listen-client-urls https://0.0.0.0:2379,https://0.0.0.0:4001 \ -initial-advertise-peer-urls https://${PRIVATE_IP}:2380 \ -listen-peer-urls https://0.0.0.0:2380 \ --cert-file=/etc/kubernetes/pki/etcd/server.pem \ --key-file=/etc/kubernetes/pki/etcd/server-key.pem \ --client-cert-auth \ --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem \ --peer-cert-file=/etc/kubernetes/pki/etcd/peer.pem \ --peer-key-file=/etc/kubernetes/pki/etcd/peer-key.pem \ --peer-client-cert-auth \ --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem \ -initial-cluster-token etcd-cluster \ -initial-cluster etcd1=https://${etcd0_ip_address}:2380,etcd2=https://${etcd1_ip_address}:2380,etcd3=https://${etcd2_ip_address}:2380 \ -initial-cluster-state new但是这样还是不太方便,我们写成docker-compose即可docker-composeversion: '2.2' services: etcd: image: marksugar/coreos-etcd:v3.3.10 container_name: etcd3 restart: always privileged: true network_mode: "host" volumes: - /data/etcd:/data/etcd - /etc/kubernetes/pki/etcd/:/etc/kubernetes/pki/etcd/ command: "etcd -name ${PEER_NAME} --data-dir /data/etcd -advertise-client-urls https://${PRIVATE_IP}:2379,https://${PRIVATE_IP}:4001 -listen-client-urls https://0.0.0.0:2379,https://0.0.0.0:4001 -initial-advertise-peer-urls https://${PRIVATE_IP}:2380 -listen-peer-urls https://0.0.0.0:2380 --cert-file=/etc/kubernetes/pki/etcd/server.pem --key-file=/etc/kubernetes/pki/etcd/server-key.pem --client-cert-auth --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem --peer-cert-file=/etc/kubernetes/pki/etcd/peer.pem --peer-key-file=/etc/kubernetes/pki/etcd/peer-key.pem --peer-client-cert-auth --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem -initial-cluster-token etcd-cluster -initial-cluster etcd1=https://${etcd0_ip_address}:2380,etcd2=https://${etcd1_ip_address}:2380,etcd3=https://${etcd2_ip_address}:2380 -initial-cluster-state new " cpu_shares: 90 mem_limit: 2048m logging: driver: "json-file" options: max-size: "200M" labels: SERVICE_TAGS: etcd必须:-v /data/etcd:/data/etcd如果丢失数据-dir ==永远丢失成员。附上三个快速重置的脚本:脚本运行后会自动删除集群内的数据存储目录,而后重启当前节点的etcd#!/bin/bash ######################################################################### # File Name: start.sh # Author: www.linuxea.com # Email: usertzc@163.com # Version: # Created Time: Wed 02 Jan 2019 11:14:26 AM CST ######################################################################### for i in 172.25.50.{16,17,18};do ssh $i "docker rm -f etcd && \rm -rf /data/etcd && ls /data";done hostnamectl set-hostname etcd1 export PEER_NAME=$(hostname) export PRIVATE_IP=$(ip addr show eth0 | grep -Po 'inet \K[\d.]+') export etcd0_ip_address=172.25.50.16 export etcd1_ip_address=172.25.50.17 export etcd2_ip_address=172.25.50.18 hostname docker-compose -f /opt/docker-compose.yaml up -d #CMD='--cacert=/etc/kubernetes/pki/etcd/ca.pem --cert=/etc/kubernetes/pki/etcd/server.pem --key=/etc/kubernetes/pki/etcd/server-key.pem' #CMD1='https://172.25.50.16:2379,https://172.25.50.17:2379,https://172.25.50.18:2379' #CMD2='--ca-file=/etc/kubernetes/pki/etcd/ca.pem --cert-file=/etc/kubernetes/pki/etcd/server.pem --key-file=/etc/kubernetes/pki/etcd/server-key.pem' #for i in 172.25.50.{16,17,18}; do ETCDCTL_API=3 etcdctl --endpoints=https://${i}:2379 $CMD endpoint health; done #cd /etc/kubernetes/pki/etcd/ && scp -P22992 ca.pem client.pem client-key.pem 172.25.50.13:/etc/kubernetes/pki/etcd/脚本2中仅仅设置了环境变量和启动的docker-compose.yaml#!/bin/bash ######################################################################### # File Name: start.sh # Author: www.linuxea.com # Email: usertzc@163.com # Version: # Created Time: Wed 02 Jan 2019 11:15:02 AM CST ######################################################################### #docker rm -f etcd && \rm -rf /data/etcd hostnamectl set-hostname etcd2 export PEER_NAME=$(hostname) export PRIVATE_IP=$(ip addr show eth0 | grep -Po 'inet \K[\d.]+') export etcd0_ip_address=172.25.50.16 export etcd1_ip_address=172.25.50.17 export etcd2_ip_address=172.25.50.18 hostname docker-compose -f /opt/docker-compose.yaml up -d脚本3和2几乎 一样,除了名称外#!/bin/bash ######################################################################### # File Name: start.sh # Author: www.linuxea.com # Email: usertzc@163.com # Version: # Created Time: Wed 02 Jan 2019 11:15:26 AM CST ######################################################################### #docker rm -f etcd && \rm -rf /data/etcd hostnamectl set-hostname etcd3 export PEER_NAME=$(hostname) export PRIVATE_IP=$(ip addr show eth0 | grep -Po 'inet \K[\d.]+') export etcd0_ip_address=172.25.50.16 export etcd1_ip_address=172.25.50.17 export etcd2_ip_address=172.25.50.18 hostname docker-compose -f /opt/docker-compose.yaml up -d延伸阅读:https://coreos.com/etcd/docs/latest/v2/docker_guide.html https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/container.md监控https://coreos.com/etcd/docs/latest/metrics.htmlhttps://etcd.readthedocs.io/en/latest/operate.html#v3-3如果要在kubernetes中监控外部etcd,可参考:https://github.com/marksugar/k8s-pgmon
2019年01月24日
3,026 阅读
0 评论
0 点赞
2018-12-18
linuxea:kubernetes helm安装efk(52)
efkElasticSearch 是一个搜索引擎,由java编写,Logstash用于生成日志,收集日志,转换日志,而后输入到ElasticSearch 。Logstash扮演agent用来收集日志,因此使用DaemonSet,日志收集完成发送到Logstash server端,Logstash统一整理之后注入到ElasticSearch 当中。ElasticSearch 一般情况下会是一个集群。Logstash也可能不是一个节点,如果两者直接速度不协调,可以使用消息队列,比如:redis组件。一般使用的情况如下图:其中Logstash扮演的角色也包括将所有节点的日志格式化(统一的格式转换)等额外操作,而后注入到ElasticSearch cluster。但是Logstash作为agent来讲,会显得重量级。因此,使用Filebeat来替代。但是,我们并不使用Filebeat。在节点上工作并收集日志的工具不单单有Logstash,Filebeat,还有Fluentd,还有其他的等。在k8s集群上,每个节点运行了很多pod,每个pod内也有很多容器,这些容器的日志信息,就需要统一平台进行管理。尽管我们可以使用kuberctl logs进行查看日志内容,但是在k8s中,pod是 以群体存在的,这种方式并不适用。其次,如果一个pod中的容器崩溃,那么崩溃前的日志内容很可能就无法收集,这是无法解决的。当一个pod容器宕掉,想要查看日志,就需要提前实时进行收集,在云环境平台,尤其需要日志统一平台进行收集日志。在一个完整的k8s平台中,有4个重要附件:coredns,ingress,metrics-server+prometheus,dashboard,而EFK也作为一个基础附件,不过,EFK在k8s集群中大多情况下是必须要提供的。ElasticSearch而在helm中efk的部署方式,特别是ElasticSearch的部署方式和常规的elk方式是有所不同的。为了能让ElasticSearch运行在k8s环境下,ElasticSearch官方制作了类型的相关镜象,在其镜象中打包了相关组件,直接能够运行在k8s之上,一般而言可以运行三种格式:ElasticSearch拆分为两部分,由master和data组成,master节点负责轻量化的查询请求,data节点负责重量级的索引构建等请求。ElasticSearch能够完成既能查询也能构建索引功能,分为两层实现。master作为接入到ElasticSearch的唯一入口,而master和data都可作为分布式,data层可随量横向扩展,master也做多台冗余。在data层是需要持久数据存储,也就是存储卷。这样一来就意味着有两个集群,分别是master和data集群。如下图:除此之外,还有client集群。摄入节点(如上图),收入任何的日志收集工具,如Fluentd发来的日志,由摄入节点统一生成特定格式以后发给master节点,由此,我们可以当作它为Logstash节点。在ELK中Logstash也就是作为清洗日志,而后转交给ElasticSearch。而摄入节点也就是有此功能。这也是x-pack的方式。master 和 data都是有状态的,意味着需要存储卷,持久存储卷。k8s日志收集方式收集日志可以分为两种方式:pod内部和外部内部:每个pod单独发送自己的日志给日志存储的,这样一来就需要在pod内部署几个容器来进行收集日志,随着pod增多,那就需要部署很多。也可以在docker中部署日志收集器,但是这种方式并不被推荐。外部:一般而言, 会在节点上部署一个统一的插件,节点上所有容器,以及节点自身。统一收集发往日志平台,而这套日志收集系统,可以部署在集群内,也可以部署在集群之外。我们暂且不管ElasticSearch部署在哪里。假如使用Fluentd收集,Fluentd本身是部署在集群外和集群内是需要考量的。如果在集群内,只需要挂载存储卷并且将主机日志目录关联到pod中,运行DaemonSet即可。而运行主机上,如果Fluentd出现问题就不再集群控制范围内。Fluentd本身是通过本地/var/log来获取日志,而节点上每一个容器的日志是输出到/var/log/containers/下部署elasticsearch[root@linuxea helm]# helm fetch stable/elasticsearch --version 1.13.3[root@linuxea helm]# tar xf elasticsearch-1.13.3.tgz tar: elasticsearch/Chart.yaml: implausibly old time stamp 1970-01-01 01:00:00 tar: elasticsearch/values.yaml: implausibly old time stamp 1970-01-01 01:00:00 tar: elasticsearch/templates/NOTES.txt: implausibly old time stamp 1970-01-01 01:00:00 tar: elasticsearch/templates/_helpers.tpl: implausibly old time stamp 1970-01-01 01:00:00 tar: elasticsearch/templates/client-deployment.yaml: implausibly old time stamp 1970-01-01 01:00:创建一个名称空间[root@linuxea elasticsearch]# kubectl create namespace efk namespace/efk created[root@linuxea elasticsearch]# kubectl get ns -n efk NAME STATUS AGE default Active 8d efk Active 13s而后install stable/elasticsearch,并且指定名称空间,-f指定/values.yaml文件values文件部分要将持久存储关闭 podDisruptionBudget: enabled: false persistence: enabled: false安装[root@linuxea elasticsearch]# helm install --name els-1 --namespace=efk -f ./values.yaml stable/elasticsearch NAME: els-1 LAST DEPLOYED: Mon Nov 19 06:45:40 2018 NAMESPACE: efk STATUS: DEPLOYED RESOURCES: ==> v1/ConfigMap NAME AGE els-1-elasticsearch 1s ==> v1/ServiceAccount els-1-elasticsearch-client 1s els-1-elasticsearch-data 1s els-1-elasticsearch-master 1s ==> v1/Service els-1-elasticsearch-client 1s els-1-elasticsearch-discovery 1s ==> v1beta1/Deployment els-1-elasticsearch-client 1s ==> v1beta1/StatefulSet els-1-elasticsearch-data 1s els-1-elasticsearch-master 1s ==> v1/Pod(related)初始化状态NAME READY STATUS RESTARTS AGE els-1-elasticsearch-client-779495bbdc-5d22f 0/1 Init:0/1 0 1s els-1-elasticsearch-client-779495bbdc-tzbps 0/1 Init:0/1 0 1s els-1-elasticsearch-data-0 0/1 Init:0/2 0 1s els-1-elasticsearch-master-0 0/1 Init:0/2 0 1sNOTES信息,这些信息可通过stats复现NOTES: The elasticsearch cluster has been installed. Elasticsearch can be accessed: * Within your cluster, at the following DNS name at port 9200: els-1-elasticsearch-client.efk.svc * From outside the cluster, run these commands in the same shell: export POD_NAME=$(kubectl get pods --namespace efk -l "app=elasticsearch,component=client,release=els-1" -o jsonpath="{.items[0].metadata.name}") echo "Visit http://127.0.0.1:9200 to use Elasticsearch" kubectl port-forward --namespace efk $POD_NAME 9200:9200当镜象拉取完成,都准备好就会启动[root@linuxea ~]# kubectl get pods -n efk NAME READY STATUS RESTARTS AGE els-1-elasticsearch-client-779495bbdc-5d22f 0/1 Running 0 32s els-1-elasticsearch-client-779495bbdc-tzbps 0/1 Running 0 32s els-1-elasticsearch-data-0 0/1 Running 0 32s els-1-elasticsearch-master-0 0/1 Running 0 32s稍等后全是Running状态,READY满载,也就是client,client,master都运行了2个[root@linuxea ~]# kubectl get pods -n efk NAME READY STATUS RESTARTS AGE els-1-elasticsearch-client-779495bbdc-5d22f 1/1 Running 0 1m els-1-elasticsearch-client-779495bbdc-tzbps 1/1 Running 0 1m els-1-elasticsearch-data-0 1/1 Running 0 1m els-1-elasticsearch-data-1 1/1 Running 0 47s els-1-elasticsearch-master-0 1/1 Running 0 1m els-1-elasticsearch-master-1 1/1 Running 0 53s验证我们下载一个cirror镜象,验证elasticsearch是否正常运行[root@linuxea elasticsearch]# kubectl run cirror-$RANDOM --rm -it --image=cirros -- /bin/sh If you don't see a command prompt, try pressing enter. /#查看是否能够被解析/ # nslookup els-1-elasticsearch-client.efk.svc Server: 10.96.0.10 Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local Name: els-1-elasticsearch-client.efk.svc Address 1: 10.104.57.197 els-1-elasticsearch-client.efk.svc.cluster.local是否能够访问9200端口/ # curl els-1-elasticsearch-client.efk.svc.cluster.local:9200 { "name" : "els-1-elasticsearch-client-779495bbdc-tzbps", "cluster_name" : "elasticsearch", "cluster_uuid" : "ROD_0h1vRiW_5POVBwz3Nw", "version" : { "number" : "6.4.3", "build_flavor" : "oss", "build_type" : "tar", "build_hash" : "fe40335", "build_date" : "2018-10-30T23:17:19.084789Z", "build_snapshot" : false, "lucene_version" : "7.4.0", "minimum_wire_compatibility_version" : "5.6.0", "minimum_index_compatibility_version" : "5.0.0" }, "tagline" : "You Know, for Search" }_cat/ # curl els-1-elasticsearch-client.efk.svc.cluster.local:9200/_cat/ =^.^= /_cat/allocation /_cat/shards /_cat/shards/{index} /_cat/master /_cat/nodes /_cat/tasks /_cat/indices /_cat/indices/{index} /_cat/segments /_cat/segments/{index} /_cat/count /_cat/count/{index} /_cat/recovery /_cat/recovery/{index} /_cat/health /_cat/pending_tasks /_cat/aliases /_cat/aliases/{alias} /_cat/thread_pool /_cat/thread_pool/{thread_pools} /_cat/plugins /_cat/fielddata /_cat/fielddata/{fields} /_cat/nodeattrs /_cat/repositories /_cat/snapshots/{repository} /_cat/templates以及节点/ # curl els-1-elasticsearch-client.efk.svc.cluster.local:9200/_cat/nodes 172.16.4.92 21 85 2 0.32 0.27 0.16 i - els-1-elasticsearch-client-779495bbdc-tzbps 172.16.5.55 19 76 2 0.08 0.11 0.12 mi * els-1-elasticsearch-master-1 172.16.3.109 21 40 5 0.15 0.13 0.10 di - els-1-elasticsearch-data-0 172.16.4.91 21 85 2 0.32 0.27 0.16 mi - els-1-elasticsearch-master-0 172.16.5.54 21 76 2 0.08 0.11 0.12 i - els-1-elasticsearch-client-779495bbdc-5d22f 172.16.4.93 21 85 1 0.32 0.27 0.16 di - els-1-elasticsearch-data-1或者索引/ # curl els-1-elasticsearch-client.efk.svc.cluster.local:9200/_cat/indices部署fluenetd[root@linuxea helm]# helm fetch stable/fluentd-elasticsearch[root@linuxea helm]# tar xf fluentd-elasticsearch-1.1.0.tgz tar: fluentd-elasticsearch/Chart.yaml: implausibly old time stamp 1970-01-01 01:00:00 tar: fluentd-elasticsearch/values.yaml: implausibly old time stamp 1970-01-01 01:00:00 tar: fluentd-elasticsearch/templates/NOTES.txt: implausibly old time stamp 1970-01-01 01:00:00 tar: fluentd-elasticsearch/templates/_helpers.tpl: implausibly old time stamp 1970-01-01 01:00:00 tar: fluentd-elasticsearch/templates/clusterrole.yaml: implausibly old time stamp 1970-01-01 01:00:00修改elasticsearch: host: 'elasticsearch-client' port: 9200 buffer_chunk_limit: 2M buffer_queue_limit: 8 为elasticsearch: host: 'els-1-elasticsearch-client.efk.svc.cluster.local' port: 9200 buffer_chunk_limit: 2M buffer_queue_limit: 8这个els-1-elasticsearch-client.efk.svc.cluster.local地址是集群内部的地址,并不是pod地址。prometheus启用annotations: prometheus.io/scrape: "true" prometheus.io/port: "24231" ------------- service: type: ClusterIP ports: - name: "monitor-agent" port: 24231污点容忍度tolerations: - key: node-role.kubernetes.io/master operator: Exists effect: NoSchedule而后install[root@linuxea helm]# helm install --name fluentd1 --namespace=efk -f fluentd-elasticsearch/values.yaml stable/fluentd-elasticsearch观察pod running[root@linuxea ~]# kubectl get pods -n efk -w NAME READY STATUS RESTARTS AGE els-1-elasticsearch-client-779495bbdc-5d22f 1/1 Running 0 1h els-1-elasticsearch-client-779495bbdc-tzbps 1/1 Running 0 1h els-1-elasticsearch-data-0 1/1 Running 0 1h els-1-elasticsearch-data-1 1/1 Running 0 1h els-1-elasticsearch-master-0 1/1 Running 0 1h els-1-elasticsearch-master-1 1/1 Running 0 1h fluentd1-fluentd-elasticsearch-28wnb 0/1 ContainerCreating 0 16s fluentd1-fluentd-elasticsearch-77qmr 0/1 ContainerCreating 0 16s fluentd1-fluentd-elasticsearch-885fc 0/1 ContainerCreating 0 16s fluentd1-fluentd-elasticsearch-9kzfm 0/1 ContainerCreating 0 16s fluentd1-fluentd-elasticsearch-lnbvg 0/1 ContainerCreating 0 16s fluentd1-fluentd-elasticsearch-9kzfm 1/1 Running 0 2m fluentd1-fluentd-elasticsearch-77qmr 1/1 Running 0 2m fluentd1-fluentd-elasticsearch-28wnb 1/1 Running 0 2m fluentd1-fluentd-elasticsearch-885fc 1/1 Running 0 2m fluentd1-fluentd-elasticsearch-lnbvg 1/1 Running 0 3m验证验证索引/ # curl els-1-elasticsearch-client.efk.svc.cluster.local:9200/_cat/indices green open logstash-2018.11.15 trf07L62QHaAIG5oym73kw 5 1 12796 0 5.9mb 3mb green open logstash-2018.11.16 thookgxbS86mNmbLPCeOQA 5 1 9964 0 4.8mb 2.4mb green open logstash-2018.11.18 Y0pbmu7RSQizbvoH7V6Cig 5 1 9597 0 5.3mb 2.7mb green open logstash-2018.11.17 pIhgyn-7TaeYOzLbH-dSJA 5 1 13302 0 9.8mb 4.8mb green open logstash-2018.11.14 JirkQqsUSmqUnn0bRPVJLA 5 1 12402 0 5.8mb 2.8mb green open logstash-2018.11.10 4zOpjsVFSMmF0hpY13jryg 5 1 179246 0 128.8mb 65.8mb green open logstash-2018.11.11 ZF6V9DETQlCBsJPSMdc6ww 5 1 34778 0 15.7mb 7.8mb green open logstash-2018.11.13 0qZgntkHTRiuK2_S3Rtvnw 5 1 12679 0 6.3mb 3.1mb green open logstash-2018.11.09 8lSv0UvxQHWTPBZVx9EMVg 5 1 7229 0 7.3mb 4mb green open logstash-2018.11.12 rJiKEdFdTzqE3ovKecn4kw 5 1 11983 0 5.3mb 2.6mb green open logstash-2018.11.19 MhasgEdtS3KWR-KpS30E7A 5 1 38671 0 73.3mb 40mb部署kibanakibana版本要和es一致[root@linuxea helm]# helm fetch stable/kibana --version 0.18.0[root@linuxea helm]# tar xf kibana-0.18.0.tgz tar: kibana/Chart.yaml: implausibly old time stamp 1970-01-01 01:00:00 tar: kibana/values.yaml: implausibly old time stamp 1970-01-01 01:00:00 tar: kibana/templates/NOTES.txt: implausibly old time stamp 1970-01-01 01:00:00 tar: kibana/templates/_helpers.tpl: implausibly old time stamp 1970-01-01 01:00:00修改values的elasticsearch.url地址,这个地址可使用helm list查看,并且使用helm status HELMNAME查看files: kibana.yml: ## Default Kibana configuration from kibana-docker. server.name: kibana server.host: "0" # elasticsearch.url: http://elasticsearch:9200 elasticsearch.url: http://els-1-elasticsearch-client.efk.svc:9200使用NodePort。便于集群外访问service: type: NodePort externalPort: 443 internalPort: 5601另外,如果有必要,这里的image版本必须和ElasticSearch一样image: repository: "docker.elastic.co/kibana/kibana-oss" tag: "6.4.3" pullPolicy: "IfNotPresent"安装[root@linuxea helm]# helm install --name kibana1 --namespace=efk -f kibana/values.yaml stable/kibana --version 0.18.0 NAME: kibana1 LAST DEPLOYED: Mon Nov 19 08:13:34 2018 NAMESPACE: efk STATUS: DEPLOYED RESOURCES: ==> v1/ConfigMap NAME AGE kibana1 1s ==> v1/Service kibana1 1s ==> v1beta1/Deployment kibana1 1s ==> v1/Pod(related) NAME READY STATUS RESTARTS AGE kibana1-578f8d68c7-dvq2z 0/1 ContainerCreating 0 1s NOTES: To verify that kibana1 has started, run: kubectl --namespace=efk get pods -l "app=kibana" Kibana can be accessed: * From outside the cluster, run these commands in the same shell: export NODE_PORT=$(kubectl get --namespace efk -o jsonpath="{.spec.ports[0].nodePort}" services kibana1) export NODE_IP=$(kubectl get nodes --namespace efk -o jsonpath="{.items[0].status.addresses[0].address}") echo http://$NODE_IP:$NODE_PORT经过漫长的等待kibana1已经running,不过由于资源不足,fluentd有被驱逐Evicted的现象[root@linuxea helm]# kubectl get pods -n efk -o wide -w NAME READY STATUS RESTARTS AGE IP NODE els-1-elasticsearch-client-779495bbdc-9rg4x 1/1 Running 0 7m 172.16.3.124 linuxea.node-2.com els-1-elasticsearch-client-779495bbdc-bhq2f 1/1 Running 0 7m 172.16.2.28 linuxea.node-1.com els-1-elasticsearch-data-0 1/1 Running 0 7m 172.16.5.65 linuxea.node-4.com els-1-elasticsearch-data-1 1/1 Running 0 6m 172.16.2.29 linuxea.node-1.com els-1-elasticsearch-master-0 1/1 Running 0 7m 172.16.3.125 linuxea.node-2.com els-1-elasticsearch-master-1 1/1 Running 0 6m 172.16.5.66 linuxea.node-4.com els-1-elasticsearch-master-2 1/1 Running 0 5m 172.16.4.97 linuxea.node-3.com fluentd1-fluentd-elasticsearch-2bllt 1/1 Running 0 3m 172.16.4.98 linuxea.node-3.com fluentd1-fluentd-elasticsearch-7pkvl 1/1 Running 0 3m 172.16.2.30 linuxea.node-1.com fluentd1-fluentd-elasticsearch-cnhk6 1/1 Running 0 3m 172.16.0.26 linuxea.master-1.com fluentd1-fluentd-elasticsearch-mk9m2 1/1 Running 0 3m 172.16.5.67 linuxea.node-4.com fluentd1-fluentd-elasticsearch-wm2kw 1/1 Running 0 3m 172.16.3.126 linuxea.node-2.com kibana1-bfbbf89f6-4tkzb 0/1 ContainerCreating 0 45s <none> linuxea.node-2.com kibana1-bfbbf89f6-4tkzb 1/1 Running 0 1m 172.16.3.127 linuxea.node-2.com通过NodePort端口进行访问[root@linuxea helm]# kubectl get svc -n efk NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE fluentd1-fluentd-elasticsearch ClusterIP 10.106.221.12 <none> 24231/TCP 4m kibana1 NodePort 10.98.70.188 <none> 443:32725/TCP 2m
2018年12月18日
4,388 阅读
0 评论
0 点赞
2018-12-17
linuxea:kubernetes 了解chart(51)
在helm的体系中,有helm,tiller server,以及chart。tiller作为服务端一般运行在k8s集群之上,helm能够通过tiller server在k8s之上部署应用程序,这些应用程序来在helm能够访问到的仓库当中的chart。此前我们自定义了一些简单的值文件,实现实例化chart可安装的,可运行的release。也可以通过chart生成release,其中配置文件config来自chart中的values.yaml文件。对于chart来讲,事实上内部没有太复杂的地方,只是将此前我们所使用的编写的配置清单,改写为可复用的格式。此前我们的配置清单的都是为了完成某一个特定的功能而开发,一旦编辑完成,如果需要配置变更就需要重新修改配置清单文件,或者重新更新配置文件。而helm chart引用了一种机制,能够把此前配置清单中特定的内容,尤其是属性值,配置为模板内容。这类似于ansibel当中的playbook模板。只不过其中调用的变量值来自于不同的位置,或是helm内建的,或者是来自部署的release,也有一些属性值是由用户通过值文件(config)提供。chart组成我们先获取一个chart[root@linuxea helm]# helm fetch stable/redis [root@linuxea helm]# ls redis-4.2.10.tgz values.yaml这些文件在打包的时候将时间改为unix元年的时间[root@linuxea helm]# tar xf redis-4.2.10.tgz tar: redis/Chart.yaml: implausibly old time stamp 1970-01-01 01:00:00 tar: redis/values.yaml: implausibly old time stamp 1970-01-01 01:00:00 tar: redis/templates/NOTES.txt: implausibly old time stamp 1970-01-01 01:00:00 tar: redis/templates/_helpers.tpl: implausibly old time stamp 1970-01-01 01:00:00 tar: redis/templates/configmap.yaml: implausibly old time stamp 1970-01-01 01:00:00一个chart组成部分:https://docs.helm.sh/developing_charts/#the-chart-file-structure,翻译如下wordpress/ Chart.yaml #YAML文件,包含有关chart的信息,名称,版本等等信息 LICENSE #OPTIONAL:包含chart许可证的纯文本文件 README.md #OPTIONAL:一个人类可读的README文件 requirements.yaml #OPTIONAL:列出chart依赖关系的YAML文件。如,lnmp的依赖关系。自动处理依赖 values.yaml #此chart的默认配置值。对模板中自定义变量的引用设定的默认值。 charts/ #包含此chart所依赖的任何chart的目录。charts于requirements类似。目录下存放每一个被当前charts所依赖的其他的charts的打包tgz格式文件。一旦tgz存放在此处,就会被依赖。不管requirements是否存在。手动依赖关系 templates/ #模板目录,当与值组合时。这取决于模板的语法,比如清单中定义的service,pvc等,这些便于复用的属性或者字段就会改造成能够通过模板编程语言,基于模板编程结果生成的信息并且自动替换到代码所在处。这需要对go模板语法理解。 #将生成有效的Kubernetes清单文件。 templates/NOTES.txt #OcTIONAL:包含简短使用说明的纯文本文件redis文件树如下:yaml定义chartapiVersion:chartAPI版本,始终为“v1”(必填) name:chart的名称(必填),与目录名称保持一致 Version:SemVer 2版本(必填),版本格式 kubeVersion:SemVer系列兼容的Kubernetes版本(可选) description:该项目的单句描述(可选) keywords: - 关于此项目的关键字列表(可选) home:该项目主页的URL(可选) sources: - 此项目源代码的URL列表(可选) maintainers:#(可选) - name:维护者的名字(每个维护者都需要) email:维护者的电子邮件(每个维护者可选) url:维护者的URL(每个维护者可选) engine:gotpl#模板引擎的名称(可选,默认为gotpl) icon:要用作图标的SVG或PNG图像的URL(可选)。 appVersion:包含的应用程序版本(可选)。这不一定是SemVer。 deprecated:是否弃用此chart(可选,布尔值) tillerVersion:此chart所需的Tiller版本。这应该表示为SemVer范围:“> 2.0.0”(可选)requirementsdependencies: - name: apache version: 1.2.3 repository: http://example.com/charts - name: mysql version: 3.2.1 repository: http://another.example.com/charts在dependencies中会引用一个列表,每一个列表是一个对象,每一个对象就定义一个被依赖的chart该name字段是您想要的chart的名称。该version字段是您想要的chart的版本。该repository字段是chart存储库的完整URL。请注意,您还必须使用helm repo add在本地添加该repo。打包一个当前的chart,就会分析requirements文件语法格式,将每一个依赖的chart从指定的仓库中下载至本地完成打包。运行helm dependency update CHARTS_NAME ,将使用的requirements文件将所有指定的依赖项下载到charts/目录中。这样在执行一个应用的时候,打包内的依赖都被自动安装完成。当然,如果我此前已知依赖关系,只需要将依赖的包下载到charts目录下,也是可以的。alias也有一些依赖是通过特殊的名称进行依赖,这时候就可以用alias别名来定义依赖关系即可dependencies: - name: subchart repository: http://localhost:10191 version: 0.1.0 alias: new-subchart-1templates模板文件遵循编写Go模板的标准约定,详情参阅:https://golang.org/pkg/text/template/变量模板文件中{{.Values.imageRegistry}}变量,这里的.values就是指明来在values.yaml文件。而imageRegistry和dockerTag才是第一个字段image: {{.Values.imageRegistry}}/postgres:{{.Values.dockerTag}}而在values中的值,分别如下:imageRegistry: "quay.io/deis" dockerTag: "latest"也可以设置默认值预设值,如果.Values.storage存在就是用存在values值文件的值,如果不存在就是用默认default "minio",如下:value: {{default "minio" .Values.storage}}而以往使用的定义的配置清单,都可以引用模板语法格式,对一些可变动的属性值,可以做成变量引用,一些复杂的场景可以使用条件判断等,如:设置为ture或者false来启用或者关闭一段配置。此后,在安装的时候使用--values=VALUES.yaml即可预定义的值可以从模板中的对象访问通过values.yaml文件(或通过--set标志)提供的值.Values。也可以在模板中访问其他预定义的数据。以下值是预定义的,可用于每个模板,并且无法覆盖。与所有值一样,名称区分大小写。Release.Name:发布的名称(不是chart)Release.Time:chart发布上次更新的时间。这将匹配Last ReleasedRelease对象上的时间。Release.Namespace:chart发布到的名称空间。Release.Service:进行发布的服务。通常这是Tiller。Release.IsUpgrade:如果当前操作是升级或回滚,则设置为true。Release.IsInstall:如果当前操作是安装,则设置为true。Release.Revision:修订号。它从1开始,每个都递增helm upgrade。Chart:内容Chart.yaml。因此,chart版本可以Chart.Version和维护者一起获得Chart.Maintainers。Files:类似于地图的对象,包含chart中的所有非特殊文件。这不会授予您访问模板的权限,但可以访问存在的其他文件(除非使用它们除外.helmignore)。可以使用{{index .Files "file.name"}}或使用{{.Files.Get name}}或 {{.Files.GetString name}}函数访问文件。您也可以访问该文件的内容,[]byte使用{{.Files.GetBytes}}Capabilities:类似于地图的对象,包含有关Kubernetes({{.Capabilities.KubeVersion}},Tiller({{.Capabilities.TillerVersion}}和支持的Kubernetes API)版本({{.Capabilities.APIVersions.Has "batch/v1")的版本的信息注意:将删除任何未知的Chart.yaml字段。它们将无法在Chart对象内部访问。因此,Chart.yaml不能用于将任意结构化数据传递到模板中。但是,值文件可以用于此目的。如果要修改一个值就可以使用set修改。创建charthelm create创建一个linuxea.com的chartChart.yaml[root@linuxea helm]# helm create linuxea Creating linuxea而后生成如下清单而后修改一下Chart文件[root@linuxea linuxea]# cat Chart.yaml apiVersion: v1 appVersion: "1.0" description: A Helm chart for linuxea name: linuxea version: 0.1.0 maintainer: - name: mark email: linuxea@gmail.com依赖文件requirements,先不定义NOTES.txt这个文件是在安装生成chart之后,提示给用户的使用信息,也在helm status命令中显示_helpers.tpltpl模板语法用法ymldeployment.yaml ingress.yaml service.yaml在deployment.yaml中语法格式,有1级字段和二级字段,如下:[root@linuxea templates]# grep Values *.yaml deployment.yaml: replicas: {{ .Values.replicaCount }} deployment.yaml: image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"并且还有一些复杂语法模式{{ toYaml .Values.resources | indent 12 }} {{- with .Values.nodeSelector }} nodeSelector: {{ toYaml . | indent 8 }} {{- end }} {{- with .Values.affinity }} affinity: {{ toYaml . | indent 8 }} {{- end }} {{- with .Values.tolerations }} tolerations: {{ toYaml . | indent 8 }} {{- end }}在service.yaml中可以赋值大多数的可变变量[root@linuxea templates]# cat service.yaml apiVersion: v1 kind: Service metadata: name: {{ include "linuxea.com.fullname" . }} labels: app.kubernetes.io/name: {{ include "linuxea.name" . }} helm.sh/chart: {{ include "linuxea.chart" . }} app.kubernetes.io/instance: {{ .Release.Name }} app.kubernetes.io/managed-by: {{ .Release.Service }} spec: type: {{ .Values.service.type }} ports: - port: {{ .Values.service.port }} targetPort: http protocol: TCP name: http selector: app.kubernetes.io/name: {{ include "linuxea.name" . }} app.kubernetes.io/instance: {{ .Release.Name }}以及ingress.yaml[root@linuxea templates]# cat ingress.yaml {{- if .Values.ingress.enabled -}} {{- $fullName := include "linuxea.fullname" . -}} {{- $ingressPath := .Values.ingress.path -}} apiVersion: extensions/v1beta1 kind: Ingress metadata: name: {{ $fullName }} labels: app.kubernetes.io/name: {{ include "linuxea.name" . }} helm.sh/chart: {{ include "linuxea.chart" . }} app.kubernetes.io/instance: {{ .Release.Name }} app.kubernetes.io/managed-by: {{ .Release.Service }} {{- with .Values.ingress.annotations }} annotations: {{ toYaml . | indent 4 }} {{- end }} spec: {{- if .Values.ingress.tls }} tls: {{- range .Values.ingress.tls }} - hosts: {{- range .hosts }} - {{ . | quote }} {{- end }} secretName: {{ .secretName }} {{- end }} {{- end }} rules: {{- range .Values.ingress.hosts }} - host: {{ . | quote }} http: paths: - path: {{ $ingressPath }} backend: serviceName: {{ $fullName }} servicePort: http {{- end }} {{- end }}修改values参数简单修改几项replicaCount: 2打开resources,这里需要将{}删掉,如果启用了的话resources: limits: cpu: 100m memory: 128Mi requests: cpu: 100m memory: 128Mi检查语法而后检查语法是否错误,在父级目录检查[root@linuxea helm]# helm lint linuxea ==> Linting linuxea [INFO] Chart.yaml: icon is recommended 1 chart(s) linted, no failures打包如果没有错误,就可以进行打包,如果有必要,也可以上传到仓库中[root@linuxea helm]# helm package linuxea/ Successfully packaged chart and saved it to: /root/linuxea/manifests/helm/linuxea-0.1.0.tgz[root@linuxea helm]# ll total 40 drwxr-xr-x. 4 root root 93 Nov 18 12:08 linuxea -rw-r--r--. 1 root root 2614 Nov 18 12:09 linuxea-0.1.0.tgz本地仓库我们打开本地的仓库,本地仓库需要运行才能正常使用[root@linuxea linuxea]# helm repo list NAME URL stable https://kubernetes-charts.storage.googleapis.com local http://127.0.0.1:8879/charts 打开一个临时简单的仓库服务,也可以用nginx当作仓库服务使用[root@linuxea prometheus]# helm serve Regenerating index. This may take a moment. Now serving you on 127.0.0.1:8879[root@linuxea linuxea]# ss -tlnp|grep 8879 LISTEN 0 128 127.0.0.1:8879 *:* users:(("helm",pid=3816,fd=3))但是,helm仓库不允许使用命令行上传,但是当创建之后就会被记录下来,可通过helm search查看[root@linuxea helm]# helm search linuxea NAME CHART VERSION APP VERSION DESCRIPTION local/linuxea 0.1.0 1.0 A Helm chart for linuxea.com尤为注意的是NOTES.txt 文件需要真正反映安装相关配置参数,需要进行定义成具体相关的内容安装chart安装[root@linuxea ~]# helm install --name linuxe local/linuxea NAME: linuxe LAST DEPLOYED: Sun Nov 18 12:10:27 2018 NAMESPACE: default STATUS: DEPLOYED RESOURCES: ==> v1/Service NAME AGE linuxe-linuxea 1s ==> v1beta2/Deployment linuxe-linuxea 1s ==> v1/Pod(related) NAME READY STATUS RESTARTS AGE linuxe-linuxea-666bf7dc5b-476gm 0/1 Pending 0 1s linuxe-linuxea-666bf7dc5b-xgs8h 0/1 Pending 0 1s NOTES: 1. Get the application URL by running these commands: export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=linuxea,app.kubernetes.io/instance=linuxe" -o jsonpath="{.items[0].metadata.name}") echo "Visit http://127.0.0.1:8080 to use your application" kubectl port-forward $POD_NAME 8080:80[root@linuxea ~]# kubectl get pods NAME READY STATUS RESTARTS AGE linuxe-linuxea-666bf7dc5b-476gm 1/1 Running 0 1m linuxe-linuxea-666bf7dc5b-xgs8h 1/1 Running 0 1m linuxea-redis-master-0 1/1 Running 0 6h linuxea-redis-slave-77f4768cd8-rf4l2 1/1 Running 1 6h这里安装的信息是存储在NOTES.txt ,可以在此获取[root@linuxea helm]# helm status linuxe LAST DEPLOYED: Sun Nov 18 12:10:27 2018 NAMESPACE: default STATUS: DEPLOYED RESOURCES: ==> v1/Pod(related) NAME READY STATUS RESTARTS AGE linuxe-linuxea-666bf7dc5b-476gm 1/1 Running 0 3m linuxe-linuxea-666bf7dc5b-xgs8h 1/1 Running 0 3m ==> v1/Service NAME AGE linuxe-linuxea 3m ==> v1beta2/Deployment linuxe-linuxea 3m NOTES: 1. Get the application URL by running these commands: export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=linuxea,app.kubernetes.io/instance=linuxe" -o jsonpath="{.items[0].metadata.name}") echo "Visit http://127.0.0.1:8080 to use your application" kubectl port-forward $POD_NAME 8080:80卸载chart--purge移除[root@linuxea helm]# helm del --purge linuxe2 release "linuxe2" deleted [root@linuxea helm]# helm del --purge linuxe21 Error: release: "linuxe21" not found [root@linuxea helm]# helm del --purge linuxe1 release "linuxe1" deleted [root@linuxea helm]# helm del --purge linuxe release "linuxe" deleted添加仓库添加incubator,incubator是不稳定的版本[root@linuxea helm]# helm repo add incubator https://kubernetes-charts-incubator.storage.googleapis.com/ "incubator" has been added to your repositories [root@linuxea helm]# helm repo list NAME URL stable https://kubernetes-charts.storage.googleapis.com local http://127.0.0.1:8879/charts incubator https://kubernetes-charts-incubator.storage.googleapis.com/
2018年12月17日
3,545 阅读
0 评论
0 点赞
1
2
3
4
...
11