首页
常用命令
About Me
推荐
weibo
github
Search
1
linuxea:gitlab-ci之docker镜像质量品质报告
48,996 阅读
2
linuxea:如何复现查看docker run参数命令
20,462 阅读
3
Graylog收集文件日志实例
18,023 阅读
4
git+jenkins发布和回滚示例
17,602 阅读
5
linuxea:jenkins+pipeline+gitlab+ansible快速安装配置(1)
17,574 阅读
ops
Openvpn
Sys Basics
rsync
Mail
NFS
Other
Network
HeartBeat
server 08
Code
Awk
Shell
Python
Golang
virtualization
KVM
Docker
openstack
Xen
kubernetes
kubernetes-cni
Service Mesh
Data
Mariadb
PostgreSQL
MongoDB
Redis
MQ
Ceph
TimescaleDB
kafka
surveillance system
zabbix
ELK Stack
Open-Falcon
Prometheus
Web
apache
Tomcat
Nginx
自动化
Puppet
Ansible
saltstack
Proxy
HAproxy
Lvs
varnish
更多
音乐
影视
music
Internet Consulting
最后的净土
软件交付
持续集成
gitops
devops
登录
Search
标签搜索
kubernetes
docker
zabbix
Golang
mariadb
持续集成工具
白话容器
linux基础
nginx
elk
dockerfile
Gitlab-ci/cd
最后的净土
基础命令
jenkins
docker-compose
gitops
haproxy
saltstack
Istio
marksugar
累计撰写
676
篇文章
累计收到
140
条评论
首页
栏目
ops
Openvpn
Sys Basics
rsync
Mail
NFS
Other
Network
HeartBeat
server 08
Code
Awk
Shell
Python
Golang
virtualization
KVM
Docker
openstack
Xen
kubernetes
kubernetes-cni
Service Mesh
Data
Mariadb
PostgreSQL
MongoDB
Redis
MQ
Ceph
TimescaleDB
kafka
surveillance system
zabbix
ELK Stack
Open-Falcon
Prometheus
Web
apache
Tomcat
Nginx
自动化
Puppet
Ansible
saltstack
Proxy
HAproxy
Lvs
varnish
更多
音乐
影视
music
Internet Consulting
最后的净土
软件交付
持续集成
gitops
devops
页面
常用命令
About Me
推荐
weibo
github
搜索到
676
篇与
marksugar
的结果
2023-03-07
linuxea: docker-compose构建postgresql12主从
postgresql的同步流复制,我们就简单叫主库和备库来表示两个不同的角色,我将分享主从的搭建过程为了减少在安装上的繁琐流程,我将使用docker镜像,使用docker-compose编排Docker-Compose version: v2.16.0 image: registry.cn-zhangjiakou.aliyuncs.com/marksugar-k8s/postgres:12.14-alpine3.17因此,需要提前安装docker和docker-compose,镜像使用的是官方镜像,只是被搬到阿里而已IDIP角色配置1172.168.204.41master4c8g2172.168.204.42slave4c8g启动的部分命令如下wal_level = replica # 这个是设置主为wal的主机 max_wal_senders = 5 # 这个设置了可以最多有几个流复制连接,差不多有几个从,就设置几个 wal_keep_segments = 128 # 设置流复制保留的最多的xlog数目 wal_sender_timeout = 60s # 设置流复制主机发送数据的超时时间 max_connections = 200 # 一般查多于写的应用从库的最大连接数要比较大 hot_standby = on # 说明这台机器不仅仅是用于数据归档,也用于数据查询 max_standby_streaming_delay = 30s # 数据流备份的最大延迟时间 wal_receiver_status_interval = 10s # 多久向主报告一次从的状态,当然从每次数据复制都会向主报告状态,这里只是设置最长的间隔时间 hot_standby_feedback = on # 如果有错误的数据复制,是否向主进行反馈 wal_log_hints = on # also do full page writes of non-critical updates这段摘自其他网页master使用docker-compose后,就需要在Command种使用-c指定即可,如下version: '3.3' services: postgresql12-m: container_name: postgresql12-m image: registry.cn-zhangjiakou.aliyuncs.com/marksugar-k8s/postgres:12.14-alpine3.17 restart: always environment: - POSTGRES_USER=postgres - POSTGRES_PASSWORD=mysecretpassword - PGDATA=/var/lib/postgresql/data/pgdata command: "postgres -c wal_level=replica -c max_wal_senders=15 -c wal_keep_segments=128 -c wal_sender_timeout=60s -c max_connections=200 -c hot_standby=on -c max_standby_streaming_delay=30s -c wal_receiver_status_interval=10s -c hot_standby_feedback=on -c wal_log_hints=on" volumes: - /etc/localtime:/etc/localtime:ro # 时区2 - /data/postgresql12:/var/lib/postgresql/data - /data/postgresql12/pg_hba.conf:/var/lib/postgresql/data/pgdata/pg_hba.conf logging: driver: "json-file" options: max-size: "50M" ports: - 8080:8080 - 5432:5432而后启动,命令如下docker-compose -f docker-compose.yaml up -d进入容器,创建用户create role replica with replication login password '123456'; alter user replica with password '123456'; 或者 CREATE USER replica WITH REPLICATION LOGIN CONNECTION LIMIT 10 ENCRYPTED PASSWORD '123456';如下docker exec --user=postgres -it postgresql12-m bash 2860b7926327:/$ psql psql (12.14) Type "help" for help. postgres=# create role replica with replication login password '123456'; CREATE ROLE postgres=# alter user replica with password '123456'; ALTER ROLE启动完成后,我们需要删除原有自动生成的配置文件在替换后启动才能生效postgres的配置文件是自动生成的,只有在容器生成在替换才可以被替换因此,修改pg_hba.conf 的内容如下docker rm -f postgresql12-m rm -f /data/postgresql12/pgdata/pg_hba.conf cat > /data/postgresql12/pgdata/pg_hba.conf << EOFFF local all all trust host all all 0.0.0.0/0 md5 host all all ::1/128 trust host replication replica 0.0.0.0/0 md5 EOFFF除此之外,我们还需要配置流复制的必要配置,直接添加到/data/postgresql12/pgdata/postgresql.conf文件内PG_FILE=/data/postgresql12/pgdata/postgresql.conf echo "synchronous_standby_names = 'standbydb1' #同步流复制才配置该值" >> $PG_FILE echo "synchronous_commit = 'remote_write'" >> $PG_FILE最后启动在重新启动docker-compose -f docker-compose.yaml up -d如果有必要,需要关闭防火墙或者配置放行5432端口slave从节点的command命令和master一致即可,修改名称添加-s以便于区分version: '3.3' services: postgresql12-s: container_name: postgresql12-s image: registry.cn-zhangjiakou.aliyuncs.com/marksugar-k8s/postgres:12.14-alpine3.17 restart: always environment: - POSTGRES_USER=postgres - POSTGRES_PASSWORD=mysecretpassword - PGDATA=/var/lib/postgresql/data/pgdata command: "postgres -c wal_level=replica -c max_wal_senders=15 -c wal_keep_segments=128 -c wal_sender_timeout=60s -c max_connections=200 -c hot_standby=on -c max_standby_streaming_delay=30s -c wal_receiver_status_interval=10s -c hot_standby_feedback=on -c wal_log_hints=on" volumes: - /etc/localtime:/etc/localtime:ro # 时区2 - /data/postgresql12:/var/lib/postgresql/data logging: driver: "json-file" options: max-size: "50M" ports: - 8080:8080 - 5432:5432启动数据库docker-compose -f docker-compose.yaml up -d备份数据备份数据,可以通过pg_start_backup和pg_basebackup,这里我们使用pg_basebackup1,pg_basebackup镜像内本身带有pg_basebackup,因此,我们在从节点,进入容器内,执行如下命令pg_basebackup -h 172.168.204.41 -p 5432 -U replica -Fp -Xs -Pv -R -D /var/lib/postgresql/data/pgdata-latest上述命令远程备份到当前的/var/lib/postgresql/data/pgdata-latest目录下也是在容器的挂载路径内,先存放在pgdata-latest,而后在切换目录即可[root@node2 pgdata-m]# docker exec -it postgresql12-s bash 142531a6f29e:/# pg_basebackup -h 172.168.204.41 -p 5432 -U replica -Fp -Xs -Pv -R -D /var/lib/postgresql/data/pgdata-latest Password: pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed pg_basebackup: write-ahead log start point: 0/6000028 on timeline 1 pg_basebackup: starting background WAL receiver pg_basebackup: created temporary replication slot "pg_basebackup_76" 24656/24656 kB (100%), 1/1 tablespace pg_basebackup: write-ahead log end point: 0/6000100 pg_basebackup: waiting for background process to finish streaming ... pg_basebackup: syncing data to disk ... pg_basebackup: base backup completed此时备份的数据目录是在/var/lib/postgresql/data/pgdata-latest跳过pg_start_backup如果使用了pg_backup,现在可以跳过次方式备份。如果不能使用pg_backup,就在主节点使用pg_start_backup后进行复制目录docker exec -it --user=postgres postgresql12-m bash2 8e481427e025:/$ psql -U postgres psql (12.14) Type "help" for help. postgres=# select pg_start_backup('$DATE',true); pg_start_backup ----------------- 0/2000028 (1 row)使用pg_start_backup这个方法后,所有请求在写日志之后不会再刷新到磁盘。直到执行pg_stop_backup()这个函数。拷贝一份data目录,并通过scp复制到子数据库中cp -r /data/postgresql/pgdata ./pgdata-m tar -zcf pgdata-m.tar.gz pgdata-m scp -r ./pgdata-m.tar.gz 172.168.204.42:~/复制完成停止postgres=# select pg_stop_backup(); NOTICE: WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup pg_stop_backup ---------------- 0/2000138 (1 row)回到从节点,删除容器,解压从主拿来的数据[root@node2 postgres]# docker rm -f postgresql12-s postgresql12-s [root@node2 ~]# tar xf pgdata-m.tar.gz -C /data/postgresql12/ [root@node2 ~]# ll /data/postgresql12/ total 8 drwx------ 2 70 root 35 Mar 6 17:31 pgdata drwx------ 19 root root 4096 Mar 6 17:33 pgdata-m -rw-r--r-- 1 70 root 113 Mar 6 17:25 pg_hba.conf2,修改数据目录此时备份的数据目录是在/var/lib/postgresql/data/pgdata-latest,修改docker-compose映射的环境变量关系,指向备份好的目录- PGDATA=/var/lib/postgresql/data/pgdata-latest在posgress12种,需要创建文件standby.signal来声明自己是从,并且standby.signal本身也优先于其他创建文件即可,你也可以写点其他信息。这里我们为了怀念老版本,追加standby_mode = 'on'echo "standby_mode = 'on'" > /data/postgresql12/pgdata-latest/standby.signal此时我们还需要检查从节点的配置。从节点的postgresql.auto.conf文件内的属性是否和预期一致,之所以是postgresql.auto.conf,只是因为postgresql.auto.conf优先于postgresql.conf被读取pg_basebackup -R会修改postgresql.auto.conf的授权的权限信息,一旦使用了pg_basebackup ,就需要重新修改此时我的postgresql.auto.conf和postgresql.conf都添加如下配置:其中包含了授权的账号信息hot_standby = 'on' primary_conninfo = 'application_name=standbydb1 user=replica password=123456 host=172.168.204.41 port=5432 sslmode=disable sslcompression=0 gssencmode=disable krbsrvname=postgres target_session_attrs=any'如果修改的是文件,而这样的修改需要重启启动,也可以通过命令行进行配置,如下:show primary_conninfo # 查看 alter system set primary_conninfo = 'application_name=standbydb1 user=replica password=123456 host=172.168.204.41 port=5432 sslmode=disable sslcompression=0 gssencmode=disable krbsrvname=postgres target_session_attrs=any';而后启动从库docker-compose -f docker-compose.yaml up -d验证主从回到主库查看linuxea=# select * from pg_stat_replication; pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time -----+----------+---------+------------------+----------------+-----------------+-------------+------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+-----------+------------+---------------+------------+------------------------------- 144 | 16384 | replica | standbydb1 | 172.168.204.42 | | 47492 | 2023-03-07 10:18:45.56999+08 | 502 | streaming | 0/F3449D8 | 0/F3449D8 | 0/F3449D8 | 0/F3449D8 | | | | 1 | sync | 2023-03-07 10:25:38.354315+08 (1 row)linuxea=# select pid,state,client_addr,sync_priority,sync_state from pg_stat_replication; pid | state | client_addr | sync_priority | sync_state -----+-----------+----------------+---------------+------------ 207 | streaming | 172.168.204.42 | 1 | sync (1 row)linuxea=# select pg_is_in_recovery(); pg_is_in_recovery ------------------- f (1 row)在主库创建数据库,并插入数据CREATE DATABASE linuxea OWNER postgres; \c linuxea CREATE TABLE test( id integer, test integer)WITH (OIDS=FALSE); ALTER TABLE test OWNER TO postgres;如下postgres=# \c linuxea You are now connected to database "linuxea" as user "postgres". linuxea=# CREATE TABLE test( id integer, test integer)WITH (OIDS=FALSE); CREATE TABLE linuxea=# ALTER TABLE test OWNER TO postgres; ALTER TABLE linuxea=# insert into test SELECT generate_series(1,1000000) as key, (random()*(10^3))::integer; INSERT 0 1000000来到从库[root@node2 pgdata-m]# docker exec -it --user=postgres postgresql12-s bash da91ac9e2a19:/$ psql -U postgres psql (12.14) Type "help" for help. postgres=# select pg_is_in_recovery(); pg_is_in_recovery ------------------- t (1 row) postgres=# \c linuxea You are now connected to database "linuxea" as user "postgres". linuxea=# SELECT * FROM test; id | test ---------+------ 1 | 935 2 | 652 3 | 204 4 | 367 5 | 100 6 | 743 --More--从切主我们假设主挂掉了,一时半会好不了,就简单的将从切换到主,提供服务即可开始之前,我们直接关闭主库模拟主库不可用,而后进入从库的容器,使用 pg_ctl promote -Dc683e39637ea:/$ pg_ctl promote -D /var/lib/postgresql/data/pgdata-latest waiting for server to promote.... done server promoted将数据写入42,测试数据写入是否正常CREATE DATABASE linuxea2 OWNER postgres; \c linuxea2 CREATE TABLE test( id integer, test integer)WITH (OIDS=FALSE); ALTER TABLE test OWNER TO postgres; insert into test SELECT generate_series(1,1000000) as key, (random()*(10^3))::integer; SELECT * FROM test;现在从库已经可以写入数据了配置从库此时主库起来了,如果代理已经将请求改到42上了,我们直接在41的主节点上,同步42数据,将41配置为从库1.备份[root@node1 pgdata]# docker-compose -f ~/postgresql/docker-compose.yaml up -d [+] Running 1/1 ⠿ Container postgresql12-m Started 0.4s [root@node1 pgdata]# docker exec -it postgresql12-m bash d7108d41f908:/# pg_basebackup -h 172.168.204.42 -p 5432 -U replica -Fp -Xs -Pv -R -D /var/lib/postgresql/data/pgdata-latest Password: pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed pg_basebackup: write-ahead log start point: 0/14000028 on timeline 2 pg_basebackup: starting background WAL receiver pg_basebackup: created temporary replication slot "pg_basebackup_100" 147294/147294 kB (100%), 1/1 tablespace pg_basebackup: write-ahead log end point: 0/14000100 pg_basebackup: waiting for background process to finish streaming ... pg_basebackup: syncing data to disk ... pg_basebackup: base backup completed修改数据目录- PGDATA=/var/lib/postgresql/data/pgdata-latest创建文件echo "standby_mode = 'on'" > /data/postgresql12/pgdata-latest/standby.signalpostgresql.auto.conf和postgresql.conf都添加如下配置:hot_standby = 'on' primary_conninfo = 'application_name=standbydb1 user=replica password=123456 host=172.168.204.42 port=5432 sslmode=disable sslcompression=0 gssencmode=disable krbsrvname=postgres target_session_attrs=any'删除容器后,重新启动docker-compose -f ~/postgresql/docker-compose.yaml down docker-compose -f ~/postgresql/docker-compose.yaml up -d6.验证回到主库42CREATE DATABASE linuxea23 OWNER postgres; \c linuxea23 CREATE TABLE test( id integer, test integer)WITH (OIDS=FALSE); ALTER TABLE test OWNER TO postgres; insert into test SELECT generate_series(1,1000000) as key, (random()*(10^3))::integer;而后到从库41查看\c linuxea23 SELECT * FROM test;如下[root@node1 postgresql12]# docker exec -it --user=postgres postgresql12-m bash 06a1e5ecc51b:/$ psql -U postgres psql (12.14) Type "help" for help. postgres=# \c linuxea23 You are now connected to database "linuxea23" as user "postgres". linuxea23=# SELECT * FROM test; id | test ---------+------ 1 | 785 2 | 654 3 | 881 4 | 19 5 | 37 6 | 482 7 | 938 8 | 25 9 | 209 10 | 820 11 | 445 12 | 238 13 | 772 14 | 233 15 | 158 16 | 964 17 | 815 18 | 890 19 | 977 20 | 437 21 | 56 22 | 241 23 | 266 24 | 123 25 | 139 26 | 207 27 | 90 28 | 4 29 | 95 30 | 896 31 | 698 32 | 752 33 | 972 --More--参考PostgreSQL12恢复配置总结
2023年03月07日
168 阅读
0 评论
0 点赞
2023-02-06
linuxea:使用robusta收集事件pod崩溃OOM日志
robusta的功能远不止本章介绍的这些,它可以去监控Kubernetes,提供观测性,可以于prometheus接入,作为告警的二次处理,自动修复等,也提供了事件的时间线。此前使用的是阿里的kube-eventer,kube-eventer仅仅只是提供了一个转发,因此kube-eventer只能解决的是事件触发的通知。当然, 如果robusta也是仅仅止步于此,那也没用多少必要性去使用它。它还提供了另外一种非常有用的功能: 事件告警。 在robusta的事件告警中,当侦测到后,会将预设中预设的pod状态连同最近一段日志发送到slack. 这也是为什么会有这篇文章最重要的原因。基础依赖python版本必须等于大于3.7,于是我们升级版本升级pythonwget https://www.python.org/ftp/python/3.9.16/Python-3.9.16.tar.xz yum install gcc zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel libffi-devel -y yum install libffi-devel -y yum install zlib* -y tar xf Python-3.9.16.tar.xz cd Python-3.9.16 ./configure --with-ssl --prefix=/usr/local/python3 make make install rm -rf /usr/bin/python3 /usr/bin/pip3 ln -s /usr/local/python3/bin/python3 /usr/bin/python3 ln -s /usr/local/python3/bin/pip3 /usr/bin/pip3准备国内源mkdir -p ~/.pip/ cat > ~/.pip/pip.conf << EOF [global] trusted-host = mirrors.aliyun.com index-url = http://mirrors.aliyun.com/pypi/simple EOFrobusta.dev参考官方文档开始安装pip3 install -U robusta-cli --no-cache robusta gen-config由于网络问题,我个人将使用使用docker进行配置curl -fsSL -o robusta https://docs.robusta.dev/master/_static/robusta chmod +x robusta ./robusta gen-config开始之前,务必下载我中转的镜像docker pull registry.cn-zhangjiakou.aliyuncs.com/marksugar-k8s/robusta-cli:latest docker tag us-central1-docker.pkg.dev/genuine-flight-317411/devel/robusta-cli:latest registry.cn-zhangjiakou.aliyuncs.com/marksugar-k8s/robusta-cli:latest[root@master1 opt]# ./robusta gen-config Robusta reports its findings to external destinations (we call them "sinks"). We'll define some of them now. Configure Slack integration? This is HIGHLY recommended. [Y/n]: y # 强烈建议配置slack If your browser does not automatically launch, open the below url: https://api.robusta.dev/integrations/slack?id=64a3ee7c-5691-466f-80da-85e8ece80359 # 浏览器打开 ====================================================================== Error getting slack token ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) ====================================================================== ====================================================================== Error getting slack token HTTPSConnectionPool(host='api.robusta.dev', port=443): Max retries exceeded with url: /integrations/slack/get-token?id=64a3ee7c-5691-466f-80da-85e8ece80359 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f50b1f18cd0>: Failed to establish a new connection: [Errno 110] Connection timed out')) ====================================================================== You've just connected Robusta to the Slack of: crow as a cock Which slack channel should I send notifications to? # 根据提示打开If your browser does not automatically launch, open the below url: https://api.robusta.dev/integrations/slack?id=64a3ee7c-5691-466f-80da-85e8ece80359勾选允许如下此时slack已经有了 robusta应用继续下一步,在提示种选择了频道后Which slack channel should I send notifications to? # devops会受到一封消息执行完成后,如下:[root@master1 opt]# ./robusta gen-config Robusta reports its findings to external destinations (we call them "sinks"). We'll define some of them now. Configure Slack integration? This is HIGHLY recommended. [Y/n]: y If your browser does not automatically launch, open the below url: https://api.robusta.dev/integrations/slack?id=d1fcbb13-5174-4027-a176-a3dcab10c27a ====================================================================== Error getting slack token HTTPSConnectionPool(host='api.robusta.dev', port=443): Max retries exceeded with url: /integrations/slack/get-token?id=d1fcbb13-5174-4027-a176-a3dcab10c27a (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f0ec508eee0>: Failed to establish a new connection: [Errno 110] Connection timed out')) ====================================================================== You've just connected Robusta to the Slack of: crow as a cock Which slack channel should I send notifications to? # devops Configure MsTeams integration? [y/N]: n 配置MsTeams集成?[y / N]: N Configure Robusta UI sink? This is HIGHLY recommended. [Y/n]: y 配置Robusta UI接收器?这是强烈推荐的。[Y / n]: Enter your Gmail/Google address. This will be used to login: user@gmail.com 输入您的Gmail/谷歌地址。这将用于登录: Choose your account name (e.g your organization name): marksugar 选择您的帐户名称(例如您的组织名称): Successfully registered. Robusta can use Prometheus as an alert source. If you haven't installed it yet, Robusta can install a pre-configured Prometheus. Would you like to do so? [y/N]: y 罗布斯塔可以使用普罗米修斯作为警报源。 如果你还没有安装它,罗布斯塔可以安装一个预先配置的Prometheus。 你愿意这样做吗?[y / N]: Please read and approve our End User License Agreement: https://api.robusta.dev/eula.html Do you accept our End User License Agreement? [y/N]: y 请阅读并批准我们的最终用户许可协议:https://api.robusta.dev/eula.html 您是否接受我们的最终用户许可协议?[y / N]: Last question! Would you like to help us improve Robusta by sending exception reports? [y/N]: n 最后一个问题!你愿意通过发送异常报告来帮助我们改进Robusta吗?[y / N]: Saved configuration to ./generated_values.yaml - save this file for future use! Finish installing with Helm (see the Robusta docs). Then login to Robusta UI at https://platform.robusta.dev By the way, we'll send you some messages later to get feedback. (We don't store your API key, so we scheduled future messages using Slack'sAPI) 保存配置到。/generated_values。保存这个文件以备将来使用! 完成Helm的安装(参见罗布斯塔文档)。然后登录到罗布斯塔用户界面https://platform.robusta.dev 顺便说一下,我们稍后会给你发一些信息以获得反馈。(我们不存储你的API密钥,所以我们使用Slack的API来安排未来的消息)上述完成后,创建了一个generated_values.yamlglobalConfig: signing_key: 92a8195-a3fa879b3f88 account_id: 79efaf9c433294 sinksConfig: - slack_sink: name: main_slack_sink slack_channel: devops api_key: xoxb-4715825756487-4749501ZZylPy1f - robusta_sink: name: robusta_ui_sink token: eyJhY2NvjIn0= enablePrometheusStack: true enablePlatformPlaybooks: true runner: sendAdditionalTelemetry: false rsa: private: LS0tLS1CRUdJTiBRCBSU0EgUFJJVkFURSBLRVktLS0tLQo= public: LS0tLS1CRUdJTiBQTElDIEtFWS0tLS0tCg==helm紧接着使用上述创建的yaml文件进行安装。我们适当调整下内容关于触发器的种类非常多,我们可以参考:example-triggers, java-troubleshooting,event-enrichmentmiscellaneous,kubernetes-triggers。我们可以针对某一组pod或者名称空间进行过滤去监控的特定的信息。我们节选一些测试,并且加到generated_values.yaml种,如下:globalConfig: signing_key: 92a8195-a3fa879b3f88 account_id: 79efaf9c433294 sinksConfig: - slack_sink: name: main_slack_sink slack_channel: devops api_key: xoxb-4715825756487-4749501ZZylPy1f - robusta_sink: name: robusta_ui_sink token: eyJhY2NvjIn0= enablePrometheusStack: false enablePlatformPlaybooks: true runner: sendAdditionalTelemetry: false rsa: private: LS0tLS1CRUdJTiBRCBSU0EgUFJJVkFURSBLRVktLS0tLQo= public: LS0tLS1CRUdJTiBQTElDIEtFWS0tLS0tCg== customPlaybooks: - triggers: - on_deployment_update: {} actions: - resource_babysitter: omitted_fields: [] fields_to_monitor: ["spec.replicas"] - triggers: - on_pod_crash_loop: restart_reason: "CrashLoopBackOff" restart_count: 1 rate_limit: 3600 actions: - report_crash_loop: {} - triggers: - on_pod_oom_killed: rate_limit: 900 exclude: - name: "oomkilled-pod" namespace: "default" actions: - pod_graph_enricher: resource_type: Memory display_limits: true - triggers: - on_container_oom_killed: rate_limit: 900 exclude: - name: "oomkilled-container" namespace: "default" actions: - oomkilled_container_graph_enricher: resource_type: Memory - triggers: - on_job_failure: namespace_prefix: robusta actions: - create_finding: title: "Job $name on namespace $namespace failed" aggregation_key: "Job Failure" - job_events_enricher: { } runner: image: registry.cn-zhangjiakou.aliyuncs.com/marksugar-k8s/robusta-runner:0.10.10 imagePullPolicy: IfNotPresent kubewatch: image: registry.cn-zhangjiakou.aliyuncs.com/marksugar-k8s/kubewatch:v2.0 imagePullPolicy: IfNotPresent现在我们开始使用helm安装helm repo add robusta https://robusta-charts.storage.googleapis.com && helm repo update helm upgrade --install robusta --namespace robusta --create-namespace robusta/robusta -f ./generated_values.yaml \ --set clusterName=test 也可以使用如下命令调试 helm upgrade --install robusta --namespace robusta robusta/robusta -f ./generated_values.yaml --set clusterName=test --dry-run 如下[root@master1 opt]# helm upgrade --install robusta --namespace robusta --create-namespace robusta/robusta -f ./generated_values.yaml \ > --set clusterName=test Release "robusta" does not exist. Installing it now. NAME: robusta LAST DEPLOYED: Thu Feb 2 15:58:32 2023 NAMESPACE: robusta STATUS: deployed REVISION: 1 TEST SUITE: None NOTES: Thank you for installing Robusta 0.10.10 As an open source project, we collect general usage statistics. This data is extremely limited and contains only general metadata to help us understand usage patterns. If you are willing to share additional data, please do so! It really help us improve Robusta. You can set sendAdditionalTelemetry: true as a Helm value to send exception reports and additional data. This is disabled by default. To opt-out of telemetry entirely, set a ENABLE_TELEMETRY=false environment variable on the robusta-runner deployment. Visit the web UI at: https://platform.robusta.dev/等待pod就绪[root@master1 opt]# kubectl -n robusta get pod -w NAME READY STATUS RESTARTS AGE robusta-forwarder-78964b4455-vnt77 1/1 Running 0 2m55s robusta-runner-758cf9c986-87l4x 0/1 ContainerCreating 0 2m55s robusta-runner-758cf9c986-87l4x 1/1 Running 0 7m6s此时如果你的集群上pod有异常状态的而崩溃的,在被删除前,将会将日志发送到slack, slack上已经可以收到日志信息了选择点击以展开内联,即可查看详细信息
2023年02月06日
354 阅读
0 评论
0 点赞
1
2
...
338