文章目录
Prometheus和Grafana安装以前已经写过很多次了,如果没有安装的小同学可以参考下面的文章安装
Docker版本
K8s版本
blackbox exporter
blackbox exporter 是prometheus社区提供的黑盒监控解决方案,运行用户通过HTTP、HTTPS、DNS、TCP以及ICMP的方式对网络进行探测(主动监测主机与服务状态)。
- HTTP 测试
定义 Request Header 信息
判断 Http status / Http Respones Header / Http Body 内容 - TCP 测试
业务组件端口状态监听
应用层协议定义与监听 - ICMP 测试
主机探活机制 - POST 测试
接口联通性 - SSL 证书过期时间
安装Blackbox exporter
- 二进制安装
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.21.1/blackbox_exporter-0.21.1.linux-amd64.tar.gz
tar zxvf blackbox_exporter-0.21.1.linux-amd64.tar.gz
mkdir /usr/local/exporter
mv blackbox_exporter-0.21.1.linux-amd64 /usr/local/exporter/blackbox_exporter
#修改配置文件
cat >/usr/local/exporter/blackbox_exporter/blackbox.yml<<EOF
modules:
http_2xx: # http 检测模块 Blockbox-Exporter 中所有的探针均是以 Module 的信息进行配置
prober: http
timeout: 30s
http:
valid_http_versions: ["HTTP/1.1", "HTTP/2"]
valid_status_codes: [200] # 这里最好作一个返回状态码,在grafana作图时,有明示---陈刚注释。
method: GET
preferred_ip_protocol: "ip4"
http_post_2xx: # http post 监测模块
prober: http
timeout: 10s
http:
valid_http_versions: ["HTTP/1.1", "HTTP/2"]
method: POST
preferred_ip_protocol: "ip4"
tcp_connect: # TCP 检测模块
prober: tcp
timeout: 10s
EOF
#启动
/usr/local/exporter/blackbox_exporter/blackbox_exporter --config.file=/usr/local/exporter/blackbox_exporter/blackbox.yml
#启动没报错就可以退出
刚刚检测启动没有问题,我们编辑启动脚本
cat >/usr/lib/systemd/system/blackbox_exporter.service<<EOF
[Unit]
Description=blackbox_exporter
After=network.target
[Service]
User=prometheus
Group=prometheus
WorkingDirectory=/usr/local/exporter/blackbox_exporter
ExecStart=/usr/local/exporter/blackbox_exporter/blackbox_exporter
[Install]
WantedBy=multi-user.target
EOF
启动测试
# 启动
[root@abcdocker system]# systemctl restart blackbox_exporter
# 查看状态
[root@abcdocker system]# systemctl status blackbox_exporter
# 开机自启
[root@abcdocker system]# systemctl enable blackbox_exporter
默认端口号9115
-
Docker安装
-
端口号映射9115
-
挂载本地/usr/local/exporter/blackbox_exporter
-
blackbox.yml 在挂载目录,可自行修改
docker run --rm -d -p 9115:9115
--name blackbox_exporter
-v /usr/local/exporter/blackbox_exporter:/config
prom/blackbox-exporter:master
--config.file=/config/blackbox.yml
检查端口启动
[root@prometheus blackbox_exporter]# docker ps|grep black
8c5302d44971 prom/blackbox-exporter:master "/bin/blackbox_expor…" 52 seconds ago Up 51 seconds 0.0.0.0:9115->9115/tcp blackbox_exporter
测试端口号
[root@prometheus blackbox_exporter]# curl 127.0.0.1:9115/metrics
# HELP blackbox_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which blackbox_exporter was built.
# TYPE blackbox_exporter_build_info gauge
blackbox_exporter_build_info{branch="master",goversion="go1.16.10",revision="70bff7941301753b125a40bcf6b3ed28935a9a94",version="0.19.0"} 1
# HELP blackbox_exporter_config_last_reload_success_timestamp_seconds Timestamp of the last successful configuration reload.
# TYPE blackbox_exporter_config_last_reload_success_timestamp_seconds gauge
blackbox_exporter_config_last_reload_success_timestamp_seconds 1.6562274758327048e+09
# HELP blackbox_exporter_config_last_reload_successful Blackbox exporter config loaded successfully.
...
...
...
Promethues 监控配置
Prometheus中配置--job
编辑Promethues配置文件
[root@prometheus ~]# cd /etc/prometheus/
[root@prometheus prometheus]# ls
alertmanager prometheus.yml prometheus.yml_bak_2022-06-20 rules
[root@prometheus prometheus]# vim prometheus.yml
添加下面的job_name
- job_name: 'blackbox_http_2xx'
metrics_path: /probe
params:
module: [http_2xx] #配置get请求检测
static_configs:
- targets:
- http://prometheus.io # Target to probe with http.
- https://i4t.com # Target to probe with https.
- https://ukx.cn
- https://k.i4t.com
- https://nas.frps.cn
- https://esxi.frps.cn
- https://rancher.frps.cn
- https://jumpserver.frps.cn
- https://frps.cn
- https://imgkb.com
- https://grafana.frps.cn
- https://down.frps.cn
- https://my.ukx.cn
- https://linux.ukx.cn
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 10.0.24.13:9115 #blackbox地址和端口号
- job_name: 'blackbox_tcp_connect' # 检测某些端口是否在线
scrape_interval: 30s
metrics_path: /probe
params:
module: [tcp_connect]
static_configs:
- targets:
- dsm.frps.cn:9091
- dsm.frps.cn:1998
- dsm.frps.cn:1999
- apiserver.frps.cn:8443
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 10.0.24.13:9115 # blackbox-exporter 服务所在的机器和端口
重启Prometheus
不建议使用127地址
Promethues Bloackbox参数解释
以下参数只是demo例子
1、ICMP 测试(主机探活)
可以通过 ping(icmp) 检测服务器的存活,在 blackbox.yml 配置文件中配置使用 icmp module:
modules:
icmp:
prober: icmp
Prometheus job文件如下
- job_name: 'blackbox-ping'
metrics_path: /probe
params:
modelus: [icmp]
static_configs:
- targets:
- 172.16.106.208 #被监控端ip
- 172.16.106.80
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: IP:9115 #blackbox-exporter 所在的机器和端口
2、TCP 测试(监控主机端口存活状态)
在 blackbox.yml
配置文件中配置使用 tcp module:
modules:
tcp_connect:
prober: tcp
Prometheus
- job_name: 'blackbox-tcp'
metrics_path: /probe
params:
modelus: [tcp_connect]
static_configs:
- targets:
- 172.16.106.208:6443
- 172.16.106.80:6443
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: IP:9115
3、HTTP检测(监控网站状态)
http 探针是进行黑盒监控时最常用的探针之一,通过 http 探针能够网站或者 http 服务建立有效的监控,包括其本身的可用性,以及用户体验相关的如响应时间等等。除了能够在服务出现异常的时候及时报警,还能帮助运维同学分析和优化网站体验。
在 blackbox.yml
配置文件中配置使用 http module:
modules:
http_2xx:
prober: http
http:
method: GET
http_post_2xx:
prober: http
http:
method: POST
Prometheus job
- job_name: 'blackbox-http'
metrics_path: /probe
params:
modelue: [http_2xx]
static_configs:
- targets:
- https://i4t.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: IP:9115 #blackbox-exporter 所在的机器和端口
通过 prober 配置项指定探针类型。配置项 http 用于自定义探针的探测方式,这里有没对 http 配置项添加任何配置,表示完全使用 http 探针的默认配置,该探针将使用 http get 的方式对目标服务进行探测,并且验证返回状态码是否为 2xx,是则表示验证成功,否则失败。
采集数据如下
# DNS解析时间,单位 s
probe_dns_lookup_time_seconds 0.000199105
# 探测从开始到结束的时间,单位 s,请求这个页面响应时间
probe_duration_seconds 0.010889113
# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
# TYPE probe_failed_due_to_regex gauge
probe_failed_due_to_regex 0
# HTTP 内容响应的长度
probe_http_content_length -1
# 按照阶段统计每阶段的时间
probe_http_duration_seconds{phase="connect"} 0.001083728 #连接时间
probe_http_duration_seconds{phase="processing"} 0.008365885 #处理请求的时间
probe_http_duration_seconds{phase="resolve"} 0.000199105 #响应时间
probe_http_duration_seconds{phase="tls"} 0 #校验证书的时间
probe_http_duration_seconds{phase="transfer"} 0.000446424 #传输时间
# 重定向的次数
probe_http_redirects 0
# ssl 指示是否将 SSL 用于最终重定向
probe_http_ssl 0
# 返回的状态码
probe_http_status_code 200
# 未压缩的响应主体长度
probe_http_uncompressed_body_length 1766
# http 协议的版本
probe_http_version 1.1
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
probe_ip_addr_hash 3.24030434e+09
# 使用的 ip 协议的版本号
probe_ip_protocol 4
# 是否探测成功
probe_success 1
Grafana 配置
Grafana模板推荐
- 13230 SSL证书监控
- 13659 HTTP状态监控
- 9965 SSL TCP HTTP综合监控图标
AlertManager
alertmanager告警配置如下
- SSL证书小于30天发送告警
- HTTP状态非200告警
alertmanager安装可以看下面文章,我这直接提供规则
alertmanager设置规则
[root@prometheus rules]# cat /etc/prometheus/rules/blackbox_exporter.yaml
groups:
- name: Blackbox 监控告警
rules:
- alert: BlackboxSlowProbe
expr: avg_over_time(probe_duration_seconds[1m]) > 1
for: 30m
labels:
severity: warning
annotations:
summary: telnet (instance {{ $labels.instance }}) 超时1秒
description: "VALUE = {{ $value }}n LABELS = {{ $labels }}"
- alert: BlackboxProbeHttpFailure
expr: probe_http_status_code <= 199 OR probe_http_status_code >= 400
for: 30m
labels:
severity: critical
annotations:
summary: HTTP 状态码 (instance {{ $labels.instance }})
description: "HTTP status code is not 200-399n VALUE = {{ $value }}n LABELS = {{ $labels }}"
- alert: BlackboxSslCertificateWillExpireSoon
expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 30
for: 30m
labels:
severity: warning
annotations:
summary: 域名证书即将过期 (instance {{ $labels.instance }})
description: "域名证书30天后过期n VALUE = {{ $value }}n LABELS = {{ $labels }}"
- alert: BlackboxSslCertificateWillExpireSoon
expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 7
for: 30m
labels:
severity: critical
annotations:
summary: 域名证书即将过期 (instance {{ $labels.instance }})
description: "域名证书7天后过期n VALUE = {{ $value }}n LABELS = {{ $labels }}"
- alert: BlackboxSslCertificateExpired
expr: probe_ssl_earliest_cert_expiry - time() <= 0
for: 30m
labels:
severity: critical
annotations:
summary: 域名证书已过期 (instance {{ $labels.instance }})
description: "域名证书已过期n VALUE = {{ $value }}n LABELS = {{ $labels }}"
- alert: BlackboxProbeSlowHttp
expr: avg_over_time(probe_http_duration_seconds[1m]) > 10
for: 30m
labels:
severity: warning
annotations:
summary: HTTP请求超时 (instance {{ $labels.instance }})
description: "HTTP请求超时超过10秒n VALUE = {{ $value }}n LABELS = {{ $labels }}"
重启prometheus
docker restart prometheus_new
此时Prometheus已经添加上,并且微信已经告警
grafana 上面的模版可以发下吗 谢谢
13230
13659
9965
这是文章的模板,还需要什么模板呢?
grafana.frps.cn 这是预览的