Prometheus Bloackbox域名SSL证书监控并设置AlertManager告警

释放双眼,带上耳机,听听看~!
bloackbox exporter 是prometheus社区提供的黑盒监控解决方案,运行用户通过HTTP、HTTPS、DNS、TCP以及ICMP的方式对网络进行探测(主动监测主机与服务状态)。

Prometheus和Grafana安装以前已经写过很多次了,如果没有安装的小同学可以参考下面的文章安装

Docker版本

Prometheus 监控MySQL数据库

K8s版本

Prometheus Grafana使用Ceph持久化并监控k8s集群

bloackbox exporter

bloackbox exporter 是prometheus社区提供的黑盒监控解决方案,运行用户通过HTTP、HTTPS、DNS、TCP以及ICMP的方式对网络进行探测(主动监测主机与服务状态)。

  • HTTP 测试
    定义 Request Header 信息
    判断 Http status / Http Respones Header / Http Body 内容
  • TCP 测试
    业务组件端口状态监听
    应用层协议定义与监听
  • ICMP 测试
    主机探活机制
  • POST 测试
    接口联通性
  • SSL 证书过期时间

安装Bloackbox exporter

  • 二进制安装
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.21.1/blackbox_exporter-0.21.1.linux-amd64.tar.gz
tar zxvf blackbox_exporter-0.21.1.linux-amd64.tar.gz
mkdir /usr/local/exporter
mv  blackbox_exporter-0.21.1.linux-amd64 /usr/local/exporter/blackbox_exporter

#修改配置文件
cat >/usr/local/exporter/blackbox_exporter/blackbox.yml<<EOF
modules:
  http_2xx:  # http 检测模块  Blockbox-Exporter 中所有的探针均是以 Module 的信息进行配置
    prober: http
    timeout: 30s
    http:
      valid_http_versions: ["HTTP/1.1", "HTTP/2"]   
      valid_status_codes: [200]  # 这里最好作一个返回状态码,在grafana作图时,有明示---陈刚注释。
      method: GET
      preferred_ip_protocol: "ip4"
  http_post_2xx:     # http post 监测模块
    prober: http
    timeout: 10s
    http:
      valid_http_versions: ["HTTP/1.1", "HTTP/2"]
      method: POST
      preferred_ip_protocol: "ip4"
  tcp_connect:   # TCP 检测模块
    prober: tcp
    timeout: 10s
EOF

#启动
/usr/local/exporter/blackbox_exporter/blackbox_exporter  --config.file=/usr/local/exporter/blackbox_exporter/blackbox.yml

#启动没报错就可以退出

刚刚检测启动没有问题,我们编辑启动脚本

cat >/usr/lib/systemd/system/blackbox_exporter.service<<EOF
[Unit]
Description=blackbox_exporter
After=network.target 
[Service]
User=prometheus
Group=prometheus
WorkingDirectory=/usr/local/exporter/blackbox_exporter
ExecStart=/usr/local/exporter/blackbox_exporter/blackbox_exporter
[Install]
WantedBy=multi-user.target
EOF

启动测试

# 启动
[root@abcdocker system]# systemctl restart blackbox_exporter
# 查看状态
[root@abcdocker system]# systemctl status blackbox_exporter
# 开机自启
[root@abcdocker system]# systemctl enable blackbox_exporter

默认端口号9115

1656227207106.png

  • Docker安装

  • 端口号映射9115

  • 挂载本地/usr/local/exporter/blackbox_exporter

  • blackbox.yml 在挂载目录,可自行修改

docker run --rm -d -p 9115:9115 
    --name blackbox_exporter 
   -v /usr/local/exporter/blackbox_exporter:/config 
   prom/blackbox-exporter:master 
   --config.file=/config/blackbox.yml

检查端口启动

[root@prometheus blackbox_exporter]# docker ps|grep black
8c5302d44971        prom/blackbox-exporter:master   "/bin/blackbox_expor…"   52 seconds ago      Up 51 seconds       0.0.0.0:9115->9115/tcp                                     blackbox_exporter

测试端口号

[root@prometheus blackbox_exporter]# curl 127.0.0.1:9115/metrics
# HELP blackbox_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which blackbox_exporter was built.
# TYPE blackbox_exporter_build_info gauge
blackbox_exporter_build_info{branch="master",goversion="go1.16.10",revision="70bff7941301753b125a40bcf6b3ed28935a9a94",version="0.19.0"} 1
# HELP blackbox_exporter_config_last_reload_success_timestamp_seconds Timestamp of the last successful configuration reload.
# TYPE blackbox_exporter_config_last_reload_success_timestamp_seconds gauge
blackbox_exporter_config_last_reload_success_timestamp_seconds 1.6562274758327048e+09
# HELP blackbox_exporter_config_last_reload_successful Blackbox exporter config loaded successfully.
...
...
...

Promethues 监控配置

Prometheus中配置--job

编辑Promethues配置文件

[root@prometheus ~]# cd /etc/prometheus/
[root@prometheus prometheus]# ls
alertmanager  prometheus.yml  prometheus.yml_bak_2022-06-20  rules
[root@prometheus prometheus]# vim prometheus.yml

添加下面的job_name

  - job_name: 'blackbox_http_2xx'
    metrics_path: /probe
    params:
      module: [http_2xx]  #配置get请求检测
    static_configs:
      - targets:
        - http://prometheus.io    # Target to probe with http.
        - https://i4t.com   # Target to probe with https.
        - https://ukx.cn
        - https://k.i4t.com
        - https://nas.frps.cn
        - https://esxi.frps.cn
        - https://rancher.frps.cn
        - https://jumpserver.frps.cn
        - https://frps.cn
        - https://imgkb.com
        - https://grafana.frps.cn
        - https://down.frps.cn
        - https://my.ukx.cn
        - https://linux.ukx.cn

    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 10.0.24.13:9115  #blackbox地址和端口号

  - job_name: 'blackbox_tcp_connect' # 检测某些端口是否在线
    scrape_interval: 30s
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets:
        - dsm.frps.cn:9091
        - dsm.frps.cn:1998
        - dsm.frps.cn:1999
        - apiserver.frps.cn:8443
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 10.0.24.13:9115 # blackbox-exporter 服务所在的机器和端口

重启Prometheus

不建议使用127地址

1656229391517.png

Promethues Bloackbox参数解释

以下参数只是demo例子

1、ICMP 测试(主机探活)

可以通过 ping(icmp) 检测服务器的存活,在 ​​blackbox.yml​​ 配置文件中配置使用 icmp module:

modules:
  icmp:
    prober: icmp

Prometheus job文件如下

  - job_name: 'blackbox-ping'
    metrics_path: /probe
    params:
      modelus: [icmp]
    static_configs:
    - targets:
      - 172.16.106.208  #被监控端ip
      - 172.16.106.80
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: IP:9115  #blackbox-exporter 所在的机器和端口

2、TCP 测试(监控主机端口存活状态)
在 blackbox.yml配置文件中配置使用 tcp module:

modules:
  tcp_connect:
    prober: tcp

Prometheus

  - job_name: 'blackbox-tcp'
    metrics_path: /probe
    params:
      modelus: [tcp_connect]
    static_configs:
    - targets:
      - 172.16.106.208:6443
      - 172.16.106.80:6443
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: IP:9115

3、HTTP检测(监控网站状态)
http 探针是进行黑盒监控时最常用的探针之一,通过 http 探针能够网站或者 http 服务建立有效的监控,包括其本身的可用性,以及用户体验相关的如响应时间等等。除了能够在服务出现异常的时候及时报警,还能帮助运维同学分析和优化网站体验。

在 blackbox.yml配置文件中配置使用 http module:

modules:
  http_2xx:
    prober: http
    http:
      method: GET
  http_post_2xx:
    prober: http
    http:
      method: POST

Prometheus job

  - job_name: 'blackbox-http'
    metrics_path: /probe
    params:
      modelue: [http_2xx]
    static_configs:
    - targets:
      - https://i4t.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: IP:9115  #blackbox-exporter 所在的机器和端口

通过 prober 配置项指定探针类型。配置项 http 用于自定义探针的探测方式,这里有没对 http 配置项添加任何配置,表示完全使用 http 探针的默认配置,该探针将使用 http get 的方式对目标服务进行探测,并且验证返回状态码是否为 2xx,是则表示验证成功,否则失败。

采集数据如下

# DNS解析时间,单位 s
probe_dns_lookup_time_seconds 0.000199105
# 探测从开始到结束的时间,单位 s,请求这个页面响应时间
probe_duration_seconds 0.010889113
# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
# TYPE probe_failed_due_to_regex gauge
probe_failed_due_to_regex 0
# HTTP 内容响应的长度
probe_http_content_length -1
# 按照阶段统计每阶段的时间
probe_http_duration_seconds{phase="connect"} 0.001083728    #连接时间
probe_http_duration_seconds{phase="processing"} 0.008365885 #处理请求的时间
probe_http_duration_seconds{phase="resolve"} 0.000199105    #响应时间
probe_http_duration_seconds{phase="tls"} 0                  #校验证书的时间
probe_http_duration_seconds{phase="transfer"} 0.000446424   #传输时间
# 重定向的次数
probe_http_redirects 0
# ssl 指示是否将 SSL 用于最终重定向
probe_http_ssl 0
# 返回的状态码
probe_http_status_code 200
# 未压缩的响应主体长度
probe_http_uncompressed_body_length 1766
# http 协议的版本
probe_http_version 1.1
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
probe_ip_addr_hash 3.24030434e+09
# 使用的 ip 协议的版本号
probe_ip_protocol 4
# 是否探测成功
probe_success 1

Grafana 配置

Grafana之前写过了,有需要看之前文章

Docker安装

Prometheus 监控MySQL数据库

Grafana模板推荐

  • 13230 SSL证书监控

1656230435221.png

  • 13659 HTTP状态监控

1656232195368.png

  • 9965 SSL TCP HTTP综合监控图标

1656232374397.png

AlertManager

alertmanager告警配置如下

  • SSL证书小于30天发送告警
  • HTTP状态非200告警

alertmanager安装可以看下面文章,我这直接提供规则

AlertManager 微信告警配置

alertmanager设置规则

[root@prometheus rules]# cat /etc/prometheus/rules/blackbox_exporter.yaml

groups:
    - name: Blackbox 监控告警
      rules:
      - alert: BlackboxSlowProbe
        expr: avg_over_time(probe_duration_seconds[1m]) > 1
        for: 30m
        labels:
          severity: warning
        annotations:
          summary: telnet (instance {{ $labels.instance }}) 超时1秒
          description: "VALUE = {{ $value }}n  LABELS = {{ $labels }}"

      - alert: BlackboxProbeHttpFailure
        expr: probe_http_status_code <= 199 OR probe_http_status_code >= 400
        for: 30m
        labels:
          severity: critical
        annotations:
          summary: HTTP 状态码 (instance {{ $labels.instance }})
          description: "HTTP status code is not 200-399n  VALUE = {{ $value }}n  LABELS = {{ $labels }}"
      - alert: BlackboxSslCertificateWillExpireSoon
        expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 30
        for: 30m
        labels:
          severity: warning
        annotations:
          summary:  域名证书即将过期 (instance {{ $labels.instance }})
          description: "域名证书30天后过期n  VALUE = {{ $value }}n  LABELS = {{ $labels }}"

      - alert: BlackboxSslCertificateWillExpireSoon
        expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 7
        for: 30m
        labels:
          severity: critical
        annotations:
          summary: 域名证书即将过期 (instance {{ $labels.instance }})
          description: "域名证书7天后过期n VALUE = {{ $value }}n  LABELS = {{ $labels }}"

      - alert: BlackboxSslCertificateExpired
        expr: probe_ssl_earliest_cert_expiry - time() <= 0
        for: 30m
        labels:
          severity: critical
        annotations:
          summary: 域名证书已过期 (instance {{ $labels.instance }})
          description: "域名证书已过期n  VALUE = {{ $value }}n  LABELS = {{ $labels }}"
      - alert: BlackboxProbeSlowHttp
        expr: avg_over_time(probe_http_duration_seconds[1m]) > 10
        for: 30m
        labels:
          severity: warning
        annotations:
          summary: HTTP请求超时 (instance {{ $labels.instance }})
          description: "HTTP请求超时超过10秒n  VALUE = {{ $value }}n  LABELS = {{ $labels }}"

重启prometheus

docker restart prometheus_new

此时Prometheus已经添加上,并且微信已经告警

1656234887268.png

1656235517484.png

给TA买糖
共{{data.count}}人
人已赞赏
CephGrafanaKubernetesprometheus

Prometheus Grafana使用Ceph持久化并监控k8s集群

2022-6-26 0:22:02

CephGrafanaprometheus

Prometheus监控Ceph集群并设置AlertManager告警

2022-6-27 0:07:43

0 条回复 A文章作者 M管理员
    暂无讨论,说说你的看法吧
个人中心
购物车
优惠劵
今日签到
有新私信 私信列表
搜索