基于 K8s 实现 Prometheus 高可用架构方案
背景说明
没有监控告警平台的业务就相当于是在裸跑,本文将提供一套当前比较流行的高可用监控告警平台解决方案。
架构方案
- 多个 Prometheus 服务从集群节点采集监控数据,并推送到 InfluxDB。
- Prometheus 评估告警规则,并将告警推送到 Alertmanager。
- Alertmanager 将告警信息推送到 Alertmanager-webhook,Alertmanager-webhook 是单独实现的一个 HTTP 服务,用于接收 Alertmanager 推送的告警信息,然后根据使用场景分别发送到邮箱、钉钉、企业微信等。
- 用户通过 Grafana 查看监控图表。
部署
InfluxDB
InfluxDB v2 vs v1
- InfluxDB v2 与 v1 存在较大差异,性能相对 v1 会有所提升,且 v2 与 v1 基础概念也有所变化。
- v2 不支持 Prometheus remote storage API,需要使用 Telegraf 作为中转组件,参考:Prometheus Remote Write Support with InfluxDB 2.0。同时,增加 Telegraf 会引入维护成本,所以选择 v1 版本作为 Prometheus 后端存储。
- InfluxDB v1 Prometheus remote storage API 支持文档参考:Prometheus endpoints support in InfluxDB。
InfluxDB K8s yaml 部署文件
- 注意修改下面 yaml 文件中参数:
- ${NAMESPACE}:K8s ns
- ${STORAGE_INFLUXDB_SIZE}:PVC 存储大小
- ${STORAGE_CLASS_NAME}:PVC 存储名称
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
|
apiVersion: apps/v1
kind: Deployment
metadata:
name: influxdb
namespace: ${NAMESPACE}
spec:
replicas: 1
selector:
matchLabels:
app: influxdb
strategy:
type: Recreate
template:
metadata:
labels:
app: influxdb
spec:
containers:
- image: influxdb:1.8.10
imagePullPolicy: IfNotPresent
resources:
requests:
memory: "32Mi"
cpu: "1m"
limits:
memory: "16Gi"
cpu: 4
env:
- name: INFLUXDB_CONFIG_PATH
value: "/etc/influxdb/influxdb.conf"
- name: INFLUXDB_ADMIN_ENABLED
value: "true"
- name: INFLUXDB_DB
value: "prometheus"
name: influxdb
ports:
- containerPort: 8086
protocol: TCP
volumeMounts:
- name: influxdb-data-volume
mountPath: /var/lib/influxdb
- name: influxdb-config-volume
mountPath: /etc/influxdb
volumes:
- name: influxdb-data-volume
persistentVolumeClaim:
claimName: influxdb-data
- name: influxdb-config-volume
configMap:
name: influxdb-config
imagePullSecrets:
- name: pull-secret
---
apiVersion: v1
kind: ConfigMap
metadata:
name: influxdb-config
namespace: ${NAMESPACE}
data:
influxdb.conf: |
[meta]
dir = "/var/lib/influxdb/meta"
[data]
dir = "/var/lib/influxdb/data"
wal-dir = "/var/lib/influxdb/wal"
wal-fsync-delay = "100ms"
index-version = "tsi1"
engine = "tsm1"
cache-max-memory-size = "8g"
cache-snapshot-memory-size = "200m"
max-concurrent-compactions = 12
compact-throughput = "64m"
compact-throughput-burst = "64m"
max-series-per-database = 0
max-values-per-tag = 0
series-id-set-cache-size = 100
[monitor]
store-enabled = false
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: influxdb-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: ${STORAGE_INFLUXDB_SIZE}
storageClassName: ${STORAGE_CLASS_NAME}
---
apiVersion: v1
kind: Service
metadata:
name: influxdb
namespace: ${NAMESPACE}
spec:
ports:
- port: 8086
protocol: TCP
targetPort: 8086
selector:
app: influxdb
type: ClusterIP
|
创建数据库
-
InfluxDB 部署成功后,进入 pod 内
1
|
kubectl exec -it influxdb-5697fd8897-hh5sx bash
|
-
登录 InfluxDB 客户端,执行命令
1
2
3
4
5
6
7
8
|
# 进入客户端
influx
# 创建数据库
CREATE DATABASE "prometheus"
# 修改默认策略,数据保留天数改为30天,可按实际情况修改
ALTER RETENTION POLICY autogen ON prometheus DURATION 30d;
|
Prometheus
Prometheus 高可用方案实现
- Prometheus 官方社区并无高可用架构方案,Prometheus 均为单点部署。
- Prometheus 也提供联邦集群架构,可以实现无限扩展、统一管理,提升集群性能,但同样也存在单点问题,无法满足要求。
- Prometheus v2.32.0 提供 Agent 模式,可以直接将数据存储到远端,如 Thanos、InfluxDB 等,但该模式同样会禁用查询、告警功能,无法满足我们的要求。
业界一般会采用搭建多套 Prometheus,上层使用负载均衡的方式实现监控高可用,本文也会采用该方式。
怎么区分多个 Prometheus 副本的数据呢?
当我们在 K8s 上启动多个 Prometheus 副本后,每个副本会并行采集所有 Target 监控数据,并统一推送的 InfluxDB 存储。其中会存在一个问题,即不同的 Prometheus 副本怎么区分 InfluxDB 中的哪些数据是它采集的呢?
Prometheus 提供了 external_labels 功能,全局的 external label, 当 Prometheus (Federation, Remote storage, Alertmanager) 和外部交互的时候很有用。举个例子:当多个 Prometheus 数据聚合到同一个 Federation Prometheus 或者 Remote storage 的时候,可以加一个 id/cluster/replica label 作为区分,如:
1
2
3
|
global:
external_labels:
replica: A # B for the second replica.
|
但是,发送到 Alertmanager 的所有警报也将具有不同的副本标签。由于 Alertmanager 根据相同的标签集对警报进行重复数据删除,因此这种重复数据删除现在会失效。将会收到与 Prometheus 服务器副本一样多的告警通知。为解决这种问题,Prometheus 也提供了 alerting alert_relabel_configs 配置用于删除标签,如:
1
2
3
4
5
|
alerting:
alert_relabel_configs:
# Drop the "replica" label.
- action: labeldrop
regex: replica
|
Prometheus K8s yaml 部署文件
K8s 中实现 Prometheus 上述逻辑,需要通过增加参数 –enable-feature=expand-external-labels 开启 external-labels 功能,然后基于 K8s StatefulSet pod 名称有序性,将 pod 名称作为 Prometheus external_labels replica,以实现数据区分。
- 注意修改下面 yaml 文件中参数:
- ${NAMESPACE}:K8s ns
- ${STORAGE_CLASS_NAME}:PVC 存储名称
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
|
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
namespace: ${NAMESPACE}
spec:
replicas: 2
serviceName: prometheus
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- args:
- --config.file=/etc/prometheus/config/prometheus.yaml
- --storage.tsdb.path=/data/prometheus
- --storage.tsdb.retention.time=3d
- --web.enable-lifecycle
- --enable-feature=expand-external-labels
image: prom/prometheus:v2.54.1
imagePullPolicy: IfNotPresent
resources:
requests:
memory: "32Mi"
cpu: "1m"
limits:
memory: "16Gi"
cpu: 4
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
livenessProbe:
failureThreshold: 3
httpGet:
path: /-/healthy
port: 9090
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 30
name: prometheus
ports:
- containerPort: 9090
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /-/ready
port: 9090
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 30
volumeMounts:
- name: prometheus-config-volume
mountPath: /etc/prometheus/config
- name: prometheus-sd-configs-volume
mountPath: /etc/prometheus/sd_configs
- name: prometheus-rules-volume
mountPath: /etc/prometheus/rules
- name: prometheus-data-volume
mountPath: /data/prometheus
volumes:
- name: prometheus-config-volume
configMap:
name: prometheus-config
- name: prometheus-sd-configs-volume
configMap:
name: prometheus-sd-configs
- name: prometheus-rules-volume
configMap:
name: prometheus-rules
imagePullSecrets:
- name: pull-secret
volumeClaimTemplates:
- metadata:
name: prometheus-data-volume
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
storageClassName: ${STORAGE_CLASS_NAME}
resources:
requests:
storage: 100Gi
---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: ${NAMESPACE}
data:
prometheus.yaml: |
global:
scrape_interval: 30s
scrape_timeout: 10s
evaluation_interval: 30s
external_labels:
replica: ${POD_NAME}
scrape_configs:
- job_name: 'etcd'
metrics_path: /metrics
file_sd_configs:
- files: ['/etc/prometheus/sd_configs/*.yml']
alerting:
alert_relabel_configs:
- source_labels: [dc]
regex: (.+)\d+
target_label: dc
- action: labeldrop
regex: replica
alertmanagers:
- static_configs:
- targets:
- 'alertmanager-0.alertmanager.${NAMESPACE}.svc.cluster.local:9093'
- 'alertmanager-1.alertmanager.${NAMESPACE}.svc.cluster.local:9093'
- 'alertmanager-2.alertmanager.${NAMESPACE}.svc.cluster.local:9093'
rule_files:
- /etc/prometheus/rules/*.yml
remote_write:
- url: "http://influxdb:8086/api/v1/prom/write?db=prometheus"
remote_read:
- url: "http://influxdb:8086/api/v1/prom/read?db=prometheus"
---
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: ${NAMESPACE}
spec:
ports:
- name: http
port: 9090
protocol: TCP
targetPort: 9090
selector:
app: prometheus
type: ClusterIP
clusterIP: None
---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-rules
data:
etcd.rules.yml: |
groups:
- name: etcd
rules:
- alert: EtcdMembersDown
expr: up{instance =~ ".*:2379"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Etcd member is down"
description: "etcd cluster member [{{ $labels.instance }}] is down.\n Labels: {{ $labels }}"
---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-sd-configs
data:
etcd1.yml: |
- targets:
- '10.0.0.1:2379'
- '10.0.0.2:2379'
- '10.0.0.3:2379'
labels:
cluster_id: "etcd1"
|
校验 external_labels 是否生效
Alertmanager
Alertmanager 部署为集群版,三节点实现高可用。Alertmanager 会将告警分组聚合后发送到 webhook,webhook 对接了企业微信机器人,可以将告警信息推送到企业微信群。
- 注意修改下面 yaml 文件中参数:
- ${NAMESPACE}:K8s ns
- ${STORAGE_CLASS_NAME}:PVC 存储名称
- ${ALERTMANAGER_WEBHOOK_CONFIGS_URL}:Alertmanager-webhook 服务地址
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
|
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app: alertmanager
name: alertmanager
namespace: ${NAMESPACE}
spec:
replicas: 3
serviceName: alertmanager
selector:
matchLabels:
app: alertmanager
template:
metadata:
labels:
app: alertmanager
spec:
containers:
- name: alertmanager
env:
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
args:
- "--log.level=info"
- "--config.file=/etc/alertmanager/alertmanager.yml"
- "--storage.path=/alertmanager"
- "--cluster.settle-timeout=1m"
- "--cluster.peer=alertmanager-0.alertmanager.${NAMESPACE}.svc.cluster.local:9094"
- "--cluster.peer=alertmanager-1.alertmanager.${NAMESPACE}.svc.cluster.local:9094"
- "--cluster.peer=alertmanager-2.alertmanager.${NAMESPACE}.svc.cluster.local:9094"
- "--cluster.advertise-address=[$(POD_IP)]:9094"
- "--cluster.listen-address=0.0.0.0:9094"
image: prom/alertmanager:v0.27.0
resources:
requests:
memory: "32Mi"
cpu: "1m"
limits:
memory: "16Gi"
cpu: 4
ports:
- containerPort: 9093
name: alertmanager
volumeMounts:
- name: alertmanager-config-volume
mountPath: /etc/alertmanager
- name: alertmanager-data-volume
mountPath: /alertmanager
volumes:
- name: alertmanager-config-volume
configMap:
name: alertmanager-config
imagePullSecrets:
- name: pull-secret
volumeClaimTemplates:
- metadata:
name: alertmanager-data-volume
spec:
accessModes:
- ReadWriteOnce
storageClassName: ${STORAGE_CLASS_NAME}
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: ConfigMap
metadata:
name: alertmanager-config
namespace: ${NAMESPACE}
data:
alertmanager.yml: |
global:
resolve_timeout: 5m
route:
receiver: default
group_by: ["alertname", "cluster_id"]
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
routes:
- receiver: wechatrobot
match:
severity: critical
receivers:
- name: "default"
webhook_configs:
- send_resolved: true
url: "${ALERTMANAGER_WEBHOOK_CONFIGS_URL}"
- name: "wechatrobot"
webhook_configs:
- send_resolved: true
url: "${ALERTMANAGER_WEBHOOK_CONFIGS_URL}"
---
apiVersion: v1
kind: Service
metadata:
name: alertmanager
namespace: ${NAMESPACE}
spec:
ports:
- name: http
port: 9093
targetPort: 9093
- name: cluster
port: 9094
targetPort: 9094
selector:
app: alertmanager
type: ClusterIP
# headless service
clusterIP: None
|
Alertmanager-webhook
本教材会对接到企业微信,参考开源实现:https://github.com/k8stech/alertmanager-wechatrobot-webhook。同时,也可将告警发送到钉钉,参考实现:https://github.com/timonwong/prometheus-webhook-dingtalk。
- 注意修改下面 yaml 文件中参数:
- ${NAMESPACE}:K8s ns
- ${DOCKER_IMAGE}:Docker 镜像,可参考代码仓库构建镜像
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
|
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: alertmanager-webhook
name: alertmanager-webhook
namespace: ${NAMESPACE}
spec:
replicas: 2
selector:
matchLabels:
app: alertmanager-webhook
template:
metadata:
labels:
app: alertmanager-webhook
spec:
containers:
- name: alertmanager-webhook
image: ${DOCKER_IMAGE}
resources:
requests:
memory: "32Mi"
cpu: "1m"
limits:
memory: "4Gi"
cpu: 2
ports:
- containerPort: 8999
name: http
imagePullSecrets:
- name: pull-secret
---
apiVersion: v1
kind: Service
metadata:
name: alertmanager-webhook
namespace: ${NAMESPACE}
spec:
ports:
- name: alertmanager-webhook
port: 8999
targetPort: 8999
selector:
app: alertmanager-webhook
type: ClusterIP
|
Grafana
Grafana 社区支持单机版和集群版,单机版可直接使用内置的 SQLite 3 数据库,存储元数据信息。集群版则必须使用 MySQL 或 PostgreSQL 数据库。参考官方说明:https://grafana.com/docs/grafana/latest/setup-grafana/installation/#supported-databases。
单机版部署比较简单,本文使用单机版部署。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
|
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: grafana
spec:
selector:
matchLabels:
app: grafana
replicas: 1
serviceName: grafana
template:
metadata:
labels:
app: grafana
spec:
containers:
- name: grafana
image: grafana/grafana:11.2.0
resources:
requests:
memory: "128Mi"
cpu: "10m"
limits:
memory: "8Gi"
cpu: "4"
args:
- -config=/etc/grafana/grafana.ini
ports:
- containerPort: 3000
volumeMounts:
- name: grafana-data-volume
mountPath: /var/lib/grafana
- name: grafana-config-volume
mountPath: /etc/grafana
- name: grafana-datasources-volume
mountPath: /etc/grafana/provisioning/datasources
- name: grafana-dashboards-volume
mountPath: /etc/grafana/provisioning/dashboards
volumes:
- name: grafana-config-volume
configMap:
name: grafana-config
- name: grafana-datasources-volume
configMap:
name: grafana-datasources
- name: grafana-dashboards-volume
configMap:
name: grafana-dashboards
imagePullSecrets:
- name: pull-secret
volumeClaimTemplates:
- metadata:
name: grafana-data-volume
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
storageClassName: ${STORAGE_CLASS_NAME}
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: ConfigMap
metadata:
labels:
app: grafana-config
name: grafana-config
data:
grafana.ini: |
[database]
type = sqlite3
[paths]
provisioning = /etc/grafana/provisioning
[auth.anonymous]
enabled = true
org_role = Viewer
[analytics]
reporting_enabled = false
check_for_updates = false
[dashboards]
versions_to_keep = 20
[alerting]
enabled = false
---
apiVersion: v1
kind: ConfigMap
metadata:
labels:
app: grafana-datasources
name: grafana-datasources
data:
datasources.yml: |
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: ${PROMETHEUS_URL}
isDefault: true
version: 1
editable: false
---
apiVersion: v1
kind: Service
metadata:
name: grafana
labels:
app: grafana
spec:
selector:
app: grafana
ports:
- port: 3000
targetPort: 3000
protocol: TCP
type: ClusterIP
---
apiVersion: v1
kind: ConfigMap
metadata:
labels:
app: grafana-dashboards
name: grafana-dashboards
data:
dashboard.yaml: |
apiVersion: 1
providers:
- name: 'default'
type: file
disableDeletion: false
updateIntervalSeconds: 3 #how often Grafana will scan for changed dashboards
options:
path: /etc/grafana/provisioning/dashboards
etcd_dashboard.json: |
{
...
}
|