Compare commits

..

236 Commits

Author SHA1 Message Date
Paweł Krupa
864ca1e773 Merge pull request #1448 from andrein/cherry-pick-1445
Cherry-pick grafana LDAP into release-0.9
2021-10-20 13:50:04 +02:00
Andrei Nistor
822f885d67 add grafana ldap example
(cherry picked from commit 882484daf1)
2021-10-19 17:08:11 +03:00
machinly
184a6a452b add grafana ldap support
(cherry picked from commit ce7007c568)
2021-10-19 17:08:11 +03:00
Paweł Krupa
b6ab321ac8 Merge pull request #1443 from prometheus-operator/automated-updates-release-0.9 2021-10-18 10:42:13 +02:00
dgrisonnet
6e67e7fdbb [bot] [release-0.9] Automated version update 2021-10-18 07:39:34 +00:00
Damien Grisonnet
ad19693121 Merge pull request #1432 from prometheus-operator/automated-updates-release-0.9
[bot] [release-0.9] Automated version update
2021-10-12 09:20:41 +02:00
dgrisonnet
8ccd82e40a [bot] [release-0.9] Automated version update 2021-10-11 07:39:30 +00:00
Damien Grisonnet
c1fc78c979 Merge pull request #1405 from PhilipGough/bp-9
Adjust dropped metrics from cAdvisor
2021-09-28 12:00:16 +02:00
Philip Gough
4e96f7bed6 Adjust dropped metrics from cAdvisor
This change drops pod-centric metrics without a non-empty 'container' label.

Previously we dropped pod-centric metrics without a (pod, namespace) label set
however these can be critical for debugging.

Keep 'container_fs_.*' metrics from cAdvisor
2021-09-28 10:17:59 +01:00
Damien Grisonnet
49eb7c66f6 Merge pull request #1400 from prometheus-operator/automated-updates-release-0.9
[bot] [release-0.9] Automated version update
2021-09-27 11:11:29 +02:00
dgrisonnet
b4b365cead [bot] [release-0.9] Automated version update 2021-09-27 07:39:22 +00:00
Damien Grisonnet
fdcff9a224 Merge pull request #1366 from dgrisonnet/pin-kubernetes-grafana
Pin kubernetes-grafana on release-0.9
2021-09-07 09:17:15 +02:00
Damien Grisonnet
2640b11d77 jsonnet: pin kubernetes-grafana on release-0.9
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-09-06 20:07:07 +02:00
Simon Pasquier
9ead6ebc53 Merge pull request #1349 from prometheus-operator/automated-updates-release-0.9
[bot] [release-0.9] Automated version update
2021-08-25 12:03:25 +02:00
simonpasquier
62a5b28b55 [bot] [release-0.9] Automated version update 2021-08-25 09:37:18 +00:00
Damien Grisonnet
0ca8df7a35 Merge pull request #1338 from dgrisonnet/cut-release-0.9
Cut release 0.9
2021-08-20 13:44:40 +02:00
Damien Grisonnet
4cfbfae071 Add release-0.9 CHANGELOG
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-08-19 16:43:34 +02:00
Damien Grisonnet
8587958cf0 Update compatibility matrix with release-0.9
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-08-19 16:43:34 +02:00
Damien Grisonnet
eca67844af jsonnet: pin and update jsonnet depdencies
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-08-19 16:41:53 +02:00
Damien Grisonnet
0df510d1fa Merge pull request #1337 from dgrisonnet/kubernetes-1.22
Test against Kubernetes 1.22
2021-08-18 19:03:21 +02:00
Damien Grisonnet
da35954628 .github: drop support for 1.20 on main
According to our policy, main branch of kube-prometheus should support
the 2 latest versions of Kubernetes. These changes update the tests and
the compatibility matrix to reflect that.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-08-18 17:53:40 +02:00
Damien Grisonnet
b5ec93208b jsonnet: drop deprecated etcd metric
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-08-18 17:27:50 +02:00
Damien Grisonnet
518c37d72d .github: test against Kubernetes 1.22
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-08-18 14:04:33 +02:00
Paweł Krupa
35397089d1 Merge pull request #1334 from dgrisonnet/prometheus-adapter-v0.9.0
Update prometheus-adapter to v0.9.0
2021-08-17 18:31:40 +02:00
Damien Grisonnet
45adc03cfb jsonnet: update prometheus-adapter to v0.9.0
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-08-17 18:05:45 +02:00
Damien Grisonnet
c1fa4971e6 Merge pull request #1325 from paulfantom/fix-1324
jsonnet: set thanos config to null by default
2021-08-17 11:20:47 +02:00
Damien Grisonnet
c69f3b4e62 Merge pull request #1330 from prometheus-operator/automated-updates-main
[bot] [main] Automated version update
2021-08-17 10:18:47 +02:00
dgrisonnet
6ade9e5c7d [bot] [main] Automated version update 2021-08-17 08:05:49 +00:00
Paweł Krupa
50c9dd2c6f Merge pull request #1326 from dgrisonnet/fix-versions-ci
Fix automated update in CI
2021-08-17 09:08:08 +02:00
Damien Grisonnet
24b0e699e4 .github: fix automated update in CI
Automated dependencies update in CI was failing whenever no new changes
were detected since git diff was returning 1.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-08-16 18:51:30 +02:00
paulfantom
c4113807fb jsonnet: set thanos config to null by default
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2021-08-16 15:16:52 +02:00
Paweł Krupa
89b57081f7 Merge pull request #1313 from dgrisonnet/enable-auto-updates
.github: enable auto updates on release branches
2021-08-16 10:16:56 +02:00
Paweł Krupa
2e8e88b882 Merge pull request #1320 from prometheus-operator/automated-updates-main
[bot] [main] Automated version update
2021-08-16 10:12:34 +02:00
paulfantom
ad3fc8920e [bot] [main] Automated version update 2021-08-16 08:04:51 +00:00
Paweł Krupa
8d36d0d707 Merge pull request #1317 from DimitrijeManic/wip/update-doc 2021-08-12 14:14:49 +02:00
Dimitrije Manic
ac75ee6221 Updates prometheus-rules documentation 2021-08-12 08:03:16 -04:00
Paweł Krupa
5452de1b43 Merge pull request #1315 from DimitrijeManic/wip/update-rule-selector 2021-08-11 16:27:38 +02:00
Dimitrije Manic
12cd7fd9ce Prometheus ruleSelector defaults to all rules 2021-08-11 10:16:24 -04:00
Damien Grisonnet
0ffe13c5d2 .github: enable auto updates on release branches
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-08-09 18:59:30 +02:00
Damien Grisonnet
6a150f4cc8 Merge pull request #1310 from paulfantom/full-path
jsonnet: use full dependency path
2021-08-09 17:53:22 +02:00
paulfantom
f6d6b30aed jsonnet: use full dependency path 2021-08-06 14:15:23 +02:00
Damien Grisonnet
33cc694f18 Merge pull request #1308 from PaytmLabs/feature/separate-thanos-rules
Create Thanos Sidecar rules separately from Prometheus ones
2021-08-05 16:19:01 +02:00
Maxime Brunet
961f138dd0 Add back _config.runbookURLPattern for Thanos Sidecar rules 2021-08-04 14:22:06 -07:00
Paweł Krupa
54d8f88162 Merge pull request #1307 from PaytmLabs/feature/addons/aws-vpc-cni
Turn AWS VPC CNI into a control plane add-on
2021-08-04 09:56:50 +02:00
Paweł Krupa
e931a417fc Merge pull request #1230 from Luis-TT/fix-kube-proxy-dashboard 2021-08-04 09:55:09 +02:00
Luis Vidal Ernst
0b49c3102d Added PodMonitor for kube-proxy 2021-08-03 08:31:49 +02:00
Maxime Brunet
0e7dc97bc5 Create Thanos Sidecar rules separately from Prometheus ones 2021-08-02 12:46:06 -07:00
Maxime Brunet
d3ccfb8220 Turn AWS VPC CNI into a control plane add-on 2021-08-02 11:26:33 -07:00
Damien Grisonnet
a330e8634a Merge pull request #1306 from paulfantom/fix-auto
.github: allow dispatching version updates manually and run on predefined schedule
2021-08-02 18:13:44 +02:00
paulfantom
1040e2bd70 .github: allow dispatching version updates manually and run on predefined schedule 2021-08-02 17:53:45 +02:00
Paweł Krupa
c3be50f61f Merge pull request #1303 from dgrisonnet/release-branch-update
Add automated dependency update to the remaining supported release branch
2021-08-02 17:50:28 +02:00
Paweł Krupa
075875e8aa Merge pull request #1298 from prometheus-operator/automated-updates-main
[bot] [main] Automated version update
2021-08-02 17:48:41 +02:00
Damien Grisonnet
9e8d1b0a72 .github: add remaining supported release branch
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-08-02 15:57:57 +02:00
dgrisonnet
e97eb0fbe9 [bot] [main] Automated version update 2021-08-02 13:37:08 +00:00
Paweł Krupa
1eeb463203 Merge pull request #1301 from dgrisonnet/fix-job-skip 2021-08-02 15:20:12 +02:00
Damien Grisonnet
844bdd9c47 .github: fix update version skip on release branch
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-08-02 15:11:41 +02:00
Paweł Krupa
0184f583d8 Merge pull request #1293 from dgrisonnet/release-branch-update 2021-08-02 13:51:59 +02:00
Damien Grisonnet
20f3cfaaeb .github: temporarily switch to manual updates
Temporarily switch to manual dependencies update workflow to test if it
is updated correctly across the different release branch.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-08-02 13:38:33 +02:00
Damien Grisonnet
7542a1b055 .github: automate release branch updates
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-08-02 13:32:32 +02:00
Paweł Krupa
d15f839802 Merge pull request #1292 from PaytmLabs/hotfix/eks/warm-ip-alert
eks: Revert back to `awscni_total_ip_addresses`-based alert
2021-08-02 13:22:13 +02:00
Maxime Brunet
b7fe018d29 eks: Revert back to awscni_total_ip_addresses-based alert 2021-07-31 11:37:12 -07:00
Paweł Krupa
b9c73c7b29 Merge pull request #1283 from prashbnair/node-veth
changing node exporter ignore list
2021-07-28 09:17:03 +02:00
Prashant Balachandran
09fdac739d changing node exporter ignore list 2021-07-27 17:17:19 +05:30
Paweł Krupa
785789b776 Merge pull request #1257 from Luis-TT/kube-state-metrics-kubac-proxy-resources 2021-07-27 12:36:26 +02:00
Paweł Krupa
bbdb21f08d Merge pull request #1282 from lanmarti/main
Add resource requests and limits to prometheus-adapter container
2021-07-27 12:36:01 +02:00
lanmarti
ed48391831 Add resource requests and limits to prometheus-adapter container 2021-07-27 12:19:51 +02:00
Damien Grisonnet
a1a9707f37 Merge pull request #1281 from prometheus-operator/paulfantom-patch-1
Use @prom-op-bot for automatic updates
2021-07-27 11:04:14 +02:00
Paweł Krupa
7b7c346aa0 Use @prom-op-bot for automatic updates 2021-07-27 08:33:08 +02:00
Damien Grisonnet
5f13edd1ea Merge pull request #1279 from prometheus-operator/automated-updates
[bot] Automated version update
2021-07-26 15:59:18 +02:00
paulfantom
05c72f83ef [bot] Automated version update 2021-07-26 13:44:14 +00:00
Paweł Krupa
93d6101bae Merge pull request #1277 from PaytmLabs/hotfix/eks/cni-relabel
eks: Fix CNI metrics relabelings
2021-07-24 11:33:29 +02:00
Maxime Brunet
3a98a3478c eks: Fix CNI metrics relabelings
Signed-off-by: Maxime Brunet <maxime.brunet@paytm.com>
2021-07-23 13:39:29 -07:00
Paweł Krupa
4965e45c15 Merge pull request #1276 from mrueg/fix-typo
node.libsonnet: Fix small typo
2021-07-23 07:44:20 +02:00
Manuel Rüger
acd1eeba4c node.libsonnet: Fix small typo
Signed-off-by: Manuel Rüger <manuel@rueg.eu>
2021-07-22 19:14:24 +02:00
Damien Grisonnet
45a466e3a7 Merge pull request #1267 from paulfantom/runbook_urlk
jsonnet/kube-prometheus: point to runbooks.prometheus-operator.dev
2021-07-22 17:40:04 +02:00
Damien Grisonnet
6d9e0fb6b2 Merge pull request #1273 from paulfantom/pr-template
.github: add PR template
2021-07-22 17:35:52 +02:00
paulfantom
755d2fe5c1 manifests: regenerate 2021-07-22 17:31:30 +02:00
paulfantom
cfe830f8f0 jsonnet/kube-prometheus: point to runbooks.prometheus-operator.dev
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2021-07-22 17:30:57 +02:00
paulfantom
94731577a8 .github: add PR template
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2021-07-22 17:24:57 +02:00
Luis Vidal Ernst
9c638162ae Allow customizing of kubeRbacProxy in kube-state-metrics 2021-07-21 13:57:05 +02:00
Paweł Krupa
acea5efd85 Merge pull request #1268 from paulfantom/alerts-best-practices
Alerts best practices
2021-07-21 09:32:32 +02:00
Paweł Krupa
cd4438ed02 Merge pull request #1250 from PhilipGough/MON-1741
jsonnet: Drop cAdvisor metrics without (pod, namespace) label pairs.
2021-07-20 14:26:43 +02:00
Philip Gough
463ad065d3 jsonnet: Drop cAdvisor metrics with no (pod, namespace) labels while preserving ability to monitor system services resource usage
The following provides a description and cardinality estimation based on the tests in a local cluster:

container_blkio_device_usage_total - useful for containers, but not for system services (nodes*disks*services*operations*2)
container_fs_.*                    - add filesystem read/write data (nodes*disks*services*4)
container_file_descriptors         - file descriptors limits and global numbers are exposed via (nodes*services)
container_threads_max              - max number of threads in cgroup. Usually for system services it is not limited (nodes*services)
container_threads                  - used threads in cgroup. Usually not important for system services (nodes*services)
container_sockets                  - used sockets in cgroup. Usually not important for system services (nodes*services)
container_start_time_seconds       - container start. Possibly not needed for system services (nodes*services)
container_last_seen                - Not needed as system services are always running (nodes*services)
container_spec_.*                  - Everything related to cgroup specification and thus static data (nodes*services*5)
2021-07-20 12:50:02 +01:00
paulfantom
46eb1713a5 jsonnet: remove unused alert unit tests as those are moved to alertmanager repository 2021-07-20 11:14:38 +02:00
paulfantom
02454b3f53 manifests: regenerate 2021-07-20 11:14:28 +02:00
paulfantom
8c357c6bde jsonnet: align alert annotations with best practices
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2021-07-20 10:59:49 +02:00
Paweł Krupa
414f8053d3 Merge pull request #1264 from prometheus-operator/automated-updates
[bot] Automated version update
2021-07-19 19:01:52 +02:00
paulfantom
1a3c610c61 [bot] Automated version update 2021-07-19 13:44:23 +00:00
Paweł Krupa
274eba0108 Merge pull request #1253 from ndegory/update-doc-for-0.8
update doc on Prometheus rule updates since release 0.8
2021-07-19 10:09:56 +02:00
Paweł Krupa
99ee030de3 Merge pull request #1259 from PaytmLabs/feature/eks/cni-relabel-instance
eks: Relabel instance with node name for CNI DaemonSet
2021-07-19 10:09:09 +02:00
Paweł Krupa
80bb15bedd Merge pull request #1255 from yeya24/fix-dashboards-definition-length-check 2021-07-19 09:56:09 +02:00
Maxime Brunet
7394929c76 eks: Relabel instance with node name for CNI DaemonSet 2021-07-17 11:28:38 -07:00
Nicolas Degory
9bc6bf3db8 update doc on Prometheus rule updates since release 0.8
Signed-off-by: Nicolas Degory <ndegory@axway.com>
2021-07-14 19:18:07 -07:00
Arthur Silva Sens
ae12388b33 Merge pull request #1256 from surik/update-kubernetes-mixin
Update kubernetes-mixin
2021-07-14 19:56:35 -03:00
Yury Gargay
9b08b941f8 Update kubernetes-mixin
From b710a868a9
2021-07-14 18:51:36 +02:00
ben.ye
43adca8df7 fmt again
Signed-off-by: ben.ye <ben.ye@bytedance.com>
2021-07-13 19:56:38 -07:00
ben.ye
90b2751f06 fmt code
Signed-off-by: ben.ye <ben.ye@bytedance.com>
2021-07-13 19:48:01 -07:00
ben.ye
dee7762ae3 create dashboardDefinitions if rawDashboards or folderDashboards are specified
Signed-off-by: ben.ye <ben.ye@bytedance.com>
2021-07-13 19:39:01 -07:00
Paweł Krupa
3a44309177 Merge pull request #1208 from paulfantom/cleanup 2021-07-08 12:18:36 +02:00
paulfantom
64cfda3012 legal cleanup
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2021-07-08 11:59:41 +02:00
Damien Grisonnet
97e77e9996 Merge pull request #1231 from dgrisonnet/fix-adapter-queries
Consolidate intervals used in prometheus-adapter CPU queries
2021-07-07 13:48:02 +02:00
Damien Grisonnet
0b3db5b6b6 Merge pull request #1245 from paulfantom/make-update
*: add "update" target to makefile and use it in automatic updater
2021-07-07 13:45:56 +02:00
Paweł Krupa
60b4b3023d Merge pull request #1244 from flurreN/prom-rules-hpa 2021-07-07 10:30:18 +02:00
paulfantom
ed2ffe9d05 *: add "update" target to makefile and use it in automatic updater
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2021-07-07 10:19:07 +02:00
Philip
3e6865d776 Generate kubernetes-mixin 2021-07-06 17:49:32 +02:00
Paweł Krupa
acd7cdcde0 Merge pull request #1243 from Kolossi/main
apply make fmt fixes to migration readme extracts
2021-07-06 14:01:31 +02:00
Paul Sweeney
552c9ecaea apply make fmt fixes to migration readme extracts 2021-07-06 12:18:07 +01:00
Paweł Krupa
a91ca001a9 Merge pull request #1235 from Kolossi/main
add example release-0.3 to release-0.8 migration to docs
2021-07-06 12:58:22 +02:00
Paul Sweeney
f95eaf8598 make fmt corrections to migration examples 2021-07-06 11:19:33 +01:00
Damien Grisonnet
b9563b9c2d jsonnet: improve adapter queries readability
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-07-05 15:29:45 +02:00
Damien Grisonnet
8812e45501 jsonnet: readjust prometheus-adapter intervals
Previously, prometheus-adapter configuration wasn't taking into account
the scrape interval of kubelet, node-exporter and windows-exporter
leading to getting non fresh results, and even negative results from the
CPU queries when the irate() function was extrapolating data.
To fix that, we want to set the interval used in the irate() function in
the CPU queries to 4x scrape interval in order to extrapolate data
between the last two scrapes. This will improve the freshness of the cpu
usage exposed and prevent incorrect extrapolations.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-07-05 15:28:25 +02:00
Paweł Krupa
3ab3947270 Merge pull request #1224 from sthaha/ignore-nw-eth0
jsonnet: kube-prometheus adapt to changes to veth interfaces names
2021-07-05 14:39:13 +02:00
Paul Sweeney
e77664f325 Update docs/migration-example/my.release-0.8.jsonnet - typo
Co-authored-by: Paweł Krupa <pawel@krupa.net.pl>
2021-07-05 11:43:51 +01:00
Paweł Krupa
496bab92a6 Merge pull request #1233 from sthaha/fix-make-manifests
Fix make manifests not building every time
2021-07-05 12:13:47 +02:00
Paweł Krupa
baf0774e09 Merge pull request #1237 from PhilipGough/ci-test
ci: Use wait command to ensure cluster readiness
2021-07-05 11:02:52 +02:00
Philip Gough
e38bc756a4 ci: Harden action to wait for kind cluster readiness 2021-07-05 09:56:28 +01:00
Paul Sweeney
fadb829b28 add example release-0.3 to release-0.8 migration to docs 2021-07-01 19:40:40 +01:00
Sunil Thaha
86d8ed0004 Fix make manifests not building every time
Make target `manifests` has a dependency on build.sh which if untouched
wouldn't generate the manifests after the first run. This patch fixes it
by removing the `build.sh` dependency

Signed-off-by: Sunil Thaha <sthaha@redhat.com>
2021-07-01 12:10:48 +10:00
Sunil Thaha
0280f4ddf9 jsonnet: kube-prometheus adapt to changes to veth interfaces names
With OVN, the container veth network interface names that used to start
with `veth` has now changed to `<rand-hex>{15}@if<number>`(see Related
Links below).

This patch adapts to the new change introduced in ovn and ignores the network
interfaces that match `[a-z0-9]{15}@if\d+` in addition to those starting
with `veth`

Related Links:
  - https://github.com/openshift/ovn-kubernetes/blob/master/go-controller/vendor/github.com/containernetworking/plugins/pkg/ip/link_linux.go#L107
  - https://github.com/openshift/ovn-kubernetes/blob/master/go-controller/pkg/cni/helper_linux.go#L148

Signed-off-by: Sunil Thaha <sthaha@redhat.com>
2021-07-01 12:01:19 +10:00
Paweł Krupa
f9fd5bd499 Merge pull request #1229 from paulfantom/new-version-only
scripts: use newer version when generating
2021-06-30 11:05:18 +02:00
paulfantom
654aa9bfac scripts: use newer version when generating 2021-06-29 10:08:20 +02:00
Paweł Krupa
ad63d6bb95 Merge pull request #1220 from fpetkovski/auto-update-deps
.github/workflows: automatically update jsonnet dependencies
2021-06-25 13:23:54 +02:00
Paweł Krupa
4a3191fc09 Merge pull request #1227 from fpetkovski/change-versions-update-schedule
.github/workflows: Update versions schedule to run each Monday
2021-06-25 13:23:09 +02:00
fpetkovski
321fa1391c .github/workflows: Update versions schedule to run each Monday 2021-06-25 11:36:50 +02:00
fpetkovski
d9fc85c0bb .github/workflows: automatically update jsonnet dependencies
This commit extends the versions github workflow to automatically update
jsonnet dependencies when the jsonnet code in upstream repositories changes.
2021-06-25 11:30:22 +02:00
Damien Grisonnet
2c5c20cfff Merge pull request #1216 from fpetkovski/prometheus-adapter-cipher-suites
jsonnet: disable insecure cypher suites for prometheus-adapter
2021-06-23 21:19:24 +02:00
Paweł Krupa
7932456718 Merge pull request #1218 from prometheus-operator/automated-updates
[bot] Automated version update
2021-06-23 16:06:21 +02:00
paulfantom
d0e21f34e5 [bot] Automated version update 2021-06-23 13:41:46 +00:00
Paweł Krupa
6ffca76858 Merge pull request #1221 from fpetkovski/update-alertmanager-branch
jsonnet: update alertmanager branch to main
2021-06-23 15:25:57 +02:00
fpetkovski
86b1207e1b jsonnet: update alertmanager branch to main
Alertmanager changed its default branch to main.
This commit updates the alertmanager branch to track the new default.

Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
2021-06-23 14:25:49 +02:00
Paweł Krupa
875d7cf4e8 Merge pull request #1219 from fpetkovski/update-deps 2021-06-23 13:57:53 +02:00
fpetkovski
0959155a1c jsonnet: update downstream dependencies
This commit updates all downstream dependencies

Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
2021-06-22 16:27:29 +02:00
fpetkovski
0ff173efea jsonnet: disable insecure cypher suites for prometheus-adapter
Running sslscan against the prometheus adapter secure port reports two
insecure SSL ciphers, ECDHE-RSA-DES-CBC3-SHA and DES-CBC3-SHA.

This commit removes those ciphers from the list.

Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
2021-06-22 14:17:09 +02:00
Paweł Krupa
94c5301c03 Merge pull request #1217 from PhilipGough/bz-1913618
Sync with kubernetes-mixin
2021-06-22 12:31:31 +02:00
Philip Gough
3a4e292aab Sync with kubernetes-mixin 2021-06-22 11:11:40 +01:00
Paweł Krupa
466eb7953f Merge pull request #1215 from prometheus-operator/automated-updates
[bot] Automated version update
2021-06-18 16:03:32 +02:00
paulfantom
ffea8f498e [bot] Automated version update 2021-06-18 13:50:44 +00:00
Arthur Silva Sens
8396c697fd Merge pull request #1212 from sanglt/main
Fix ingress path rules for networking.k8s.io/v1
2021-06-16 20:58:18 -03:00
Sang Le
4e43a1e16e Fix ingress rules for api networking.k8s.io/v1 - format code 2021-06-17 08:19:23 +10:00
Arthur Silva Sens
071b39477a Merge pull request #1213 from metalmatze/blackbox-exporter-psp
Fix name for blackbox-exporter PodSecurityPolicy
2021-06-16 08:15:16 -03:00
Matthias Loibl
4ea366eef7 Fix name for blackbox-exporter PodSecurityPolicy 2021-06-16 12:55:51 +02:00
Paweł Krupa
8d57b10d50 Merge pull request #1211 from ArthurSens/as/gitpod-k3s
[Gitpod] Deploy kube-prometheus on k3s
2021-06-16 09:50:14 +02:00
Sang Le
db6a513190 Fix ingress rules for api networking.k8s.io/v1 2021-06-16 13:06:32 +10:00
ArthurSens
b7ac30704e Run k3s inside gitpod and deploy kube-prometheus.
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2021-06-15 13:29:06 +00:00
Damien Grisonnet
836fa4f086 Merge pull request #1209 from paulfantom/test-sh
scripts: move test.sh script into scripts dir
2021-06-15 11:10:16 +02:00
Damien Grisonnet
59918caf8d Merge pull request #1207 from paulfantom/rm-hack
hack: remove unused directory
2021-06-15 11:07:38 +02:00
paulfantom
6dc90593f9 scripts: move test.sh script into scripts dir 2021-06-14 22:47:22 +02:00
paulfantom
253a8ff2d6 hack: remove unused directory 2021-06-14 21:55:40 +02:00
Damien Grisonnet
df4275e3c8 Merge pull request #1206 from prometheus-operator/automated-updates
[bot] Automated version update
2021-06-14 18:19:50 +02:00
paulfantom
d6201759b8 [bot] Automated version update 2021-06-14 13:50:57 +00:00
Paweł Krupa
7d48d055c6 Merge pull request #1205 from adinhodovic/import-managed-cluster-eks
jsonnet/platforms: Import managed-cluster addon for the EKS platform
2021-06-14 12:45:48 +02:00
Adin Hodovic
88034c4c41 jsonnet/platforms: Import managed-cluster addon for the EKS platform 2021-06-14 01:07:18 +02:00
Paweł Krupa
11778868b1 Merge pull request #1202 from prashbnair/kube-mixin 2021-06-12 13:36:39 +02:00
Prashant Balachandran
78a4677370 pulling in changes from kubernetes-mixin
adding changes from kube-mixin
2021-06-12 15:26:37 +05:30
Paweł Krupa
52fa4166d2 Merge pull request #1203 from prometheus-operator/automated-updates 2021-06-12 11:48:56 +02:00
paulfantom
54f79428ce [bot] Automated version update 2021-06-11 13:51:10 +00:00
Paweł Krupa
df197f6759 Merge pull request #1192 from prometheus-operator/automated-updates 2021-06-11 15:47:41 +02:00
Damien Grisonnet
8fada1a219 Merge pull request #1201 from paulfantom/no-grafana
examples: add example of running without grafana deployment
2021-06-11 14:19:19 +02:00
Damien Grisonnet
46922c11c6 Merge pull request #1200 from paulfantom/coredns-selector
jsonnet: fix label selector for coredns ServiceMonitor
2021-06-11 12:44:40 +02:00
paulfantom
859b87b454 examples: add example of running without grafana deployment
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2021-06-11 11:07:05 +02:00
paulfantom
edc869991d manifests: regenerate 2021-06-11 11:02:21 +02:00
paulfantom
5ea10d80a1 jsonnet: fix label selector for coredns ServiceMonitor 2021-06-11 10:56:54 +02:00
paulfantom
a2cf1acd95 [bot] Automated version update 2021-06-10 13:59:30 +00:00
Paweł Krupa
2afbb72a88 Merge pull request #1193 from ArthurSens/as/alertmanager-dashboard 2021-06-09 21:08:51 +02:00
ArthurSens
f643955034 Update alertmanager mixin
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2021-06-08 18:19:23 +00:00
Damien Grisonnet
a27f65e910 Merge pull request #1191 from paulfantom/fix-version-updater
.github: write temporary file to /tmp
2021-06-08 12:18:04 +02:00
paulfantom
d45114c73e .github: write temporary file to /tmp 2021-06-08 11:22:25 +02:00
Damien Grisonnet
4d8104817d Merge pull request #1131 from paulfantom/improve-all-namespace
jsonnet: improve all-namespaces addon
2021-06-01 11:00:55 +02:00
paulfantom
feee269fdb jsonnet: improve all-namespaces addon
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2021-06-01 09:05:07 +02:00
Paweł Krupa
6d603cf7a9 Merge pull request #1142 from faruryo/fix/kubespray-alert
Fix scheduler and controller selectors for Kubespray
2021-05-31 23:14:02 +02:00
Paweł Krupa
dccf2ee085 Merge pull request #1135 from paulfantom/use-common 2021-05-31 23:12:53 +02:00
Paweł Krupa
93cc34f0f6 Merge pull request #1171 from anarcher/pr/grafana-env-1167
feat(grafana): add env parameter for grafana component
2021-05-31 23:11:34 +02:00
Ajit
d57542eae1 Fix for bug #1163 (#1164) 2021-05-31 23:08:59 +02:00
Paweł Krupa
133c274aa9 Merge pull request #1173 from paulfantom/version-update 2021-05-31 22:57:23 +02:00
paulfantom
67f710846a .github: make version update operation atomic 2021-05-31 17:13:35 +02:00
Damien Grisonnet
68b926f643 Merge pull request #1170 from paulfantom/include-versions
scripts: include kube-rbac-proxy and config-reloader in version upgrades
2021-05-31 11:58:28 +02:00
anarcher
8bcfb98a1d feat(grafana): add env parameter for gradana component 2021-05-31 18:52:55 +09:00
paulfantom
e5720038fe scripts: include kube-rbac-proxy and config-reloader in version upgrades 2021-05-31 11:02:19 +02:00
Paweł Krupa
1a39aaa2ab Merge pull request #1166 from paulfantom/version-upgrader-v2 2021-05-31 10:56:57 +02:00
Paweł Krupa
b279e38809 Merge pull request #1129 from onprem/feature-flags 2021-05-31 10:56:39 +02:00
Paweł Krupa
ae48746f3a Merge pull request #1169 from paulportela/patch-1
Fix adding private repository
2021-05-31 10:56:05 +02:00
paulportela
f7baf1599d Fix adding private repository
`imageRepos` field was removed and the project no longer tries to compose image strings. Now the libraries use `$.values.common.images` to override default images.
2021-05-28 17:22:27 -07:00
Prem Saraswat
93282accb7 Generate manifests 2021-05-27 23:21:30 +05:30
Prem Saraswat
228f8ffdad Add support for feature-flags in Prometheus 2021-05-27 23:21:30 +05:30
paulfantom
9b65a6ddce .github: re-enable automatic version upgrader
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2021-05-27 18:04:12 +02:00
Paweł Krupa
e481cbd7c5 Merge pull request #1162 from paulfantom/deprecated 2021-05-27 12:38:23 +02:00
paulfantom
b10e0c9690 manifests: regenerate 2021-05-27 10:51:14 +02:00
paulfantom
039d4a1e48 jsonnet: sort list of dropped metrics 2021-05-27 10:49:36 +02:00
paulfantom
2873857dc7 jsonnet: convert string of deprecated metrics into array 2021-05-27 10:46:58 +02:00
Paweł Krupa
6c82dd5fc1 Merge pull request #1161 from paulfantom/ci-1.21
Enable tests for kubernetes 1.21
2021-05-27 10:45:57 +02:00
paulfantom
edd0eb639e manifests: regenerate 2021-05-26 12:50:11 +02:00
paulfantom
2fee85eb43 jsonnet: drop storage_operation_errors_total and storage_operation_status_count as those are deprecated in k8s 1.21 2021-05-26 12:49:44 +02:00
paulfantom
e1e367e820 .github: enable e2e tests on k8s 1.21
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2021-05-26 12:30:53 +02:00
Paweł Krupa
a89da4adb6 Merge pull request #1113 from paulfantom/unpin
Unpin jsonnet dependencies
2021-05-26 11:18:46 +02:00
Paweł Krupa
8f7d2b9c6a Merge pull request #1107 from paulfantom/mixin-add
Improvements in addMixin function.
2021-05-26 11:18:29 +02:00
paulfantom
888443e447 manifests: regenerate 2021-05-25 16:03:49 +02:00
paulfantom
ce7e86b93a jsonnet/kube-prometheus: fix usage of latest thanos mixin 2021-05-25 16:03:39 +02:00
paulfantom
ddfadbadf9 jsonnet: unpin dependencies
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2021-05-25 16:03:11 +02:00
Paweł Krupa
6134f1a967 Merge pull request #1157 from fpetkovski/update-kubeconform 2021-05-25 15:44:17 +02:00
fpetkovski
5fbdddf92e Update kubeconform to 0.4.7
This change updates the version of kubeconform to 0.4.7. It simplifies the
`validate` Makefile target and extracts the kubernetes version into a variable.
2021-05-25 15:33:47 +02:00
paulfantom
9e00fa5136 docs: regenerate 2021-05-21 11:44:16 +02:00
paulfantom
3197720de6 *: add test of mixin addition in examples/; change config to _config in addMixin to be consistent with main components 2021-05-21 11:43:59 +02:00
Paweł Krupa
b9ecb0a6c6 Merge pull request #1148 from xadereq/fix_missing_resource
jsonnet/components: fix missing resource config in blackbox exporter
2021-05-20 14:37:24 +02:00
Simon Pasquier
eb06a1ab45 Merge pull request #1146 from simonpasquier/fix-ksm-lite-addon
jsonnet/kube-prometheus/addons: fix KSM regex patterns
2021-05-20 09:22:27 +02:00
Piotr Piskiewicz
a8c344c848 jsonnet/components: fix missing resource config in blackbox exporter 2021-05-17 21:32:01 +02:00
Simon Pasquier
e58cadfe96 jsonnet/kube-prometheus/addons: fix KSM regex patterns
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2021-05-17 12:42:43 +02:00
faruryo
babc6b820c Fix scheduler and controller selectors for Kubespray
- refs:https://github.com/prometheus-operator/kube-prometheus/pull/916
- kubespray uses kubeadm, so it is good to inherit it
2021-05-09 23:26:47 +09:00
Paweł Krupa
3b1f268d51 Merge pull request #1140 from paulfantom/config-reloader
jsonnet: use common to populate options for additional objects
2021-05-07 10:00:29 +02:00
paulfantom
f340a76e21 jsonnet/addons: fix config-reloader limits 2021-05-07 09:37:03 +02:00
Paweł Krupa
a1210f1eff Merge pull request #1132 from paulfantom/ruleNamespaceSelector 2021-05-06 23:05:34 +02:00
paulfantom
c2ea96bf4f jsonnet: use common to populate options for additional objects
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2021-05-05 16:31:36 +02:00
Paweł Krupa
d50b5fd2ea Merge pull request #1136 from dgrisonnet/prometheus-adapter-pdb
Add PodDisruptionBudget to prometheus-adapter
2021-05-05 16:20:49 +02:00
Damien Grisonnet
a4a4d4b744 jsonnet: add PDB to prometheus-adapter
Adding a PodDisruptionBudget to prometheus-adapter ensure that at least
one replica of the adapter is always available. This make sure that even
during disruption the aggregated API is available and thus does not
impact the availability of the apiserver.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-05-05 16:15:25 +02:00
paulfantom
15a8351ce0 manifests: regenerate 2021-05-05 08:57:27 +02:00
paulfantom
ee7fb97598 jsonnet: by default select rules from all available namespaces 2021-05-04 13:20:28 +02:00
Paweł Krupa
e0fb2b7821 Merge pull request #1130 from prometheus-operator/paulfantom-patch-1
addons: hide antiaffinity function
2021-05-04 10:31:37 +02:00
Paweł Krupa
982360b65e addons: hide inline antiaffinity function 2021-05-03 16:01:26 +02:00
Paweł Krupa
e2f1581c37 Merge pull request #1124 from kaflake/feature/configRbacImage 2021-05-03 10:15:44 +02:00
paulfantom
b9a49678b2 jsonnet: fmt 2021-05-03 10:02:45 +02:00
paulfantom
2531c043dc jsonnet: fix conflict resolution 2021-05-03 10:01:37 +02:00
Paweł Krupa
624c6c0108 Merge branch 'main' into feature/configRbacImage 2021-05-03 09:57:23 +02:00
Paweł Krupa
db7f3c9107 Merge pull request #1125 from kaflake/feature/configGrafanaImage
can change grafanaImage over $.values.common.images
2021-05-03 09:55:19 +02:00
Paweł Krupa
4eb52db22c Merge pull request #1123 from kaflake/feature/configmapReloadImage 2021-05-03 09:55:04 +02:00
Paweł Krupa
c45f7377ac Merge pull request #1126 from junaid-ali/patch-1 2021-05-03 09:54:44 +02:00
Nagel, Felix
8c221441d1 fix formatting issues 2021-05-03 07:02:28 +02:00
Nagel, Felix
f107e8fb16 fix formatting issues 2021-05-03 06:59:10 +02:00
Nagel, Felix
14e6143037 replace double quotes with single quotes 2021-05-03 06:35:59 +02:00
Junaid Ali
78b88e1b17 Update README.md 2021-05-01 16:30:03 +01:00
Junaid Ali
80408c6057 Adding release branch URLs to compatibility matrix 2021-05-01 16:28:42 +01:00
Paweł Krupa
5b2740d517 Merge pull request #1114 from dgrisonnet/export-anti-affinity
Export anti-affinity addon
2021-04-30 17:20:01 +02:00
Nagel, Felix
7e5d4196b9 can change grafanaImage over $.values.common.images 2021-04-30 14:05:23 +02:00
Nagel, Felix
5761267842 can change kubeRbacProxy over $.values.common.images 2021-04-30 13:48:34 +02:00
Nagel, Felix
be2964887f can change configmapReload over $.values.common.images 2021-04-30 12:46:48 +02:00
Paweł Krupa
dbf61818fa Merge pull request #1115 from paulfantom/fix-1112
jsonnet: pin alertmanager to specific commit
2021-04-28 10:08:35 +02:00
paulfantom
53efc25b3f jsonnet: pin alertmanager to specific commit as release-0.21 doesn't have mixin directory
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2021-04-27 22:11:49 +02:00
Damien Grisonnet
fa05e2cde8 jsonnet: export anti-affinity addon
Export the antiaffinity function of the anti-affinity addon to make it
possible to extend the addon to component that are not present in the
kube-prometheus stack.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-04-27 15:30:06 +02:00
138 changed files with 7374 additions and 5564 deletions

37
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View File

@@ -0,0 +1,37 @@
<!--
WARNING: Not using this template will result in a longer review process and your change won't be visible in CHANGELOG.
-->
## Description
_Describe the big picture of your changes here to communicate to the maintainers why we should accept this pull request.
If it fixes a bug or resolves a feature request, be sure to link to that issue._
## Type of change
_What type of changes does your code introduce to the kube-prometheus? Put an `x` in the box that apply._
- [ ] `CHANGE` (fix or feature that would cause existing functionality to not work as expected)
- [ ] `FEATURE` (non-breaking change which adds functionality)
- [ ] `BUGFIX` (non-breaking change which fixes an issue)
- [ ] `ENHANCEMENT` (non-breaking change which improves existing functionality)
- [ ] `NONE` (if none of the other choices apply. Example, tooling, build system, CI, docs, etc.)
## Changelog entry
_Please put a one-line changelog entry below. Later this will be copied to the changelog file._
<!--
Your release note should be written in clear and straightforward sentences. Most often, users aren't familiar with
the technical details of your PR, so consider what they need to know when you write your release note.
Some brief examples of release notes:
- Add metadataConfig field to the Prometheus CRD for configuring how remote-write sends metadata information.
- Generate correct scraping configuration for Probes with empty or unset module parameter.
-->
```release-note
```

View File

@@ -4,7 +4,7 @@ on:
- pull_request
env:
golang-version: '1.15'
kind-version: 'v0.11.0'
kind-version: 'v0.11.1'
jobs:
generate:
runs-on: ${{ matrix.os }}
@@ -52,8 +52,8 @@ jobs:
strategy:
matrix:
kind-image:
- 'kindest/node:v1.20.0'
- 'kindest/node:v1.21.1'
- 'kindest/node:v1.22.0'
steps:
- uses: actions/checkout@v2
with:

68
.github/workflows/versions.yaml vendored Normal file
View File

@@ -0,0 +1,68 @@
name: Upgrade to latest versions
on:
workflow_dispatch:
schedule:
- cron: '37 7 * * 1'
jobs:
versions:
runs-on: ubuntu-latest
strategy:
matrix:
branch:
- 'release-0.5'
- 'release-0.6'
- 'release-0.7'
- 'release-0.8'
- 'main'
steps:
- uses: actions/checkout@v2
with:
ref: ${{ matrix.branch }}
- uses: actions/setup-go@v2
with:
go-version: 1.16
- name: Upgrade versions
run: |
export GITHUB_TOKEN=${{ secrets.GITHUB_TOKEN }}
# Write to temporary file to make update atomic
scripts/generate-versions.sh > /tmp/versions.json
mv /tmp/versions.json jsonnet/kube-prometheus/versions.json
if: matrix.branch == 'main'
- name: Update jsonnet dependencies
run: |
make update
make generate
# Reset jsonnetfile.lock.json if no dependencies were updated
changedFiles=$(git diff --name-only | grep -v 'jsonnetfile.lock.json' | wc -l)
if [[ "$changedFiles" -eq 0 ]]; then
git checkout -- jsonnetfile.lock.json;
fi
- name: Create Pull Request
uses: peter-evans/create-pull-request@v3
with:
commit-message: "[bot] [${{ matrix.branch }}] Automated version update"
title: "[bot] [${{ matrix.branch }}] Automated version update"
body: |
## Description
This is an automated version and jsonnet dependencies update performed from CI.
Configuration of the workflow is located in `.github/workflows/versions.yaml`
## Type of change
- [x] `NONE` (if none of the other choices apply. Example, tooling, build system, CI, docs, etc.)
## Changelog entry
```release-note
```
team-reviewers: kube-prometheus-reviewers
branch: automated-updates-${{ matrix.branch }}
delete-branch: true
# GITHUB_TOKEN cannot be used as it won't trigger CI in a created PR
# More in https://github.com/peter-evans/create-pull-request/issues/155
token: ${{ secrets.PROM_OP_BOT_PAT }}

2
.gitignore vendored
View File

@@ -4,3 +4,5 @@ vendor/
./auth
.swp
crdschemas/
.gitpod/_output/

View File

@@ -1,4 +1,5 @@
image: gitpod/workspace-full
checkoutLocation: gitpod-k3s
tasks:
- init: |
make --always-make
@@ -21,6 +22,26 @@ tasks:
fi
EOF
chmod +x ${PWD}/.git/hooks/pre-commit
- name: run kube-prometheus
command: |
.gitpod/prepare-k3s.sh
.gitpod/deploy-kube-prometheus.sh
- name: kernel dev environment
init: |
sudo apt update -y
sudo apt install qemu qemu-system-x86 linux-image-$(uname -r) libguestfs-tools sshpass netcat -y
sudo curl -o /usr/bin/kubectl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo chmod +x /usr/bin/kubectl
.gitpod/prepare-rootfs.sh
command: |
.gitpod/qemu.sh
ports:
- port: 3000
onOpen: open-browser
- port: 9090
onOpen: open-browser
- port: 9093
onOpen: open-browser
vscode:
extensions:
- heptio.jsonnet@0.1.0:woEDU5N62LRdgdz0g/I6sQ==

View File

@@ -0,0 +1,16 @@
kubectl apply -f manifests/setup
# Safety wait for CRDs to be working
sleep 30
kubectl apply -f manifests/
kubectl rollout status -n monitoring daemonset node-exporter
kubectl rollout status -n monitoring statefulset alertmanager-main
kubectl rollout status -n monitoring statefulset prometheus-k8s
kubectl rollout status -n monitoring deployment grafana
kubectl rollout status -n monitoring deployment kube-state-metrics
kubectl port-forward -n monitoring svc/grafana 3000 > /dev/null 2>&1 &
kubectl port-forward -n monitoring svc/alertmanager-main 9093 > /dev/null 2>&1 &
kubectl port-forward -n monitoring svc/prometheus-k8s 9090 > /dev/null 2>&1 &

49
.gitpod/prepare-k3s.sh Executable file
View File

@@ -0,0 +1,49 @@
#!/bin/bash
script_dirname="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
rootfslock="${script_dirname}/_output/rootfs/rootfs-ready.lock"
k3sreadylock="${script_dirname}/_output/rootfs/k3s-ready.lock"
if test -f "${k3sreadylock}"; then
exit 0
fi
cd $script_dirname
function waitssh() {
while ! nc -z 127.0.0.1 2222; do
sleep 0.1
done
./ssh.sh "whoami" &>/dev/null
if [ $? -ne 0 ]; then
sleep 1
waitssh
fi
}
function waitrootfs() {
while ! test -f "${rootfslock}"; do
sleep 0.1
done
}
echo "🔥 Installing everything, this will be done only one time per workspace."
echo "Waiting for the rootfs to become available, it can take a while, open the terminal #2 for progress"
waitrootfs
echo "✅ rootfs available"
echo "Waiting for the ssh server to become available, it can take a while, after this k3s is getting installed"
waitssh
echo "✅ ssh server available"
./ssh.sh "curl -sfL https://get.k3s.io | sh -"
mkdir -p ~/.kube
./scp.sh root@127.0.0.1:/etc/rancher/k3s/k3s.yaml ~/.kube/config
echo "✅ k3s server is ready"
touch "${k3sreadylock}"
# safety wait for cluster availability
sleep 30s

48
.gitpod/prepare-rootfs.sh Executable file
View File

@@ -0,0 +1,48 @@
#!/bin/bash
set -euo pipefail
img_url="https://cloud-images.ubuntu.com/hirsute/current/hirsute-server-cloudimg-amd64.tar.gz"
script_dirname="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
outdir="${script_dirname}/_output/rootfs"
rm -Rf $outdir
mkdir -p $outdir
curl -L -o "${outdir}/rootfs.tar.gz" $img_url
cd $outdir
tar -xvf rootfs.tar.gz
qemu-img resize hirsute-server-cloudimg-amd64.img +20G
sudo virt-customize -a hirsute-server-cloudimg-amd64.img --run-command 'resize2fs /dev/sda'
sudo virt-customize -a hirsute-server-cloudimg-amd64.img --root-password password:root
netconf="
network:
version: 2
renderer: networkd
ethernets:
enp0s3:
dhcp4: yes
"
# networking setup
sudo virt-customize -a hirsute-server-cloudimg-amd64.img --run-command "echo '${netconf}' > /etc/netplan/01-net.yaml"
# copy kernel modules
sudo virt-customize -a hirsute-server-cloudimg-amd64.img --copy-in /lib/modules/$(uname -r):/lib/modules
# ssh
sudo virt-customize -a hirsute-server-cloudimg-amd64.img --run-command 'apt remove openssh-server -y && apt install openssh-server -y'
sudo virt-customize -a hirsute-server-cloudimg-amd64.img --run-command "sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config"
sudo virt-customize -a hirsute-server-cloudimg-amd64.img --run-command "sed -i 's/PasswordAuthentication no/PasswordAuthentication yes/' /etc/ssh/sshd_config"
# mark as ready
touch rootfs-ready.lock
echo "k3s development environment is ready"

14
.gitpod/qemu.sh Executable file
View File

@@ -0,0 +1,14 @@
#!/bin/bash
set -xeuo pipefail
script_dirname="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
outdir="${script_dirname}/_output"
sudo qemu-system-x86_64 -kernel "/boot/vmlinuz" \
-boot c -m 3073M -hda "${outdir}/rootfs/hirsute-server-cloudimg-amd64.img" \
-net user \
-smp 8 \
-append "root=/dev/sda rw console=ttyS0,115200 acpi=off nokaslr" \
-nic user,hostfwd=tcp::2222-:22,hostfwd=tcp::6443-:6443 \
-serial mon:stdio -display none

3
.gitpod/scp.sh Executable file
View File

@@ -0,0 +1,3 @@
#!/bin/bash
sshpass -p 'root' scp -o StrictHostKeychecking=no -P 2222 $@

3
.gitpod/ssh.sh Executable file
View File

@@ -0,0 +1,3 @@
#!/bin/bash
sshpass -p 'root' ssh -o StrictHostKeychecking=no -p 2222 root@127.0.0.1 "$@"

44
CHANGELOG.md Normal file
View File

@@ -0,0 +1,44 @@
## release-0.9 / 2021-08-19
* [CHANGE] Test against Kubernetes 1.21 and 1,22. #1161 #1337
* [CHANGE] Drop cAdvisor metrics without (pod, namespace) label pairs. #1250
* [CHANGE] Excluded deprecated `etcd_object_counts` metric. #1337
* [FEATURE] Add PodDisruptionBudget to prometheus-adapter. #1136
* [FEATURE] Add support for feature flags in Prometheus. #1129
* [FEATURE] Add env parameter for grafana component. #1171
* [FEATURE] Add gitpod deployment of kube-prometheus on k3s. #1211
* [FEATURE] Add resource requests and limits to prometheus-adapter container. #1282
* [FEATURE] Add PodMonitor for kube-proxy. #1230
* [FEATURE] Turn AWS VPC CNI into a control plane add-on. #1307
* [ENHANCEMENT] Export anti-affinity addon. #1114
* [ENHANCEMENT] Allow changing configmap-reloader, grafana, and kube-rbac-proxy images in $.values.common.images. #1123 #1124 #1125
* [ENHANCEMENT] Add automated version upgrader. #1166
* [ENHANCEMENT] Improve all-namespace addon. #1131
* [ENHANCEMENT] Add example of running without grafana deployment. #1201
* [ENHANCEMENT] Import managed-cluster addon for the EKS platform. #1205
* [ENHANCEMENT] Automatically update jsonnet dependencies. #1220
* [ENHANCEMENT] Adapt kube-prometheus to changes to ovn veth interfaces names. #1224
* [ENHANCEMENT] Add example release-0.3 to release-0.8 migration to docs. #1235
* [ENHANCEMENT] Consolidate intervals used in prometheus-adapter CPU queries. #1231
* [ENHANCEMENT] Create dashboardDefinitions if rawDashboards or folderDashboards are specified. #1255
* [ENHANCEMENT] Relabel instance with node name for CNI DaemonSet on EKS. #1259
* [ENHANCEMENT] Update doc on Prometheus rule updates since release 0.8. #1253
* [ENHANCEMENT] Point runbooks to https://runbooks.prometheus-operator.dev. #1267
* [ENHANCEMENT] Allow setting of kubeRbacProxyMainResources in kube-state-metrics. #1257
* [ENHANCEMENT] Automate release branch updates. #1293 #1303
* [ENHANCEMENT] Create Thanos Sidecar rules separately from Prometheus ones. #1308
* [ENHANCEMENT] Allow using newer jsonnet-bundler dependency resolution when using windows addon. #1310
* [ENHANCEMENT] Prometheus ruleSelector defaults to all rules.
* [BUGFIX] Fix kube-state-metrics metric denylist regex pattern. #1146
* [BUGFIX] Fix missing resource config in blackbox exporter. #1148
* [BUGFIX] Fix adding private repository. #1169
* [BUGFIX] Fix kops selectors for scheduler, controllerManager and kube-dns. #1164
* [BUGFIX] Fix scheduler and controller selectors for Kubespray. #1142
* [BUGFIX] Fix label selector for coredns ServiceMonitor. #1200
* [BUGFIX] Fix name for blackbox-exporter PodSecurityPolicy. #1213
* [BUGFIX] Fix ingress path rules for networking.k8s.io/v1. #1212
* [BUGFIX] Disable insecure cypher suites for prometheus-adapter. #1216
* [BUGFIX] Fix CNI metrics relabelings on EKS. #1277
* [BUGFIX] Fix node-exporter ignore list for OVN. #1283
* [BUGFIX] Revert back to awscni_total_ip_addresses-based alert on EKS. #1292
* [BUGFIX] Allow passing `thanos: {}` to prometheus configuration. #1325

View File

@@ -13,6 +13,8 @@ TOOLING=$(EMBEDMD_BIN) $(JB_BIN) $(GOJSONTOYAML_BIN) $(JSONNET_BIN) $(JSONNETLIN
JSONNETFMT_ARGS=-n 2 --max-blank-lines 2 --string-style s --comment-style s
KUBE_VERSION?="1.20.0"
all: generate fmt test
.PHONY: clean
@@ -26,7 +28,7 @@ generate: manifests **.md
**.md: $(EMBEDMD_BIN) $(shell find examples) build.sh example.jsonnet
$(EMBEDMD_BIN) -w `find . -name "*.md" | grep -v vendor`
manifests: examples/kustomize.jsonnet $(GOJSONTOYAML_BIN) vendor build.sh
manifests: examples/kustomize.jsonnet $(GOJSONTOYAML_BIN) vendor
./build.sh $<
vendor: $(JB_BIN) jsonnetfile.json jsonnetfile.lock.json
@@ -34,7 +36,7 @@ vendor: $(JB_BIN) jsonnetfile.json jsonnetfile.lock.json
$(JB_BIN) install
crdschemas: vendor
./scripts/generate-schemas.sh
./scripts/generate-schemas.sh
.PHONY: update
update: $(JB_BIN)
@@ -42,8 +44,7 @@ update: $(JB_BIN)
.PHONY: validate
validate: crdschemas manifests $(KUBECONFORM_BIN)
# Follow-up on https://github.com/instrumenta/kubernetes-json-schema/issues/26 if validations start failing
$(KUBECONFORM_BIN) -schema-location 'https://kubernetesjsonschema.dev' -schema-location 'crdschemas/{{ .ResourceKind }}.json' -skip CustomResourceDefinition manifests/
$(KUBECONFORM_BIN) -kubernetes-version $(KUBE_VERSION) -schema-location 'default' -schema-location 'crdschemas/{{ .ResourceKind }}.json' -skip CustomResourceDefinition manifests/
.PHONY: fmt
fmt: $(JSONNETFMT_BIN)
@@ -58,7 +59,7 @@ lint: $(JSONNETLINT_BIN) vendor
.PHONY: test
test: $(JB_BIN)
$(JB_BIN) install
./test.sh
./scripts/test.sh
.PHONY: test-e2e
test-e2e:

5
NOTICE
View File

@@ -1,5 +0,0 @@
CoreOS Project
Copyright 2018 CoreOS, Inc
This product includes software developed at CoreOS, Inc.
(http://www.coreos.com/).

View File

@@ -70,6 +70,7 @@ If you are migrating from `release-0.7` branch or earlier please read [what chan
- [Authentication problem](#authentication-problem)
- [Authorization problem](#authorization-problem)
- [kube-state-metrics resource usage](#kube-state-metrics-resource-usage)
- [Error retrieving kube-proxy metrics](#error-retrieving-kube-proxy-metrics)
- [Contributing](#contributing)
- [License](#license)
@@ -105,17 +106,17 @@ $ minikube addons disable metrics-server
The following versions are supported and work as we test against these versions in their respective branches. But note that other versions might work!
| kube-prometheus stack | Kubernetes 1.18 | Kubernetes 1.19 | Kubernetes 1.20 | Kubernetes 1.21 |
|-----------------------|-----------------|-----------------|-----------------|-----------------|
| `release-0.5` | ✔ | ✗ | ✗ | ✗ |
| `release-0.6` | | ✔ | ✗ | ✗ |
| `release-0.7` | ✗ | ✔ | ✔ | ✗ |
| `release-0.8` | ✗ | ✗ | ✔ | ✔ |
| `HEAD` | ✗ | ✗ | ✔ | ✔ |
| kube-prometheus stack | Kubernetes 1.18 | Kubernetes 1.19 | Kubernetes 1.20 | Kubernetes 1.21 | Kubernetes 1.22 |
|------------------------------------------------------------------------------------------|-----------------|-----------------|-----------------|-----------------|-----------------|
| [`release-0.6`](https://github.com/prometheus-operator/kube-prometheus/tree/release-0.6) | ✗ | ✔ | ✗ | ✗ | ✗ |
| [`release-0.7`](https://github.com/prometheus-operator/kube-prometheus/tree/release-0.7) | ✗ | | ✔ | ✗ | ✗ |
| [`release-0.8`](https://github.com/prometheus-operator/kube-prometheus/tree/release-0.8) | ✗ | ✗ | ✔ | ✔ | ✗ |
| [`release-0.9`](https://github.com/prometheus-operator/kube-prometheus/tree/release-0.9) | ✗ | ✗ | ✗ | ✔ | ✔ |
| [`HEAD`](https://github.com/prometheus-operator/kube-prometheus/tree/main) | ✗ | ✗ | ✗ | ✔ | ✔ |
## Quickstart
>Note: For versions before Kubernetes v1.20.z refer to the [Kubernetes compatibility matrix](#kubernetes-compatibility-matrix) in order to choose a compatible branch.
>Note: For versions before Kubernetes v1.21.z refer to the [Kubernetes compatibility matrix](#kubernetes-compatibility-matrix) in order to choose a compatible branch.
This project is intended to be used as a library (i.e. the intent is not for you to create your own modified copy of this repository).
@@ -376,7 +377,7 @@ These mixins are selectable via the `platform` field of kubePrometheus:
(import 'kube-prometheus/main.libsonnet') +
{
values+:: {
kubePrometheus+: {
common+: {
platform: 'example-platform',
},
},
@@ -770,6 +771,13 @@ config. They default to:
}
```
### Error retrieving kube-proxy metrics
By default, kubeadm will configure kube-proxy to listen on 127.0.0.1 for metrics. Because of this prometheus would not be able to scrape these metrics. This would have to be changed to 0.0.0.0 in one of the following two places:
1. Before cluster initialization, the config file passed to kubeadm init should have KubeProxyConfiguration manifest with the field metricsBindAddress set to 0.0.0.0:10249
2. If the k8s cluster is already up and running, we'll have to modify the configmap kube-proxy in the namespace kube-system and set the metricsBindAddress field. After this kube-proxy daemonset would have to be restarted with
`kubectl -n kube-system rollout restart daemonset kube-proxy`
## Contributing
All `.yaml` files in the `/manifests` folder are generated via

View File

@@ -1,4 +1,4 @@
## CoreOS Community Code of Conduct
## Community Code of Conduct
### Contributor Code of Conduct
@@ -33,29 +33,9 @@ This code of conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community.
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting a project maintainer, Brandon Philips
<brandon.philips@coreos.com>, and/or Rithu John <rithu.john@coreos.com>.
reported by contacting a project maintainer listed in
https://github.com/prometheus-operator/prometheus-operator/blob/master/MAINTAINERS.md.
This Code of Conduct is adapted from the Contributor Covenant
(http://contributor-covenant.org), version 1.2.0, available at
http://contributor-covenant.org/version/1/2/0/
### CoreOS Events Code of Conduct
CoreOS events are working conferences intended for professional networking and
collaboration in the CoreOS community. Attendees are expected to behave
according to professional standards and in accordance with their employers
policies on appropriate workplace behavior.
While at CoreOS events or related social networking opportunities, attendees
should not engage in discriminatory or offensive speech or actions including
but not limited to gender, sexuality, race, age, disability, or religion.
Speakers should be especially aware of these concerns.
CoreOS does not condone any statements by speakers contrary to these standards.
CoreOS reserves the right to deny entrance and/or eject from an event (without
refund) any individual found to be engaging in discriminatory or offensive
speech or actions.
Please bring any concerns to the immediate attention of designated on-site
staff, Brandon Philips <brandon.philips@coreos.com>, and/or Rithu John <rithu.john@coreos.com>.

View File

@@ -219,72 +219,113 @@ local kp = (import 'kube-prometheus/main.libsonnet') + {
```
### Changing default rules
Along with adding additional rules, we give the user the option to filter or adjust the existing rules imported by `kube-prometheus/kube-prometheus.libsonnet`. The recording rules can be found in [kube-prometheus/rules](../jsonnet/kube-prometheus/rules) and [kubernetes-mixin/rules](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/rules) while the alerting rules can be found in [kube-prometheus/alerts](../jsonnet/kube-prometheus/alerts) and [kubernetes-mixin/alerts](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/alerts).
Along with adding additional rules, we give the user the option to filter or adjust the existing rules imported by `kube-prometheus/main.libsonnet`. The recording rules can be found in [kube-prometheus/components/mixin/rules](../jsonnet/kube-prometheus/components/mixin/rules) and [kubernetes-mixin/rules](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/rules) while the alerting rules can be found in [kube-prometheus/components/mixin/alerts](../jsonnet/kube-prometheus/components/mixin/alerts) and [kubernetes-mixin/alerts](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/alerts).
Knowing which rules to change, the user can now use functions from the [Jsonnet standard library](https://jsonnet.org/ref/stdlib.html) to make these changes. Below are examples of both a filter and an adjustment being made to the default rules. These changes can be assigned to a local variable and then added to the `local kp` object as seen in the examples above.
#### Filter
Here the alert `KubeStatefulSetReplicasMismatch` is being filtered out of the group `kubernetes-apps`. The default rule can be seen [here](https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/alerts/apps_alerts.libsonnet).
Here the alert `KubeStatefulSetReplicasMismatch` is being filtered out of the group `kubernetes-apps`. The default rule can be seen [here](https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/alerts/apps_alerts.libsonnet). You first need to find out in which component the rule is defined (here it is kuberentesControlPlane).
```jsonnet
local filter = {
prometheusAlerts+:: {
groups: std.map(
function(group)
if group.name == 'kubernetes-apps' then
group {
rules: std.filter(function(rule)
rule.alert != "KubeStatefulSetReplicasMismatch",
group.rules
)
}
else
group,
super.groups
),
kubernetesControlPlane+: {
prometheusRule+: {
spec+: {
groups: std.map(
function(group)
if group.name == 'kubernetes-apps' then
group {
rules: std.filter(
function(rule)
rule.alert != 'KubeStatefulSetReplicasMismatch',
group.rules
),
}
else
group,
super.groups
),
},
},
},
};
```
#### Adjustment
Here the expression for the alert used above is updated from its previous value. The default rule can be seen [here](https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/alerts/apps_alerts.libsonnet).
Here the expression for another alert in the same component is updated from its previous value. The default rule can be seen [here](https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/alerts/apps_alerts.libsonnet).
```jsonnet
local update = {
prometheusAlerts+:: {
groups: std.map(
function(group)
if group.name == 'kubernetes-apps' then
group {
rules: std.map(
function(rule)
if rule.alert == "KubeStatefulSetReplicasMismatch" then
rule {
expr: "kube_statefulset_status_replicas_ready{job=\"kube-state-metrics\",statefulset!=\"vault\"} != kube_statefulset_status_replicas{job=\"kube-state-metrics\",statefulset!=\"vault\"}"
}
else
rule,
group.rules
)
}
else
group,
super.groups
),
kubernetesControlPlane+: {
prometheusRule+: {
spec+: {
groups: std.map(
function(group)
if group.name == 'kubernetes-apps' then
group {
rules: std.map(
function(rule)
if rule.alert == 'KubePodCrashLooping' then
rule {
expr: 'rate(kube_pod_container_status_restarts_total{namespace=kube-system,job="kube-state-metrics"}[10m]) * 60 * 5 > 0',
}
else
rule,
group.rules
),
}
else
group,
super.groups
),
},
},
},
};
```
Using the example from above about adding in pre-rendered rules, the new local variables can be added in as follows:
```jsonnet
local kp = (import 'kube-prometheus/kube-prometheus.libsonnet') + filter + update + {
prometheusAlerts+:: (import 'existingrule.json'),
local add = {
exampleApplication:: {
prometheusRule+: {
apiVersion: 'monitoring.coreos.com/v1',
kind: 'PrometheusRule',
metadata: {
name: 'example-application-rules',
namespace: $.values.common.namespace,
},
spec: (import 'existingrule.json'),
},
},
};
{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
local kp = (import 'kube-prometheus/main.libsonnet') + filter + update + add;
local kp = (import 'kube-prometheus/main.libsonnet') +
filter +
update +
add + {
values+:: {
common+: {
namespace: 'monitoring',
},
},
};
{ 'setup/0namespace-namespace': kp.kubePrometheus.namespace } +
{
['setup/prometheus-operator-' + name]: kp.prometheusOperator[name]
for name in std.filter((function(name) name != 'serviceMonitor' && name != 'prometheusRule'), std.objectFields(kp.prometheusOperator))
} +
// serviceMonitor and prometheusRule are separated so that they can be created after the CRDs are ready
{ 'prometheus-operator-serviceMonitor': kp.prometheusOperator.serviceMonitor } +
{ 'prometheus-operator-prometheusRule': kp.prometheusOperator.prometheusRule } +
{ 'kube-prometheus-prometheusRule': kp.kubePrometheus.prometheusRule } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['blackbox-exporter-' + name]: kp.blackboxExporter[name] for name in std.objectFields(kp.blackboxExporter) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['prometheus-adapter-' + name]: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) }
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) } +
{ ['kubernetes-' + name]: kp.kubernetesControlPlane[name] for name in std.objectFields(kp.kubernetesControlPlane) } +
{ ['exampleApplication-' + name]: kp.exampleApplication[name] for name in std.objectFields(kp.exampleApplication) }
```
## Dashboards
@@ -479,3 +520,39 @@ values+:: {
},
} + myMixin.grafanaDashboards
```
Full example of including etcd mixin using method described above:
[embedmd]:# (../examples/mixin-inclusion.jsonnet)
```jsonnet
local addMixin = (import 'kube-prometheus/lib/mixin.libsonnet');
local etcdMixin = addMixin({
name: 'etcd',
mixin: (import 'github.com/etcd-io/etcd/contrib/mixin/mixin.libsonnet') + {
_config+: {}, // mixin configuration object
},
});
local kp = (import 'kube-prometheus/main.libsonnet') +
{
values+:: {
common+: {
namespace: 'monitoring',
},
grafana+: {
// Adding new dashboard to grafana. This will modify grafana configMap with dashboards
dashboards+: etcdMixin.grafanaDashboards,
},
},
};
{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) } +
// Rendering prometheusRules object. This is an object compatible with prometheus-operator CRD definition for prometheusRule
{ 'external-mixins/etcd-mixin-prometheus-rules': etcdMixin.prometheusRules }
```

View File

@@ -0,0 +1,296 @@
// Has the following customisations
// Custom alert manager config
// Ingresses for the alert manager, prometheus and grafana
// Grafana admin user password
// Custom prometheus rules
// Custom grafana dashboards
// Custom prometheus config - Data retention, memory, etc.
// Node exporter role and role binding so we can use a PSP for the node exporter
// External variables
// See https://jsonnet.org/learning/tutorial.html
local cluster_identifier = std.extVar('cluster_identifier');
local etcd_ip = std.extVar('etcd_ip');
local etcd_tls_ca = std.extVar('etcd_tls_ca');
local etcd_tls_cert = std.extVar('etcd_tls_cert');
local etcd_tls_key = std.extVar('etcd_tls_key');
local grafana_admin_password = std.extVar('grafana_admin_password');
local prometheus_data_retention_period = std.extVar('prometheus_data_retention_period');
local prometheus_request_memory = std.extVar('prometheus_request_memory');
// Derived variables
local alert_manager_host = 'alertmanager.' + cluster_identifier + '.myorg.local';
local grafana_host = 'grafana.' + cluster_identifier + '.myorg.local';
local prometheus_host = 'prometheus.' + cluster_identifier + '.myorg.local';
// Imports
local k = import 'ksonnet/ksonnet.beta.3/k.libsonnet';
local ingress = k.extensions.v1beta1.ingress;
local ingressRule = ingress.mixin.spec.rulesType;
local ingressRuleHttpPath = ingressRule.mixin.http.pathsType;
local ingressTls = ingress.mixin.spec.tlsType;
local role = k.rbac.v1.role;
local roleBinding = k.rbac.v1.roleBinding;
local roleRulesType = k.rbac.v1.role.rulesType;
local kp =
(import 'kube-prometheus/kube-prometheus.libsonnet') +
(import 'kube-prometheus/kube-prometheus-kubeadm.libsonnet') +
(import 'kube-prometheus/kube-prometheus-static-etcd.libsonnet') +
{
_config+:: {
// Override namespace
namespace: 'monitoring',
// Override alert manager config
// See https://github.com/coreos/kube-prometheus/tree/master/examples/alertmanager-config-external.jsonnet
alertmanager+: {
config: importstr 'alertmanager.yaml',
},
// Override etcd config
// See https://github.com/coreos/kube-prometheus/blob/master/jsonnet/kube-prometheus/kube-prometheus-static-etcd.libsonnet
// See https://github.com/coreos/kube-prometheus/blob/master/examples/etcd-skip-verify.jsonnet
etcd+:: {
clientCA: etcd_tls_ca,
clientCert: etcd_tls_cert,
clientKey: etcd_tls_key,
ips: [etcd_ip],
},
// Override grafana config
// anonymous access
// See http://docs.grafana.org/installation/configuration/
// See http://docs.grafana.org/auth/overview/#anonymous-authentication
// admin_password
// See http://docs.grafana.org/installation/configuration/#admin-password
grafana+:: {
config: {
sections: {
'auth.anonymous': {
enabled: true,
},
security: {
admin_password: grafana_admin_password,
},
},
},
},
},
// Additional grafana dashboards
grafanaDashboards+:: {
'my-specific.json': (import 'my-grafana-dashboard-definitions.json'),
},
// Alert manager needs an externalUrl
alertmanager+:: {
alertmanager+: {
spec+: {
// See https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md
// See https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/exposing-prometheus-and-alertmanager.md
externalUrl: 'https://' + alert_manager_host,
},
},
},
// Add additional ingresses
// See https://github.com/coreos/kube-prometheus/tree/master/examples/ingress.jsonnet
ingress+:: {
alertmanager:
ingress.new() +
ingress.mixin.metadata.withName('alertmanager') +
ingress.mixin.metadata.withNamespace($._config.namespace) +
ingress.mixin.metadata.withAnnotations({
'kubernetes.io/ingress.class': 'nginx-api',
}) +
ingress.mixin.spec.withRules(
ingressRule.new() +
ingressRule.withHost(alert_manager_host) +
ingressRule.mixin.http.withPaths(
ingressRuleHttpPath.new() +
ingressRuleHttpPath.mixin.backend.withServiceName('alertmanager-operated') +
ingressRuleHttpPath.mixin.backend.withServicePort(9093)
),
) +
// Note we do not need a TLS secretName here as we are going to use the nginx-ingress default secret which is a wildcard
// secretName would need to be in the same namespace at this time, see https://github.com/kubernetes/ingress-nginx/issues/2371
ingress.mixin.spec.withTls(
ingressTls.new() +
ingressTls.withHosts(alert_manager_host)
),
grafana:
ingress.new() +
ingress.mixin.metadata.withName('grafana') +
ingress.mixin.metadata.withNamespace($._config.namespace) +
ingress.mixin.metadata.withAnnotations({
'kubernetes.io/ingress.class': 'nginx-api',
}) +
ingress.mixin.spec.withRules(
ingressRule.new() +
ingressRule.withHost(grafana_host) +
ingressRule.mixin.http.withPaths(
ingressRuleHttpPath.new() +
ingressRuleHttpPath.mixin.backend.withServiceName('grafana') +
ingressRuleHttpPath.mixin.backend.withServicePort(3000)
),
) +
// Note we do not need a TLS secretName here as we are going to use the nginx-ingress default secret which is a wildcard
// secretName would need to be in the same namespace at this time, see https://github.com/kubernetes/ingress-nginx/issues/2371
ingress.mixin.spec.withTls(
ingressTls.new() +
ingressTls.withHosts(grafana_host)
),
prometheus:
ingress.new() +
ingress.mixin.metadata.withName('prometheus') +
ingress.mixin.metadata.withNamespace($._config.namespace) +
ingress.mixin.metadata.withAnnotations({
'kubernetes.io/ingress.class': 'nginx-api',
}) +
ingress.mixin.spec.withRules(
ingressRule.new() +
ingressRule.withHost(prometheus_host) +
ingressRule.mixin.http.withPaths(
ingressRuleHttpPath.new() +
ingressRuleHttpPath.mixin.backend.withServiceName('prometheus-operated') +
ingressRuleHttpPath.mixin.backend.withServicePort(9090)
),
) +
// Note we do not need a TLS secretName here as we are going to use the nginx-ingress default secret which is a wildcard
// secretName would need to be in the same namespace at this time, see https://github.com/kubernetes/ingress-nginx/issues/2371
ingress.mixin.spec.withTls(
ingressTls.new() +
ingressTls.withHosts(prometheus_host)
),
},
// Node exporter PSP role and role binding
// Add a new top level field for this, the "node-exporter" PSP already exists, so not defining here just referencing
// See https://github.com/coreos/prometheus-operator/issues/787
nodeExporterPSP: {
role:
role.new() +
role.mixin.metadata.withName('node-exporter-psp') +
role.mixin.metadata.withNamespace($._config.namespace) +
role.withRules([
roleRulesType.new() +
roleRulesType.withApiGroups(['policy']) +
roleRulesType.withResources(['podsecuritypolicies']) +
roleRulesType.withVerbs(['use']) +
roleRulesType.withResourceNames(['node-exporter']),
]),
roleBinding:
roleBinding.new() +
roleBinding.mixin.roleRef.withApiGroup('rbac.authorization.k8s.io') +
roleBinding.mixin.metadata.withName('node-exporter-psp') +
roleBinding.mixin.metadata.withNamespace($._config.namespace) +
roleBinding.mixin.roleRef.withName('node-exporter-psp') +
roleBinding.mixin.roleRef.mixinInstance({ kind: 'Role' }) +
roleBinding.withSubjects([{ kind: 'ServiceAccount', name: 'node-exporter' }]),
},
// Prometheus needs some extra custom config
prometheus+:: {
prometheus+: {
spec+: {
// See https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#prometheusspec
externalLabels: {
cluster: cluster_identifier,
},
// See https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md
// See https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/exposing-prometheus-and-alertmanager.md
externalUrl: 'https://' + prometheus_host,
// Override reuest memory
resources: {
requests: {
memory: prometheus_request_memory,
},
},
// Override data retention period
retention: prometheus_data_retention_period,
},
},
},
// Additional prometheus rules
// See https://github.com/coreos/kube-prometheus/docs/developing-prometheus-rules-and-grafana-dashboards.md
// cat my-prometheus-rules.yaml | gojsontoyaml -yamltojson | jq . > my-prometheus-rules.json
prometheusRules+:: {
groups+: import 'my-prometheus-rules.json',
},
};
// Render
{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ [name + '-ingress']: kp.ingress[name] for name in std.objectFields(kp.ingress) } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['node-exporter-psp-' + name]: kp.nodeExporterPSP[name] for name in std.objectFields(kp.nodeExporterPSP) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['prometheus-adapter-' + name]: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) }

View File

@@ -0,0 +1,316 @@
// Has the following customisations
// Custom alert manager config
// Ingresses for the alert manager, prometheus and grafana
// Grafana admin user password
// Custom prometheus rules
// Custom grafana dashboards
// Custom prometheus config - Data retention, memory, etc.
// Node exporter role and role binding so we can use a PSP for the node exporter
// for help with expected content, see https://github.com/thaum-xyz/ankhmorpork
// External variables
// See https://jsonnet.org/learning/tutorial.html
local cluster_identifier = std.extVar('cluster_identifier');
local etcd_ip = std.extVar('etcd_ip');
local etcd_tls_ca = std.extVar('etcd_tls_ca');
local etcd_tls_cert = std.extVar('etcd_tls_cert');
local etcd_tls_key = std.extVar('etcd_tls_key');
local grafana_admin_password = std.extVar('grafana_admin_password');
local prometheus_data_retention_period = std.extVar('prometheus_data_retention_period');
local prometheus_request_memory = std.extVar('prometheus_request_memory');
// Derived variables
local alert_manager_host = 'alertmanager.' + cluster_identifier + '.myorg.local';
local grafana_host = 'grafana.' + cluster_identifier + '.myorg.local';
local prometheus_host = 'prometheus.' + cluster_identifier + '.myorg.local';
// ksonnet no longer required
local kp =
(import 'kube-prometheus/main.libsonnet') +
// kubeadm now achieved by setting platform value - see 9 lines below
(import 'kube-prometheus/addons/static-etcd.libsonnet') +
(import 'kube-prometheus/addons/podsecuritypolicies.libsonnet') +
{
values+:: {
common+: {
namespace: 'monitoring',
},
// Add kubeadm platform-specific items,
// including kube-contoller-manager and kube-scheduler discovery
kubePrometheus+: {
platform: 'kubeadm',
},
// Override alert manager config
// See https://github.com/prometheus-operator/kube-prometheus/blob/main/examples/alertmanager-config-external.jsonnet
alertmanager+: {
config: importstr 'alertmanager.yaml',
},
// Override etcd config
// See https://github.com/prometheus-operator/kube-prometheus/blob/main/jsonnet/kube-prometheus/addons/static-etcd.libsonnet
// See https://github.com/prometheus-operator/kube-prometheus/blob/main/examples/etcd-skip-verify.jsonnet
etcd+:: {
clientCA: etcd_tls_ca,
clientCert: etcd_tls_cert,
clientKey: etcd_tls_key,
ips: [etcd_ip],
},
// Override grafana config
// anonymous access
// See http://docs.grafana.org/installation/configuration/
// See http://docs.grafana.org/auth/overview/#anonymous-authentication
// admin_password
// See http://docs.grafana.org/installation/configuration/#admin-password
grafana+:: {
config: {
sections: {
'auth.anonymous': {
enabled: true,
},
security: {
admin_password: grafana_admin_password,
},
},
},
// Additional grafana dashboards
dashboards+:: {
'my-specific.json': (import 'my-grafana-dashboard-definitions.json'),
},
},
},
// Alert manager needs an externalUrl
alertmanager+:: {
alertmanager+: {
spec+: {
// See https://github.com/prometheus-operator/kube-prometheus/blob/main/docs/exposing-prometheus-alertmanager-grafana-ingress.md
externalUrl: 'https://' + alert_manager_host,
},
},
},
// Add additional ingresses
// See https://github.com/prometheus-operator/kube-prometheus/blob/main/examples/ingress.jsonnet
ingress+:: {
alertmanager: {
apiVersion: 'networking.k8s.io/v1',
kind: 'Ingress',
metadata: {
name: 'alertmanager',
namespace: $.values.common.namespace,
annotations: {
'kubernetes.io/ingress.class': 'nginx-api',
},
},
spec: {
rules: [{
host: alert_manager_host,
http: {
paths: [{
path: '/',
pathType: 'Prefix',
backend: {
service: {
name: 'alertmanager-operated',
port: {
number: 9093,
},
},
},
}],
},
}],
tls: [{
hosts: [alert_manager_host],
}],
},
},
grafana: {
apiVersion: 'networking.k8s.io/v1',
kind: 'Ingress',
metadata: {
name: 'grafana',
namespace: $.values.common.namespace,
annotations: {
'kubernetes.io/ingress.class': 'nginx-api',
},
},
spec: {
rules: [{
host: grafana_host,
http: {
paths: [{
path: '/',
pathType: 'Prefix',
backend: {
service: {
name: 'grafana',
port: {
number: 3000,
},
},
},
}],
},
}],
tls: [{
hosts: [grafana_host],
}],
},
},
prometheus: {
apiVersion: 'networking.k8s.io/v1',
kind: 'Ingress',
metadata: {
name: 'prometheus',
namespace: $.values.common.namespace,
annotations: {
'kubernetes.io/ingress.class': 'nginx-api',
},
},
spec: {
rules: [{
host: prometheus_host,
http: {
paths: [{
path: '/',
pathType: 'Prefix',
backend: {
service: {
name: 'prometheus-operated',
port: {
number: 9090,
},
},
},
}],
},
}],
tls: [{
hosts: [prometheus_host],
}],
},
},
},
// Node exporter PSP role and role binding
nodeExporter+: {
'psp-role'+: {
apiVersion: 'rbac.authorization.k8s.io/v1',
kind: 'Role',
metadata: {
name: 'node-exporter-psp',
namespace: $.values.common.namespace,
},
rules: [{
apiGroups: ['policy'],
resources: ['podsecuritypolicies'],
verbs: ['use'],
resourceNames: ['node-exporter'],
}],
},
'psp-rolebinding'+: {
apiVersion: 'rbac.authorization.k8s.io/v1',
kind: 'RoleBinding',
metadata: {
name: 'node-exporter-psp',
namespace: $.values.common.namespace,
},
roleRef: {
apiGroup: 'rbac.authorization.k8s.io',
name: 'node-exporter-psp',
kind: 'Role',
},
subjects: [{
kind: 'ServiceAccount',
name: 'node-exporter',
}],
},
},
// Prometheus needs some extra custom config
prometheus+:: {
prometheus+: {
spec+: {
externalLabels: {
cluster: cluster_identifier,
},
// See https://github.com/prometheus-operator/kube-prometheus/blob/main/docs/exposing-prometheus-alertmanager-grafana-ingress.md
externalUrl: 'https://' + prometheus_host,
// Override reuest memory
resources: {
requests: {
memory: prometheus_request_memory,
},
},
// Override data retention period
retention: prometheus_data_retention_period,
},
},
},
// Additional prometheus rules
// See https://github.com/prometheus-operator/kube-prometheus/blob/main/docs/developing-prometheus-rules-and-grafana-dashboards.md#pre-rendered-rules
// cat my-prometheus-rules.yaml | gojsontoyaml -yamltojson | jq . > my-prometheus-rules.json
prometheusMe: {
rules: {
apiVersion: 'monitoring.coreos.com/v1',
kind: 'PrometheusRule',
metadata: {
name: 'my-prometheus-rule',
namespace: $.values.common.namespace,
labels: {
'app.kubernetes.io/name': 'kube-prometheus',
'app.kubernetes.io/part-of': 'kube-prometheus',
prometheus: 'k8s',
role: 'alert-rules',
},
},
spec: {
groups: import 'my-prometheus-rules.json',
},
},
},
};
// Render
{ 'setup/0namespace-namespace': kp.kubePrometheus.namespace } +
{
['setup/prometheus-operator-' + name]: kp.prometheusOperator[name]
for name in std.filter((function(name) name != 'serviceMonitor' && name != 'prometheusRule'), std.objectFields(kp.prometheusOperator))
} +
// serviceMonitor and prometheusRule are separated so that they can be created after the CRDs are ready
{ 'prometheus-operator-serviceMonitor': kp.prometheusOperator.serviceMonitor } +
{ 'prometheus-operator-prometheusRule': kp.prometheusOperator.prometheusRule } +
{ 'kube-prometheus-prometheusRule': kp.kubePrometheus.prometheusRule } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['blackbox-exporter-' + name]: kp.blackboxExporter[name] for name in std.objectFields(kp.blackboxExporter) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['kubernetes-' + name]: kp.kubernetesControlPlane[name] for name in std.objectFields(kp.kubernetesControlPlane) }
{ [name + '-ingress']: kp.ingress[name] for name in std.objectFields(kp.ingress) } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['prometheus-adapter-' + name]: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) }
+ { ['prometheus-my-' + name]: kp.prometheusMe[name] for name in std.objectFields(kp.prometheusMe) }

View File

@@ -0,0 +1,250 @@
## Example of conversion of a legacy my.jsonnet file
An example conversion of a legacy custom jsonnet file to release-0.8
format can be seen by viewing and comparing this
[release-0.3 jsonnet file](./my.release-0.3.jsonnet) (when the github
repo was under `https://github.com/coreos/kube-prometheus...`)
and the corresponding [release-0.8 jsonnet file](./my.release-0.8.jsonnet).
These two files have had necessary blank lines added so that they
can be compared side-by-side and line-by-line on screen.
The conversion covers both the change of stopping using ksonnet after
release-0.3 and also the major migration after release-0.7 as described in
[migration-guide.md](../migration-guide.md)
The sample files are intended as an example of format conversion and
not necessarily best practice for the files in release-0.3 or release-0.8.
Below are three sample extracts of the conversion as an indication of the
changes required.
<table>
<tr>
<th> release-0.3 </th>
<th> release-0.8 </th>
</tr>
<tr>
<td>
```jsonnet
local kp =
(import 'kube-prometheus/kube-prometheus.libsonnet') +
(import 'kube-prometheus/kube-prometheus-kubeadm.libsonnet') +
(import 'kube-prometheus/kube-prometheus-static-etcd.libsonnet') +
{
_config+:: {
// Override namespace
namespace: 'monitoring',
```
</td>
<td>
```jsonnet
local kp =
(import 'kube-prometheus/main.libsonnet') +
// kubeadm now achieved by setting platform value - see 9 lines below
(import 'kube-prometheus/addons/static-etcd.libsonnet') +
(import 'kube-prometheus/addons/podsecuritypolicies.libsonnet') +
{
values+:: {
common+: {
namespace: 'monitoring',
},
// Add kubeadm platform-specific items,
// including kube-contoller-manager and kube-scheduler discovery
kubePrometheus+: {
platform: 'kubeadm',
},
```
</td>
</tr>
</table>
<table>
<tr>
<th> release-0.3 </th>
<th> release-0.8 </th>
</tr>
<tr>
<td>
```jsonnet
// Add additional ingresses
// See https://github.com/coreos/kube-prometheus/...
// tree/master/examples/ingress.jsonnet
ingress+:: {
alertmanager:
ingress.new() +
ingress.mixin.metadata.withName('alertmanager') +
ingress.mixin.metadata.withNamespace($._config.namespace) +
ingress.mixin.metadata.withAnnotations({
'kubernetes.io/ingress.class': 'nginx-api',
}) +
ingress.mixin.spec.withRules(
ingressRule.new() +
ingressRule.withHost(alert_manager_host) +
ingressRule.mixin.http.withPaths(
ingressRuleHttpPath.new() +
ingressRuleHttpPath.mixin.backend
.withServiceName('alertmanager-operated') +
ingressRuleHttpPath.mixin.backend.withServicePort(9093)
),
) +
// Note we do not need a TLS secretName here as we are going to use the
// nginx-ingress default secret which is a wildcard
// secretName would need to be in the same namespace at this time,
// see https://github.com/kubernetes/ingress-nginx/issues/2371
ingress.mixin.spec.withTls(
ingressTls.new() +
ingressTls.withHosts(alert_manager_host)
),
```
</td>
<td>
```jsonnet
// Add additional ingresses
// See https://github.com/prometheus-operator/kube-prometheus/...
// blob/main/examples/ingress.jsonnet
ingress+:: {
alertmanager: {
apiVersion: 'networking.k8s.io/v1',
kind: 'Ingress',
metadata: {
name: 'alertmanager',
namespace: $.values.common.namespace,
annotations: {
'kubernetes.io/ingress.class': 'nginx-api',
},
},
spec: {
rules: [{
host: alert_manager_host,
http: {
paths: [{
path: '/',
pathType: 'Prefix',
backend: {
service: {
name: 'alertmanager-operated',
port: {
number: 9093,
},
},
},
}],
},
}],
tls: [{
hosts: [alert_manager_host],
}],
},
},
```
</td>
</tr>
</table>
<table>
<tr>
<th> release-0.3 </th>
<th> release-0.8 </th>
</tr>
<tr>
<td>
```jsonnet
// Additional prometheus rules
// See https://github.com/coreos/kube-prometheus/docs/...
// developing-prometheus-rules-and-grafana-dashboards.md
//
// cat my-prometheus-rules.yaml | \
// gojsontoyaml -yamltojson | \
// jq . > my-prometheus-rules.json
prometheusRules+:: {
groups+: import 'my-prometheus-rules.json',
},
};
```
</td>
<td>
```jsonnet
// Additional prometheus rules
// See https://github.com/prometheus-operator/kube-prometheus/blob/main/...
// docs/developing-prometheus-rules-and-grafana-dashboards.md...
// #pre-rendered-rules
// cat my-prometheus-rules.yaml | \
// gojsontoyaml -yamltojson | \
// jq . > my-prometheus-rules.json
prometheusMe: {
rules: {
apiVersion: 'monitoring.coreos.com/v1',
kind: 'PrometheusRule',
metadata: {
name: 'my-prometheus-rule',
namespace: $.values.common.namespace,
labels: {
'app.kubernetes.io/name': 'kube-prometheus',
'app.kubernetes.io/part-of': 'kube-prometheus',
prometheus: 'k8s',
role: 'alert-rules',
},
},
spec: {
groups: import 'my-prometheus-rules.json',
},
},
},
};
...
+ { ['prometheus-my-' + name]: kp.prometheusMe[name] for name in std.objectFields(kp.prometheusMe) }
```
</td>
</tr>
</table>

View File

@@ -61,6 +61,10 @@ This results in creating multiple `PrometheusRule` objects instead of having one
All examples from `examples/` directory were adapted to the new codebase. [Please take a look at them for guideance](https://github.com/prometheus-operator/kube-prometheus/tree/main/examples)
## Legacy migration
An example of conversion of a legacy release-0.3 my.jsonnet file to release-0.8 can be found in [migration-example](./migration-example)
## Advanced usage examples
For more advanced usage examples you can take a look at those two, open to public, implementations:

View File

@@ -0,0 +1,92 @@
local filter = {
kubernetesControlPlane+: {
prometheusRule+:: {
spec+: {
groups: std.map(
function(group)
if group.name == 'kubernetes-apps' then
group {
rules: std.filter(
function(rule)
rule.alert != 'KubeStatefulSetReplicasMismatch',
group.rules
),
}
else
group,
super.groups
),
},
},
},
};
local update = {
kubernetesControlPlane+: {
prometheusRule+:: {
spec+: {
groups: std.map(
function(group)
if group.name == 'kubernetes-apps' then
group {
rules: std.map(
function(rule)
if rule.alert == 'KubePodCrashLooping' then
rule {
expr: 'rate(kube_pod_container_status_restarts_total{namespace=kube-system,job="kube-state-metrics"}[10m]) * 60 * 5 > 0',
}
else
rule,
group.rules
),
}
else
group,
super.groups
),
},
},
},
};
local add = {
exampleApplication:: {
prometheusRule+: {
apiVersion: 'monitoring.coreos.com/v1',
kind: 'PrometheusRule',
metadata: {
name: 'example-application-rules',
namespace: $.values.common.namespace,
},
spec: (import 'existingrule.json'),
},
},
};
local kp = (import 'kube-prometheus/main.libsonnet') +
filter +
update +
add + {
values+:: {
common+: {
namespace: 'monitoring',
},
},
};
{ 'setup/0namespace-namespace': kp.kubePrometheus.namespace } +
{
['setup/prometheus-operator-' + name]: kp.prometheusOperator[name]
for name in std.filter((function(name) name != 'serviceMonitor' && name != 'prometheusRule'), std.objectFields(kp.prometheusOperator))
} +
// serviceMonitor and prometheusRule are separated so that they can be created after the CRDs are ready
{ 'prometheus-operator-serviceMonitor': kp.prometheusOperator.serviceMonitor } +
{ 'prometheus-operator-prometheusRule': kp.prometheusOperator.prometheusRule } +
{ 'kube-prometheus-prometheusRule': kp.kubePrometheus.prometheusRule } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['blackbox-exporter-' + name]: kp.blackboxExporter[name] for name in std.objectFields(kp.blackboxExporter) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['prometheus-adapter-' + name]: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) } +
{ ['kubernetes-' + name]: kp.kubernetesControlPlane[name] for name in std.objectFields(kp.kubernetesControlPlane) } +
{ ['exampleApplication-' + name]: kp.exampleApplication[name] for name in std.objectFields(kp.exampleApplication) }

View File

@@ -0,0 +1,36 @@
local kp =
(import 'kube-prometheus/main.libsonnet') +
{
values+:: {
common+: {
namespace: 'monitoring',
},
grafana+: {
config+: {
sections: {
'auth.ldap': {
enabled: true,
config_file: '/etc/grafana/ldap.toml',
allow_sign_up: true,
},
},
},
ldap: |||
[[servers]]
host = "127.0.0.1"
port = 389
use_ssl = false
start_tls = false
ssl_skip_verify = false
bind_dn = "cn=admins,dc=example,dc=com"
bind_password = 'grafana'
search_filter = "(cn=%s)"
search_base_dns = ["dc=example,dc=com"]
|||,
},
},
};
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) }

View File

@@ -0,0 +1,25 @@
local kp =
(import 'kube-prometheus/main.libsonnet') +
{
values+:: {
common+: {
namespace: 'monitoring',
},
},
// Disable all grafana-related objects apart from dashboards and datasource
grafana: {
dashboardSources:: {},
deployment:: {},
serviceAccount:: {},
serviceMonitor:: {},
service:: {},
},
};
// Manifestation
{
[component + '-' + resource + '.json']: kp[component][resource]
for component in std.objectFields(kp)
for resource in std.objectFields(kp[component])
}

View File

@@ -54,10 +54,14 @@ local kp =
host: 'alertmanager.example.com',
http: {
paths: [{
path: '/',
pathType: 'Prefix',
backend: {
service: {
name: 'alertmanager-main',
port: 'web',
port: {
name: 'web',
},
},
},
}],
@@ -71,10 +75,14 @@ local kp =
host: 'grafana.example.com',
http: {
paths: [{
path: '/',
pathType: 'Prefix',
backend: {
service: {
name: 'grafana',
port: 'http',
port: {
name: 'http',
},
},
},
}],
@@ -88,10 +96,14 @@ local kp =
host: 'prometheus.example.com',
http: {
paths: [{
path: '/',
pathType: 'Prefix',
backend: {
service: {
name: 'prometheus-k8s',
port: 'web',
port: {
name: 'web',
},
},
},
}],

View File

@@ -1,7 +1,7 @@
(import 'kube-prometheus/main.libsonnet') +
{
values+:: {
kubePrometheus+: {
common+: {
platform: 'example-platform',
},
},

View File

@@ -0,0 +1,20 @@
local kp = (import 'kube-prometheus/main.libsonnet') + {
values+:: {
common+: {
namespace: 'monitoring',
},
kubernetesControlPlane+: {
kubeProxy: true,
},
},
};
{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) } +
{ ['kubernetes-' + name]: kp.kubernetesControlPlane[name] for name in std.objectFields(kp.kubernetesControlPlane) }

View File

@@ -0,0 +1,30 @@
local addMixin = (import 'kube-prometheus/lib/mixin.libsonnet');
local etcdMixin = addMixin({
name: 'etcd',
mixin: (import 'github.com/etcd-io/etcd/contrib/mixin/mixin.libsonnet') + {
_config+: {}, // mixin configuration object
},
});
local kp = (import 'kube-prometheus/main.libsonnet') +
{
values+:: {
common+: {
namespace: 'monitoring',
},
grafana+: {
// Adding new dashboard to grafana. This will modify grafana configMap with dashboards
dashboards+: etcdMixin.grafanaDashboards,
},
},
};
{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) } +
// Rendering prometheusRules object. This is an object compatible with prometheus-operator CRD definition for prometheusRule
{ 'external-mixins/etcd-mixin-prometheus-rules': etcdMixin.prometheusRules }

View File

@@ -1,9 +0,0 @@
#!/usr/bin/env bash
# exit immediately when a command fails
set -e
# only exit with zero if all commands of the pipeline exit successfully
set -o pipefail
# error on unset variables
set -u
kubectl apply -f examples/example-app

View File

@@ -1,9 +0,0 @@
#!/usr/bin/env bash
# exit immediately when a command fails
set -e
# only exit with zero if all commands of the pipeline exit successfully
set -o pipefail
# error on unset variables
set -u
kubectl delete -f examples/example-app

View File

@@ -1,11 +1,22 @@
{
prometheus+:: {
clusterRole+: {
rules+: [{
apiGroups: [''],
resources: ['services', 'endpoints', 'pods'],
verbs: ['get', 'list', 'watch'],
}],
rules+: [
{
apiGroups: [''],
resources: ['services', 'endpoints', 'pods'],
verbs: ['get', 'list', 'watch'],
},
{
apiGroups: ['networking.k8s.io'],
resources: ['ingresses'],
verbs: ['get', 'list', 'watch'],
},
],
},
// There is no need for specific namespaces RBAC as this addon grants
// all required permissions for every namespace
roleBindingSpecificNamespaces:: null,
roleSpecificNamespaces:: null,
},
}

View File

@@ -18,7 +18,7 @@
},
},
local antiaffinity(labelSelector, namespace, type, topologyKey) = {
antiaffinity(labelSelector, namespace, type, topologyKey):: {
local podAffinityTerm = {
namespaces: [namespace],
topologyKey: topologyKey,
@@ -44,7 +44,7 @@
alertmanager+: {
alertmanager+: {
spec+:
antiaffinity(
$.antiaffinity(
$.alertmanager._config.selectorLabels,
$.values.common.namespace,
$.values.alertmanager.podAntiAffinity,
@@ -56,7 +56,7 @@
prometheus+: {
prometheus+: {
spec+:
antiaffinity(
$.antiaffinity(
$.prometheus._config.selectorLabels,
$.values.common.namespace,
$.values.prometheus.podAntiAffinity,
@@ -70,7 +70,7 @@
spec+: {
template+: {
spec+:
antiaffinity(
$.antiaffinity(
$.blackboxExporter._config.selectorLabels,
$.values.common.namespace,
$.values.blackboxExporter.podAntiAffinity,
@@ -86,7 +86,7 @@
spec+: {
template+: {
spec+:
antiaffinity(
$.antiaffinity(
$.prometheusAdapter._config.selectorLabels,
$.values.common.namespace,
$.values.prometheusAdapter.podAntiAffinity,

View File

@@ -0,0 +1,110 @@
{
values+:: {
awsVpcCni: {
// `minimumWarmIPs` should be inferior or equal to `WARM_IP_TARGET`.
//
// References:
// https://github.com/aws/amazon-vpc-cni-k8s/blob/v1.9.0/docs/eni-and-ip-target.md
// https://github.com/aws/amazon-vpc-cni-k8s/blob/v1.9.0/pkg/ipamd/ipamd.go#L61-L71
minimumWarmIPs: 10,
minimumWarmIPsTime: '10m',
},
},
kubernetesControlPlane+: {
serviceAwsVpcCni: {
apiVersion: 'v1',
kind: 'Service',
metadata: {
name: 'aws-node',
namespace: 'kube-system',
labels: { 'app.kubernetes.io/name': 'aws-node' },
},
spec: {
ports: [
{
name: 'cni-metrics-port',
port: 61678,
targetPort: 61678,
},
],
selector: { 'app.kubernetes.io/name': 'aws-node' },
clusterIP: 'None',
},
},
serviceMonitorAwsVpcCni: {
apiVersion: 'monitoring.coreos.com/v1',
kind: 'ServiceMonitor',
metadata: {
name: 'aws-node',
namespace: $.values.common.namespace,
labels: {
'app.kubernetes.io/name': 'aws-node',
},
},
spec: {
jobLabel: 'app.kubernetes.io/name',
selector: {
matchLabels: {
'app.kubernetes.io/name': 'aws-node',
},
},
namespaceSelector: {
matchNames: [
'kube-system',
],
},
endpoints: [
{
port: 'cni-metrics-port',
interval: '30s',
path: '/metrics',
relabelings: [
{
action: 'replace',
regex: '(.*)',
replacement: '$1',
sourceLabels: ['__meta_kubernetes_pod_node_name'],
targetLabel: 'instance',
},
],
},
],
},
},
prometheusRuleAwsVpcCni: {
apiVersion: 'monitoring.coreos.com/v1',
kind: 'PrometheusRule',
metadata: {
labels: $.prometheus._config.commonLabels + $.prometheus._config.mixin.ruleLabels,
name: 'aws-vpc-cni-rules',
namespace: $.prometheus._config.namespace,
},
spec: {
groups: [
{
name: 'aws-vpc-cni.rules',
rules: [
{
expr: 'sum by(instance) (awscni_total_ip_addresses) - sum by(instance) (awscni_assigned_ip_addresses) < %s' % $.values.awsVpcCni.minimumWarmIPs,
labels: {
severity: 'critical',
},
annotations: {
summary: 'AWS VPC CNI has a low warm IP pool',
description: |||
Instance {{ $labels.instance }} has only {{ $value }} warm IPs which is lower than set threshold of %s.
It could mean the current subnet is out of available IP addresses or the CNI is unable to request them from the EC2 API.
||| % $.values.awsVpcCni.minimumWarmIPs,
},
'for': $.values.awsVpcCni.minimumWarmIPsTime,
alert: 'AwsVpcCniWarmIPsLow',
},
],
},
],
},
},
},
}

View File

@@ -18,13 +18,15 @@ local imageName(image) =
// quay.io/coreos/addon-resizer -> $repository/addon-resizer
// grafana/grafana -> grafana $repository/grafana
local withImageRepository(repository) = {
local oldRepos = super._config.imageRepos,
local oldRepos = super.values.common.images,
local substituteRepository(image, repository) =
if repository == null then image else repository + '/' + imageName(image),
values+:: {
imageRepos:: {
[field]: substituteRepository(oldRepos[field], repository)
for field in std.objectFields(oldRepos)
common+:: {
images:: {
[field]: substituteRepository(oldRepos[field], repository)
for field in std.objectFields(oldRepos)
},
},
},
};

View File

@@ -32,7 +32,7 @@
// Drop all etcd metrics which are deprecated in kubernetes.
{
sourceLabels: ['__name__'],
regex: 'etcd_(helper_cache_hit_count|helper_cache_miss_count|helper_cache_entry_count|request_cache_get_latencies_summary|request_cache_add_latencies_summary|request_latencies_summary)',
regex: 'etcd_(helper_cache_hit_count|helper_cache_miss_count|helper_cache_entry_count|object_counts|request_cache_get_latencies_summary|request_cache_add_latencies_summary|request_latencies_summary)',
action: 'drop',
},
// Drop all transformation metrics which are deprecated in kubernetes.

View File

@@ -117,7 +117,11 @@ local restrictedPodSecurityPolicy = {
},
}
else
{};
{
metadata+: {
name: 'blackbox-exporter-psp',
},
};
restrictedPodSecurityPolicy + blackboxExporterPspPrivileged,
},

View File

@@ -37,7 +37,7 @@
spec+: {
local addArgs(c) =
if c.name == 'prometheus-operator'
then c { args+: ['--config-reloader-cpu=0'] }
then c { args+: ['--config-reloader-cpu-limit=0', '--config-reloader-memory-limit=0'] }
else c,
containers: std.map(addArgs, super.containers),
},

View File

@@ -1,5 +1,5 @@
local windowsdashboards = import 'kubernetes-mixin/dashboards/windows.libsonnet';
local windowsrules = import 'kubernetes-mixin/rules/windows.libsonnet';
local windowsdashboards = import 'github.com/kubernetes-monitoring/kubernetes-mixin/dashboards/windows.libsonnet';
local windowsrules = import 'github.com/kubernetes-monitoring/kubernetes-mixin/rules/windows.libsonnet';
{
values+:: {

View File

@@ -64,7 +64,7 @@ local defaults = {
alertmanagerName: '{{ $labels.namespace }}/{{ $labels.pod}}',
alertmanagerClusterLabels: 'namespace,service',
alertmanagerSelector: 'job="alertmanager-' + defaults.name + '",namespace="' + defaults.namespace + '"',
runbookURLPattern: 'https://github.com/prometheus-operator/kube-prometheus/wiki/%s',
runbookURLPattern: 'https://runbooks.prometheus-operator.dev/runbooks/alertmanager/%s',
},
},
};
@@ -78,7 +78,7 @@ function(params) {
assert std.isObject(am._config.mixin._config),
mixin:: (import 'github.com/prometheus/alertmanager/doc/alertmanager-mixin/mixin.libsonnet') +
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/alerts/add-runbook-links.libsonnet') {
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/lib/add-runbook-links.libsonnet') {
_config+:: am._config.mixin._config,
},

View File

@@ -201,6 +201,7 @@ function(params) {
local kubeRbacProxy = krp({
name: 'kube-rbac-proxy',
upstream: 'http://127.0.0.1:' + bb._config.internalPort + '/',
resources: bb._config.resources,
secureListenAddress: ':' + bb._config.port,
ports: [
{ name: 'https', containerPort: bb._config.port },

View File

@@ -27,7 +27,9 @@ local defaults = {
containers: [],
datasources: [],
config: {},
ldap: null,
plugins: [],
env: [],
};
function(params) {
@@ -56,7 +58,9 @@ function(params) {
folderDashboards: g._config.folderDashboards,
containers: g._config.containers,
config+: g._config.config,
ldap: g._config.ldap,
plugins+: g._config.plugins,
env: g._config.env,
} + (
// Conditionally overwrite default setting.
if std.length(g._config.datasources) > 0 then
@@ -74,7 +78,9 @@ function(params) {
dashboardDatasources: glib.grafana.dashboardDatasources,
dashboardSources: glib.grafana.dashboardSources,
dashboardDefinitions: if std.length(g._config.dashboards) > 0 then {
dashboardDefinitions: if std.length(g._config.dashboards) > 0 ||
std.length(g._config.rawDashboards) > 0 ||
std.length(g._config.folderDashboards) > 0 then {
apiVersion: 'v1',
kind: 'ConfigMapList',
items: glib.grafana.dashboardDefinitions,

View File

@@ -17,11 +17,12 @@ local defaults = {
kubeControllerManagerSelector: 'job="kube-controller-manager"',
kubeApiserverSelector: 'job="apiserver"',
podLabel: 'pod',
runbookURLPattern: 'https://github.com/prometheus-operator/kube-prometheus/wiki/%s',
runbookURLPattern: 'https://runbooks.prometheus-operator.dev/runbooks/kubernetes/%s',
diskDeviceSelector: 'device=~"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+"',
hostNetworkInterfaceSelector: 'device!~"veth.+"',
},
},
kubeProxy: false,
};
function(params) {
@@ -126,9 +127,7 @@ function(params) {
action: 'drop',
regex: '(' + std.join('|',
[
'container_fs_.*', // add filesystem read/write data (nodes*disks*services*4)
'container_spec_.*', // everything related to cgroup specification and thus static data (nodes*services*5)
'container_blkio_device_usage_total', // useful for containers, but not for system services (nodes*disks*services*operations*2)
'container_file_descriptors', // file descriptors limits and global numbers are exposed via (nodes*services)
'container_sockets', // used sockets in cgroup. Usually not important for system services (nodes*services)
'container_threads_max', // max number of threads in cgroup. Usually for system services it is not limited (nodes*services)
@@ -137,6 +136,14 @@ function(params) {
'container_last_seen', // not needed as system services are always running (nodes*services)
]) + ');;',
},
{
sourceLabels: ['__name__', 'container'],
action: 'drop',
regex: '(' + std.join('|',
[
'container_blkio_device_usage_total',
]) + ');.+',
},
],
},
{
@@ -251,6 +258,45 @@ function(params) {
},
},
[if (defaults + params).kubeProxy then 'podMonitorKubeProxy']: {
apiVersion: 'monitoring.coreos.com/v1',
kind: 'PodMonitor',
metadata: {
labels: {
'k8s-app': 'kube-proxy',
},
name: 'kube-proxy',
namespace: k8s._config.namespace,
},
spec: {
jobLabel: 'k8s-app',
namespaceSelector: {
matchNames: [
'kube-system',
],
},
selector: {
matchLabels: {
'k8s-app': 'kube-proxy',
},
},
podMetricsEndpoints: [{
honorLabels: true,
targetPort: 10249,
relabelings: [
{
action: 'replace',
regex: '(.*)',
replacement: '$1',
sourceLabels: ['__meta_kubernetes_pod_node_name'],
targetLabel: 'instance',
},
],
}],
},
},
serviceMonitorCoreDNS: {
apiVersion: 'monitoring.coreos.com/v1',
kind: 'ServiceMonitor',
@@ -262,7 +308,7 @@ function(params) {
spec: {
jobLabel: 'app.kubernetes.io/name',
selector: {
matchLabels: { 'app.kubernetes.io/name': 'kube-dns' },
matchLabels: { 'k8s-app': 'kube-dns' },
},
namespaceSelector: {
matchNames: ['kube-system'],

View File

@@ -12,6 +12,12 @@ local defaults = {
limits: { cpu: '100m', memory: '250Mi' },
},
kubeRbacProxyMain: {
resources+: {
limits+: { cpu: '40m' },
requests+: { cpu: '20m' },
},
},
scrapeInterval: '30s',
scrapeTimeout: '30s',
commonLabels:: {
@@ -29,7 +35,7 @@ local defaults = {
ruleLabels: {},
_config: {
kubeStateMetricsSelector: 'job="' + defaults.name + '"',
runbookURLPattern: 'https://github.com/prometheus-operator/kube-prometheus/wiki/%s',
runbookURLPattern: 'https://runbooks.prometheus-operator.dev/runbooks/kube-state-metrics/%s',
},
},
};
@@ -49,7 +55,7 @@ function(params) (import 'github.com/kubernetes/kube-state-metrics/jsonnet/kube-
podLabels:: ksm._config.selectorLabels,
mixin:: (import 'github.com/kubernetes/kube-state-metrics/jsonnet/kube-state-metrics-mixin/mixin.libsonnet') +
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/alerts/add-runbook-links.libsonnet') {
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/lib/add-runbook-links.libsonnet') {
_config+:: ksm._config.mixin._config,
},
@@ -85,17 +91,13 @@ function(params) (import 'github.com/kubernetes/kube-state-metrics/jsonnet/kube-
},
},
local kubeRbacProxyMain = krp({
local kubeRbacProxyMain = krp(ksm._config.kubeRbacProxyMain {
name: 'kube-rbac-proxy-main',
upstream: 'http://127.0.0.1:8081/',
secureListenAddress: ':8443',
ports: [
{ name: 'https-main', containerPort: 8443 },
],
resources+: {
limits+: { cpu: '40m' },
requests+: { cpu: '20m' },
},
image: ksm._config.kubeRbacProxyImage,
}),

View File

@@ -7,7 +7,8 @@
{
alert: 'NodeNetworkInterfaceFlapping',
annotations: {
message: 'Network interface "{{ $labels.device }}" changing it\'s up status often on node-exporter {{ $labels.namespace }}/{{ $labels.pod }}',
summary: 'Network interface is often changing its status',
description: 'Network interface "{{ $labels.device }}" changing its up status often on node-exporter {{ $labels.namespace }}/{{ $labels.pod }}',
},
expr: |||
changes(node_network_up{%(nodeExporterSelector)s,%(hostNetworkInterfaceSelector)s}[2m]) > 2

View File

@@ -1,157 +0,0 @@
# TODO(metalmatze): This file is temporarily saved here for later reference
# until we find out how to integrate the tests into our jsonnet stack.
rule_files:
- rules.yaml
evaluation_interval: 1m
tests:
- interval: 1m
input_series:
- series: 'alertmanager_cluster_members{job="alertmanager-main",instance="10.10.10.0",namespace="monitoring",pod="alertmanager-main-0",service="alertmanager-main"}'
values: '3 3 3 3 3 2 2 2 2 2 2 1 1 1 1 1 1 0 0 0 0 0 0'
- series: 'alertmanager_cluster_members{job="alertmanager-main",instance="10.10.10.1",namespace="monitoring",pod="alertmanager-main-1",service="alertmanager-main"}'
values: '3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3'
- series: 'alertmanager_cluster_members{job="alertmanager-main",instance="10.10.10.2",namespace="monitoring",pod="alertmanager-main-2",service="alertmanager-main"}'
values: '3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3'
alert_rule_test:
- eval_time: 5m
alertname: AlertmanagerMembersInconsistent
- eval_time: 11m
alertname: AlertmanagerMembersInconsistent
exp_alerts:
- exp_labels:
service: 'alertmanager-main'
severity: critical
job: 'alertmanager-main'
instance: 10.10.10.0
namespace: monitoring
pod: alertmanager-main-0
exp_annotations:
message: 'Alertmanager has not found all other members of the cluster.'
- eval_time: 17m
alertname: AlertmanagerMembersInconsistent
exp_alerts:
- exp_labels:
service: 'alertmanager-main'
severity: critical
job: 'alertmanager-main'
instance: 10.10.10.0
namespace: monitoring
pod: alertmanager-main-0
exp_annotations:
message: 'Alertmanager has not found all other members of the cluster.'
- eval_time: 23m
alertname: AlertmanagerMembersInconsistent
exp_alerts:
- exp_labels:
service: 'alertmanager-main'
severity: critical
job: 'alertmanager-main'
instance: 10.10.10.0
namespace: monitoring
pod: alertmanager-main-0
exp_annotations:
message: 'Alertmanager has not found all other members of the cluster.'
- interval: 1m
input_series:
- series: 'alertmanager_cluster_members{job="alertmanager-main",instance="10.10.10.0",namespace="monitoring",pod="alertmanager-main-0",service="alertmanager-main"}'
values: '3 3 3 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1'
- series: 'alertmanager_cluster_members{job="alertmanager-main",instance="10.10.10.1",namespace="monitoring",pod="alertmanager-main-1",service="alertmanager-main"}'
values: '3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2'
- series: 'alertmanager_cluster_members{job="alertmanager-main",instance="10.10.10.2",namespace="monitoring",pod="alertmanager-main-2",service="alertmanager-main"}'
values: '3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2'
alert_rule_test:
- eval_time: 5m
alertname: AlertmanagerMembersInconsistent
- eval_time: 11m
alertname: AlertmanagerMembersInconsistent
exp_alerts:
- exp_labels:
service: 'alertmanager-main'
severity: critical
job: 'alertmanager-main'
instance: 10.10.10.0
namespace: monitoring
pod: alertmanager-main-0
exp_annotations:
message: 'Alertmanager has not found all other members of the cluster.'
- exp_labels:
service: 'alertmanager-main'
severity: critical
job: 'alertmanager-main'
instance: 10.10.10.1
namespace: monitoring
pod: alertmanager-main-1
exp_annotations:
message: 'Alertmanager has not found all other members of the cluster.'
- exp_labels:
service: 'alertmanager-main'
severity: critical
job: 'alertmanager-main'
instance: 10.10.10.2
namespace: monitoring
pod: alertmanager-main-2
exp_annotations:
message: 'Alertmanager has not found all other members of the cluster.'
- eval_time: 17m
alertname: AlertmanagerMembersInconsistent
exp_alerts:
- exp_labels:
service: 'alertmanager-main'
severity: critical
job: 'alertmanager-main'
instance: 10.10.10.0
namespace: monitoring
pod: alertmanager-main-0
exp_annotations:
message: 'Alertmanager has not found all other members of the cluster.'
- exp_labels:
service: 'alertmanager-main'
severity: critical
job: 'alertmanager-main'
instance: 10.10.10.1
namespace: monitoring
pod: alertmanager-main-1
exp_annotations:
message: 'Alertmanager has not found all other members of the cluster.'
- exp_labels:
service: 'alertmanager-main'
severity: critical
job: 'alertmanager-main'
instance: 10.10.10.2
namespace: monitoring
pod: alertmanager-main-2
exp_annotations:
message: 'Alertmanager has not found all other members of the cluster.'
- eval_time: 23m
alertname: AlertmanagerMembersInconsistent
exp_alerts:
- exp_labels:
service: 'alertmanager-main'
severity: critical
job: 'alertmanager-main'
instance: 10.10.10.0
namespace: monitoring
pod: alertmanager-main-0
exp_annotations:
message: 'Alertmanager has not found all other members of the cluster.'
- exp_labels:
service: 'alertmanager-main'
severity: critical
job: 'alertmanager-main'
instance: 10.10.10.1
namespace: monitoring
pod: alertmanager-main-1
exp_annotations:
message: 'Alertmanager has not found all other members of the cluster.'
- exp_labels:
service: 'alertmanager-main'
severity: critical
job: 'alertmanager-main'
instance: 10.10.10.2
namespace: monitoring
pod: alertmanager-main-2
exp_annotations:
message: 'Alertmanager has not found all other members of the cluster.'

View File

@@ -11,7 +11,7 @@ local defaults = {
_config: {
nodeExporterSelector: 'job="node-exporter"',
hostNetworkInterfaceSelector: 'device!~"veth.+"',
runbookURLPattern: 'https://github.com/prometheus-operator/kube-prometheus/wiki/%s',
runbookURLPattern: 'https://runbooks.prometheus-operator.dev/runbooks/general/%s',
},
},
};
@@ -23,7 +23,7 @@ function(params) {
local alertsandrules = (import './alerts/alerts.libsonnet') + (import './rules/rules.libsonnet'),
mixin:: alertsandrules +
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/alerts/add-runbook-links.libsonnet') {
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/lib/add-runbook-links.libsonnet') {
_config+:: m._config.mixin._config,
},

View File

@@ -30,7 +30,7 @@ local defaults = {
nodeExporterSelector: 'job="' + defaults.name + '"',
fsSpaceFillingUpCriticalThreshold: 15,
diskDeviceSelector: 'device=~"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+"',
runbookURLPattern: 'https://github.com/prometheus-operator/kube-prometheus/wiki/%s',
runbookURLPattern: 'https://runbooks.prometheus-operator.dev/runbooks/node/%s',
},
},
};
@@ -44,7 +44,7 @@ function(params) {
assert std.isObject(ne._config.mixin._config),
mixin:: (import 'github.com/prometheus/node_exporter/docs/node-mixin/mixin.libsonnet') +
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/alerts/add-runbook-links.libsonnet') {
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/lib/add-runbook-links.libsonnet') {
_config+:: ne._config.mixin._config,
},

View File

@@ -22,13 +22,40 @@ local defaults = {
for labelName in std.objectFields(defaults.commonLabels)
if !std.setMember(labelName, ['app.kubernetes.io/version'])
},
// Default range intervals are equal to 4 times the default scrape interval.
// This is done in order to follow Prometheus rule of thumb with irate().
rangeIntervals: {
kubelet: '4m',
nodeExporter: '4m',
windowsExporter: '4m',
},
prometheusURL: error 'must provide prometheusURL',
config: {
resourceRules: {
cpu: {
containerQuery: 'sum(irate(container_cpu_usage_seconds_total{<<.LabelMatchers>>,container!="",pod!=""}[5m])) by (<<.GroupBy>>)',
nodeQuery: 'sum(1 - irate(node_cpu_seconds_total{mode="idle"}[5m]) * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:{<<.LabelMatchers>>}) by (<<.GroupBy>>) or sum (1- irate(windows_cpu_time_total{mode="idle", job="windows-exporter",<<.LabelMatchers>>}[5m])) by (<<.GroupBy>>)',
containerQuery: |||
sum by (<<.GroupBy>>) (
irate (
container_cpu_usage_seconds_total{<<.LabelMatchers>>,container!="",pod!=""}[%(kubelet)s]
)
)
||| % $.rangeIntervals,
nodeQuery: |||
sum by (<<.GroupBy>>) (
1 - irate(
node_cpu_seconds_total{mode="idle"}[%(nodeExporter)s]
)
* on(namespace, pod) group_left(node) (
node_namespace_pod:kube_pod_info:{<<.LabelMatchers>>}
)
)
or sum by (<<.GroupBy>>) (
1 - irate(
windows_cpu_time_total{mode="idle", job="windows-exporter",<<.LabelMatchers>>}[%(windowsExporter)s]
)
)
||| % $.rangeIntervals,
resources: {
overrides: {
node: { resource: 'node' },
@@ -39,8 +66,23 @@ local defaults = {
containerLabel: 'container',
},
memory: {
containerQuery: 'sum(container_memory_working_set_bytes{<<.LabelMatchers>>,container!="",pod!=""}) by (<<.GroupBy>>)',
nodeQuery: 'sum(node_memory_MemTotal_bytes{job="node-exporter",<<.LabelMatchers>>} - node_memory_MemAvailable_bytes{job="node-exporter",<<.LabelMatchers>>}) by (<<.GroupBy>>) or sum(windows_cs_physical_memory_bytes{job="windows-exporter",<<.LabelMatchers>>} - windows_memory_available_bytes{job="windows-exporter",<<.LabelMatchers>>}) by (<<.GroupBy>>)',
containerQuery: |||
sum by (<<.GroupBy>>) (
container_memory_working_set_bytes{<<.LabelMatchers>>,container!="",pod!=""}
)
|||,
nodeQuery: |||
sum by (<<.GroupBy>>) (
node_memory_MemTotal_bytes{job="node-exporter",<<.LabelMatchers>>}
-
node_memory_MemAvailable_bytes{job="node-exporter",<<.LabelMatchers>>}
)
or sum by (<<.GroupBy>>) (
windows_cs_physical_memory_bytes{job="windows-exporter",<<.LabelMatchers>>}
-
windows_memory_available_bytes{job="windows-exporter",<<.LabelMatchers>>}
)
|||,
resources: {
overrides: {
instance: { resource: 'node' },
@@ -53,6 +95,23 @@ local defaults = {
window: '5m',
},
},
tlsCipherSuites: [
'TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305',
'TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305',
'TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256',
'TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384',
'TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256',
'TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384',
'TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA',
'TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256',
'TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA',
'TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA',
'TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA',
'TLS_RSA_WITH_AES_128_GCM_SHA256',
'TLS_RSA_WITH_AES_256_GCM_SHA384',
'TLS_RSA_WITH_AES_128_CBC_SHA',
'TLS_RSA_WITH_AES_256_CBC_SHA',
],
};
function(params) {
@@ -145,7 +204,9 @@ function(params) {
'--metrics-relist-interval=1m',
'--prometheus-url=' + pa._config.prometheusURL,
'--secure-port=6443',
'--tls-cipher-suites=' + std.join(',', pa._config.tlsCipherSuites),
],
resources: pa._config.resources,
ports: [{ containerPort: 6443 }],
volumeMounts: [
{ name: 'tmpfs', mountPath: '/tmp', readOnly: false },

View File

@@ -31,7 +31,7 @@ local defaults = {
},
_config: {
prometheusOperatorSelector: 'job="prometheus-operator",namespace="' + defaults.namespace + '"',
runbookURLPattern: 'https://github.com/prometheus-operator/kube-prometheus/wiki/%s',
runbookURLPattern: 'https://runbooks.prometheus-operator.dev/runbooks/prometheus-operator/%s',
},
},
};
@@ -46,7 +46,7 @@ function(params)
// declare variable as a field to allow overriding options and to have unified API across all components
_config:: config,
mixin:: (import 'github.com/prometheus-operator/prometheus-operator/jsonnet/mixin/mixin.libsonnet') +
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/alerts/add-runbook-links.libsonnet') {
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/lib/add-runbook-links.libsonnet') {
_config+:: po._config.mixin._config,
},

View File

@@ -12,6 +12,7 @@ local defaults = {
namespaces: ['default', 'kube-system', defaults.namespace],
replicas: 2,
externalLabels: {},
enableFeatures: [],
commonLabels:: {
'app.kubernetes.io/name': 'prometheus',
'app.kubernetes.io/version': defaults.version,
@@ -23,22 +24,17 @@ local defaults = {
for labelName in std.objectFields(defaults.commonLabels)
if !std.setMember(labelName, ['app.kubernetes.io/version'])
} + { prometheus: defaults.name },
ruleSelector: {
matchLabels: defaults.mixin.ruleLabels,
},
ruleSelector: {},
mixin: {
ruleLabels: {
role: 'alert-rules',
prometheus: defaults.name,
},
ruleLabels: {},
_config: {
prometheusSelector: 'job="prometheus-' + defaults.name + '",namespace="' + defaults.namespace + '"',
prometheusName: '{{$labels.namespace}}/{{$labels.pod}}',
thanosSelector: 'job="thanos-sidecar"',
runbookURLPattern: 'https://github.com/prometheus-operator/kube-prometheus/wiki/%s',
runbookURLPattern: 'https://runbooks.prometheus-operator.dev/runbooks/prometheus/%s',
},
},
thanos: {},
thanos: null,
};
@@ -49,18 +45,22 @@ function(params) {
assert std.isObject(p._config.resources),
assert std.isObject(p._config.mixin._config),
mixin:: (import 'github.com/prometheus/prometheus/documentation/prometheus-mixin/mixin.libsonnet') +
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/alerts/add-runbook-links.libsonnet') + (
if p._config.thanos != {} then
(import 'github.com/thanos-io/thanos/mixin/alerts/sidecar.libsonnet') + {
sidecar: {
selector: p._config.mixin._config.thanosSelector,
},
}
else {}
) {
_config+:: p._config.mixin._config,
},
mixin::
(import 'github.com/prometheus/prometheus/documentation/prometheus-mixin/mixin.libsonnet') +
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/lib/add-runbook-links.libsonnet') + {
_config+:: p._config.mixin._config,
},
mixinThanos::
(import 'github.com/thanos-io/thanos/mixin/alerts/sidecar.libsonnet') +
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/lib/add-runbook-links.libsonnet') + {
_config+:: p._config.mixin._config,
targetGroups: {},
sidecar: {
selector: p._config.mixin._config.thanosSelector,
dimensions: std.join(', ', ['job', 'instance']),
},
},
prometheusRule: {
apiVersion: 'monitoring.coreos.com/v1',
@@ -100,7 +100,7 @@ function(params) {
{ name: 'web', targetPort: 'web', port: 9090 },
] +
(
if p._config.thanos != {} then
if p._config.thanos != null then
[{ name: 'grpc', port: 10901, targetPort: 10901 }]
else []
),
@@ -276,15 +276,17 @@ function(params) {
labels: p._config.commonLabels,
},
externalLabels: p._config.externalLabels,
enableFeatures: p._config.enableFeatures,
serviceAccountName: 'prometheus-' + p._config.name,
serviceMonitorSelector: {},
podMonitorSelector: {},
probeSelector: {},
serviceMonitorNamespaceSelector: {},
podMonitorNamespaceSelector: {},
probeSelector: {},
probeNamespaceSelector: {},
nodeSelector: { 'kubernetes.io/os': 'linux' },
ruleNamespaceSelector: {},
ruleSelector: p._config.ruleSelector,
serviceMonitorSelector: {},
serviceMonitorNamespaceSelector: {},
nodeSelector: { 'kubernetes.io/os': 'linux' },
resources: p._config.resources,
alerting: {
alertmanagers: [{
@@ -322,8 +324,24 @@ function(params) {
},
},
// Include thanos sidecar PrometheusRule only if thanos config was passed by user
[if std.objectHas(params, 'thanos') && params.thanos != null then 'prometheusRuleThanosSidecar']: {
apiVersion: 'monitoring.coreos.com/v1',
kind: 'PrometheusRule',
metadata: {
labels: p._config.commonLabels + p._config.mixin.ruleLabels,
name: 'prometheus-' + p._config.name + '-thanos-sidecar-rules',
namespace: p._config.namespace,
},
spec: {
local r = if std.objectHasAll(p.mixinThanos, 'prometheusRules') then p.mixinThanos.prometheusRules.groups else [],
local a = if std.objectHasAll(p.mixinThanos, 'prometheusAlerts') then p.mixinThanos.prometheusAlerts.groups else [],
groups: a + r,
},
},
// Include thanos sidecar Service only if thanos config was passed by user
[if std.objectHas(params, 'thanos') && std.length(params.thanos) > 0 then 'serviceThanosSidecar']: {
[if std.objectHas(params, 'thanos') && params.thanos != null then 'serviceThanosSidecar']: {
apiVersion: 'v1',
kind: 'Service',
metadata+: {
@@ -348,7 +366,7 @@ function(params) {
},
// Include thanos sidecar ServiceMonitor only if thanos config was passed by user
[if std.objectHas(params, 'thanos') && std.length(params.thanos) > 0 then 'serviceMonitorThanosSidecar']: {
[if std.objectHas(params, 'thanos') && params.thanos != null then 'serviceMonitorThanosSidecar']: {
apiVersion: 'monitoring.coreos.com/v1',
kind: 'ServiceMonitor',
metadata+: {

View File

@@ -8,7 +8,7 @@
"subdir": "grafana"
}
},
"version": "8ea4e7bc04b1bf5e9bd99918ca28c6271b42be0e"
"version": "90f38916f1f8a310a715d18e36f787f84df4ddf5"
},
{
"source": {
@@ -17,7 +17,7 @@
"subdir": "contrib/mixin"
}
},
"version": "562d645ac923388ff5b8d270b0536764d34b0e0f"
"version": "release-3.5"
},
{
"source": {
@@ -26,7 +26,7 @@
"subdir": "jsonnet/prometheus-operator"
}
},
"version": "release-0.47"
"version": "release-0.50"
},
{
"source": {
@@ -35,7 +35,7 @@
"subdir": "jsonnet/mixin"
}
},
"version": "release-0.47",
"version": "release-0.50",
"name": "prometheus-operator-mixin"
},
{
@@ -45,7 +45,7 @@
"subdir": ""
}
},
"version": "release-0.8"
"version": "release-0.9"
},
{
"source": {
@@ -54,7 +54,7 @@
"subdir": "jsonnet/kube-state-metrics"
}
},
"version": "release-2.0"
"version": "release-2.1"
},
{
"source": {
@@ -63,7 +63,7 @@
"subdir": "jsonnet/kube-state-metrics-mixin"
}
},
"version": "release-2.0"
"version": "release-2.1"
},
{
"source": {
@@ -72,7 +72,7 @@
"subdir": "docs/node-mixin"
}
},
"version": "release-1.1"
"version": "832909dd257eb368cf83363ffcae3ab84cb4bcb1"
},
{
"source": {
@@ -81,7 +81,7 @@
"subdir": "documentation/prometheus-mixin"
}
},
"version": "release-2.26",
"version": "751ca03faddc9c64089c41d0da370a3a0b477742",
"name": "prometheus"
},
{
@@ -91,7 +91,7 @@
"subdir": "doc/alertmanager-mixin"
}
},
"version": "99f64e944b1043c790784cf5373c8fb349816fc4",
"version": "b408b522bc653d014e53035e59fa394cc1edd762",
"name": "alertmanager"
},
{
@@ -101,7 +101,7 @@
"subdir": "mixin"
}
},
"version": "release-0.19",
"version": "release-0.22",
"name": "thanos-mixin"
}
],

View File

@@ -8,29 +8,29 @@ local defaults = {
};
function(params) {
config:: defaults + params,
_config:: defaults + params,
local m = self,
local prometheusRules = if std.objectHasAll(m.config.mixin, 'prometheusRules') || std.objectHasAll(m.config.mixin, 'prometheusAlerts') then {
local prometheusRules = if std.objectHasAll(m._config.mixin, 'prometheusRules') || std.objectHasAll(m._config.mixin, 'prometheusAlerts') then {
apiVersion: 'monitoring.coreos.com/v1',
kind: 'PrometheusRule',
metadata: {
labels: m.config.labels,
name: m.config.name,
namespace: m.config.namespace,
labels: m._config.labels,
name: m._config.name,
namespace: m._config.namespace,
},
spec: {
local r = if std.objectHasAll(m.config.mixin, 'prometheusRules') then m.config.mixin.prometheusRules.groups else [],
local a = if std.objectHasAll(m.config.mixin, 'prometheusAlerts') then m.config.mixin.prometheusAlerts.groups else [],
local r = if std.objectHasAll(m._config.mixin, 'prometheusRules') then m._config.mixin.prometheusRules.groups else [],
local a = if std.objectHasAll(m._config.mixin, 'prometheusAlerts') then m._config.mixin.prometheusAlerts.groups else [],
groups: a + r,
},
},
local grafanaDashboards = if std.objectHasAll(m.config.mixin, 'grafanaDashboards') then (
if std.objectHas(m.config, 'dashboardFolder') then {
[m.config.dashboardFolder]+: m.config.mixin.grafanaDashboards,
} else (m.config.mixin.grafanaDashboards)
local grafanaDashboards = if std.objectHasAll(m._config.mixin, 'grafanaDashboards') then (
if std.objectHas(m._config, 'dashboardFolder') then {
[m._config.dashboardFolder]+: m._config.mixin.grafanaDashboards,
} else (m._config.mixin.grafanaDashboards)
),
prometheusRules: prometheusRules,

View File

@@ -0,0 +1,7 @@
{
// rangeInterval takes a scrape interval and convert its to a range interval
// following Prometheus rule of thumb for rate() and irate().
rangeInterval(i='1m'):
local interval = std.parseInt(std.substr(i, 0, std.length(i) - 1));
interval * 4 + i[std.length(i) - 1],
}

View File

@@ -11,11 +11,14 @@ local prometheus = import './components/prometheus.libsonnet';
local platformPatch = import './platforms/platforms.libsonnet';
local utils = import './lib/utils.libsonnet';
{
// using `values` as this is similar to helm
values:: {
common: {
namespace: 'default',
platform: null,
ruleLabels: {
role: 'alert-rules',
prometheus: $.values.prometheus.name,
@@ -40,7 +43,7 @@ local platformPatch = import './platforms/platforms.libsonnet';
kubeStateMetrics: 'k8s.gcr.io/kube-state-metrics/kube-state-metrics:v' + $.values.common.versions.kubeStateMetrics,
nodeExporter: 'quay.io/prometheus/node-exporter:v' + $.values.common.versions.nodeExporter,
prometheus: 'quay.io/prometheus/prometheus:v' + $.values.common.versions.prometheus,
prometheusAdapter: 'directxman12/k8s-prometheus-adapter:v' + $.values.common.versions.prometheusAdapter,
prometheusAdapter: 'k8s.gcr.io/prometheus-adapter/prometheus-adapter:v' + $.values.common.versions.prometheusAdapter,
prometheusOperator: 'quay.io/prometheus-operator/prometheus-operator:v' + $.values.common.versions.prometheusOperator,
prometheusOperatorReloader: 'quay.io/prometheus-operator/prometheus-config-reloader:v' + $.values.common.versions.prometheusOperator,
kubeRbacProxy: 'quay.io/brancz/kube-rbac-proxy:v' + $.values.common.versions.kubeRbacProxy,
@@ -67,7 +70,7 @@ local platformPatch = import './platforms/platforms.libsonnet';
image: $.values.common.images.grafana,
prometheusName: $.values.prometheus.name,
// TODO(paulfantom) This should be done by iterating over all objects and looking for object.mixin.grafanaDashboards
dashboards: $.nodeExporter.mixin.grafanaDashboards + $.prometheus.mixin.grafanaDashboards + $.kubernetesControlPlane.mixin.grafanaDashboards,
dashboards: $.nodeExporter.mixin.grafanaDashboards + $.prometheus.mixin.grafanaDashboards + $.kubernetesControlPlane.mixin.grafanaDashboards + $.alertmanager.mixin.grafanaDashboards,
},
kubeStateMetrics: {
namespace: $.values.common.namespace,
@@ -96,15 +99,16 @@ local platformPatch = import './platforms/platforms.libsonnet';
version: $.values.common.versions.prometheusAdapter,
image: $.values.common.images.prometheusAdapter,
prometheusURL: 'http://prometheus-' + $.values.prometheus.name + '.' + $.values.common.namespace + '.svc.cluster.local:9090/',
rangeIntervals+: {
kubelet: utils.rangeInterval($.kubernetesControlPlane.serviceMonitorKubelet.spec.endpoints[0].interval),
nodeExporter: utils.rangeInterval($.nodeExporter.serviceMonitor.spec.endpoints[0].interval),
},
},
prometheusOperator: {
namespace: $.values.common.namespace,
version: $.values.common.versions.prometheusOperator,
image: $.values.common.images.prometheusOperator,
configReloaderImage: $.values.common.images.prometheusOperatorReloader,
commonLabels+: {
'app.kubernetes.io/part-of': 'kube-prometheus',
},
mixin+: { ruleLabels: $.values.common.ruleLabels },
kubeRbacProxyImage: $.values.common.images.kubeRbacProxy,
},
@@ -112,11 +116,6 @@ local platformPatch = import './platforms/platforms.libsonnet';
namespace: $.values.common.namespace,
mixin+: { ruleLabels: $.values.common.ruleLabels },
},
kubePrometheus: {
namespace: $.values.common.namespace,
mixin+: { ruleLabels: $.values.common.ruleLabels },
platform: null,
},
},
alertmanager: alertmanager($.values.alertmanager),
@@ -128,12 +127,17 @@ local platformPatch = import './platforms/platforms.libsonnet';
prometheusAdapter: prometheusAdapter($.values.prometheusAdapter),
prometheusOperator: prometheusOperator($.values.prometheusOperator),
kubernetesControlPlane: kubernetesControlPlane($.values.kubernetesControlPlane),
kubePrometheus: customMixin($.values.kubePrometheus) + {
kubePrometheus: customMixin(
{
namespace: $.values.common.namespace,
mixin+: { ruleLabels: $.values.common.ruleLabels },
}
) + {
namespace: {
apiVersion: 'v1',
kind: 'Namespace',
metadata: {
name: $.values.kubePrometheus.namespace,
name: $.values.common.namespace,
},
},
},

View File

@@ -1,10 +1,5 @@
{
values+:: {
eks: {
minimumAvailableIPs: 10,
minimumAvailableIPsTime: '10m',
},
},
(import '../addons/aws-vpc-cni.libsonnet') +
(import '../addons/managed-cluster.libsonnet') + {
kubernetesControlPlane+: {
serviceMonitorCoreDNS+: {
spec+: {
@@ -17,82 +12,5 @@
],
},
},
AwsEksCniMetricService: {
apiVersion: 'v1',
kind: 'Service',
metadata: {
name: 'aws-node',
namespace: 'kube-system',
labels: { 'app.kubernetes.io/name': 'aws-node' },
},
spec: {
ports: [
{ name: 'cni-metrics-port', port: 61678, targetPort: 61678 },
],
selector: { 'app.kubernetes.io/name': 'aws-node' },
clusterIP: 'None',
},
},
serviceMonitorAwsEksCNI: {
apiVersion: 'monitoring.coreos.com/v1',
kind: 'ServiceMonitor',
metadata: {
name: 'awsekscni',
namespace: $.values.common.namespace,
labels: {
'app.kubernetes.io/name': 'eks-cni',
},
},
spec: {
jobLabel: 'app.kubernetes.io/name',
selector: {
matchLabels: {
'app.kubernetes.io/name': 'aws-node',
},
},
namespaceSelector: {
matchNames: [
'kube-system',
],
},
endpoints: [
{
port: 'cni-metrics-port',
interval: '30s',
path: '/metrics',
},
],
},
},
prometheusRuleEksCNI: {
apiVersion: 'monitoring.coreos.com/v1',
kind: 'PrometheusRule',
metadata: {
labels: $.prometheus._config.commonLabels + $.prometheus._config.mixin.ruleLabels,
name: 'eks-rules',
namespace: $.prometheus._config.namespace,
},
spec: {
groups: [
{
name: 'kube-prometheus-eks.rules',
rules: [
{
expr: 'sum by(instance) (awscni_ip_max) - sum by(instance) (awscni_assigned_ip_addresses) < %s' % $.values.eks.minimumAvailableIPs,
labels: {
severity: 'critical',
},
annotations: {
message: 'Instance {{ $labels.instance }} has less than 10 IPs available.',
},
'for': $.values.eks.minimumAvailableIPsTime,
alert: 'EksAvailableIPs',
},
],
},
],
},
},
},
}

View File

@@ -1,56 +1 @@
local service(name, namespace, labels, selector, ports) = {
apiVersion: 'v1',
kind: 'Service',
metadata: {
name: name,
namespace: namespace,
labels: labels,
},
spec: {
ports+: ports,
selector: selector,
clusterIP: 'None',
},
};
{
kubernetesControlPlane+: {
kubeControllerManagerPrometheusDiscoveryService: service(
'kube-controller-manager-prometheus-discovery',
'kube-system',
{ 'app.kubernetes.io/name': 'kube-controller-manager' },
{ 'app.kubernetes.io/name': 'kube-controller-manager' },
[{ name: 'https-metrics', port: 10257, targetPort: 10257 }]
),
kubeSchedulerPrometheusDiscoveryService: service(
'kube-scheduler-prometheus-discovery',
'kube-system',
{ 'app.kubernetes.io/name': 'kube-scheduler' },
{ 'app.kubernetes.io/name': 'kube-scheduler' },
[{ name: 'https-metrics', port: 10259, targetPort: 10259 }],
),
serviceMonitorKubeScheduler+: {
spec+: {
selector+: {
matchLabels: {
'app.kubernetes.io/name': 'kube-scheduler',
},
},
},
},
serviceMonitorKubeControllerManager+: {
spec+: {
selector+: {
matchLabels: {
'app.kubernetes.io/name': 'kube-controller-manager',
},
},
},
},
},
}
(import './kubeadm.libsonnet')

View File

@@ -26,7 +26,7 @@ local platformPatch(p) = if p != null && std.objectHas(platforms, p) then platfo
prometheusOperator: {},
kubernetesControlPlane: {},
kubePrometheus: {},
} + platformPatch($.values.kubePrometheus.platform),
} + platformPatch($.values.common.platform),
alertmanager+: p.alertmanager,
blackboxExporter+: p.blackboxExporter,

View File

@@ -1,12 +1,12 @@
{
"alertmanager": "0.21.0",
"blackboxExporter": "0.18.0",
"grafana": "7.5.4",
"kubeStateMetrics": "2.0.0",
"nodeExporter": "1.1.2",
"prometheus": "2.26.0",
"prometheusAdapter": "0.8.4",
"prometheusOperator": "0.47.0",
"kubeRbacProxy": "0.8.0",
"alertmanager": "0.22.2",
"blackboxExporter": "0.19.0",
"grafana": "8.1.1",
"kubeStateMetrics": "2.1.1",
"nodeExporter": "1.2.2",
"prometheus": "2.29.1",
"prometheusAdapter": "0.9.0",
"prometheusOperator": "0.49.0",
"kubeRbacProxy": "0.11.0",
"configmapReload": "0.5.0"
}
}

View File

@@ -8,8 +8,8 @@
"subdir": "grafana"
}
},
"version": "8ea4e7bc04b1bf5e9bd99918ca28c6271b42be0e",
"sum": "muenICtKXABk6MZZHCZD2wCbmtiE96GwWRMGa1Rg+wA="
"version": "90f38916f1f8a310a715d18e36f787f84df4ddf5",
"sum": "0kZ1pnuIirDtbg6F9at5+NQOwKNONIGEPq0eECzvRkI="
},
{
"source": {
@@ -18,7 +18,7 @@
"subdir": "contrib/mixin"
}
},
"version": "562d645ac923388ff5b8d270b0536764d34b0e0f",
"version": "e8732fb5f35d4f5229c983fea478ed13b11d729e",
"sum": "W/Azptf1PoqjyMwJON96UY69MFugDA4IAYiKURscryc="
},
{
@@ -28,7 +28,7 @@
"subdir": "grafonnet"
}
},
"version": "6db00c292d3a1c71661fc875f90e0ec7caa538c2",
"version": "3626fc4dc2326931c530861ac5bebe39444f6cbf",
"sum": "gF8foHByYcB25jcUOBqP6jxk0OPifQMjPvKY0HaCk6w="
},
{
@@ -38,19 +38,8 @@
"subdir": "grafana-builder"
}
},
"version": "98c3060877aa178f6bdfc6ac618fbe0043fc3de7",
"sum": "0KkygBQd/AFzUvVzezE4qF/uDYgrwUXVpZfINBti0oc="
},
{
"source": {
"git": {
"remote": "https://github.com/ksonnet/ksonnet-lib.git",
"subdir": ""
}
},
"version": "0d2f82676817bbf9e4acf6495b2090205f323b9f",
"sum": "h28BXZ7+vczxYJ2sCt8JuR9+yznRtU/iA6DCpQUrtEg=",
"name": "ksonnet"
"version": "2ed138b205717af721af57b572bc7cd63bda62fd",
"sum": "U34Nd1ViO2LZ3D8IzygPPRfUcy6zOgCnTMVHZ+9O/QE="
},
{
"source": {
@@ -59,8 +48,8 @@
"subdir": ""
}
},
"version": "7d3bb79a4983052d421264a7e0f3c9b0d4a22268",
"sum": "DFo3YX4xc6GJTSZDaG5XRE/ixY/5GZJwdyqBkvons4M="
"version": "1163ea85e45e1f7edf6d4f83758d44c6fef1f2fa",
"sum": "4H2pzHd6A47rQIZcQ3B0o+nFMeNgLE9dGYJv7ZP7m2s="
},
{
"source": {
@@ -69,7 +58,7 @@
"subdir": "lib/promgrafonnet"
}
},
"version": "0f0f3dc472ff2a8cdc6a6c6f938a2c450cb493ec",
"version": "06d00e40b43e4e618afbebe8e453b5650c659015",
"sum": "zv7hXGui6BfHzE9wPatHI/AGZa4A2WKo6pq7ZdqBsps="
},
{
@@ -79,7 +68,7 @@
"subdir": "jsonnet/kube-state-metrics"
}
},
"version": "b1889aa1561ee269f628e2b9659155e7714dbbf0",
"version": "d60e6f7ba1719045edc0f60857faadeb87280421",
"sum": "S5qI+PJUdNeYOv76jH5nxwYS9N6U7CRxvyuB1wI4cTE="
},
{
@@ -89,8 +78,8 @@
"subdir": "jsonnet/kube-state-metrics-mixin"
}
},
"version": "b1889aa1561ee269f628e2b9659155e7714dbbf0",
"sum": "Yf8mNAHrV1YWzrdV8Ry5dJ8YblepTGw3C0Zp10XIYLo="
"version": "d60e6f7ba1719045edc0f60857faadeb87280421",
"sum": "u8gaydJoxEjzizQ8jY8xSjYgWooPmxw+wIWdDxifMAk="
},
{
"source": {
@@ -99,7 +88,7 @@
"subdir": "jsonnet/mixin"
}
},
"version": "b7ca32169844f0b5143f3e5e318fc05fa025df18",
"version": "83fe36566f4e0894eb5ffcd2638a0f039a17bdeb",
"sum": "6reUygVmQrLEWQzTKcH8ceDbvM+2ztK3z2VBR2K2l+U=",
"name": "prometheus-operator-mixin"
},
@@ -110,8 +99,8 @@
"subdir": "jsonnet/prometheus-operator"
}
},
"version": "b7ca32169844f0b5143f3e5e318fc05fa025df18",
"sum": "MRwyChXdKG3anL2OWpbUu3qWc97w9J6YsjUWjLFQyB0="
"version": "83fe36566f4e0894eb5ffcd2638a0f039a17bdeb",
"sum": "J1G++A8hrtr3+OZQMmcNeb1w/C30bXqqwpwHL/Xhsd4="
},
{
"source": {
@@ -120,8 +109,8 @@
"subdir": "doc/alertmanager-mixin"
}
},
"version": "99f64e944b1043c790784cf5373c8fb349816fc4",
"sum": "V8jcZQ1Qrlm7AQ6wjbuQQsacPb0NvrcZovKyplmzW5w=",
"version": "b408b522bc653d014e53035e59fa394cc1edd762",
"sum": "pep+dHzfIjh2SU5pEkwilMCAT/NoL6YYflV4x8cr7vU=",
"name": "alertmanager"
},
{
@@ -131,8 +120,8 @@
"subdir": "docs/node-mixin"
}
},
"version": "b597c1244d7bef49e6f3359c87a56dd7707f6719",
"sum": "cZTNXQMUCLB5FGYpMn845dcqGdkcYt58qCqOFIV/BoQ="
"version": "832909dd257eb368cf83363ffcae3ab84cb4bcb1",
"sum": "MmxGhE2PJ1a52mk2x7vDpMT2at4Jglbud/rK74CB5i0="
},
{
"source": {
@@ -141,8 +130,8 @@
"subdir": "documentation/prometheus-mixin"
}
},
"version": "6eeded0fdf760e81af75d9c44ce539ab77da4505",
"sum": "VK0c3sQ3ksiM6JQsAVfWmL5NbzGv9llMfXFNXfFdJ+A=",
"version": "751ca03faddc9c64089c41d0da370a3a0b477742",
"sum": "AS8WYFi/z10BZSF6DFkKBscjB32XDMM7iIso7CO/FyI=",
"name": "prometheus"
},
{
@@ -152,8 +141,8 @@
"subdir": "mixin"
}
},
"version": "09b36547e5ed61a32a309648a8913bd02c08d3cc",
"sum": "XP3uq7xcfKHsnWsz1v992csZhhZR3jQma6hFOfSViTs=",
"version": "ff363498fc95cfe17de894d7237bcf38bdd0bc36",
"sum": "cajthvLKDjYgYHCKQU2g/pTMRkxcbuJEvTnCyJOihl8=",
"name": "thanos-mixin"
},
{

View File

@@ -6,11 +6,11 @@ metadata:
app.kubernetes.io/component: alert-router
app.kubernetes.io/name: alertmanager
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.21.0
app.kubernetes.io/version: 0.22.2
name: main
namespace: monitoring
spec:
image: quay.io/prometheus/alertmanager:v0.21.0
image: quay.io/prometheus/alertmanager:v0.22.2
nodeSelector:
kubernetes.io/os: linux
podMetadata:
@@ -18,7 +18,7 @@ spec:
app.kubernetes.io/component: alert-router
app.kubernetes.io/name: alertmanager
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.21.0
app.kubernetes.io/version: 0.22.2
replicas: 3
resources:
limits:
@@ -32,4 +32,4 @@ spec:
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: alertmanager-main
version: 0.21.0
version: 0.22.2

View File

@@ -5,7 +5,7 @@ metadata:
app.kubernetes.io/component: alert-router
app.kubernetes.io/name: alertmanager
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.21.0
app.kubernetes.io/version: 0.22.2
name: alertmanager-main
namespace: monitoring
spec:

View File

@@ -5,7 +5,7 @@ metadata:
app.kubernetes.io/component: alert-router
app.kubernetes.io/name: alertmanager
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.21.0
app.kubernetes.io/version: 0.22.2
prometheus: k8s
role: alert-rules
name: alertmanager-main-rules
@@ -17,7 +17,7 @@ spec:
- alert: AlertmanagerFailedReload
annotations:
description: Configuration has failed to load for {{ $labels.namespace }}/{{ $labels.pod}}.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/alertmanagerfailedreload
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagerfailedreload
summary: Reloading an Alertmanager configuration has failed.
expr: |
# Without max_over_time, failed scrapes could create false negatives, see
@@ -29,7 +29,7 @@ spec:
- alert: AlertmanagerMembersInconsistent
annotations:
description: Alertmanager {{ $labels.namespace }}/{{ $labels.pod}} has only found {{ $value }} members of the {{$labels.job}} cluster.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/alertmanagermembersinconsistent
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagermembersinconsistent
summary: A member of an Alertmanager cluster has not found all other cluster members.
expr: |
# Without max_over_time, failed scrapes could create false negatives, see
@@ -37,13 +37,13 @@ spec:
max_over_time(alertmanager_cluster_members{job="alertmanager-main",namespace="monitoring"}[5m])
< on (namespace,service) group_left
count by (namespace,service) (max_over_time(alertmanager_cluster_members{job="alertmanager-main",namespace="monitoring"}[5m]))
for: 10m
for: 15m
labels:
severity: critical
- alert: AlertmanagerFailedToSendAlerts
annotations:
description: Alertmanager {{ $labels.namespace }}/{{ $labels.pod}} failed to send {{ $value | humanizePercentage }} of notifications to {{ $labels.integration }}.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/alertmanagerfailedtosendalerts
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagerfailedtosendalerts
summary: An Alertmanager instance failed to send notifications.
expr: |
(
@@ -58,7 +58,7 @@ spec:
- alert: AlertmanagerClusterFailedToSendAlerts
annotations:
description: The minimum notification failure rate to {{ $labels.integration }} sent from any instance in the {{$labels.job}} cluster is {{ $value | humanizePercentage }}.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/alertmanagerclusterfailedtosendalerts
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagerclusterfailedtosendalerts
summary: All Alertmanager instances in a cluster failed to send notifications to a critical integration.
expr: |
min by (namespace,service, integration) (
@@ -73,7 +73,7 @@ spec:
- alert: AlertmanagerClusterFailedToSendAlerts
annotations:
description: The minimum notification failure rate to {{ $labels.integration }} sent from any instance in the {{$labels.job}} cluster is {{ $value | humanizePercentage }}.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/alertmanagerclusterfailedtosendalerts
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagerclusterfailedtosendalerts
summary: All Alertmanager instances in a cluster failed to send notifications to a non-critical integration.
expr: |
min by (namespace,service, integration) (
@@ -88,7 +88,7 @@ spec:
- alert: AlertmanagerConfigInconsistent
annotations:
description: Alertmanager instances within the {{$labels.job}} cluster have different configurations.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/alertmanagerconfiginconsistent
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagerconfiginconsistent
summary: Alertmanager instances within the same cluster have different configurations.
expr: |
count by (namespace,service) (
@@ -101,7 +101,7 @@ spec:
- alert: AlertmanagerClusterDown
annotations:
description: '{{ $value | humanizePercentage }} of Alertmanager instances within the {{$labels.job}} cluster have been up for less than half of the last 5m.'
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/alertmanagerclusterdown
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagerclusterdown
summary: Half or more of the Alertmanager instances within the same cluster are down.
expr: |
(
@@ -120,7 +120,7 @@ spec:
- alert: AlertmanagerClusterCrashlooping
annotations:
description: '{{ $value | humanizePercentage }} of Alertmanager instances within the {{$labels.job}} cluster have restarted at least 5 times in the last 10m.'
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/alertmanagerclustercrashlooping
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagerclustercrashlooping
summary: Half or more of the Alertmanager instances within the same cluster are crashlooping.
expr: |
(

View File

@@ -6,7 +6,7 @@ metadata:
app.kubernetes.io/component: alert-router
app.kubernetes.io/name: alertmanager
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.21.0
app.kubernetes.io/version: 0.22.2
name: alertmanager-main
namespace: monitoring
stringData:

View File

@@ -6,7 +6,7 @@ metadata:
app.kubernetes.io/component: alert-router
app.kubernetes.io/name: alertmanager
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.21.0
app.kubernetes.io/version: 0.22.2
name: alertmanager-main
namespace: monitoring
spec:

View File

@@ -6,6 +6,6 @@ metadata:
app.kubernetes.io/component: alert-router
app.kubernetes.io/name: alertmanager
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.21.0
app.kubernetes.io/version: 0.22.2
name: alertmanager-main
namespace: monitoring

View File

@@ -5,7 +5,7 @@ metadata:
app.kubernetes.io/component: alert-router
app.kubernetes.io/name: alertmanager
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.21.0
app.kubernetes.io/version: 0.22.2
name: alertmanager
namespace: monitoring
spec:

View File

@@ -46,6 +46,6 @@ metadata:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: blackbox-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.18.0
app.kubernetes.io/version: 0.19.0
name: blackbox-exporter-configuration
namespace: monitoring

View File

@@ -5,7 +5,7 @@ metadata:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: blackbox-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.18.0
app.kubernetes.io/version: 0.19.0
name: blackbox-exporter
namespace: monitoring
spec:
@@ -23,13 +23,13 @@ spec:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: blackbox-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.18.0
app.kubernetes.io/version: 0.19.0
spec:
containers:
- args:
- --config.file=/etc/blackbox_exporter/config.yml
- --web.listen-address=:19115
image: quay.io/prometheus/blackbox-exporter:v0.18.0
image: quay.io/prometheus/blackbox-exporter:v0.19.0
name: blackbox-exporter
ports:
- containerPort: 19115
@@ -74,7 +74,7 @@ spec:
- --secure-listen-address=:9115
- --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
- --upstream=http://127.0.0.1:19115/
image: quay.io/brancz/kube-rbac-proxy:v0.8.0
image: quay.io/brancz/kube-rbac-proxy:v0.11.0
name: kube-rbac-proxy
ports:
- containerPort: 9115

View File

@@ -5,7 +5,7 @@ metadata:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: blackbox-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.18.0
app.kubernetes.io/version: 0.19.0
name: blackbox-exporter
namespace: monitoring
spec:

View File

@@ -5,7 +5,7 @@ metadata:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: blackbox-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.18.0
app.kubernetes.io/version: 0.19.0
name: blackbox-exporter
namespace: monitoring
spec:

View File

@@ -7,7 +7,7 @@ metadata:
app.kubernetes.io/component: grafana
app.kubernetes.io/name: grafana
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 7.5.4
app.kubernetes.io/version: 8.1.1
name: grafana-datasources
namespace: monitoring
type: Opaque

File diff suppressed because it is too large Load Diff

View File

@@ -21,6 +21,6 @@ metadata:
app.kubernetes.io/component: grafana
app.kubernetes.io/name: grafana
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 7.5.4
app.kubernetes.io/version: 8.1.1
name: grafana-dashboards
namespace: monitoring

View File

@@ -5,7 +5,7 @@ metadata:
app.kubernetes.io/component: grafana
app.kubernetes.io/name: grafana
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 7.5.4
app.kubernetes.io/version: 8.1.1
name: grafana
namespace: monitoring
spec:
@@ -18,16 +18,16 @@ spec:
template:
metadata:
annotations:
checksum/grafana-datasources: bff02b6fd55e414ce7cf08a5ea2a85e3
checksum/grafana-datasources: fbf9c3b28f5667257167c2cec0ac311a
labels:
app.kubernetes.io/component: grafana
app.kubernetes.io/name: grafana
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 7.5.4
app.kubernetes.io/version: 8.1.1
spec:
containers:
- env: []
image: grafana/grafana:7.5.4
image: grafana/grafana:8.1.1
name: grafana
ports:
- containerPort: 3000
@@ -53,6 +53,9 @@ spec:
- mountPath: /etc/grafana/provisioning/dashboards
name: grafana-dashboards
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/alertmanager-overview
name: grafana-dashboard-alertmanager-overview
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/apiserver
name: grafana-dashboard-apiserver
readOnly: false
@@ -116,14 +119,11 @@ spec:
- mountPath: /grafana-dashboard-definitions/0/scheduler
name: grafana-dashboard-scheduler
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/statefulset
name: grafana-dashboard-statefulset
readOnly: false
- mountPath: /grafana-dashboard-definitions/0/workload-total
name: grafana-dashboard-workload-total
readOnly: false
nodeSelector:
beta.kubernetes.io/os: linux
kubernetes.io/os: linux
securityContext:
fsGroup: 65534
runAsNonRoot: true
@@ -138,6 +138,9 @@ spec:
- configMap:
name: grafana-dashboards
name: grafana-dashboards
- configMap:
name: grafana-dashboard-alertmanager-overview
name: grafana-dashboard-alertmanager-overview
- configMap:
name: grafana-dashboard-apiserver
name: grafana-dashboard-apiserver
@@ -201,9 +204,6 @@ spec:
- configMap:
name: grafana-dashboard-scheduler
name: grafana-dashboard-scheduler
- configMap:
name: grafana-dashboard-statefulset
name: grafana-dashboard-statefulset
- configMap:
name: grafana-dashboard-workload-total
name: grafana-dashboard-workload-total

View File

@@ -5,7 +5,7 @@ metadata:
app.kubernetes.io/component: grafana
app.kubernetes.io/name: grafana
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 7.5.4
app.kubernetes.io/version: 8.1.1
name: grafana
namespace: monitoring
spec:

View File

@@ -5,7 +5,7 @@ metadata:
app.kubernetes.io/component: grafana
app.kubernetes.io/name: grafana
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 7.5.4
app.kubernetes.io/version: 8.1.1
name: grafana
namespace: monitoring
spec:

View File

@@ -16,7 +16,7 @@ spec:
- alert: TargetDown
annotations:
description: '{{ printf "%.4g" $value }}% of the {{ $labels.job }}/{{ $labels.service }} targets in {{ $labels.namespace }} namespace are down.'
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/targetdown
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/general/targetdown
summary: One or more targets are unreachable.
expr: 100 * (count(up == 0) BY (job, namespace, service) / count(up) BY (job, namespace, service)) > 10
for: 10m
@@ -30,7 +30,7 @@ spec:
and always fire against a receiver. There are integrations with various notification
mechanisms that send a notification when this alert is not firing. For example the
"DeadMansSnitch" integration in PagerDuty.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/watchdog
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/general/watchdog
summary: An alert that should always be firing to certify that Alertmanager is working properly.
expr: vector(1)
labels:
@@ -39,8 +39,9 @@ spec:
rules:
- alert: NodeNetworkInterfaceFlapping
annotations:
message: Network interface "{{ $labels.device }}" changing it's up status often on node-exporter {{ $labels.namespace }}/{{ $labels.pod }}
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodenetworkinterfaceflapping
description: Network interface "{{ $labels.device }}" changing its up status often on node-exporter {{ $labels.namespace }}/{{ $labels.pod }}
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/general/nodenetworkinterfaceflapping
summary: Network interface is often changing its status
expr: |
changes(node_network_up{job="node-exporter",device!~"veth.+"}[2m]) > 2
for: 2m

View File

@@ -5,7 +5,7 @@ metadata:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.0.0
app.kubernetes.io/version: 2.1.1
name: kube-state-metrics
rules:
- apiGroups:

View File

@@ -5,7 +5,7 @@ metadata:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.0.0
app.kubernetes.io/version: 2.1.1
name: kube-state-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io

View File

@@ -5,7 +5,7 @@ metadata:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.0.0
app.kubernetes.io/version: 2.1.1
name: kube-state-metrics
namespace: monitoring
spec:
@@ -23,7 +23,7 @@ spec:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.0.0
app.kubernetes.io/version: 2.1.1
spec:
containers:
- args:
@@ -31,7 +31,7 @@ spec:
- --port=8081
- --telemetry-host=127.0.0.1
- --telemetry-port=8082
image: k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.0.0
image: k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.1.1
name: kube-state-metrics
resources:
limits:
@@ -47,7 +47,7 @@ spec:
- --secure-listen-address=:8443
- --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
- --upstream=http://127.0.0.1:8081/
image: quay.io/brancz/kube-rbac-proxy:v0.8.0
image: quay.io/brancz/kube-rbac-proxy:v0.11.0
name: kube-rbac-proxy-main
ports:
- containerPort: 8443
@@ -68,7 +68,7 @@ spec:
- --secure-listen-address=:9443
- --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
- --upstream=http://127.0.0.1:8082/
image: quay.io/brancz/kube-rbac-proxy:v0.8.0
image: quay.io/brancz/kube-rbac-proxy:v0.11.0
name: kube-rbac-proxy-self
ports:
- containerPort: 9443

View File

@@ -5,7 +5,7 @@ metadata:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.0.0
app.kubernetes.io/version: 2.1.1
prometheus: k8s
role: alert-rules
name: kube-state-metrics-rules
@@ -17,7 +17,7 @@ spec:
- alert: KubeStateMetricsListErrors
annotations:
description: kube-state-metrics is experiencing errors at an elevated rate in list operations. This is likely causing it to not be able to expose metrics about Kubernetes objects correctly or at all.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/kubestatemetricslisterrors
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kube-state-metrics/kubestatemetricslisterrors
summary: kube-state-metrics is experiencing errors in list operations.
expr: |
(sum(rate(kube_state_metrics_list_total{job="kube-state-metrics",result="error"}[5m]))
@@ -30,7 +30,7 @@ spec:
- alert: KubeStateMetricsWatchErrors
annotations:
description: kube-state-metrics is experiencing errors at an elevated rate in watch operations. This is likely causing it to not be able to expose metrics about Kubernetes objects correctly or at all.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/kubestatemetricswatcherrors
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kube-state-metrics/kubestatemetricswatcherrors
summary: kube-state-metrics is experiencing errors in watch operations.
expr: |
(sum(rate(kube_state_metrics_watch_total{job="kube-state-metrics",result="error"}[5m]))
@@ -40,3 +40,26 @@ spec:
for: 15m
labels:
severity: critical
- alert: KubeStateMetricsShardingMismatch
annotations:
description: kube-state-metrics pods are running with different --total-shards configuration, some Kubernetes objects may be exposed multiple times or not exposed at all.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kube-state-metrics/kubestatemetricsshardingmismatch
summary: kube-state-metrics sharding is misconfigured.
expr: |
stdvar (kube_state_metrics_total_shards{job="kube-state-metrics"}) != 0
for: 15m
labels:
severity: critical
- alert: KubeStateMetricsShardsMissing
annotations:
description: kube-state-metrics shards are missing, some Kubernetes objects are not being exposed.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kube-state-metrics/kubestatemetricsshardsmissing
summary: kube-state-metrics shards are missing.
expr: |
2^max(kube_state_metrics_total_shards{job="kube-state-metrics"}) - 1
-
sum( 2 ^ max by (shard_ordinal) (kube_state_metrics_shard_ordinal{job="kube-state-metrics"}) )
!= 0
for: 15m
labels:
severity: critical

View File

@@ -5,7 +5,7 @@ metadata:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.0.0
app.kubernetes.io/version: 2.1.1
name: kube-state-metrics
namespace: monitoring
spec:

View File

@@ -5,6 +5,6 @@ metadata:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.0.0
app.kubernetes.io/version: 2.1.1
name: kube-state-metrics
namespace: monitoring

View File

@@ -5,7 +5,7 @@ metadata:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.0.0
app.kubernetes.io/version: 2.1.1
name: kube-state-metrics
namespace: monitoring
spec:

File diff suppressed because it is too large Load Diff

View File

@@ -31,7 +31,7 @@ spec:
sourceLabels:
- __name__
- action: drop
regex: etcd_(helper_cache_hit_count|helper_cache_miss_count|helper_cache_entry_count|request_cache_get_latencies_summary|request_cache_add_latencies_summary|request_latencies_summary)
regex: etcd_(helper_cache_hit_count|helper_cache_miss_count|helper_cache_entry_count|object_counts|request_cache_get_latencies_summary|request_cache_add_latencies_summary|request_latencies_summary)
sourceLabels:
- __name__
- action: drop

View File

@@ -16,4 +16,4 @@ spec:
- kube-system
selector:
matchLabels:
app.kubernetes.io/name: kube-dns
k8s-app: kube-dns

View File

@@ -31,7 +31,7 @@ spec:
sourceLabels:
- __name__
- action: drop
regex: etcd_(helper_cache_hit_count|helper_cache_miss_count|helper_cache_entry_count|request_cache_get_latencies_summary|request_cache_add_latencies_summary|request_latencies_summary)
regex: etcd_(helper_cache_hit_count|helper_cache_miss_count|helper_cache_entry_count|object_counts|request_cache_get_latencies_summary|request_cache_add_latencies_summary|request_latencies_summary)
sourceLabels:
- __name__
- action: drop

View File

@@ -32,7 +32,7 @@ spec:
sourceLabels:
- __name__
- action: drop
regex: etcd_(helper_cache_hit_count|helper_cache_miss_count|helper_cache_entry_count|request_cache_get_latencies_summary|request_cache_add_latencies_summary|request_latencies_summary)
regex: etcd_(helper_cache_hit_count|helper_cache_miss_count|helper_cache_entry_count|object_counts|request_cache_get_latencies_summary|request_cache_add_latencies_summary|request_latencies_summary)
sourceLabels:
- __name__
- action: drop
@@ -61,11 +61,16 @@ spec:
sourceLabels:
- __name__
- action: drop
regex: (container_fs_.*|container_spec_.*|container_blkio_device_usage_total|container_file_descriptors|container_sockets|container_threads_max|container_threads|container_start_time_seconds|container_last_seen);;
regex: (container_spec_.*|container_file_descriptors|container_sockets|container_threads_max|container_threads|container_start_time_seconds|container_last_seen);;
sourceLabels:
- __name__
- pod
- namespace
- action: drop
regex: (container_blkio_device_usage_total);.+
sourceLabels:
- __name__
- container
path: /metrics/cadvisor
port: https-metrics
relabelings:

View File

@@ -5,7 +5,7 @@ metadata:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: node-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 1.1.2
app.kubernetes.io/version: 1.2.2
name: node-exporter
rules:
- apiGroups:

View File

@@ -5,7 +5,7 @@ metadata:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: node-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 1.1.2
app.kubernetes.io/version: 1.2.2
name: node-exporter
roleRef:
apiGroup: rbac.authorization.k8s.io

View File

@@ -5,7 +5,7 @@ metadata:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: node-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 1.1.2
app.kubernetes.io/version: 1.2.2
name: node-exporter
namespace: monitoring
spec:
@@ -20,7 +20,7 @@ spec:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: node-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 1.1.2
app.kubernetes.io/version: 1.2.2
spec:
containers:
- args:
@@ -32,7 +32,7 @@ spec:
- --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/pods/.+)($|/)
- --collector.netclass.ignored-devices=^(veth.*|[a-f0-9]{15})$
- --collector.netdev.device-exclude=^(veth.*|[a-f0-9]{15})$
image: quay.io/prometheus/node-exporter:v1.1.2
image: quay.io/prometheus/node-exporter:v1.2.2
name: node-exporter
resources:
limits:
@@ -60,7 +60,7 @@ spec:
valueFrom:
fieldRef:
fieldPath: status.podIP
image: quay.io/brancz/kube-rbac-proxy:v0.8.0
image: quay.io/brancz/kube-rbac-proxy:v0.11.0
name: kube-rbac-proxy
ports:
- containerPort: 9100

View File

@@ -5,7 +5,7 @@ metadata:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: node-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 1.1.2
app.kubernetes.io/version: 1.2.2
prometheus: k8s
role: alert-rules
name: node-exporter-rules
@@ -17,7 +17,7 @@ spec:
- alert: NodeFilesystemSpaceFillingUp
annotations:
description: Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available space left and is filling up.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodefilesystemspacefillingup
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemspacefillingup
summary: Filesystem is predicted to run out of space within the next 24 hours.
expr: |
(
@@ -33,7 +33,7 @@ spec:
- alert: NodeFilesystemSpaceFillingUp
annotations:
description: Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available space left and is filling up fast.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodefilesystemspacefillingup
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemspacefillingup
summary: Filesystem is predicted to run out of space within the next 4 hours.
expr: |
(
@@ -49,7 +49,7 @@ spec:
- alert: NodeFilesystemAlmostOutOfSpace
annotations:
description: Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available space left.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodefilesystemalmostoutofspace
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemalmostoutofspace
summary: Filesystem has less than 5% space left.
expr: |
(
@@ -57,13 +57,13 @@ spec:
and
node_filesystem_readonly{job="node-exporter",fstype!=""} == 0
)
for: 1h
for: 30m
labels:
severity: warning
- alert: NodeFilesystemAlmostOutOfSpace
annotations:
description: Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available space left.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodefilesystemalmostoutofspace
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemalmostoutofspace
summary: Filesystem has less than 3% space left.
expr: |
(
@@ -71,13 +71,13 @@ spec:
and
node_filesystem_readonly{job="node-exporter",fstype!=""} == 0
)
for: 1h
for: 30m
labels:
severity: critical
- alert: NodeFilesystemFilesFillingUp
annotations:
description: Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available inodes left and is filling up.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodefilesystemfilesfillingup
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemfilesfillingup
summary: Filesystem is predicted to run out of inodes within the next 24 hours.
expr: |
(
@@ -93,7 +93,7 @@ spec:
- alert: NodeFilesystemFilesFillingUp
annotations:
description: Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available inodes left and is filling up fast.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodefilesystemfilesfillingup
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemfilesfillingup
summary: Filesystem is predicted to run out of inodes within the next 4 hours.
expr: |
(
@@ -109,7 +109,7 @@ spec:
- alert: NodeFilesystemAlmostOutOfFiles
annotations:
description: Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available inodes left.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodefilesystemalmostoutoffiles
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemalmostoutoffiles
summary: Filesystem has less than 5% inodes left.
expr: |
(
@@ -123,7 +123,7 @@ spec:
- alert: NodeFilesystemAlmostOutOfFiles
annotations:
description: Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available inodes left.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodefilesystemalmostoutoffiles
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemalmostoutoffiles
summary: Filesystem has less than 3% inodes left.
expr: |
(
@@ -137,7 +137,7 @@ spec:
- alert: NodeNetworkReceiveErrs
annotations:
description: '{{ $labels.instance }} interface {{ $labels.device }} has encountered {{ printf "%.0f" $value }} receive errors in the last two minutes.'
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodenetworkreceiveerrs
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodenetworkreceiveerrs
summary: Network interface is reporting many receive errors.
expr: |
rate(node_network_receive_errs_total[2m]) / rate(node_network_receive_packets_total[2m]) > 0.01
@@ -147,7 +147,7 @@ spec:
- alert: NodeNetworkTransmitErrs
annotations:
description: '{{ $labels.instance }} interface {{ $labels.device }} has encountered {{ printf "%.0f" $value }} transmit errors in the last two minutes.'
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodenetworktransmiterrs
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodenetworktransmiterrs
summary: Network interface is reporting many transmit errors.
expr: |
rate(node_network_transmit_errs_total[2m]) / rate(node_network_transmit_packets_total[2m]) > 0.01
@@ -157,7 +157,7 @@ spec:
- alert: NodeHighNumberConntrackEntriesUsed
annotations:
description: '{{ $value | humanizePercentage }} of conntrack entries are used.'
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodehighnumberconntrackentriesused
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodehighnumberconntrackentriesused
summary: Number of conntrack are getting close to the limit.
expr: |
(node_nf_conntrack_entries / node_nf_conntrack_entries_limit) > 0.75
@@ -166,7 +166,7 @@ spec:
- alert: NodeTextFileCollectorScrapeError
annotations:
description: Node Exporter text file collector failed to scrape.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodetextfilecollectorscrapeerror
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodetextfilecollectorscrapeerror
summary: Node Exporter text file collector failed to scrape.
expr: |
node_textfile_scrape_error{job="node-exporter"} == 1
@@ -175,7 +175,7 @@ spec:
- alert: NodeClockSkewDetected
annotations:
description: Clock on {{ $labels.instance }} is out of sync by more than 300s. Ensure NTP is configured correctly on this host.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodeclockskewdetected
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodeclockskewdetected
summary: Clock skew detected.
expr: |
(
@@ -195,7 +195,7 @@ spec:
- alert: NodeClockNotSynchronising
annotations:
description: Clock on {{ $labels.instance }} is not synchronising. Ensure NTP is configured on this host.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodeclocknotsynchronising
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodeclocknotsynchronising
summary: Clock not synchronising.
expr: |
min_over_time(node_timex_sync_status[5m]) == 0
@@ -207,7 +207,7 @@ spec:
- alert: NodeRAIDDegraded
annotations:
description: RAID array '{{ $labels.device }}' on {{ $labels.instance }} is in degraded state due to one or more disks failures. Number of spare drives is insufficient to fix issue automatically.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/noderaiddegraded
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/noderaiddegraded
summary: RAID Array is degraded
expr: |
node_md_disks_required - ignoring (state) (node_md_disks{state="active"}) > 0
@@ -217,12 +217,36 @@ spec:
- alert: NodeRAIDDiskFailure
annotations:
description: At least one device in RAID array on {{ $labels.instance }} failed. Array '{{ $labels.device }}' needs attention and possibly a disk swap.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/noderaiddiskfailure
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/noderaiddiskfailure
summary: Failed device in RAID array
expr: |
node_md_disks{state="failed"} > 0
labels:
severity: warning
- alert: NodeFileDescriptorLimit
annotations:
description: File descriptors limit at {{ $labels.instance }} is currently at {{ printf "%.2f" $value }}%.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefiledescriptorlimit
summary: Kernel is predicted to exhaust file descriptors limit soon.
expr: |
(
node_filefd_allocated{job="node-exporter"} * 100 / node_filefd_maximum{job="node-exporter"} > 70
)
for: 15m
labels:
severity: warning
- alert: NodeFileDescriptorLimit
annotations:
description: File descriptors limit at {{ $labels.instance }} is currently at {{ printf "%.2f" $value }}%.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefiledescriptorlimit
summary: Kernel is predicted to exhaust file descriptors limit soon.
expr: |
(
node_filefd_allocated{job="node-exporter"} * 100 / node_filefd_maximum{job="node-exporter"} > 90
)
for: 15m
labels:
severity: critical
- name: node-exporter.rules
rules:
- expr: |
@@ -234,9 +258,9 @@ spec:
record: instance:node_num_cpu:sum
- expr: |
1 - avg without (cpu, mode) (
rate(node_cpu_seconds_total{job="node-exporter", mode="idle"}[1m])
rate(node_cpu_seconds_total{job="node-exporter", mode="idle"}[5m])
)
record: instance:node_cpu_utilisation:rate1m
record: instance:node_cpu_utilisation:rate5m
- expr: |
(
node_load1{job="node-exporter"}
@@ -252,31 +276,31 @@ spec:
)
record: instance:node_memory_utilisation:ratio
- expr: |
rate(node_vmstat_pgmajfault{job="node-exporter"}[1m])
record: instance:node_vmstat_pgmajfault:rate1m
rate(node_vmstat_pgmajfault{job="node-exporter"}[5m])
record: instance:node_vmstat_pgmajfault:rate5m
- expr: |
rate(node_disk_io_time_seconds_total{job="node-exporter", device=~"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+"}[1m])
record: instance_device:node_disk_io_time_seconds:rate1m
rate(node_disk_io_time_seconds_total{job="node-exporter", device=~"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+"}[5m])
record: instance_device:node_disk_io_time_seconds:rate5m
- expr: |
rate(node_disk_io_time_weighted_seconds_total{job="node-exporter", device=~"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+"}[1m])
record: instance_device:node_disk_io_time_weighted_seconds:rate1m
rate(node_disk_io_time_weighted_seconds_total{job="node-exporter", device=~"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+"}[5m])
record: instance_device:node_disk_io_time_weighted_seconds:rate5m
- expr: |
sum without (device) (
rate(node_network_receive_bytes_total{job="node-exporter", device!="lo"}[1m])
rate(node_network_receive_bytes_total{job="node-exporter", device!="lo"}[5m])
)
record: instance:node_network_receive_bytes_excluding_lo:rate1m
record: instance:node_network_receive_bytes_excluding_lo:rate5m
- expr: |
sum without (device) (
rate(node_network_transmit_bytes_total{job="node-exporter", device!="lo"}[1m])
rate(node_network_transmit_bytes_total{job="node-exporter", device!="lo"}[5m])
)
record: instance:node_network_transmit_bytes_excluding_lo:rate1m
record: instance:node_network_transmit_bytes_excluding_lo:rate5m
- expr: |
sum without (device) (
rate(node_network_receive_drop_total{job="node-exporter", device!="lo"}[1m])
rate(node_network_receive_drop_total{job="node-exporter", device!="lo"}[5m])
)
record: instance:node_network_receive_drop_excluding_lo:rate1m
record: instance:node_network_receive_drop_excluding_lo:rate5m
- expr: |
sum without (device) (
rate(node_network_transmit_drop_total{job="node-exporter", device!="lo"}[1m])
rate(node_network_transmit_drop_total{job="node-exporter", device!="lo"}[5m])
)
record: instance:node_network_transmit_drop_excluding_lo:rate1m
record: instance:node_network_transmit_drop_excluding_lo:rate5m

View File

@@ -5,7 +5,7 @@ metadata:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: node-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 1.1.2
app.kubernetes.io/version: 1.2.2
name: node-exporter
namespace: monitoring
spec:

View File

@@ -5,6 +5,6 @@ metadata:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: node-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 1.1.2
app.kubernetes.io/version: 1.2.2
name: node-exporter
namespace: monitoring

View File

@@ -5,7 +5,7 @@ metadata:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: node-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 1.1.2
app.kubernetes.io/version: 1.2.2
name: node-exporter
namespace: monitoring
spec:

View File

@@ -5,7 +5,7 @@ metadata:
app.kubernetes.io/component: metrics-adapter
app.kubernetes.io/name: prometheus-adapter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.8.4
app.kubernetes.io/version: 0.9.0
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io

View File

@@ -5,7 +5,7 @@ metadata:
app.kubernetes.io/component: metrics-adapter
app.kubernetes.io/name: prometheus-adapter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.8.4
app.kubernetes.io/version: 0.9.0
name: prometheus-adapter
rules:
- apiGroups:

View File

@@ -5,7 +5,7 @@ metadata:
app.kubernetes.io/component: metrics-adapter
app.kubernetes.io/name: prometheus-adapter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.8.4
app.kubernetes.io/version: 0.9.0
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"

Some files were not shown because too many files have changed in this diff Show More