Compare commits
236 Commits
release-0.
...
v0.9.0
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
864ca1e773 | ||
|
|
822f885d67 | ||
|
|
184a6a452b | ||
|
|
b6ab321ac8 | ||
|
|
6e67e7fdbb | ||
|
|
ad19693121 | ||
|
|
8ccd82e40a | ||
|
|
c1fc78c979 | ||
|
|
4e96f7bed6 | ||
|
|
49eb7c66f6 | ||
|
|
b4b365cead | ||
|
|
fdcff9a224 | ||
|
|
2640b11d77 | ||
|
|
9ead6ebc53 | ||
|
|
62a5b28b55 | ||
|
|
0ca8df7a35 | ||
|
|
4cfbfae071 | ||
|
|
8587958cf0 | ||
|
|
eca67844af | ||
|
|
0df510d1fa | ||
|
|
da35954628 | ||
|
|
b5ec93208b | ||
|
|
518c37d72d | ||
|
|
35397089d1 | ||
|
|
45adc03cfb | ||
|
|
c1fa4971e6 | ||
|
|
c69f3b4e62 | ||
|
|
6ade9e5c7d | ||
|
|
50c9dd2c6f | ||
|
|
24b0e699e4 | ||
|
|
c4113807fb | ||
|
|
89b57081f7 | ||
|
|
2e8e88b882 | ||
|
|
ad3fc8920e | ||
|
|
8d36d0d707 | ||
|
|
ac75ee6221 | ||
|
|
5452de1b43 | ||
|
|
12cd7fd9ce | ||
|
|
0ffe13c5d2 | ||
|
|
6a150f4cc8 | ||
|
|
f6d6b30aed | ||
|
|
33cc694f18 | ||
|
|
961f138dd0 | ||
|
|
54d8f88162 | ||
|
|
e931a417fc | ||
|
|
0b49c3102d | ||
|
|
0e7dc97bc5 | ||
|
|
d3ccfb8220 | ||
|
|
a330e8634a | ||
|
|
1040e2bd70 | ||
|
|
c3be50f61f | ||
|
|
075875e8aa | ||
|
|
9e8d1b0a72 | ||
|
|
e97eb0fbe9 | ||
|
|
1eeb463203 | ||
|
|
844bdd9c47 | ||
|
|
0184f583d8 | ||
|
|
20f3cfaaeb | ||
|
|
7542a1b055 | ||
|
|
d15f839802 | ||
|
|
b7fe018d29 | ||
|
|
b9c73c7b29 | ||
|
|
09fdac739d | ||
|
|
785789b776 | ||
|
|
bbdb21f08d | ||
|
|
ed48391831 | ||
|
|
a1a9707f37 | ||
|
|
7b7c346aa0 | ||
|
|
5f13edd1ea | ||
|
|
05c72f83ef | ||
|
|
93d6101bae | ||
|
|
3a98a3478c | ||
|
|
4965e45c15 | ||
|
|
acd1eeba4c | ||
|
|
45a466e3a7 | ||
|
|
6d9e0fb6b2 | ||
|
|
755d2fe5c1 | ||
|
|
cfe830f8f0 | ||
|
|
94731577a8 | ||
|
|
9c638162ae | ||
|
|
acea5efd85 | ||
|
|
cd4438ed02 | ||
|
|
463ad065d3 | ||
|
|
46eb1713a5 | ||
|
|
02454b3f53 | ||
|
|
8c357c6bde | ||
|
|
414f8053d3 | ||
|
|
1a3c610c61 | ||
|
|
274eba0108 | ||
|
|
99ee030de3 | ||
|
|
80bb15bedd | ||
|
|
7394929c76 | ||
|
|
9bc6bf3db8 | ||
|
|
ae12388b33 | ||
|
|
9b08b941f8 | ||
|
|
43adca8df7 | ||
|
|
90b2751f06 | ||
|
|
dee7762ae3 | ||
|
|
3a44309177 | ||
|
|
64cfda3012 | ||
|
|
97e77e9996 | ||
|
|
0b3db5b6b6 | ||
|
|
60b4b3023d | ||
|
|
ed2ffe9d05 | ||
|
|
3e6865d776 | ||
|
|
acd7cdcde0 | ||
|
|
552c9ecaea | ||
|
|
a91ca001a9 | ||
|
|
f95eaf8598 | ||
|
|
b9563b9c2d | ||
|
|
8812e45501 | ||
|
|
3ab3947270 | ||
|
|
e77664f325 | ||
|
|
496bab92a6 | ||
|
|
baf0774e09 | ||
|
|
e38bc756a4 | ||
|
|
fadb829b28 | ||
|
|
86d8ed0004 | ||
|
|
0280f4ddf9 | ||
|
|
f9fd5bd499 | ||
|
|
654aa9bfac | ||
|
|
ad63d6bb95 | ||
|
|
4a3191fc09 | ||
|
|
321fa1391c | ||
|
|
d9fc85c0bb | ||
|
|
2c5c20cfff | ||
|
|
7932456718 | ||
|
|
d0e21f34e5 | ||
|
|
6ffca76858 | ||
|
|
86b1207e1b | ||
|
|
875d7cf4e8 | ||
|
|
0959155a1c | ||
|
|
0ff173efea | ||
|
|
94c5301c03 | ||
|
|
3a4e292aab | ||
|
|
466eb7953f | ||
|
|
ffea8f498e | ||
|
|
8396c697fd | ||
|
|
4e43a1e16e | ||
|
|
071b39477a | ||
|
|
4ea366eef7 | ||
|
|
8d57b10d50 | ||
|
|
db6a513190 | ||
|
|
b7ac30704e | ||
|
|
836fa4f086 | ||
|
|
59918caf8d | ||
|
|
6dc90593f9 | ||
|
|
253a8ff2d6 | ||
|
|
df4275e3c8 | ||
|
|
d6201759b8 | ||
|
|
7d48d055c6 | ||
|
|
88034c4c41 | ||
|
|
11778868b1 | ||
|
|
78a4677370 | ||
|
|
52fa4166d2 | ||
|
|
54f79428ce | ||
|
|
df197f6759 | ||
|
|
8fada1a219 | ||
|
|
46922c11c6 | ||
|
|
859b87b454 | ||
|
|
edc869991d | ||
|
|
5ea10d80a1 | ||
|
|
a2cf1acd95 | ||
|
|
2afbb72a88 | ||
|
|
f643955034 | ||
|
|
a27f65e910 | ||
|
|
d45114c73e | ||
|
|
4d8104817d | ||
|
|
feee269fdb | ||
|
|
6d603cf7a9 | ||
|
|
dccf2ee085 | ||
|
|
93cc34f0f6 | ||
|
|
d57542eae1 | ||
|
|
133c274aa9 | ||
|
|
67f710846a | ||
|
|
68b926f643 | ||
|
|
8bcfb98a1d | ||
|
|
e5720038fe | ||
|
|
1a39aaa2ab | ||
|
|
b279e38809 | ||
|
|
ae48746f3a | ||
|
|
f7baf1599d | ||
|
|
93282accb7 | ||
|
|
228f8ffdad | ||
|
|
9b65a6ddce | ||
|
|
e481cbd7c5 | ||
|
|
b10e0c9690 | ||
|
|
039d4a1e48 | ||
|
|
2873857dc7 | ||
|
|
6c82dd5fc1 | ||
|
|
edd0eb639e | ||
|
|
2fee85eb43 | ||
|
|
e1e367e820 | ||
|
|
a89da4adb6 | ||
|
|
8f7d2b9c6a | ||
|
|
888443e447 | ||
|
|
ce7e86b93a | ||
|
|
ddfadbadf9 | ||
|
|
6134f1a967 | ||
|
|
5fbdddf92e | ||
|
|
9e00fa5136 | ||
|
|
3197720de6 | ||
|
|
b9ecb0a6c6 | ||
|
|
eb06a1ab45 | ||
|
|
a8c344c848 | ||
|
|
e58cadfe96 | ||
|
|
babc6b820c | ||
|
|
3b1f268d51 | ||
|
|
f340a76e21 | ||
|
|
a1210f1eff | ||
|
|
c2ea96bf4f | ||
|
|
d50b5fd2ea | ||
|
|
a4a4d4b744 | ||
|
|
15a8351ce0 | ||
|
|
ee7fb97598 | ||
|
|
e0fb2b7821 | ||
|
|
982360b65e | ||
|
|
e2f1581c37 | ||
|
|
b9a49678b2 | ||
|
|
2531c043dc | ||
|
|
624c6c0108 | ||
|
|
db7f3c9107 | ||
|
|
4eb52db22c | ||
|
|
c45f7377ac | ||
|
|
8c221441d1 | ||
|
|
f107e8fb16 | ||
|
|
14e6143037 | ||
|
|
78b88e1b17 | ||
|
|
80408c6057 | ||
|
|
5b2740d517 | ||
|
|
7e5d4196b9 | ||
|
|
5761267842 | ||
|
|
be2964887f | ||
|
|
dbf61818fa | ||
|
|
53efc25b3f | ||
|
|
fa05e2cde8 |
37
.github/PULL_REQUEST_TEMPLATE.md
vendored
Normal file
37
.github/PULL_REQUEST_TEMPLATE.md
vendored
Normal file
@@ -0,0 +1,37 @@
|
||||
<!--
|
||||
WARNING: Not using this template will result in a longer review process and your change won't be visible in CHANGELOG.
|
||||
-->
|
||||
|
||||
## Description
|
||||
|
||||
_Describe the big picture of your changes here to communicate to the maintainers why we should accept this pull request.
|
||||
If it fixes a bug or resolves a feature request, be sure to link to that issue._
|
||||
|
||||
|
||||
|
||||
## Type of change
|
||||
|
||||
_What type of changes does your code introduce to the kube-prometheus? Put an `x` in the box that apply._
|
||||
|
||||
- [ ] `CHANGE` (fix or feature that would cause existing functionality to not work as expected)
|
||||
- [ ] `FEATURE` (non-breaking change which adds functionality)
|
||||
- [ ] `BUGFIX` (non-breaking change which fixes an issue)
|
||||
- [ ] `ENHANCEMENT` (non-breaking change which improves existing functionality)
|
||||
- [ ] `NONE` (if none of the other choices apply. Example, tooling, build system, CI, docs, etc.)
|
||||
|
||||
## Changelog entry
|
||||
|
||||
_Please put a one-line changelog entry below. Later this will be copied to the changelog file._
|
||||
|
||||
<!--
|
||||
Your release note should be written in clear and straightforward sentences. Most often, users aren't familiar with
|
||||
the technical details of your PR, so consider what they need to know when you write your release note.
|
||||
|
||||
Some brief examples of release notes:
|
||||
- Add metadataConfig field to the Prometheus CRD for configuring how remote-write sends metadata information.
|
||||
- Generate correct scraping configuration for Probes with empty or unset module parameter.
|
||||
-->
|
||||
|
||||
```release-note
|
||||
|
||||
```
|
||||
4
.github/workflows/ci.yaml
vendored
4
.github/workflows/ci.yaml
vendored
@@ -4,7 +4,7 @@ on:
|
||||
- pull_request
|
||||
env:
|
||||
golang-version: '1.15'
|
||||
kind-version: 'v0.11.0'
|
||||
kind-version: 'v0.11.1'
|
||||
jobs:
|
||||
generate:
|
||||
runs-on: ${{ matrix.os }}
|
||||
@@ -52,8 +52,8 @@ jobs:
|
||||
strategy:
|
||||
matrix:
|
||||
kind-image:
|
||||
- 'kindest/node:v1.20.0'
|
||||
- 'kindest/node:v1.21.1'
|
||||
- 'kindest/node:v1.22.0'
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
with:
|
||||
|
||||
68
.github/workflows/versions.yaml
vendored
Normal file
68
.github/workflows/versions.yaml
vendored
Normal file
@@ -0,0 +1,68 @@
|
||||
name: Upgrade to latest versions
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
schedule:
|
||||
- cron: '37 7 * * 1'
|
||||
jobs:
|
||||
versions:
|
||||
runs-on: ubuntu-latest
|
||||
strategy:
|
||||
matrix:
|
||||
branch:
|
||||
- 'release-0.5'
|
||||
- 'release-0.6'
|
||||
- 'release-0.7'
|
||||
- 'release-0.8'
|
||||
- 'main'
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
with:
|
||||
ref: ${{ matrix.branch }}
|
||||
- uses: actions/setup-go@v2
|
||||
with:
|
||||
go-version: 1.16
|
||||
- name: Upgrade versions
|
||||
run: |
|
||||
export GITHUB_TOKEN=${{ secrets.GITHUB_TOKEN }}
|
||||
# Write to temporary file to make update atomic
|
||||
scripts/generate-versions.sh > /tmp/versions.json
|
||||
mv /tmp/versions.json jsonnet/kube-prometheus/versions.json
|
||||
if: matrix.branch == 'main'
|
||||
- name: Update jsonnet dependencies
|
||||
run: |
|
||||
make update
|
||||
make generate
|
||||
|
||||
# Reset jsonnetfile.lock.json if no dependencies were updated
|
||||
changedFiles=$(git diff --name-only | grep -v 'jsonnetfile.lock.json' | wc -l)
|
||||
if [[ "$changedFiles" -eq 0 ]]; then
|
||||
git checkout -- jsonnetfile.lock.json;
|
||||
fi
|
||||
- name: Create Pull Request
|
||||
uses: peter-evans/create-pull-request@v3
|
||||
with:
|
||||
commit-message: "[bot] [${{ matrix.branch }}] Automated version update"
|
||||
title: "[bot] [${{ matrix.branch }}] Automated version update"
|
||||
body: |
|
||||
## Description
|
||||
|
||||
This is an automated version and jsonnet dependencies update performed from CI.
|
||||
|
||||
Configuration of the workflow is located in `.github/workflows/versions.yaml`
|
||||
|
||||
## Type of change
|
||||
|
||||
- [x] `NONE` (if none of the other choices apply. Example, tooling, build system, CI, docs, etc.)
|
||||
|
||||
## Changelog entry
|
||||
|
||||
```release-note
|
||||
|
||||
```
|
||||
team-reviewers: kube-prometheus-reviewers
|
||||
branch: automated-updates-${{ matrix.branch }}
|
||||
delete-branch: true
|
||||
# GITHUB_TOKEN cannot be used as it won't trigger CI in a created PR
|
||||
# More in https://github.com/peter-evans/create-pull-request/issues/155
|
||||
token: ${{ secrets.PROM_OP_BOT_PAT }}
|
||||
2
.gitignore
vendored
2
.gitignore
vendored
@@ -4,3 +4,5 @@ vendor/
|
||||
./auth
|
||||
.swp
|
||||
crdschemas/
|
||||
|
||||
.gitpod/_output/
|
||||
23
.gitpod.yml
23
.gitpod.yml
@@ -1,4 +1,5 @@
|
||||
|
||||
image: gitpod/workspace-full
|
||||
checkoutLocation: gitpod-k3s
|
||||
tasks:
|
||||
- init: |
|
||||
make --always-make
|
||||
@@ -21,6 +22,26 @@ tasks:
|
||||
fi
|
||||
EOF
|
||||
chmod +x ${PWD}/.git/hooks/pre-commit
|
||||
- name: run kube-prometheus
|
||||
command: |
|
||||
.gitpod/prepare-k3s.sh
|
||||
.gitpod/deploy-kube-prometheus.sh
|
||||
- name: kernel dev environment
|
||||
init: |
|
||||
sudo apt update -y
|
||||
sudo apt install qemu qemu-system-x86 linux-image-$(uname -r) libguestfs-tools sshpass netcat -y
|
||||
sudo curl -o /usr/bin/kubectl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
|
||||
sudo chmod +x /usr/bin/kubectl
|
||||
.gitpod/prepare-rootfs.sh
|
||||
command: |
|
||||
.gitpod/qemu.sh
|
||||
ports:
|
||||
- port: 3000
|
||||
onOpen: open-browser
|
||||
- port: 9090
|
||||
onOpen: open-browser
|
||||
- port: 9093
|
||||
onOpen: open-browser
|
||||
vscode:
|
||||
extensions:
|
||||
- heptio.jsonnet@0.1.0:woEDU5N62LRdgdz0g/I6sQ==
|
||||
16
.gitpod/deploy-kube-prometheus.sh
Executable file
16
.gitpod/deploy-kube-prometheus.sh
Executable file
@@ -0,0 +1,16 @@
|
||||
kubectl apply -f manifests/setup
|
||||
|
||||
# Safety wait for CRDs to be working
|
||||
sleep 30
|
||||
|
||||
kubectl apply -f manifests/
|
||||
|
||||
kubectl rollout status -n monitoring daemonset node-exporter
|
||||
kubectl rollout status -n monitoring statefulset alertmanager-main
|
||||
kubectl rollout status -n monitoring statefulset prometheus-k8s
|
||||
kubectl rollout status -n monitoring deployment grafana
|
||||
kubectl rollout status -n monitoring deployment kube-state-metrics
|
||||
|
||||
kubectl port-forward -n monitoring svc/grafana 3000 > /dev/null 2>&1 &
|
||||
kubectl port-forward -n monitoring svc/alertmanager-main 9093 > /dev/null 2>&1 &
|
||||
kubectl port-forward -n monitoring svc/prometheus-k8s 9090 > /dev/null 2>&1 &
|
||||
49
.gitpod/prepare-k3s.sh
Executable file
49
.gitpod/prepare-k3s.sh
Executable file
@@ -0,0 +1,49 @@
|
||||
#!/bin/bash
|
||||
|
||||
script_dirname="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
|
||||
rootfslock="${script_dirname}/_output/rootfs/rootfs-ready.lock"
|
||||
k3sreadylock="${script_dirname}/_output/rootfs/k3s-ready.lock"
|
||||
|
||||
if test -f "${k3sreadylock}"; then
|
||||
exit 0
|
||||
fi
|
||||
|
||||
cd $script_dirname
|
||||
|
||||
function waitssh() {
|
||||
while ! nc -z 127.0.0.1 2222; do
|
||||
sleep 0.1
|
||||
done
|
||||
./ssh.sh "whoami" &>/dev/null
|
||||
if [ $? -ne 0 ]; then
|
||||
sleep 1
|
||||
waitssh
|
||||
fi
|
||||
}
|
||||
|
||||
function waitrootfs() {
|
||||
while ! test -f "${rootfslock}"; do
|
||||
sleep 0.1
|
||||
done
|
||||
}
|
||||
|
||||
echo "🔥 Installing everything, this will be done only one time per workspace."
|
||||
|
||||
echo "Waiting for the rootfs to become available, it can take a while, open the terminal #2 for progress"
|
||||
waitrootfs
|
||||
echo "✅ rootfs available"
|
||||
|
||||
echo "Waiting for the ssh server to become available, it can take a while, after this k3s is getting installed"
|
||||
waitssh
|
||||
echo "✅ ssh server available"
|
||||
|
||||
./ssh.sh "curl -sfL https://get.k3s.io | sh -"
|
||||
|
||||
mkdir -p ~/.kube
|
||||
./scp.sh root@127.0.0.1:/etc/rancher/k3s/k3s.yaml ~/.kube/config
|
||||
|
||||
echo "✅ k3s server is ready"
|
||||
touch "${k3sreadylock}"
|
||||
|
||||
# safety wait for cluster availability
|
||||
sleep 30s
|
||||
48
.gitpod/prepare-rootfs.sh
Executable file
48
.gitpod/prepare-rootfs.sh
Executable file
@@ -0,0 +1,48 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
img_url="https://cloud-images.ubuntu.com/hirsute/current/hirsute-server-cloudimg-amd64.tar.gz"
|
||||
|
||||
script_dirname="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
|
||||
outdir="${script_dirname}/_output/rootfs"
|
||||
|
||||
rm -Rf $outdir
|
||||
mkdir -p $outdir
|
||||
|
||||
curl -L -o "${outdir}/rootfs.tar.gz" $img_url
|
||||
|
||||
cd $outdir
|
||||
|
||||
tar -xvf rootfs.tar.gz
|
||||
|
||||
qemu-img resize hirsute-server-cloudimg-amd64.img +20G
|
||||
|
||||
sudo virt-customize -a hirsute-server-cloudimg-amd64.img --run-command 'resize2fs /dev/sda'
|
||||
|
||||
sudo virt-customize -a hirsute-server-cloudimg-amd64.img --root-password password:root
|
||||
|
||||
netconf="
|
||||
network:
|
||||
version: 2
|
||||
renderer: networkd
|
||||
ethernets:
|
||||
enp0s3:
|
||||
dhcp4: yes
|
||||
"
|
||||
|
||||
# networking setup
|
||||
sudo virt-customize -a hirsute-server-cloudimg-amd64.img --run-command "echo '${netconf}' > /etc/netplan/01-net.yaml"
|
||||
|
||||
# copy kernel modules
|
||||
sudo virt-customize -a hirsute-server-cloudimg-amd64.img --copy-in /lib/modules/$(uname -r):/lib/modules
|
||||
|
||||
# ssh
|
||||
sudo virt-customize -a hirsute-server-cloudimg-amd64.img --run-command 'apt remove openssh-server -y && apt install openssh-server -y'
|
||||
sudo virt-customize -a hirsute-server-cloudimg-amd64.img --run-command "sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config"
|
||||
sudo virt-customize -a hirsute-server-cloudimg-amd64.img --run-command "sed -i 's/PasswordAuthentication no/PasswordAuthentication yes/' /etc/ssh/sshd_config"
|
||||
|
||||
# mark as ready
|
||||
touch rootfs-ready.lock
|
||||
|
||||
echo "k3s development environment is ready"
|
||||
14
.gitpod/qemu.sh
Executable file
14
.gitpod/qemu.sh
Executable file
@@ -0,0 +1,14 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -xeuo pipefail
|
||||
|
||||
script_dirname="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
|
||||
outdir="${script_dirname}/_output"
|
||||
|
||||
sudo qemu-system-x86_64 -kernel "/boot/vmlinuz" \
|
||||
-boot c -m 3073M -hda "${outdir}/rootfs/hirsute-server-cloudimg-amd64.img" \
|
||||
-net user \
|
||||
-smp 8 \
|
||||
-append "root=/dev/sda rw console=ttyS0,115200 acpi=off nokaslr" \
|
||||
-nic user,hostfwd=tcp::2222-:22,hostfwd=tcp::6443-:6443 \
|
||||
-serial mon:stdio -display none
|
||||
3
.gitpod/scp.sh
Executable file
3
.gitpod/scp.sh
Executable file
@@ -0,0 +1,3 @@
|
||||
#!/bin/bash
|
||||
|
||||
sshpass -p 'root' scp -o StrictHostKeychecking=no -P 2222 $@
|
||||
3
.gitpod/ssh.sh
Executable file
3
.gitpod/ssh.sh
Executable file
@@ -0,0 +1,3 @@
|
||||
#!/bin/bash
|
||||
|
||||
sshpass -p 'root' ssh -o StrictHostKeychecking=no -p 2222 root@127.0.0.1 "$@"
|
||||
44
CHANGELOG.md
Normal file
44
CHANGELOG.md
Normal file
@@ -0,0 +1,44 @@
|
||||
## release-0.9 / 2021-08-19
|
||||
|
||||
* [CHANGE] Test against Kubernetes 1.21 and 1,22. #1161 #1337
|
||||
* [CHANGE] Drop cAdvisor metrics without (pod, namespace) label pairs. #1250
|
||||
* [CHANGE] Excluded deprecated `etcd_object_counts` metric. #1337
|
||||
* [FEATURE] Add PodDisruptionBudget to prometheus-adapter. #1136
|
||||
* [FEATURE] Add support for feature flags in Prometheus. #1129
|
||||
* [FEATURE] Add env parameter for grafana component. #1171
|
||||
* [FEATURE] Add gitpod deployment of kube-prometheus on k3s. #1211
|
||||
* [FEATURE] Add resource requests and limits to prometheus-adapter container. #1282
|
||||
* [FEATURE] Add PodMonitor for kube-proxy. #1230
|
||||
* [FEATURE] Turn AWS VPC CNI into a control plane add-on. #1307
|
||||
* [ENHANCEMENT] Export anti-affinity addon. #1114
|
||||
* [ENHANCEMENT] Allow changing configmap-reloader, grafana, and kube-rbac-proxy images in $.values.common.images. #1123 #1124 #1125
|
||||
* [ENHANCEMENT] Add automated version upgrader. #1166
|
||||
* [ENHANCEMENT] Improve all-namespace addon. #1131
|
||||
* [ENHANCEMENT] Add example of running without grafana deployment. #1201
|
||||
* [ENHANCEMENT] Import managed-cluster addon for the EKS platform. #1205
|
||||
* [ENHANCEMENT] Automatically update jsonnet dependencies. #1220
|
||||
* [ENHANCEMENT] Adapt kube-prometheus to changes to ovn veth interfaces names. #1224
|
||||
* [ENHANCEMENT] Add example release-0.3 to release-0.8 migration to docs. #1235
|
||||
* [ENHANCEMENT] Consolidate intervals used in prometheus-adapter CPU queries. #1231
|
||||
* [ENHANCEMENT] Create dashboardDefinitions if rawDashboards or folderDashboards are specified. #1255
|
||||
* [ENHANCEMENT] Relabel instance with node name for CNI DaemonSet on EKS. #1259
|
||||
* [ENHANCEMENT] Update doc on Prometheus rule updates since release 0.8. #1253
|
||||
* [ENHANCEMENT] Point runbooks to https://runbooks.prometheus-operator.dev. #1267
|
||||
* [ENHANCEMENT] Allow setting of kubeRbacProxyMainResources in kube-state-metrics. #1257
|
||||
* [ENHANCEMENT] Automate release branch updates. #1293 #1303
|
||||
* [ENHANCEMENT] Create Thanos Sidecar rules separately from Prometheus ones. #1308
|
||||
* [ENHANCEMENT] Allow using newer jsonnet-bundler dependency resolution when using windows addon. #1310
|
||||
* [ENHANCEMENT] Prometheus ruleSelector defaults to all rules.
|
||||
* [BUGFIX] Fix kube-state-metrics metric denylist regex pattern. #1146
|
||||
* [BUGFIX] Fix missing resource config in blackbox exporter. #1148
|
||||
* [BUGFIX] Fix adding private repository. #1169
|
||||
* [BUGFIX] Fix kops selectors for scheduler, controllerManager and kube-dns. #1164
|
||||
* [BUGFIX] Fix scheduler and controller selectors for Kubespray. #1142
|
||||
* [BUGFIX] Fix label selector for coredns ServiceMonitor. #1200
|
||||
* [BUGFIX] Fix name for blackbox-exporter PodSecurityPolicy. #1213
|
||||
* [BUGFIX] Fix ingress path rules for networking.k8s.io/v1. #1212
|
||||
* [BUGFIX] Disable insecure cypher suites for prometheus-adapter. #1216
|
||||
* [BUGFIX] Fix CNI metrics relabelings on EKS. #1277
|
||||
* [BUGFIX] Fix node-exporter ignore list for OVN. #1283
|
||||
* [BUGFIX] Revert back to awscni_total_ip_addresses-based alert on EKS. #1292
|
||||
* [BUGFIX] Allow passing `thanos: {}` to prometheus configuration. #1325
|
||||
11
Makefile
11
Makefile
@@ -13,6 +13,8 @@ TOOLING=$(EMBEDMD_BIN) $(JB_BIN) $(GOJSONTOYAML_BIN) $(JSONNET_BIN) $(JSONNETLIN
|
||||
|
||||
JSONNETFMT_ARGS=-n 2 --max-blank-lines 2 --string-style s --comment-style s
|
||||
|
||||
KUBE_VERSION?="1.20.0"
|
||||
|
||||
all: generate fmt test
|
||||
|
||||
.PHONY: clean
|
||||
@@ -26,7 +28,7 @@ generate: manifests **.md
|
||||
**.md: $(EMBEDMD_BIN) $(shell find examples) build.sh example.jsonnet
|
||||
$(EMBEDMD_BIN) -w `find . -name "*.md" | grep -v vendor`
|
||||
|
||||
manifests: examples/kustomize.jsonnet $(GOJSONTOYAML_BIN) vendor build.sh
|
||||
manifests: examples/kustomize.jsonnet $(GOJSONTOYAML_BIN) vendor
|
||||
./build.sh $<
|
||||
|
||||
vendor: $(JB_BIN) jsonnetfile.json jsonnetfile.lock.json
|
||||
@@ -34,7 +36,7 @@ vendor: $(JB_BIN) jsonnetfile.json jsonnetfile.lock.json
|
||||
$(JB_BIN) install
|
||||
|
||||
crdschemas: vendor
|
||||
./scripts/generate-schemas.sh
|
||||
./scripts/generate-schemas.sh
|
||||
|
||||
.PHONY: update
|
||||
update: $(JB_BIN)
|
||||
@@ -42,8 +44,7 @@ update: $(JB_BIN)
|
||||
|
||||
.PHONY: validate
|
||||
validate: crdschemas manifests $(KUBECONFORM_BIN)
|
||||
# Follow-up on https://github.com/instrumenta/kubernetes-json-schema/issues/26 if validations start failing
|
||||
$(KUBECONFORM_BIN) -schema-location 'https://kubernetesjsonschema.dev' -schema-location 'crdschemas/{{ .ResourceKind }}.json' -skip CustomResourceDefinition manifests/
|
||||
$(KUBECONFORM_BIN) -kubernetes-version $(KUBE_VERSION) -schema-location 'default' -schema-location 'crdschemas/{{ .ResourceKind }}.json' -skip CustomResourceDefinition manifests/
|
||||
|
||||
.PHONY: fmt
|
||||
fmt: $(JSONNETFMT_BIN)
|
||||
@@ -58,7 +59,7 @@ lint: $(JSONNETLINT_BIN) vendor
|
||||
.PHONY: test
|
||||
test: $(JB_BIN)
|
||||
$(JB_BIN) install
|
||||
./test.sh
|
||||
./scripts/test.sh
|
||||
|
||||
.PHONY: test-e2e
|
||||
test-e2e:
|
||||
|
||||
5
NOTICE
5
NOTICE
@@ -1,5 +0,0 @@
|
||||
CoreOS Project
|
||||
Copyright 2018 CoreOS, Inc
|
||||
|
||||
This product includes software developed at CoreOS, Inc.
|
||||
(http://www.coreos.com/).
|
||||
26
README.md
26
README.md
@@ -70,6 +70,7 @@ If you are migrating from `release-0.7` branch or earlier please read [what chan
|
||||
- [Authentication problem](#authentication-problem)
|
||||
- [Authorization problem](#authorization-problem)
|
||||
- [kube-state-metrics resource usage](#kube-state-metrics-resource-usage)
|
||||
- [Error retrieving kube-proxy metrics](#error-retrieving-kube-proxy-metrics)
|
||||
- [Contributing](#contributing)
|
||||
- [License](#license)
|
||||
|
||||
@@ -105,17 +106,17 @@ $ minikube addons disable metrics-server
|
||||
|
||||
The following versions are supported and work as we test against these versions in their respective branches. But note that other versions might work!
|
||||
|
||||
| kube-prometheus stack | Kubernetes 1.18 | Kubernetes 1.19 | Kubernetes 1.20 | Kubernetes 1.21 |
|
||||
|-----------------------|-----------------|-----------------|-----------------|-----------------|
|
||||
| `release-0.5` | ✔ | ✗ | ✗ | ✗ |
|
||||
| `release-0.6` | ✗ | ✔ | ✗ | ✗ |
|
||||
| `release-0.7` | ✗ | ✔ | ✔ | ✗ |
|
||||
| `release-0.8` | ✗ | ✗ | ✔ | ✔ |
|
||||
| `HEAD` | ✗ | ✗ | ✔ | ✔ |
|
||||
| kube-prometheus stack | Kubernetes 1.18 | Kubernetes 1.19 | Kubernetes 1.20 | Kubernetes 1.21 | Kubernetes 1.22 |
|
||||
|------------------------------------------------------------------------------------------|-----------------|-----------------|-----------------|-----------------|-----------------|
|
||||
| [`release-0.6`](https://github.com/prometheus-operator/kube-prometheus/tree/release-0.6) | ✗ | ✔ | ✗ | ✗ | ✗ |
|
||||
| [`release-0.7`](https://github.com/prometheus-operator/kube-prometheus/tree/release-0.7) | ✗ | ✔ | ✔ | ✗ | ✗ |
|
||||
| [`release-0.8`](https://github.com/prometheus-operator/kube-prometheus/tree/release-0.8) | ✗ | ✗ | ✔ | ✔ | ✗ |
|
||||
| [`release-0.9`](https://github.com/prometheus-operator/kube-prometheus/tree/release-0.9) | ✗ | ✗ | ✗ | ✔ | ✔ |
|
||||
| [`HEAD`](https://github.com/prometheus-operator/kube-prometheus/tree/main) | ✗ | ✗ | ✗ | ✔ | ✔ |
|
||||
|
||||
## Quickstart
|
||||
|
||||
>Note: For versions before Kubernetes v1.20.z refer to the [Kubernetes compatibility matrix](#kubernetes-compatibility-matrix) in order to choose a compatible branch.
|
||||
>Note: For versions before Kubernetes v1.21.z refer to the [Kubernetes compatibility matrix](#kubernetes-compatibility-matrix) in order to choose a compatible branch.
|
||||
|
||||
This project is intended to be used as a library (i.e. the intent is not for you to create your own modified copy of this repository).
|
||||
|
||||
@@ -376,7 +377,7 @@ These mixins are selectable via the `platform` field of kubePrometheus:
|
||||
(import 'kube-prometheus/main.libsonnet') +
|
||||
{
|
||||
values+:: {
|
||||
kubePrometheus+: {
|
||||
common+: {
|
||||
platform: 'example-platform',
|
||||
},
|
||||
},
|
||||
@@ -770,6 +771,13 @@ config. They default to:
|
||||
}
|
||||
```
|
||||
|
||||
### Error retrieving kube-proxy metrics
|
||||
By default, kubeadm will configure kube-proxy to listen on 127.0.0.1 for metrics. Because of this prometheus would not be able to scrape these metrics. This would have to be changed to 0.0.0.0 in one of the following two places:
|
||||
|
||||
1. Before cluster initialization, the config file passed to kubeadm init should have KubeProxyConfiguration manifest with the field metricsBindAddress set to 0.0.0.0:10249
|
||||
2. If the k8s cluster is already up and running, we'll have to modify the configmap kube-proxy in the namespace kube-system and set the metricsBindAddress field. After this kube-proxy daemonset would have to be restarted with
|
||||
`kubectl -n kube-system rollout restart daemonset kube-proxy`
|
||||
|
||||
## Contributing
|
||||
|
||||
All `.yaml` files in the `/manifests` folder are generated via
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
## CoreOS Community Code of Conduct
|
||||
## Community Code of Conduct
|
||||
|
||||
### Contributor Code of Conduct
|
||||
|
||||
@@ -33,29 +33,9 @@ This code of conduct applies both within project spaces and in public spaces
|
||||
when an individual is representing the project or its community.
|
||||
|
||||
Instances of abusive, harassing, or otherwise unacceptable behavior may be
|
||||
reported by contacting a project maintainer, Brandon Philips
|
||||
<brandon.philips@coreos.com>, and/or Rithu John <rithu.john@coreos.com>.
|
||||
reported by contacting a project maintainer listed in
|
||||
https://github.com/prometheus-operator/prometheus-operator/blob/master/MAINTAINERS.md.
|
||||
|
||||
This Code of Conduct is adapted from the Contributor Covenant
|
||||
(http://contributor-covenant.org), version 1.2.0, available at
|
||||
http://contributor-covenant.org/version/1/2/0/
|
||||
|
||||
### CoreOS Events Code of Conduct
|
||||
|
||||
CoreOS events are working conferences intended for professional networking and
|
||||
collaboration in the CoreOS community. Attendees are expected to behave
|
||||
according to professional standards and in accordance with their employer’s
|
||||
policies on appropriate workplace behavior.
|
||||
|
||||
While at CoreOS events or related social networking opportunities, attendees
|
||||
should not engage in discriminatory or offensive speech or actions including
|
||||
but not limited to gender, sexuality, race, age, disability, or religion.
|
||||
Speakers should be especially aware of these concerns.
|
||||
|
||||
CoreOS does not condone any statements by speakers contrary to these standards.
|
||||
CoreOS reserves the right to deny entrance and/or eject from an event (without
|
||||
refund) any individual found to be engaging in discriminatory or offensive
|
||||
speech or actions.
|
||||
|
||||
Please bring any concerns to the immediate attention of designated on-site
|
||||
staff, Brandon Philips <brandon.philips@coreos.com>, and/or Rithu John <rithu.john@coreos.com>.
|
||||
|
||||
@@ -219,72 +219,113 @@ local kp = (import 'kube-prometheus/main.libsonnet') + {
|
||||
```
|
||||
### Changing default rules
|
||||
|
||||
Along with adding additional rules, we give the user the option to filter or adjust the existing rules imported by `kube-prometheus/kube-prometheus.libsonnet`. The recording rules can be found in [kube-prometheus/rules](../jsonnet/kube-prometheus/rules) and [kubernetes-mixin/rules](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/rules) while the alerting rules can be found in [kube-prometheus/alerts](../jsonnet/kube-prometheus/alerts) and [kubernetes-mixin/alerts](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/alerts).
|
||||
Along with adding additional rules, we give the user the option to filter or adjust the existing rules imported by `kube-prometheus/main.libsonnet`. The recording rules can be found in [kube-prometheus/components/mixin/rules](../jsonnet/kube-prometheus/components/mixin/rules) and [kubernetes-mixin/rules](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/rules) while the alerting rules can be found in [kube-prometheus/components/mixin/alerts](../jsonnet/kube-prometheus/components/mixin/alerts) and [kubernetes-mixin/alerts](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/alerts).
|
||||
|
||||
Knowing which rules to change, the user can now use functions from the [Jsonnet standard library](https://jsonnet.org/ref/stdlib.html) to make these changes. Below are examples of both a filter and an adjustment being made to the default rules. These changes can be assigned to a local variable and then added to the `local kp` object as seen in the examples above.
|
||||
|
||||
#### Filter
|
||||
Here the alert `KubeStatefulSetReplicasMismatch` is being filtered out of the group `kubernetes-apps`. The default rule can be seen [here](https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/alerts/apps_alerts.libsonnet).
|
||||
Here the alert `KubeStatefulSetReplicasMismatch` is being filtered out of the group `kubernetes-apps`. The default rule can be seen [here](https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/alerts/apps_alerts.libsonnet). You first need to find out in which component the rule is defined (here it is kuberentesControlPlane).
|
||||
```jsonnet
|
||||
local filter = {
|
||||
prometheusAlerts+:: {
|
||||
groups: std.map(
|
||||
function(group)
|
||||
if group.name == 'kubernetes-apps' then
|
||||
group {
|
||||
rules: std.filter(function(rule)
|
||||
rule.alert != "KubeStatefulSetReplicasMismatch",
|
||||
group.rules
|
||||
)
|
||||
}
|
||||
else
|
||||
group,
|
||||
super.groups
|
||||
),
|
||||
kubernetesControlPlane+: {
|
||||
prometheusRule+: {
|
||||
spec+: {
|
||||
groups: std.map(
|
||||
function(group)
|
||||
if group.name == 'kubernetes-apps' then
|
||||
group {
|
||||
rules: std.filter(
|
||||
function(rule)
|
||||
rule.alert != 'KubeStatefulSetReplicasMismatch',
|
||||
group.rules
|
||||
),
|
||||
}
|
||||
else
|
||||
group,
|
||||
super.groups
|
||||
),
|
||||
},
|
||||
},
|
||||
},
|
||||
};
|
||||
```
|
||||
|
||||
#### Adjustment
|
||||
Here the expression for the alert used above is updated from its previous value. The default rule can be seen [here](https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/alerts/apps_alerts.libsonnet).
|
||||
Here the expression for another alert in the same component is updated from its previous value. The default rule can be seen [here](https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/alerts/apps_alerts.libsonnet).
|
||||
```jsonnet
|
||||
local update = {
|
||||
prometheusAlerts+:: {
|
||||
groups: std.map(
|
||||
function(group)
|
||||
if group.name == 'kubernetes-apps' then
|
||||
group {
|
||||
rules: std.map(
|
||||
function(rule)
|
||||
if rule.alert == "KubeStatefulSetReplicasMismatch" then
|
||||
rule {
|
||||
expr: "kube_statefulset_status_replicas_ready{job=\"kube-state-metrics\",statefulset!=\"vault\"} != kube_statefulset_status_replicas{job=\"kube-state-metrics\",statefulset!=\"vault\"}"
|
||||
}
|
||||
else
|
||||
rule,
|
||||
group.rules
|
||||
)
|
||||
}
|
||||
else
|
||||
group,
|
||||
super.groups
|
||||
),
|
||||
kubernetesControlPlane+: {
|
||||
prometheusRule+: {
|
||||
spec+: {
|
||||
groups: std.map(
|
||||
function(group)
|
||||
if group.name == 'kubernetes-apps' then
|
||||
group {
|
||||
rules: std.map(
|
||||
function(rule)
|
||||
if rule.alert == 'KubePodCrashLooping' then
|
||||
rule {
|
||||
expr: 'rate(kube_pod_container_status_restarts_total{namespace=kube-system,job="kube-state-metrics"}[10m]) * 60 * 5 > 0',
|
||||
}
|
||||
else
|
||||
rule,
|
||||
group.rules
|
||||
),
|
||||
}
|
||||
else
|
||||
group,
|
||||
super.groups
|
||||
),
|
||||
},
|
||||
},
|
||||
},
|
||||
};
|
||||
```
|
||||
|
||||
Using the example from above about adding in pre-rendered rules, the new local variables can be added in as follows:
|
||||
```jsonnet
|
||||
local kp = (import 'kube-prometheus/kube-prometheus.libsonnet') + filter + update + {
|
||||
prometheusAlerts+:: (import 'existingrule.json'),
|
||||
local add = {
|
||||
exampleApplication:: {
|
||||
prometheusRule+: {
|
||||
apiVersion: 'monitoring.coreos.com/v1',
|
||||
kind: 'PrometheusRule',
|
||||
metadata: {
|
||||
name: 'example-application-rules',
|
||||
namespace: $.values.common.namespace,
|
||||
},
|
||||
spec: (import 'existingrule.json'),
|
||||
},
|
||||
},
|
||||
};
|
||||
|
||||
{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
|
||||
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
|
||||
local kp = (import 'kube-prometheus/main.libsonnet') + filter + update + add;
|
||||
local kp = (import 'kube-prometheus/main.libsonnet') +
|
||||
filter +
|
||||
update +
|
||||
add + {
|
||||
values+:: {
|
||||
common+: {
|
||||
namespace: 'monitoring',
|
||||
},
|
||||
},
|
||||
};
|
||||
{ 'setup/0namespace-namespace': kp.kubePrometheus.namespace } +
|
||||
{
|
||||
['setup/prometheus-operator-' + name]: kp.prometheusOperator[name]
|
||||
for name in std.filter((function(name) name != 'serviceMonitor' && name != 'prometheusRule'), std.objectFields(kp.prometheusOperator))
|
||||
} +
|
||||
// serviceMonitor and prometheusRule are separated so that they can be created after the CRDs are ready
|
||||
{ 'prometheus-operator-serviceMonitor': kp.prometheusOperator.serviceMonitor } +
|
||||
{ 'prometheus-operator-prometheusRule': kp.prometheusOperator.prometheusRule } +
|
||||
{ 'kube-prometheus-prometheusRule': kp.kubePrometheus.prometheusRule } +
|
||||
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
|
||||
{ ['blackbox-exporter-' + name]: kp.blackboxExporter[name] for name in std.objectFields(kp.blackboxExporter) } +
|
||||
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
|
||||
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
|
||||
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
|
||||
{ ['prometheus-adapter-' + name]: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) } +
|
||||
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) }
|
||||
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) } +
|
||||
{ ['kubernetes-' + name]: kp.kubernetesControlPlane[name] for name in std.objectFields(kp.kubernetesControlPlane) } +
|
||||
{ ['exampleApplication-' + name]: kp.exampleApplication[name] for name in std.objectFields(kp.exampleApplication) }
|
||||
```
|
||||
## Dashboards
|
||||
|
||||
@@ -479,3 +520,39 @@ values+:: {
|
||||
},
|
||||
} + myMixin.grafanaDashboards
|
||||
```
|
||||
|
||||
Full example of including etcd mixin using method described above:
|
||||
|
||||
[embedmd]:# (../examples/mixin-inclusion.jsonnet)
|
||||
```jsonnet
|
||||
local addMixin = (import 'kube-prometheus/lib/mixin.libsonnet');
|
||||
local etcdMixin = addMixin({
|
||||
name: 'etcd',
|
||||
mixin: (import 'github.com/etcd-io/etcd/contrib/mixin/mixin.libsonnet') + {
|
||||
_config+: {}, // mixin configuration object
|
||||
},
|
||||
});
|
||||
|
||||
local kp = (import 'kube-prometheus/main.libsonnet') +
|
||||
{
|
||||
values+:: {
|
||||
common+: {
|
||||
namespace: 'monitoring',
|
||||
},
|
||||
grafana+: {
|
||||
// Adding new dashboard to grafana. This will modify grafana configMap with dashboards
|
||||
dashboards+: etcdMixin.grafanaDashboards,
|
||||
},
|
||||
},
|
||||
};
|
||||
|
||||
{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
|
||||
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
|
||||
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
|
||||
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
|
||||
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
|
||||
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
|
||||
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) } +
|
||||
// Rendering prometheusRules object. This is an object compatible with prometheus-operator CRD definition for prometheusRule
|
||||
{ 'external-mixins/etcd-mixin-prometheus-rules': etcdMixin.prometheusRules }
|
||||
```
|
||||
|
||||
296
docs/migration-example/my.release-0.3.jsonnet
Normal file
296
docs/migration-example/my.release-0.3.jsonnet
Normal file
@@ -0,0 +1,296 @@
|
||||
// Has the following customisations
|
||||
// Custom alert manager config
|
||||
// Ingresses for the alert manager, prometheus and grafana
|
||||
// Grafana admin user password
|
||||
// Custom prometheus rules
|
||||
// Custom grafana dashboards
|
||||
// Custom prometheus config - Data retention, memory, etc.
|
||||
// Node exporter role and role binding so we can use a PSP for the node exporter
|
||||
|
||||
|
||||
// External variables
|
||||
// See https://jsonnet.org/learning/tutorial.html
|
||||
local cluster_identifier = std.extVar('cluster_identifier');
|
||||
local etcd_ip = std.extVar('etcd_ip');
|
||||
local etcd_tls_ca = std.extVar('etcd_tls_ca');
|
||||
local etcd_tls_cert = std.extVar('etcd_tls_cert');
|
||||
local etcd_tls_key = std.extVar('etcd_tls_key');
|
||||
local grafana_admin_password = std.extVar('grafana_admin_password');
|
||||
local prometheus_data_retention_period = std.extVar('prometheus_data_retention_period');
|
||||
local prometheus_request_memory = std.extVar('prometheus_request_memory');
|
||||
|
||||
|
||||
// Derived variables
|
||||
local alert_manager_host = 'alertmanager.' + cluster_identifier + '.myorg.local';
|
||||
local grafana_host = 'grafana.' + cluster_identifier + '.myorg.local';
|
||||
local prometheus_host = 'prometheus.' + cluster_identifier + '.myorg.local';
|
||||
|
||||
|
||||
// Imports
|
||||
local k = import 'ksonnet/ksonnet.beta.3/k.libsonnet';
|
||||
local ingress = k.extensions.v1beta1.ingress;
|
||||
local ingressRule = ingress.mixin.spec.rulesType;
|
||||
local ingressRuleHttpPath = ingressRule.mixin.http.pathsType;
|
||||
local ingressTls = ingress.mixin.spec.tlsType;
|
||||
local role = k.rbac.v1.role;
|
||||
local roleBinding = k.rbac.v1.roleBinding;
|
||||
local roleRulesType = k.rbac.v1.role.rulesType;
|
||||
|
||||
|
||||
local kp =
|
||||
(import 'kube-prometheus/kube-prometheus.libsonnet') +
|
||||
(import 'kube-prometheus/kube-prometheus-kubeadm.libsonnet') +
|
||||
(import 'kube-prometheus/kube-prometheus-static-etcd.libsonnet') +
|
||||
|
||||
{
|
||||
_config+:: {
|
||||
// Override namespace
|
||||
namespace: 'monitoring',
|
||||
|
||||
|
||||
// Override alert manager config
|
||||
// See https://github.com/coreos/kube-prometheus/tree/master/examples/alertmanager-config-external.jsonnet
|
||||
alertmanager+: {
|
||||
config: importstr 'alertmanager.yaml',
|
||||
},
|
||||
|
||||
// Override etcd config
|
||||
// See https://github.com/coreos/kube-prometheus/blob/master/jsonnet/kube-prometheus/kube-prometheus-static-etcd.libsonnet
|
||||
// See https://github.com/coreos/kube-prometheus/blob/master/examples/etcd-skip-verify.jsonnet
|
||||
etcd+:: {
|
||||
clientCA: etcd_tls_ca,
|
||||
clientCert: etcd_tls_cert,
|
||||
clientKey: etcd_tls_key,
|
||||
ips: [etcd_ip],
|
||||
},
|
||||
|
||||
// Override grafana config
|
||||
// anonymous access
|
||||
// See http://docs.grafana.org/installation/configuration/
|
||||
// See http://docs.grafana.org/auth/overview/#anonymous-authentication
|
||||
// admin_password
|
||||
// See http://docs.grafana.org/installation/configuration/#admin-password
|
||||
grafana+:: {
|
||||
config: {
|
||||
sections: {
|
||||
'auth.anonymous': {
|
||||
enabled: true,
|
||||
},
|
||||
security: {
|
||||
admin_password: grafana_admin_password,
|
||||
},
|
||||
},
|
||||
},
|
||||
|
||||
|
||||
},
|
||||
},
|
||||
|
||||
// Additional grafana dashboards
|
||||
grafanaDashboards+:: {
|
||||
'my-specific.json': (import 'my-grafana-dashboard-definitions.json'),
|
||||
},
|
||||
|
||||
// Alert manager needs an externalUrl
|
||||
alertmanager+:: {
|
||||
alertmanager+: {
|
||||
spec+: {
|
||||
// See https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md
|
||||
// See https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/exposing-prometheus-and-alertmanager.md
|
||||
externalUrl: 'https://' + alert_manager_host,
|
||||
},
|
||||
},
|
||||
},
|
||||
|
||||
|
||||
// Add additional ingresses
|
||||
// See https://github.com/coreos/kube-prometheus/tree/master/examples/ingress.jsonnet
|
||||
ingress+:: {
|
||||
alertmanager:
|
||||
ingress.new() +
|
||||
|
||||
|
||||
ingress.mixin.metadata.withName('alertmanager') +
|
||||
ingress.mixin.metadata.withNamespace($._config.namespace) +
|
||||
ingress.mixin.metadata.withAnnotations({
|
||||
'kubernetes.io/ingress.class': 'nginx-api',
|
||||
}) +
|
||||
|
||||
ingress.mixin.spec.withRules(
|
||||
ingressRule.new() +
|
||||
ingressRule.withHost(alert_manager_host) +
|
||||
ingressRule.mixin.http.withPaths(
|
||||
ingressRuleHttpPath.new() +
|
||||
|
||||
|
||||
ingressRuleHttpPath.mixin.backend.withServiceName('alertmanager-operated') +
|
||||
|
||||
ingressRuleHttpPath.mixin.backend.withServicePort(9093)
|
||||
),
|
||||
) +
|
||||
|
||||
|
||||
// Note we do not need a TLS secretName here as we are going to use the nginx-ingress default secret which is a wildcard
|
||||
// secretName would need to be in the same namespace at this time, see https://github.com/kubernetes/ingress-nginx/issues/2371
|
||||
ingress.mixin.spec.withTls(
|
||||
ingressTls.new() +
|
||||
ingressTls.withHosts(alert_manager_host)
|
||||
),
|
||||
|
||||
|
||||
grafana:
|
||||
ingress.new() +
|
||||
|
||||
|
||||
ingress.mixin.metadata.withName('grafana') +
|
||||
ingress.mixin.metadata.withNamespace($._config.namespace) +
|
||||
ingress.mixin.metadata.withAnnotations({
|
||||
'kubernetes.io/ingress.class': 'nginx-api',
|
||||
}) +
|
||||
|
||||
ingress.mixin.spec.withRules(
|
||||
ingressRule.new() +
|
||||
ingressRule.withHost(grafana_host) +
|
||||
ingressRule.mixin.http.withPaths(
|
||||
ingressRuleHttpPath.new() +
|
||||
|
||||
|
||||
ingressRuleHttpPath.mixin.backend.withServiceName('grafana') +
|
||||
|
||||
ingressRuleHttpPath.mixin.backend.withServicePort(3000)
|
||||
),
|
||||
) +
|
||||
|
||||
|
||||
// Note we do not need a TLS secretName here as we are going to use the nginx-ingress default secret which is a wildcard
|
||||
// secretName would need to be in the same namespace at this time, see https://github.com/kubernetes/ingress-nginx/issues/2371
|
||||
ingress.mixin.spec.withTls(
|
||||
ingressTls.new() +
|
||||
ingressTls.withHosts(grafana_host)
|
||||
),
|
||||
|
||||
|
||||
prometheus:
|
||||
ingress.new() +
|
||||
|
||||
|
||||
ingress.mixin.metadata.withName('prometheus') +
|
||||
ingress.mixin.metadata.withNamespace($._config.namespace) +
|
||||
ingress.mixin.metadata.withAnnotations({
|
||||
'kubernetes.io/ingress.class': 'nginx-api',
|
||||
}) +
|
||||
ingress.mixin.spec.withRules(
|
||||
ingressRule.new() +
|
||||
|
||||
ingressRule.withHost(prometheus_host) +
|
||||
ingressRule.mixin.http.withPaths(
|
||||
ingressRuleHttpPath.new() +
|
||||
|
||||
|
||||
ingressRuleHttpPath.mixin.backend.withServiceName('prometheus-operated') +
|
||||
|
||||
ingressRuleHttpPath.mixin.backend.withServicePort(9090)
|
||||
),
|
||||
) +
|
||||
|
||||
|
||||
// Note we do not need a TLS secretName here as we are going to use the nginx-ingress default secret which is a wildcard
|
||||
// secretName would need to be in the same namespace at this time, see https://github.com/kubernetes/ingress-nginx/issues/2371
|
||||
ingress.mixin.spec.withTls(
|
||||
ingressTls.new() +
|
||||
ingressTls.withHosts(prometheus_host)
|
||||
),
|
||||
},
|
||||
|
||||
|
||||
// Node exporter PSP role and role binding
|
||||
// Add a new top level field for this, the "node-exporter" PSP already exists, so not defining here just referencing
|
||||
// See https://github.com/coreos/prometheus-operator/issues/787
|
||||
nodeExporterPSP: {
|
||||
role:
|
||||
role.new() +
|
||||
|
||||
|
||||
role.mixin.metadata.withName('node-exporter-psp') +
|
||||
role.mixin.metadata.withNamespace($._config.namespace) +
|
||||
role.withRules([
|
||||
roleRulesType.new() +
|
||||
roleRulesType.withApiGroups(['policy']) +
|
||||
roleRulesType.withResources(['podsecuritypolicies']) +
|
||||
roleRulesType.withVerbs(['use']) +
|
||||
roleRulesType.withResourceNames(['node-exporter']),
|
||||
]),
|
||||
|
||||
roleBinding:
|
||||
roleBinding.new() +
|
||||
roleBinding.mixin.roleRef.withApiGroup('rbac.authorization.k8s.io') +
|
||||
|
||||
|
||||
roleBinding.mixin.metadata.withName('node-exporter-psp') +
|
||||
roleBinding.mixin.metadata.withNamespace($._config.namespace) +
|
||||
|
||||
|
||||
roleBinding.mixin.roleRef.withName('node-exporter-psp') +
|
||||
roleBinding.mixin.roleRef.mixinInstance({ kind: 'Role' }) +
|
||||
|
||||
|
||||
roleBinding.withSubjects([{ kind: 'ServiceAccount', name: 'node-exporter' }]),
|
||||
|
||||
|
||||
},
|
||||
|
||||
|
||||
// Prometheus needs some extra custom config
|
||||
prometheus+:: {
|
||||
prometheus+: {
|
||||
spec+: {
|
||||
// See https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#prometheusspec
|
||||
externalLabels: {
|
||||
cluster: cluster_identifier,
|
||||
},
|
||||
// See https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md
|
||||
// See https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/exposing-prometheus-and-alertmanager.md
|
||||
externalUrl: 'https://' + prometheus_host,
|
||||
// Override reuest memory
|
||||
resources: {
|
||||
requests: {
|
||||
memory: prometheus_request_memory,
|
||||
},
|
||||
},
|
||||
// Override data retention period
|
||||
retention: prometheus_data_retention_period,
|
||||
},
|
||||
},
|
||||
},
|
||||
|
||||
|
||||
// Additional prometheus rules
|
||||
// See https://github.com/coreos/kube-prometheus/docs/developing-prometheus-rules-and-grafana-dashboards.md
|
||||
// cat my-prometheus-rules.yaml | gojsontoyaml -yamltojson | jq . > my-prometheus-rules.json
|
||||
prometheusRules+:: {
|
||||
|
||||
|
||||
groups+: import 'my-prometheus-rules.json',
|
||||
|
||||
|
||||
},
|
||||
};
|
||||
|
||||
|
||||
// Render
|
||||
{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
|
||||
|
||||
|
||||
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
|
||||
|
||||
|
||||
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
|
||||
|
||||
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) } +
|
||||
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
|
||||
|
||||
{ [name + '-ingress']: kp.ingress[name] for name in std.objectFields(kp.ingress) } +
|
||||
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
|
||||
{ ['node-exporter-psp-' + name]: kp.nodeExporterPSP[name] for name in std.objectFields(kp.nodeExporterPSP) } +
|
||||
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
|
||||
{ ['prometheus-adapter-' + name]: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) }
|
||||
316
docs/migration-example/my.release-0.8.jsonnet
Normal file
316
docs/migration-example/my.release-0.8.jsonnet
Normal file
@@ -0,0 +1,316 @@
|
||||
// Has the following customisations
|
||||
// Custom alert manager config
|
||||
// Ingresses for the alert manager, prometheus and grafana
|
||||
// Grafana admin user password
|
||||
// Custom prometheus rules
|
||||
// Custom grafana dashboards
|
||||
// Custom prometheus config - Data retention, memory, etc.
|
||||
// Node exporter role and role binding so we can use a PSP for the node exporter
|
||||
|
||||
// for help with expected content, see https://github.com/thaum-xyz/ankhmorpork
|
||||
|
||||
// External variables
|
||||
// See https://jsonnet.org/learning/tutorial.html
|
||||
local cluster_identifier = std.extVar('cluster_identifier');
|
||||
local etcd_ip = std.extVar('etcd_ip');
|
||||
local etcd_tls_ca = std.extVar('etcd_tls_ca');
|
||||
local etcd_tls_cert = std.extVar('etcd_tls_cert');
|
||||
local etcd_tls_key = std.extVar('etcd_tls_key');
|
||||
local grafana_admin_password = std.extVar('grafana_admin_password');
|
||||
local prometheus_data_retention_period = std.extVar('prometheus_data_retention_period');
|
||||
local prometheus_request_memory = std.extVar('prometheus_request_memory');
|
||||
|
||||
|
||||
// Derived variables
|
||||
local alert_manager_host = 'alertmanager.' + cluster_identifier + '.myorg.local';
|
||||
local grafana_host = 'grafana.' + cluster_identifier + '.myorg.local';
|
||||
local prometheus_host = 'prometheus.' + cluster_identifier + '.myorg.local';
|
||||
|
||||
|
||||
// ksonnet no longer required
|
||||
|
||||
|
||||
local kp =
|
||||
(import 'kube-prometheus/main.libsonnet') +
|
||||
// kubeadm now achieved by setting platform value - see 9 lines below
|
||||
(import 'kube-prometheus/addons/static-etcd.libsonnet') +
|
||||
(import 'kube-prometheus/addons/podsecuritypolicies.libsonnet') +
|
||||
{
|
||||
values+:: {
|
||||
common+: {
|
||||
namespace: 'monitoring',
|
||||
},
|
||||
|
||||
// Add kubeadm platform-specific items,
|
||||
// including kube-contoller-manager and kube-scheduler discovery
|
||||
kubePrometheus+: {
|
||||
platform: 'kubeadm',
|
||||
},
|
||||
|
||||
// Override alert manager config
|
||||
// See https://github.com/prometheus-operator/kube-prometheus/blob/main/examples/alertmanager-config-external.jsonnet
|
||||
alertmanager+: {
|
||||
config: importstr 'alertmanager.yaml',
|
||||
},
|
||||
|
||||
// Override etcd config
|
||||
// See https://github.com/prometheus-operator/kube-prometheus/blob/main/jsonnet/kube-prometheus/addons/static-etcd.libsonnet
|
||||
// See https://github.com/prometheus-operator/kube-prometheus/blob/main/examples/etcd-skip-verify.jsonnet
|
||||
etcd+:: {
|
||||
clientCA: etcd_tls_ca,
|
||||
clientCert: etcd_tls_cert,
|
||||
clientKey: etcd_tls_key,
|
||||
ips: [etcd_ip],
|
||||
},
|
||||
|
||||
// Override grafana config
|
||||
// anonymous access
|
||||
// See http://docs.grafana.org/installation/configuration/
|
||||
// See http://docs.grafana.org/auth/overview/#anonymous-authentication
|
||||
// admin_password
|
||||
// See http://docs.grafana.org/installation/configuration/#admin-password
|
||||
grafana+:: {
|
||||
config: {
|
||||
sections: {
|
||||
'auth.anonymous': {
|
||||
enabled: true,
|
||||
},
|
||||
security: {
|
||||
admin_password: grafana_admin_password,
|
||||
},
|
||||
},
|
||||
},
|
||||
// Additional grafana dashboards
|
||||
dashboards+:: {
|
||||
'my-specific.json': (import 'my-grafana-dashboard-definitions.json'),
|
||||
},
|
||||
},
|
||||
},
|
||||
|
||||
|
||||
// Alert manager needs an externalUrl
|
||||
alertmanager+:: {
|
||||
alertmanager+: {
|
||||
spec+: {
|
||||
|
||||
// See https://github.com/prometheus-operator/kube-prometheus/blob/main/docs/exposing-prometheus-alertmanager-grafana-ingress.md
|
||||
externalUrl: 'https://' + alert_manager_host,
|
||||
},
|
||||
},
|
||||
},
|
||||
|
||||
|
||||
// Add additional ingresses
|
||||
// See https://github.com/prometheus-operator/kube-prometheus/blob/main/examples/ingress.jsonnet
|
||||
ingress+:: {
|
||||
alertmanager: {
|
||||
apiVersion: 'networking.k8s.io/v1',
|
||||
kind: 'Ingress',
|
||||
metadata: {
|
||||
name: 'alertmanager',
|
||||
namespace: $.values.common.namespace,
|
||||
annotations: {
|
||||
'kubernetes.io/ingress.class': 'nginx-api',
|
||||
},
|
||||
},
|
||||
spec: {
|
||||
rules: [{
|
||||
host: alert_manager_host,
|
||||
http: {
|
||||
paths: [{
|
||||
path: '/',
|
||||
pathType: 'Prefix',
|
||||
backend: {
|
||||
service: {
|
||||
name: 'alertmanager-operated',
|
||||
port: {
|
||||
number: 9093,
|
||||
},
|
||||
},
|
||||
},
|
||||
}],
|
||||
},
|
||||
}],
|
||||
tls: [{
|
||||
|
||||
hosts: [alert_manager_host],
|
||||
}],
|
||||
},
|
||||
},
|
||||
grafana: {
|
||||
apiVersion: 'networking.k8s.io/v1',
|
||||
kind: 'Ingress',
|
||||
metadata: {
|
||||
name: 'grafana',
|
||||
namespace: $.values.common.namespace,
|
||||
annotations: {
|
||||
'kubernetes.io/ingress.class': 'nginx-api',
|
||||
},
|
||||
},
|
||||
spec: {
|
||||
rules: [{
|
||||
host: grafana_host,
|
||||
http: {
|
||||
paths: [{
|
||||
path: '/',
|
||||
pathType: 'Prefix',
|
||||
backend: {
|
||||
service: {
|
||||
name: 'grafana',
|
||||
port: {
|
||||
number: 3000,
|
||||
},
|
||||
},
|
||||
},
|
||||
}],
|
||||
},
|
||||
}],
|
||||
tls: [{
|
||||
|
||||
hosts: [grafana_host],
|
||||
}],
|
||||
},
|
||||
},
|
||||
prometheus: {
|
||||
apiVersion: 'networking.k8s.io/v1',
|
||||
kind: 'Ingress',
|
||||
metadata: {
|
||||
name: 'prometheus',
|
||||
namespace: $.values.common.namespace,
|
||||
annotations: {
|
||||
'kubernetes.io/ingress.class': 'nginx-api',
|
||||
},
|
||||
},
|
||||
spec: {
|
||||
rules: [{
|
||||
host: prometheus_host,
|
||||
http: {
|
||||
paths: [{
|
||||
path: '/',
|
||||
pathType: 'Prefix',
|
||||
backend: {
|
||||
service: {
|
||||
name: 'prometheus-operated',
|
||||
port: {
|
||||
number: 9090,
|
||||
},
|
||||
},
|
||||
},
|
||||
}],
|
||||
},
|
||||
}],
|
||||
tls: [{
|
||||
|
||||
hosts: [prometheus_host],
|
||||
}],
|
||||
},
|
||||
},
|
||||
},
|
||||
|
||||
|
||||
// Node exporter PSP role and role binding
|
||||
nodeExporter+: {
|
||||
'psp-role'+: {
|
||||
apiVersion: 'rbac.authorization.k8s.io/v1',
|
||||
kind: 'Role',
|
||||
metadata: {
|
||||
name: 'node-exporter-psp',
|
||||
namespace: $.values.common.namespace,
|
||||
},
|
||||
rules: [{
|
||||
apiGroups: ['policy'],
|
||||
resources: ['podsecuritypolicies'],
|
||||
verbs: ['use'],
|
||||
resourceNames: ['node-exporter'],
|
||||
}],
|
||||
},
|
||||
'psp-rolebinding'+: {
|
||||
|
||||
apiVersion: 'rbac.authorization.k8s.io/v1',
|
||||
kind: 'RoleBinding',
|
||||
metadata: {
|
||||
name: 'node-exporter-psp',
|
||||
namespace: $.values.common.namespace,
|
||||
},
|
||||
roleRef: {
|
||||
apiGroup: 'rbac.authorization.k8s.io',
|
||||
name: 'node-exporter-psp',
|
||||
kind: 'Role',
|
||||
},
|
||||
subjects: [{
|
||||
kind: 'ServiceAccount',
|
||||
name: 'node-exporter',
|
||||
}],
|
||||
},
|
||||
},
|
||||
|
||||
// Prometheus needs some extra custom config
|
||||
prometheus+:: {
|
||||
prometheus+: {
|
||||
spec+: {
|
||||
|
||||
externalLabels: {
|
||||
cluster: cluster_identifier,
|
||||
},
|
||||
|
||||
// See https://github.com/prometheus-operator/kube-prometheus/blob/main/docs/exposing-prometheus-alertmanager-grafana-ingress.md
|
||||
externalUrl: 'https://' + prometheus_host,
|
||||
// Override reuest memory
|
||||
resources: {
|
||||
requests: {
|
||||
memory: prometheus_request_memory,
|
||||
},
|
||||
},
|
||||
// Override data retention period
|
||||
retention: prometheus_data_retention_period,
|
||||
},
|
||||
},
|
||||
},
|
||||
|
||||
|
||||
// Additional prometheus rules
|
||||
// See https://github.com/prometheus-operator/kube-prometheus/blob/main/docs/developing-prometheus-rules-and-grafana-dashboards.md#pre-rendered-rules
|
||||
// cat my-prometheus-rules.yaml | gojsontoyaml -yamltojson | jq . > my-prometheus-rules.json
|
||||
prometheusMe: {
|
||||
rules: {
|
||||
apiVersion: 'monitoring.coreos.com/v1',
|
||||
kind: 'PrometheusRule',
|
||||
metadata: {
|
||||
name: 'my-prometheus-rule',
|
||||
namespace: $.values.common.namespace,
|
||||
labels: {
|
||||
'app.kubernetes.io/name': 'kube-prometheus',
|
||||
'app.kubernetes.io/part-of': 'kube-prometheus',
|
||||
prometheus: 'k8s',
|
||||
role: 'alert-rules',
|
||||
},
|
||||
},
|
||||
spec: {
|
||||
groups: import 'my-prometheus-rules.json',
|
||||
},
|
||||
},
|
||||
},
|
||||
};
|
||||
|
||||
|
||||
// Render
|
||||
{ 'setup/0namespace-namespace': kp.kubePrometheus.namespace } +
|
||||
{
|
||||
['setup/prometheus-operator-' + name]: kp.prometheusOperator[name]
|
||||
for name in std.filter((function(name) name != 'serviceMonitor' && name != 'prometheusRule'), std.objectFields(kp.prometheusOperator))
|
||||
} +
|
||||
// serviceMonitor and prometheusRule are separated so that they can be created after the CRDs are ready
|
||||
{ 'prometheus-operator-serviceMonitor': kp.prometheusOperator.serviceMonitor } +
|
||||
{ 'prometheus-operator-prometheusRule': kp.prometheusOperator.prometheusRule } +
|
||||
{ 'kube-prometheus-prometheusRule': kp.kubePrometheus.prometheusRule } +
|
||||
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
|
||||
{ ['blackbox-exporter-' + name]: kp.blackboxExporter[name] for name in std.objectFields(kp.blackboxExporter) } +
|
||||
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) } +
|
||||
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
|
||||
{ ['kubernetes-' + name]: kp.kubernetesControlPlane[name] for name in std.objectFields(kp.kubernetesControlPlane) }
|
||||
{ [name + '-ingress']: kp.ingress[name] for name in std.objectFields(kp.ingress) } +
|
||||
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
|
||||
|
||||
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
|
||||
{ ['prometheus-adapter-' + name]: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) }
|
||||
+ { ['prometheus-my-' + name]: kp.prometheusMe[name] for name in std.objectFields(kp.prometheusMe) }
|
||||
250
docs/migration-example/readme.md
Normal file
250
docs/migration-example/readme.md
Normal file
@@ -0,0 +1,250 @@
|
||||
## Example of conversion of a legacy my.jsonnet file
|
||||
|
||||
An example conversion of a legacy custom jsonnet file to release-0.8
|
||||
format can be seen by viewing and comparing this
|
||||
[release-0.3 jsonnet file](./my.release-0.3.jsonnet) (when the github
|
||||
repo was under `https://github.com/coreos/kube-prometheus...`)
|
||||
and the corresponding [release-0.8 jsonnet file](./my.release-0.8.jsonnet).
|
||||
|
||||
These two files have had necessary blank lines added so that they
|
||||
can be compared side-by-side and line-by-line on screen.
|
||||
|
||||
The conversion covers both the change of stopping using ksonnet after
|
||||
release-0.3 and also the major migration after release-0.7 as described in
|
||||
[migration-guide.md](../migration-guide.md)
|
||||
|
||||
The sample files are intended as an example of format conversion and
|
||||
not necessarily best practice for the files in release-0.3 or release-0.8.
|
||||
|
||||
Below are three sample extracts of the conversion as an indication of the
|
||||
changes required.
|
||||
<table>
|
||||
<tr>
|
||||
<th> release-0.3 </th>
|
||||
<th> release-0.8 </th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
|
||||
```jsonnet
|
||||
local kp =
|
||||
(import 'kube-prometheus/kube-prometheus.libsonnet') +
|
||||
(import 'kube-prometheus/kube-prometheus-kubeadm.libsonnet') +
|
||||
(import 'kube-prometheus/kube-prometheus-static-etcd.libsonnet') +
|
||||
|
||||
{
|
||||
_config+:: {
|
||||
// Override namespace
|
||||
namespace: 'monitoring',
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
```
|
||||
|
||||
</td>
|
||||
<td>
|
||||
|
||||
```jsonnet
|
||||
local kp =
|
||||
(import 'kube-prometheus/main.libsonnet') +
|
||||
// kubeadm now achieved by setting platform value - see 9 lines below
|
||||
(import 'kube-prometheus/addons/static-etcd.libsonnet') +
|
||||
(import 'kube-prometheus/addons/podsecuritypolicies.libsonnet') +
|
||||
{
|
||||
values+:: {
|
||||
common+: {
|
||||
namespace: 'monitoring',
|
||||
},
|
||||
|
||||
// Add kubeadm platform-specific items,
|
||||
// including kube-contoller-manager and kube-scheduler discovery
|
||||
kubePrometheus+: {
|
||||
platform: 'kubeadm',
|
||||
},
|
||||
```
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
<table>
|
||||
<tr>
|
||||
<th> release-0.3 </th>
|
||||
<th> release-0.8 </th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
|
||||
```jsonnet
|
||||
// Add additional ingresses
|
||||
// See https://github.com/coreos/kube-prometheus/...
|
||||
// tree/master/examples/ingress.jsonnet
|
||||
ingress+:: {
|
||||
alertmanager:
|
||||
ingress.new() +
|
||||
|
||||
|
||||
ingress.mixin.metadata.withName('alertmanager') +
|
||||
ingress.mixin.metadata.withNamespace($._config.namespace) +
|
||||
ingress.mixin.metadata.withAnnotations({
|
||||
'kubernetes.io/ingress.class': 'nginx-api',
|
||||
}) +
|
||||
|
||||
ingress.mixin.spec.withRules(
|
||||
ingressRule.new() +
|
||||
ingressRule.withHost(alert_manager_host) +
|
||||
ingressRule.mixin.http.withPaths(
|
||||
ingressRuleHttpPath.new() +
|
||||
|
||||
|
||||
|
||||
|
||||
ingressRuleHttpPath.mixin.backend
|
||||
.withServiceName('alertmanager-operated') +
|
||||
ingressRuleHttpPath.mixin.backend.withServicePort(9093)
|
||||
),
|
||||
) +
|
||||
// Note we do not need a TLS secretName here as we are going to use the
|
||||
// nginx-ingress default secret which is a wildcard
|
||||
// secretName would need to be in the same namespace at this time,
|
||||
// see https://github.com/kubernetes/ingress-nginx/issues/2371
|
||||
ingress.mixin.spec.withTls(
|
||||
ingressTls.new() +
|
||||
ingressTls.withHosts(alert_manager_host)
|
||||
),
|
||||
|
||||
|
||||
```
|
||||
|
||||
</td>
|
||||
<td>
|
||||
|
||||
```jsonnet
|
||||
// Add additional ingresses
|
||||
// See https://github.com/prometheus-operator/kube-prometheus/...
|
||||
// blob/main/examples/ingress.jsonnet
|
||||
ingress+:: {
|
||||
alertmanager: {
|
||||
apiVersion: 'networking.k8s.io/v1',
|
||||
kind: 'Ingress',
|
||||
metadata: {
|
||||
name: 'alertmanager',
|
||||
namespace: $.values.common.namespace,
|
||||
annotations: {
|
||||
'kubernetes.io/ingress.class': 'nginx-api',
|
||||
},
|
||||
},
|
||||
spec: {
|
||||
rules: [{
|
||||
host: alert_manager_host,
|
||||
http: {
|
||||
paths: [{
|
||||
path: '/',
|
||||
pathType: 'Prefix',
|
||||
backend: {
|
||||
service: {
|
||||
name: 'alertmanager-operated',
|
||||
port: {
|
||||
number: 9093,
|
||||
},
|
||||
},
|
||||
},
|
||||
}],
|
||||
},
|
||||
}],
|
||||
tls: [{
|
||||
|
||||
hosts: [alert_manager_host],
|
||||
}],
|
||||
},
|
||||
},
|
||||
```
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
<table>
|
||||
<tr>
|
||||
<th> release-0.3 </th>
|
||||
<th> release-0.8 </th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
|
||||
```jsonnet
|
||||
// Additional prometheus rules
|
||||
// See https://github.com/coreos/kube-prometheus/docs/...
|
||||
// developing-prometheus-rules-and-grafana-dashboards.md
|
||||
//
|
||||
// cat my-prometheus-rules.yaml | \
|
||||
// gojsontoyaml -yamltojson | \
|
||||
// jq . > my-prometheus-rules.json
|
||||
prometheusRules+:: {
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
groups+: import 'my-prometheus-rules.json',
|
||||
|
||||
|
||||
},
|
||||
};
|
||||
|
||||
|
||||
|
||||
|
||||
```
|
||||
|
||||
</td>
|
||||
<td>
|
||||
|
||||
```jsonnet
|
||||
// Additional prometheus rules
|
||||
// See https://github.com/prometheus-operator/kube-prometheus/blob/main/...
|
||||
// docs/developing-prometheus-rules-and-grafana-dashboards.md...
|
||||
// #pre-rendered-rules
|
||||
// cat my-prometheus-rules.yaml | \
|
||||
// gojsontoyaml -yamltojson | \
|
||||
// jq . > my-prometheus-rules.json
|
||||
prometheusMe: {
|
||||
rules: {
|
||||
apiVersion: 'monitoring.coreos.com/v1',
|
||||
kind: 'PrometheusRule',
|
||||
metadata: {
|
||||
name: 'my-prometheus-rule',
|
||||
namespace: $.values.common.namespace,
|
||||
labels: {
|
||||
'app.kubernetes.io/name': 'kube-prometheus',
|
||||
'app.kubernetes.io/part-of': 'kube-prometheus',
|
||||
prometheus: 'k8s',
|
||||
role: 'alert-rules',
|
||||
},
|
||||
},
|
||||
spec: {
|
||||
groups: import 'my-prometheus-rules.json',
|
||||
},
|
||||
},
|
||||
},
|
||||
};
|
||||
|
||||
...
|
||||
|
||||
+ { ['prometheus-my-' + name]: kp.prometheusMe[name] for name in std.objectFields(kp.prometheusMe) }
|
||||
```
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
@@ -61,6 +61,10 @@ This results in creating multiple `PrometheusRule` objects instead of having one
|
||||
|
||||
All examples from `examples/` directory were adapted to the new codebase. [Please take a look at them for guideance](https://github.com/prometheus-operator/kube-prometheus/tree/main/examples)
|
||||
|
||||
## Legacy migration
|
||||
|
||||
An example of conversion of a legacy release-0.3 my.jsonnet file to release-0.8 can be found in [migration-example](./migration-example)
|
||||
|
||||
## Advanced usage examples
|
||||
|
||||
For more advanced usage examples you can take a look at those two, open to public, implementations:
|
||||
|
||||
92
examples/changing-default-rules.libsonnet
Normal file
92
examples/changing-default-rules.libsonnet
Normal file
@@ -0,0 +1,92 @@
|
||||
local filter = {
|
||||
kubernetesControlPlane+: {
|
||||
prometheusRule+:: {
|
||||
spec+: {
|
||||
groups: std.map(
|
||||
function(group)
|
||||
if group.name == 'kubernetes-apps' then
|
||||
group {
|
||||
rules: std.filter(
|
||||
function(rule)
|
||||
rule.alert != 'KubeStatefulSetReplicasMismatch',
|
||||
group.rules
|
||||
),
|
||||
}
|
||||
else
|
||||
group,
|
||||
super.groups
|
||||
),
|
||||
},
|
||||
},
|
||||
},
|
||||
};
|
||||
local update = {
|
||||
kubernetesControlPlane+: {
|
||||
prometheusRule+:: {
|
||||
spec+: {
|
||||
groups: std.map(
|
||||
function(group)
|
||||
if group.name == 'kubernetes-apps' then
|
||||
group {
|
||||
rules: std.map(
|
||||
function(rule)
|
||||
if rule.alert == 'KubePodCrashLooping' then
|
||||
rule {
|
||||
expr: 'rate(kube_pod_container_status_restarts_total{namespace=kube-system,job="kube-state-metrics"}[10m]) * 60 * 5 > 0',
|
||||
}
|
||||
else
|
||||
rule,
|
||||
group.rules
|
||||
),
|
||||
}
|
||||
else
|
||||
group,
|
||||
super.groups
|
||||
),
|
||||
},
|
||||
},
|
||||
},
|
||||
};
|
||||
|
||||
local add = {
|
||||
exampleApplication:: {
|
||||
prometheusRule+: {
|
||||
apiVersion: 'monitoring.coreos.com/v1',
|
||||
kind: 'PrometheusRule',
|
||||
metadata: {
|
||||
name: 'example-application-rules',
|
||||
namespace: $.values.common.namespace,
|
||||
},
|
||||
spec: (import 'existingrule.json'),
|
||||
},
|
||||
},
|
||||
};
|
||||
local kp = (import 'kube-prometheus/main.libsonnet') +
|
||||
filter +
|
||||
update +
|
||||
add + {
|
||||
values+:: {
|
||||
common+: {
|
||||
namespace: 'monitoring',
|
||||
},
|
||||
},
|
||||
};
|
||||
|
||||
{ 'setup/0namespace-namespace': kp.kubePrometheus.namespace } +
|
||||
{
|
||||
['setup/prometheus-operator-' + name]: kp.prometheusOperator[name]
|
||||
for name in std.filter((function(name) name != 'serviceMonitor' && name != 'prometheusRule'), std.objectFields(kp.prometheusOperator))
|
||||
} +
|
||||
// serviceMonitor and prometheusRule are separated so that they can be created after the CRDs are ready
|
||||
{ 'prometheus-operator-serviceMonitor': kp.prometheusOperator.serviceMonitor } +
|
||||
{ 'prometheus-operator-prometheusRule': kp.prometheusOperator.prometheusRule } +
|
||||
{ 'kube-prometheus-prometheusRule': kp.kubePrometheus.prometheusRule } +
|
||||
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
|
||||
{ ['blackbox-exporter-' + name]: kp.blackboxExporter[name] for name in std.objectFields(kp.blackboxExporter) } +
|
||||
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
|
||||
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
|
||||
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
|
||||
{ ['prometheus-adapter-' + name]: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) } +
|
||||
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) } +
|
||||
{ ['kubernetes-' + name]: kp.kubernetesControlPlane[name] for name in std.objectFields(kp.kubernetesControlPlane) } +
|
||||
{ ['exampleApplication-' + name]: kp.exampleApplication[name] for name in std.objectFields(kp.exampleApplication) }
|
||||
36
examples/grafana-ldap.jsonnet
Normal file
36
examples/grafana-ldap.jsonnet
Normal file
@@ -0,0 +1,36 @@
|
||||
local kp =
|
||||
(import 'kube-prometheus/main.libsonnet') +
|
||||
{
|
||||
values+:: {
|
||||
common+: {
|
||||
namespace: 'monitoring',
|
||||
},
|
||||
grafana+: {
|
||||
config+: {
|
||||
sections: {
|
||||
'auth.ldap': {
|
||||
enabled: true,
|
||||
config_file: '/etc/grafana/ldap.toml',
|
||||
allow_sign_up: true,
|
||||
},
|
||||
},
|
||||
},
|
||||
ldap: |||
|
||||
[[servers]]
|
||||
host = "127.0.0.1"
|
||||
port = 389
|
||||
use_ssl = false
|
||||
start_tls = false
|
||||
ssl_skip_verify = false
|
||||
|
||||
bind_dn = "cn=admins,dc=example,dc=com"
|
||||
bind_password = 'grafana'
|
||||
|
||||
search_filter = "(cn=%s)"
|
||||
search_base_dns = ["dc=example,dc=com"]
|
||||
|||,
|
||||
},
|
||||
},
|
||||
};
|
||||
|
||||
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) }
|
||||
25
examples/grafana-only-dashboards.jsonnet
Normal file
25
examples/grafana-only-dashboards.jsonnet
Normal file
@@ -0,0 +1,25 @@
|
||||
local kp =
|
||||
(import 'kube-prometheus/main.libsonnet') +
|
||||
{
|
||||
values+:: {
|
||||
common+: {
|
||||
namespace: 'monitoring',
|
||||
},
|
||||
},
|
||||
|
||||
// Disable all grafana-related objects apart from dashboards and datasource
|
||||
grafana: {
|
||||
dashboardSources:: {},
|
||||
deployment:: {},
|
||||
serviceAccount:: {},
|
||||
serviceMonitor:: {},
|
||||
service:: {},
|
||||
},
|
||||
};
|
||||
|
||||
// Manifestation
|
||||
{
|
||||
[component + '-' + resource + '.json']: kp[component][resource]
|
||||
for component in std.objectFields(kp)
|
||||
for resource in std.objectFields(kp[component])
|
||||
}
|
||||
@@ -54,10 +54,14 @@ local kp =
|
||||
host: 'alertmanager.example.com',
|
||||
http: {
|
||||
paths: [{
|
||||
path: '/',
|
||||
pathType: 'Prefix',
|
||||
backend: {
|
||||
service: {
|
||||
name: 'alertmanager-main',
|
||||
port: 'web',
|
||||
port: {
|
||||
name: 'web',
|
||||
},
|
||||
},
|
||||
},
|
||||
}],
|
||||
@@ -71,10 +75,14 @@ local kp =
|
||||
host: 'grafana.example.com',
|
||||
http: {
|
||||
paths: [{
|
||||
path: '/',
|
||||
pathType: 'Prefix',
|
||||
backend: {
|
||||
service: {
|
||||
name: 'grafana',
|
||||
port: 'http',
|
||||
port: {
|
||||
name: 'http',
|
||||
},
|
||||
},
|
||||
},
|
||||
}],
|
||||
@@ -88,10 +96,14 @@ local kp =
|
||||
host: 'prometheus.example.com',
|
||||
http: {
|
||||
paths: [{
|
||||
path: '/',
|
||||
pathType: 'Prefix',
|
||||
backend: {
|
||||
service: {
|
||||
name: 'prometheus-k8s',
|
||||
port: 'web',
|
||||
port: {
|
||||
name: 'web',
|
||||
},
|
||||
},
|
||||
},
|
||||
}],
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
(import 'kube-prometheus/main.libsonnet') +
|
||||
{
|
||||
values+:: {
|
||||
kubePrometheus+: {
|
||||
common+: {
|
||||
platform: 'example-platform',
|
||||
},
|
||||
},
|
||||
|
||||
20
examples/kubeProxy.jsonnet
Normal file
20
examples/kubeProxy.jsonnet
Normal file
@@ -0,0 +1,20 @@
|
||||
local kp = (import 'kube-prometheus/main.libsonnet') + {
|
||||
values+:: {
|
||||
common+: {
|
||||
namespace: 'monitoring',
|
||||
},
|
||||
|
||||
kubernetesControlPlane+: {
|
||||
kubeProxy: true,
|
||||
},
|
||||
},
|
||||
};
|
||||
|
||||
{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
|
||||
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
|
||||
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
|
||||
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
|
||||
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
|
||||
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
|
||||
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) } +
|
||||
{ ['kubernetes-' + name]: kp.kubernetesControlPlane[name] for name in std.objectFields(kp.kubernetesControlPlane) }
|
||||
30
examples/mixin-inclusion.jsonnet
Normal file
30
examples/mixin-inclusion.jsonnet
Normal file
@@ -0,0 +1,30 @@
|
||||
local addMixin = (import 'kube-prometheus/lib/mixin.libsonnet');
|
||||
local etcdMixin = addMixin({
|
||||
name: 'etcd',
|
||||
mixin: (import 'github.com/etcd-io/etcd/contrib/mixin/mixin.libsonnet') + {
|
||||
_config+: {}, // mixin configuration object
|
||||
},
|
||||
});
|
||||
|
||||
local kp = (import 'kube-prometheus/main.libsonnet') +
|
||||
{
|
||||
values+:: {
|
||||
common+: {
|
||||
namespace: 'monitoring',
|
||||
},
|
||||
grafana+: {
|
||||
// Adding new dashboard to grafana. This will modify grafana configMap with dashboards
|
||||
dashboards+: etcdMixin.grafanaDashboards,
|
||||
},
|
||||
},
|
||||
};
|
||||
|
||||
{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
|
||||
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
|
||||
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
|
||||
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
|
||||
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
|
||||
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
|
||||
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) } +
|
||||
// Rendering prometheusRules object. This is an object compatible with prometheus-operator CRD definition for prometheusRule
|
||||
{ 'external-mixins/etcd-mixin-prometheus-rules': etcdMixin.prometheusRules }
|
||||
@@ -1,9 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# exit immediately when a command fails
|
||||
set -e
|
||||
# only exit with zero if all commands of the pipeline exit successfully
|
||||
set -o pipefail
|
||||
# error on unset variables
|
||||
set -u
|
||||
|
||||
kubectl apply -f examples/example-app
|
||||
@@ -1,9 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# exit immediately when a command fails
|
||||
set -e
|
||||
# only exit with zero if all commands of the pipeline exit successfully
|
||||
set -o pipefail
|
||||
# error on unset variables
|
||||
set -u
|
||||
|
||||
kubectl delete -f examples/example-app
|
||||
@@ -1,11 +1,22 @@
|
||||
{
|
||||
prometheus+:: {
|
||||
clusterRole+: {
|
||||
rules+: [{
|
||||
apiGroups: [''],
|
||||
resources: ['services', 'endpoints', 'pods'],
|
||||
verbs: ['get', 'list', 'watch'],
|
||||
}],
|
||||
rules+: [
|
||||
{
|
||||
apiGroups: [''],
|
||||
resources: ['services', 'endpoints', 'pods'],
|
||||
verbs: ['get', 'list', 'watch'],
|
||||
},
|
||||
{
|
||||
apiGroups: ['networking.k8s.io'],
|
||||
resources: ['ingresses'],
|
||||
verbs: ['get', 'list', 'watch'],
|
||||
},
|
||||
],
|
||||
},
|
||||
// There is no need for specific namespaces RBAC as this addon grants
|
||||
// all required permissions for every namespace
|
||||
roleBindingSpecificNamespaces:: null,
|
||||
roleSpecificNamespaces:: null,
|
||||
},
|
||||
}
|
||||
|
||||
@@ -18,7 +18,7 @@
|
||||
},
|
||||
},
|
||||
|
||||
local antiaffinity(labelSelector, namespace, type, topologyKey) = {
|
||||
antiaffinity(labelSelector, namespace, type, topologyKey):: {
|
||||
local podAffinityTerm = {
|
||||
namespaces: [namespace],
|
||||
topologyKey: topologyKey,
|
||||
@@ -44,7 +44,7 @@
|
||||
alertmanager+: {
|
||||
alertmanager+: {
|
||||
spec+:
|
||||
antiaffinity(
|
||||
$.antiaffinity(
|
||||
$.alertmanager._config.selectorLabels,
|
||||
$.values.common.namespace,
|
||||
$.values.alertmanager.podAntiAffinity,
|
||||
@@ -56,7 +56,7 @@
|
||||
prometheus+: {
|
||||
prometheus+: {
|
||||
spec+:
|
||||
antiaffinity(
|
||||
$.antiaffinity(
|
||||
$.prometheus._config.selectorLabels,
|
||||
$.values.common.namespace,
|
||||
$.values.prometheus.podAntiAffinity,
|
||||
@@ -70,7 +70,7 @@
|
||||
spec+: {
|
||||
template+: {
|
||||
spec+:
|
||||
antiaffinity(
|
||||
$.antiaffinity(
|
||||
$.blackboxExporter._config.selectorLabels,
|
||||
$.values.common.namespace,
|
||||
$.values.blackboxExporter.podAntiAffinity,
|
||||
@@ -86,7 +86,7 @@
|
||||
spec+: {
|
||||
template+: {
|
||||
spec+:
|
||||
antiaffinity(
|
||||
$.antiaffinity(
|
||||
$.prometheusAdapter._config.selectorLabels,
|
||||
$.values.common.namespace,
|
||||
$.values.prometheusAdapter.podAntiAffinity,
|
||||
|
||||
110
jsonnet/kube-prometheus/addons/aws-vpc-cni.libsonnet
Normal file
110
jsonnet/kube-prometheus/addons/aws-vpc-cni.libsonnet
Normal file
@@ -0,0 +1,110 @@
|
||||
{
|
||||
values+:: {
|
||||
awsVpcCni: {
|
||||
// `minimumWarmIPs` should be inferior or equal to `WARM_IP_TARGET`.
|
||||
//
|
||||
// References:
|
||||
// https://github.com/aws/amazon-vpc-cni-k8s/blob/v1.9.0/docs/eni-and-ip-target.md
|
||||
// https://github.com/aws/amazon-vpc-cni-k8s/blob/v1.9.0/pkg/ipamd/ipamd.go#L61-L71
|
||||
minimumWarmIPs: 10,
|
||||
minimumWarmIPsTime: '10m',
|
||||
},
|
||||
},
|
||||
kubernetesControlPlane+: {
|
||||
serviceAwsVpcCni: {
|
||||
apiVersion: 'v1',
|
||||
kind: 'Service',
|
||||
metadata: {
|
||||
name: 'aws-node',
|
||||
namespace: 'kube-system',
|
||||
labels: { 'app.kubernetes.io/name': 'aws-node' },
|
||||
},
|
||||
spec: {
|
||||
ports: [
|
||||
{
|
||||
name: 'cni-metrics-port',
|
||||
port: 61678,
|
||||
targetPort: 61678,
|
||||
},
|
||||
],
|
||||
selector: { 'app.kubernetes.io/name': 'aws-node' },
|
||||
clusterIP: 'None',
|
||||
},
|
||||
},
|
||||
|
||||
serviceMonitorAwsVpcCni: {
|
||||
apiVersion: 'monitoring.coreos.com/v1',
|
||||
kind: 'ServiceMonitor',
|
||||
metadata: {
|
||||
name: 'aws-node',
|
||||
namespace: $.values.common.namespace,
|
||||
labels: {
|
||||
'app.kubernetes.io/name': 'aws-node',
|
||||
},
|
||||
},
|
||||
spec: {
|
||||
jobLabel: 'app.kubernetes.io/name',
|
||||
selector: {
|
||||
matchLabels: {
|
||||
'app.kubernetes.io/name': 'aws-node',
|
||||
},
|
||||
},
|
||||
namespaceSelector: {
|
||||
matchNames: [
|
||||
'kube-system',
|
||||
],
|
||||
},
|
||||
endpoints: [
|
||||
{
|
||||
port: 'cni-metrics-port',
|
||||
interval: '30s',
|
||||
path: '/metrics',
|
||||
relabelings: [
|
||||
{
|
||||
action: 'replace',
|
||||
regex: '(.*)',
|
||||
replacement: '$1',
|
||||
sourceLabels: ['__meta_kubernetes_pod_node_name'],
|
||||
targetLabel: 'instance',
|
||||
},
|
||||
],
|
||||
},
|
||||
],
|
||||
},
|
||||
},
|
||||
|
||||
prometheusRuleAwsVpcCni: {
|
||||
apiVersion: 'monitoring.coreos.com/v1',
|
||||
kind: 'PrometheusRule',
|
||||
metadata: {
|
||||
labels: $.prometheus._config.commonLabels + $.prometheus._config.mixin.ruleLabels,
|
||||
name: 'aws-vpc-cni-rules',
|
||||
namespace: $.prometheus._config.namespace,
|
||||
},
|
||||
spec: {
|
||||
groups: [
|
||||
{
|
||||
name: 'aws-vpc-cni.rules',
|
||||
rules: [
|
||||
{
|
||||
expr: 'sum by(instance) (awscni_total_ip_addresses) - sum by(instance) (awscni_assigned_ip_addresses) < %s' % $.values.awsVpcCni.minimumWarmIPs,
|
||||
labels: {
|
||||
severity: 'critical',
|
||||
},
|
||||
annotations: {
|
||||
summary: 'AWS VPC CNI has a low warm IP pool',
|
||||
description: |||
|
||||
Instance {{ $labels.instance }} has only {{ $value }} warm IPs which is lower than set threshold of %s.
|
||||
It could mean the current subnet is out of available IP addresses or the CNI is unable to request them from the EC2 API.
|
||||
||| % $.values.awsVpcCni.minimumWarmIPs,
|
||||
},
|
||||
'for': $.values.awsVpcCni.minimumWarmIPsTime,
|
||||
alert: 'AwsVpcCniWarmIPsLow',
|
||||
},
|
||||
],
|
||||
},
|
||||
],
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
@@ -18,13 +18,15 @@ local imageName(image) =
|
||||
// quay.io/coreos/addon-resizer -> $repository/addon-resizer
|
||||
// grafana/grafana -> grafana $repository/grafana
|
||||
local withImageRepository(repository) = {
|
||||
local oldRepos = super._config.imageRepos,
|
||||
local oldRepos = super.values.common.images,
|
||||
local substituteRepository(image, repository) =
|
||||
if repository == null then image else repository + '/' + imageName(image),
|
||||
values+:: {
|
||||
imageRepos:: {
|
||||
[field]: substituteRepository(oldRepos[field], repository)
|
||||
for field in std.objectFields(oldRepos)
|
||||
common+:: {
|
||||
images:: {
|
||||
[field]: substituteRepository(oldRepos[field], repository)
|
||||
for field in std.objectFields(oldRepos)
|
||||
},
|
||||
},
|
||||
},
|
||||
};
|
||||
|
||||
@@ -32,7 +32,7 @@
|
||||
// Drop all etcd metrics which are deprecated in kubernetes.
|
||||
{
|
||||
sourceLabels: ['__name__'],
|
||||
regex: 'etcd_(helper_cache_hit_count|helper_cache_miss_count|helper_cache_entry_count|request_cache_get_latencies_summary|request_cache_add_latencies_summary|request_latencies_summary)',
|
||||
regex: 'etcd_(helper_cache_hit_count|helper_cache_miss_count|helper_cache_entry_count|object_counts|request_cache_get_latencies_summary|request_cache_add_latencies_summary|request_latencies_summary)',
|
||||
action: 'drop',
|
||||
},
|
||||
// Drop all transformation metrics which are deprecated in kubernetes.
|
||||
|
||||
@@ -117,7 +117,11 @@ local restrictedPodSecurityPolicy = {
|
||||
},
|
||||
}
|
||||
else
|
||||
{};
|
||||
{
|
||||
metadata+: {
|
||||
name: 'blackbox-exporter-psp',
|
||||
},
|
||||
};
|
||||
|
||||
restrictedPodSecurityPolicy + blackboxExporterPspPrivileged,
|
||||
},
|
||||
|
||||
@@ -37,7 +37,7 @@
|
||||
spec+: {
|
||||
local addArgs(c) =
|
||||
if c.name == 'prometheus-operator'
|
||||
then c { args+: ['--config-reloader-cpu=0'] }
|
||||
then c { args+: ['--config-reloader-cpu-limit=0', '--config-reloader-memory-limit=0'] }
|
||||
else c,
|
||||
containers: std.map(addArgs, super.containers),
|
||||
},
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
local windowsdashboards = import 'kubernetes-mixin/dashboards/windows.libsonnet';
|
||||
local windowsrules = import 'kubernetes-mixin/rules/windows.libsonnet';
|
||||
local windowsdashboards = import 'github.com/kubernetes-monitoring/kubernetes-mixin/dashboards/windows.libsonnet';
|
||||
local windowsrules = import 'github.com/kubernetes-monitoring/kubernetes-mixin/rules/windows.libsonnet';
|
||||
|
||||
{
|
||||
values+:: {
|
||||
|
||||
@@ -64,7 +64,7 @@ local defaults = {
|
||||
alertmanagerName: '{{ $labels.namespace }}/{{ $labels.pod}}',
|
||||
alertmanagerClusterLabels: 'namespace,service',
|
||||
alertmanagerSelector: 'job="alertmanager-' + defaults.name + '",namespace="' + defaults.namespace + '"',
|
||||
runbookURLPattern: 'https://github.com/prometheus-operator/kube-prometheus/wiki/%s',
|
||||
runbookURLPattern: 'https://runbooks.prometheus-operator.dev/runbooks/alertmanager/%s',
|
||||
},
|
||||
},
|
||||
};
|
||||
@@ -78,7 +78,7 @@ function(params) {
|
||||
assert std.isObject(am._config.mixin._config),
|
||||
|
||||
mixin:: (import 'github.com/prometheus/alertmanager/doc/alertmanager-mixin/mixin.libsonnet') +
|
||||
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/alerts/add-runbook-links.libsonnet') {
|
||||
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/lib/add-runbook-links.libsonnet') {
|
||||
_config+:: am._config.mixin._config,
|
||||
},
|
||||
|
||||
|
||||
@@ -201,6 +201,7 @@ function(params) {
|
||||
local kubeRbacProxy = krp({
|
||||
name: 'kube-rbac-proxy',
|
||||
upstream: 'http://127.0.0.1:' + bb._config.internalPort + '/',
|
||||
resources: bb._config.resources,
|
||||
secureListenAddress: ':' + bb._config.port,
|
||||
ports: [
|
||||
{ name: 'https', containerPort: bb._config.port },
|
||||
|
||||
@@ -27,7 +27,9 @@ local defaults = {
|
||||
containers: [],
|
||||
datasources: [],
|
||||
config: {},
|
||||
ldap: null,
|
||||
plugins: [],
|
||||
env: [],
|
||||
};
|
||||
|
||||
function(params) {
|
||||
@@ -56,7 +58,9 @@ function(params) {
|
||||
folderDashboards: g._config.folderDashboards,
|
||||
containers: g._config.containers,
|
||||
config+: g._config.config,
|
||||
ldap: g._config.ldap,
|
||||
plugins+: g._config.plugins,
|
||||
env: g._config.env,
|
||||
} + (
|
||||
// Conditionally overwrite default setting.
|
||||
if std.length(g._config.datasources) > 0 then
|
||||
@@ -74,7 +78,9 @@ function(params) {
|
||||
dashboardDatasources: glib.grafana.dashboardDatasources,
|
||||
dashboardSources: glib.grafana.dashboardSources,
|
||||
|
||||
dashboardDefinitions: if std.length(g._config.dashboards) > 0 then {
|
||||
dashboardDefinitions: if std.length(g._config.dashboards) > 0 ||
|
||||
std.length(g._config.rawDashboards) > 0 ||
|
||||
std.length(g._config.folderDashboards) > 0 then {
|
||||
apiVersion: 'v1',
|
||||
kind: 'ConfigMapList',
|
||||
items: glib.grafana.dashboardDefinitions,
|
||||
|
||||
@@ -17,11 +17,12 @@ local defaults = {
|
||||
kubeControllerManagerSelector: 'job="kube-controller-manager"',
|
||||
kubeApiserverSelector: 'job="apiserver"',
|
||||
podLabel: 'pod',
|
||||
runbookURLPattern: 'https://github.com/prometheus-operator/kube-prometheus/wiki/%s',
|
||||
runbookURLPattern: 'https://runbooks.prometheus-operator.dev/runbooks/kubernetes/%s',
|
||||
diskDeviceSelector: 'device=~"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+"',
|
||||
hostNetworkInterfaceSelector: 'device!~"veth.+"',
|
||||
},
|
||||
},
|
||||
kubeProxy: false,
|
||||
};
|
||||
|
||||
function(params) {
|
||||
@@ -126,9 +127,7 @@ function(params) {
|
||||
action: 'drop',
|
||||
regex: '(' + std.join('|',
|
||||
[
|
||||
'container_fs_.*', // add filesystem read/write data (nodes*disks*services*4)
|
||||
'container_spec_.*', // everything related to cgroup specification and thus static data (nodes*services*5)
|
||||
'container_blkio_device_usage_total', // useful for containers, but not for system services (nodes*disks*services*operations*2)
|
||||
'container_file_descriptors', // file descriptors limits and global numbers are exposed via (nodes*services)
|
||||
'container_sockets', // used sockets in cgroup. Usually not important for system services (nodes*services)
|
||||
'container_threads_max', // max number of threads in cgroup. Usually for system services it is not limited (nodes*services)
|
||||
@@ -137,6 +136,14 @@ function(params) {
|
||||
'container_last_seen', // not needed as system services are always running (nodes*services)
|
||||
]) + ');;',
|
||||
},
|
||||
{
|
||||
sourceLabels: ['__name__', 'container'],
|
||||
action: 'drop',
|
||||
regex: '(' + std.join('|',
|
||||
[
|
||||
'container_blkio_device_usage_total',
|
||||
]) + ');.+',
|
||||
},
|
||||
],
|
||||
},
|
||||
{
|
||||
@@ -251,6 +258,45 @@ function(params) {
|
||||
},
|
||||
},
|
||||
|
||||
[if (defaults + params).kubeProxy then 'podMonitorKubeProxy']: {
|
||||
apiVersion: 'monitoring.coreos.com/v1',
|
||||
kind: 'PodMonitor',
|
||||
metadata: {
|
||||
labels: {
|
||||
'k8s-app': 'kube-proxy',
|
||||
},
|
||||
name: 'kube-proxy',
|
||||
namespace: k8s._config.namespace,
|
||||
},
|
||||
spec: {
|
||||
jobLabel: 'k8s-app',
|
||||
namespaceSelector: {
|
||||
matchNames: [
|
||||
'kube-system',
|
||||
],
|
||||
},
|
||||
selector: {
|
||||
matchLabels: {
|
||||
'k8s-app': 'kube-proxy',
|
||||
},
|
||||
},
|
||||
podMetricsEndpoints: [{
|
||||
honorLabels: true,
|
||||
targetPort: 10249,
|
||||
relabelings: [
|
||||
{
|
||||
action: 'replace',
|
||||
regex: '(.*)',
|
||||
replacement: '$1',
|
||||
sourceLabels: ['__meta_kubernetes_pod_node_name'],
|
||||
targetLabel: 'instance',
|
||||
},
|
||||
],
|
||||
}],
|
||||
},
|
||||
},
|
||||
|
||||
|
||||
serviceMonitorCoreDNS: {
|
||||
apiVersion: 'monitoring.coreos.com/v1',
|
||||
kind: 'ServiceMonitor',
|
||||
@@ -262,7 +308,7 @@ function(params) {
|
||||
spec: {
|
||||
jobLabel: 'app.kubernetes.io/name',
|
||||
selector: {
|
||||
matchLabels: { 'app.kubernetes.io/name': 'kube-dns' },
|
||||
matchLabels: { 'k8s-app': 'kube-dns' },
|
||||
},
|
||||
namespaceSelector: {
|
||||
matchNames: ['kube-system'],
|
||||
|
||||
@@ -12,6 +12,12 @@ local defaults = {
|
||||
limits: { cpu: '100m', memory: '250Mi' },
|
||||
},
|
||||
|
||||
kubeRbacProxyMain: {
|
||||
resources+: {
|
||||
limits+: { cpu: '40m' },
|
||||
requests+: { cpu: '20m' },
|
||||
},
|
||||
},
|
||||
scrapeInterval: '30s',
|
||||
scrapeTimeout: '30s',
|
||||
commonLabels:: {
|
||||
@@ -29,7 +35,7 @@ local defaults = {
|
||||
ruleLabels: {},
|
||||
_config: {
|
||||
kubeStateMetricsSelector: 'job="' + defaults.name + '"',
|
||||
runbookURLPattern: 'https://github.com/prometheus-operator/kube-prometheus/wiki/%s',
|
||||
runbookURLPattern: 'https://runbooks.prometheus-operator.dev/runbooks/kube-state-metrics/%s',
|
||||
},
|
||||
},
|
||||
};
|
||||
@@ -49,7 +55,7 @@ function(params) (import 'github.com/kubernetes/kube-state-metrics/jsonnet/kube-
|
||||
podLabels:: ksm._config.selectorLabels,
|
||||
|
||||
mixin:: (import 'github.com/kubernetes/kube-state-metrics/jsonnet/kube-state-metrics-mixin/mixin.libsonnet') +
|
||||
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/alerts/add-runbook-links.libsonnet') {
|
||||
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/lib/add-runbook-links.libsonnet') {
|
||||
_config+:: ksm._config.mixin._config,
|
||||
},
|
||||
|
||||
@@ -85,17 +91,13 @@ function(params) (import 'github.com/kubernetes/kube-state-metrics/jsonnet/kube-
|
||||
},
|
||||
},
|
||||
|
||||
local kubeRbacProxyMain = krp({
|
||||
local kubeRbacProxyMain = krp(ksm._config.kubeRbacProxyMain {
|
||||
name: 'kube-rbac-proxy-main',
|
||||
upstream: 'http://127.0.0.1:8081/',
|
||||
secureListenAddress: ':8443',
|
||||
ports: [
|
||||
{ name: 'https-main', containerPort: 8443 },
|
||||
],
|
||||
resources+: {
|
||||
limits+: { cpu: '40m' },
|
||||
requests+: { cpu: '20m' },
|
||||
},
|
||||
image: ksm._config.kubeRbacProxyImage,
|
||||
}),
|
||||
|
||||
|
||||
@@ -7,7 +7,8 @@
|
||||
{
|
||||
alert: 'NodeNetworkInterfaceFlapping',
|
||||
annotations: {
|
||||
message: 'Network interface "{{ $labels.device }}" changing it\'s up status often on node-exporter {{ $labels.namespace }}/{{ $labels.pod }}',
|
||||
summary: 'Network interface is often changing its status',
|
||||
description: 'Network interface "{{ $labels.device }}" changing its up status often on node-exporter {{ $labels.namespace }}/{{ $labels.pod }}',
|
||||
},
|
||||
expr: |||
|
||||
changes(node_network_up{%(nodeExporterSelector)s,%(hostNetworkInterfaceSelector)s}[2m]) > 2
|
||||
|
||||
@@ -1,157 +0,0 @@
|
||||
# TODO(metalmatze): This file is temporarily saved here for later reference
|
||||
# until we find out how to integrate the tests into our jsonnet stack.
|
||||
|
||||
rule_files:
|
||||
- rules.yaml
|
||||
|
||||
evaluation_interval: 1m
|
||||
|
||||
tests:
|
||||
- interval: 1m
|
||||
input_series:
|
||||
- series: 'alertmanager_cluster_members{job="alertmanager-main",instance="10.10.10.0",namespace="monitoring",pod="alertmanager-main-0",service="alertmanager-main"}'
|
||||
values: '3 3 3 3 3 2 2 2 2 2 2 1 1 1 1 1 1 0 0 0 0 0 0'
|
||||
- series: 'alertmanager_cluster_members{job="alertmanager-main",instance="10.10.10.1",namespace="monitoring",pod="alertmanager-main-1",service="alertmanager-main"}'
|
||||
values: '3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3'
|
||||
- series: 'alertmanager_cluster_members{job="alertmanager-main",instance="10.10.10.2",namespace="monitoring",pod="alertmanager-main-2",service="alertmanager-main"}'
|
||||
values: '3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3'
|
||||
alert_rule_test:
|
||||
- eval_time: 5m
|
||||
alertname: AlertmanagerMembersInconsistent
|
||||
- eval_time: 11m
|
||||
alertname: AlertmanagerMembersInconsistent
|
||||
exp_alerts:
|
||||
- exp_labels:
|
||||
service: 'alertmanager-main'
|
||||
severity: critical
|
||||
job: 'alertmanager-main'
|
||||
instance: 10.10.10.0
|
||||
namespace: monitoring
|
||||
pod: alertmanager-main-0
|
||||
exp_annotations:
|
||||
message: 'Alertmanager has not found all other members of the cluster.'
|
||||
- eval_time: 17m
|
||||
alertname: AlertmanagerMembersInconsistent
|
||||
exp_alerts:
|
||||
- exp_labels:
|
||||
service: 'alertmanager-main'
|
||||
severity: critical
|
||||
job: 'alertmanager-main'
|
||||
instance: 10.10.10.0
|
||||
namespace: monitoring
|
||||
pod: alertmanager-main-0
|
||||
exp_annotations:
|
||||
message: 'Alertmanager has not found all other members of the cluster.'
|
||||
- eval_time: 23m
|
||||
alertname: AlertmanagerMembersInconsistent
|
||||
exp_alerts:
|
||||
- exp_labels:
|
||||
service: 'alertmanager-main'
|
||||
severity: critical
|
||||
job: 'alertmanager-main'
|
||||
instance: 10.10.10.0
|
||||
namespace: monitoring
|
||||
pod: alertmanager-main-0
|
||||
exp_annotations:
|
||||
message: 'Alertmanager has not found all other members of the cluster.'
|
||||
- interval: 1m
|
||||
input_series:
|
||||
- series: 'alertmanager_cluster_members{job="alertmanager-main",instance="10.10.10.0",namespace="monitoring",pod="alertmanager-main-0",service="alertmanager-main"}'
|
||||
values: '3 3 3 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1'
|
||||
- series: 'alertmanager_cluster_members{job="alertmanager-main",instance="10.10.10.1",namespace="monitoring",pod="alertmanager-main-1",service="alertmanager-main"}'
|
||||
values: '3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2'
|
||||
- series: 'alertmanager_cluster_members{job="alertmanager-main",instance="10.10.10.2",namespace="monitoring",pod="alertmanager-main-2",service="alertmanager-main"}'
|
||||
values: '3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2'
|
||||
alert_rule_test:
|
||||
- eval_time: 5m
|
||||
alertname: AlertmanagerMembersInconsistent
|
||||
- eval_time: 11m
|
||||
alertname: AlertmanagerMembersInconsistent
|
||||
exp_alerts:
|
||||
- exp_labels:
|
||||
service: 'alertmanager-main'
|
||||
severity: critical
|
||||
job: 'alertmanager-main'
|
||||
instance: 10.10.10.0
|
||||
namespace: monitoring
|
||||
pod: alertmanager-main-0
|
||||
exp_annotations:
|
||||
message: 'Alertmanager has not found all other members of the cluster.'
|
||||
- exp_labels:
|
||||
service: 'alertmanager-main'
|
||||
severity: critical
|
||||
job: 'alertmanager-main'
|
||||
instance: 10.10.10.1
|
||||
namespace: monitoring
|
||||
pod: alertmanager-main-1
|
||||
exp_annotations:
|
||||
message: 'Alertmanager has not found all other members of the cluster.'
|
||||
- exp_labels:
|
||||
service: 'alertmanager-main'
|
||||
severity: critical
|
||||
job: 'alertmanager-main'
|
||||
instance: 10.10.10.2
|
||||
namespace: monitoring
|
||||
pod: alertmanager-main-2
|
||||
exp_annotations:
|
||||
message: 'Alertmanager has not found all other members of the cluster.'
|
||||
- eval_time: 17m
|
||||
alertname: AlertmanagerMembersInconsistent
|
||||
exp_alerts:
|
||||
- exp_labels:
|
||||
service: 'alertmanager-main'
|
||||
severity: critical
|
||||
job: 'alertmanager-main'
|
||||
instance: 10.10.10.0
|
||||
namespace: monitoring
|
||||
pod: alertmanager-main-0
|
||||
exp_annotations:
|
||||
message: 'Alertmanager has not found all other members of the cluster.'
|
||||
- exp_labels:
|
||||
service: 'alertmanager-main'
|
||||
severity: critical
|
||||
job: 'alertmanager-main'
|
||||
instance: 10.10.10.1
|
||||
namespace: monitoring
|
||||
pod: alertmanager-main-1
|
||||
exp_annotations:
|
||||
message: 'Alertmanager has not found all other members of the cluster.'
|
||||
- exp_labels:
|
||||
service: 'alertmanager-main'
|
||||
severity: critical
|
||||
job: 'alertmanager-main'
|
||||
instance: 10.10.10.2
|
||||
namespace: monitoring
|
||||
pod: alertmanager-main-2
|
||||
exp_annotations:
|
||||
message: 'Alertmanager has not found all other members of the cluster.'
|
||||
- eval_time: 23m
|
||||
alertname: AlertmanagerMembersInconsistent
|
||||
exp_alerts:
|
||||
- exp_labels:
|
||||
service: 'alertmanager-main'
|
||||
severity: critical
|
||||
job: 'alertmanager-main'
|
||||
instance: 10.10.10.0
|
||||
namespace: monitoring
|
||||
pod: alertmanager-main-0
|
||||
exp_annotations:
|
||||
message: 'Alertmanager has not found all other members of the cluster.'
|
||||
- exp_labels:
|
||||
service: 'alertmanager-main'
|
||||
severity: critical
|
||||
job: 'alertmanager-main'
|
||||
instance: 10.10.10.1
|
||||
namespace: monitoring
|
||||
pod: alertmanager-main-1
|
||||
exp_annotations:
|
||||
message: 'Alertmanager has not found all other members of the cluster.'
|
||||
- exp_labels:
|
||||
service: 'alertmanager-main'
|
||||
severity: critical
|
||||
job: 'alertmanager-main'
|
||||
instance: 10.10.10.2
|
||||
namespace: monitoring
|
||||
pod: alertmanager-main-2
|
||||
exp_annotations:
|
||||
message: 'Alertmanager has not found all other members of the cluster.'
|
||||
@@ -11,7 +11,7 @@ local defaults = {
|
||||
_config: {
|
||||
nodeExporterSelector: 'job="node-exporter"',
|
||||
hostNetworkInterfaceSelector: 'device!~"veth.+"',
|
||||
runbookURLPattern: 'https://github.com/prometheus-operator/kube-prometheus/wiki/%s',
|
||||
runbookURLPattern: 'https://runbooks.prometheus-operator.dev/runbooks/general/%s',
|
||||
},
|
||||
},
|
||||
};
|
||||
@@ -23,7 +23,7 @@ function(params) {
|
||||
local alertsandrules = (import './alerts/alerts.libsonnet') + (import './rules/rules.libsonnet'),
|
||||
|
||||
mixin:: alertsandrules +
|
||||
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/alerts/add-runbook-links.libsonnet') {
|
||||
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/lib/add-runbook-links.libsonnet') {
|
||||
_config+:: m._config.mixin._config,
|
||||
},
|
||||
|
||||
|
||||
@@ -30,7 +30,7 @@ local defaults = {
|
||||
nodeExporterSelector: 'job="' + defaults.name + '"',
|
||||
fsSpaceFillingUpCriticalThreshold: 15,
|
||||
diskDeviceSelector: 'device=~"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+"',
|
||||
runbookURLPattern: 'https://github.com/prometheus-operator/kube-prometheus/wiki/%s',
|
||||
runbookURLPattern: 'https://runbooks.prometheus-operator.dev/runbooks/node/%s',
|
||||
},
|
||||
},
|
||||
};
|
||||
@@ -44,7 +44,7 @@ function(params) {
|
||||
assert std.isObject(ne._config.mixin._config),
|
||||
|
||||
mixin:: (import 'github.com/prometheus/node_exporter/docs/node-mixin/mixin.libsonnet') +
|
||||
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/alerts/add-runbook-links.libsonnet') {
|
||||
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/lib/add-runbook-links.libsonnet') {
|
||||
_config+:: ne._config.mixin._config,
|
||||
},
|
||||
|
||||
|
||||
@@ -22,13 +22,40 @@ local defaults = {
|
||||
for labelName in std.objectFields(defaults.commonLabels)
|
||||
if !std.setMember(labelName, ['app.kubernetes.io/version'])
|
||||
},
|
||||
// Default range intervals are equal to 4 times the default scrape interval.
|
||||
// This is done in order to follow Prometheus rule of thumb with irate().
|
||||
rangeIntervals: {
|
||||
kubelet: '4m',
|
||||
nodeExporter: '4m',
|
||||
windowsExporter: '4m',
|
||||
},
|
||||
|
||||
prometheusURL: error 'must provide prometheusURL',
|
||||
config: {
|
||||
resourceRules: {
|
||||
cpu: {
|
||||
containerQuery: 'sum(irate(container_cpu_usage_seconds_total{<<.LabelMatchers>>,container!="",pod!=""}[5m])) by (<<.GroupBy>>)',
|
||||
nodeQuery: 'sum(1 - irate(node_cpu_seconds_total{mode="idle"}[5m]) * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:{<<.LabelMatchers>>}) by (<<.GroupBy>>) or sum (1- irate(windows_cpu_time_total{mode="idle", job="windows-exporter",<<.LabelMatchers>>}[5m])) by (<<.GroupBy>>)',
|
||||
containerQuery: |||
|
||||
sum by (<<.GroupBy>>) (
|
||||
irate (
|
||||
container_cpu_usage_seconds_total{<<.LabelMatchers>>,container!="",pod!=""}[%(kubelet)s]
|
||||
)
|
||||
)
|
||||
||| % $.rangeIntervals,
|
||||
nodeQuery: |||
|
||||
sum by (<<.GroupBy>>) (
|
||||
1 - irate(
|
||||
node_cpu_seconds_total{mode="idle"}[%(nodeExporter)s]
|
||||
)
|
||||
* on(namespace, pod) group_left(node) (
|
||||
node_namespace_pod:kube_pod_info:{<<.LabelMatchers>>}
|
||||
)
|
||||
)
|
||||
or sum by (<<.GroupBy>>) (
|
||||
1 - irate(
|
||||
windows_cpu_time_total{mode="idle", job="windows-exporter",<<.LabelMatchers>>}[%(windowsExporter)s]
|
||||
)
|
||||
)
|
||||
||| % $.rangeIntervals,
|
||||
resources: {
|
||||
overrides: {
|
||||
node: { resource: 'node' },
|
||||
@@ -39,8 +66,23 @@ local defaults = {
|
||||
containerLabel: 'container',
|
||||
},
|
||||
memory: {
|
||||
containerQuery: 'sum(container_memory_working_set_bytes{<<.LabelMatchers>>,container!="",pod!=""}) by (<<.GroupBy>>)',
|
||||
nodeQuery: 'sum(node_memory_MemTotal_bytes{job="node-exporter",<<.LabelMatchers>>} - node_memory_MemAvailable_bytes{job="node-exporter",<<.LabelMatchers>>}) by (<<.GroupBy>>) or sum(windows_cs_physical_memory_bytes{job="windows-exporter",<<.LabelMatchers>>} - windows_memory_available_bytes{job="windows-exporter",<<.LabelMatchers>>}) by (<<.GroupBy>>)',
|
||||
containerQuery: |||
|
||||
sum by (<<.GroupBy>>) (
|
||||
container_memory_working_set_bytes{<<.LabelMatchers>>,container!="",pod!=""}
|
||||
)
|
||||
|||,
|
||||
nodeQuery: |||
|
||||
sum by (<<.GroupBy>>) (
|
||||
node_memory_MemTotal_bytes{job="node-exporter",<<.LabelMatchers>>}
|
||||
-
|
||||
node_memory_MemAvailable_bytes{job="node-exporter",<<.LabelMatchers>>}
|
||||
)
|
||||
or sum by (<<.GroupBy>>) (
|
||||
windows_cs_physical_memory_bytes{job="windows-exporter",<<.LabelMatchers>>}
|
||||
-
|
||||
windows_memory_available_bytes{job="windows-exporter",<<.LabelMatchers>>}
|
||||
)
|
||||
|||,
|
||||
resources: {
|
||||
overrides: {
|
||||
instance: { resource: 'node' },
|
||||
@@ -53,6 +95,23 @@ local defaults = {
|
||||
window: '5m',
|
||||
},
|
||||
},
|
||||
tlsCipherSuites: [
|
||||
'TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305',
|
||||
'TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305',
|
||||
'TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256',
|
||||
'TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384',
|
||||
'TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256',
|
||||
'TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384',
|
||||
'TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA',
|
||||
'TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256',
|
||||
'TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA',
|
||||
'TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA',
|
||||
'TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA',
|
||||
'TLS_RSA_WITH_AES_128_GCM_SHA256',
|
||||
'TLS_RSA_WITH_AES_256_GCM_SHA384',
|
||||
'TLS_RSA_WITH_AES_128_CBC_SHA',
|
||||
'TLS_RSA_WITH_AES_256_CBC_SHA',
|
||||
],
|
||||
};
|
||||
|
||||
function(params) {
|
||||
@@ -145,7 +204,9 @@ function(params) {
|
||||
'--metrics-relist-interval=1m',
|
||||
'--prometheus-url=' + pa._config.prometheusURL,
|
||||
'--secure-port=6443',
|
||||
'--tls-cipher-suites=' + std.join(',', pa._config.tlsCipherSuites),
|
||||
],
|
||||
resources: pa._config.resources,
|
||||
ports: [{ containerPort: 6443 }],
|
||||
volumeMounts: [
|
||||
{ name: 'tmpfs', mountPath: '/tmp', readOnly: false },
|
||||
|
||||
@@ -31,7 +31,7 @@ local defaults = {
|
||||
},
|
||||
_config: {
|
||||
prometheusOperatorSelector: 'job="prometheus-operator",namespace="' + defaults.namespace + '"',
|
||||
runbookURLPattern: 'https://github.com/prometheus-operator/kube-prometheus/wiki/%s',
|
||||
runbookURLPattern: 'https://runbooks.prometheus-operator.dev/runbooks/prometheus-operator/%s',
|
||||
},
|
||||
},
|
||||
};
|
||||
@@ -46,7 +46,7 @@ function(params)
|
||||
// declare variable as a field to allow overriding options and to have unified API across all components
|
||||
_config:: config,
|
||||
mixin:: (import 'github.com/prometheus-operator/prometheus-operator/jsonnet/mixin/mixin.libsonnet') +
|
||||
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/alerts/add-runbook-links.libsonnet') {
|
||||
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/lib/add-runbook-links.libsonnet') {
|
||||
_config+:: po._config.mixin._config,
|
||||
},
|
||||
|
||||
|
||||
@@ -12,6 +12,7 @@ local defaults = {
|
||||
namespaces: ['default', 'kube-system', defaults.namespace],
|
||||
replicas: 2,
|
||||
externalLabels: {},
|
||||
enableFeatures: [],
|
||||
commonLabels:: {
|
||||
'app.kubernetes.io/name': 'prometheus',
|
||||
'app.kubernetes.io/version': defaults.version,
|
||||
@@ -23,22 +24,17 @@ local defaults = {
|
||||
for labelName in std.objectFields(defaults.commonLabels)
|
||||
if !std.setMember(labelName, ['app.kubernetes.io/version'])
|
||||
} + { prometheus: defaults.name },
|
||||
ruleSelector: {
|
||||
matchLabels: defaults.mixin.ruleLabels,
|
||||
},
|
||||
ruleSelector: {},
|
||||
mixin: {
|
||||
ruleLabels: {
|
||||
role: 'alert-rules',
|
||||
prometheus: defaults.name,
|
||||
},
|
||||
ruleLabels: {},
|
||||
_config: {
|
||||
prometheusSelector: 'job="prometheus-' + defaults.name + '",namespace="' + defaults.namespace + '"',
|
||||
prometheusName: '{{$labels.namespace}}/{{$labels.pod}}',
|
||||
thanosSelector: 'job="thanos-sidecar"',
|
||||
runbookURLPattern: 'https://github.com/prometheus-operator/kube-prometheus/wiki/%s',
|
||||
runbookURLPattern: 'https://runbooks.prometheus-operator.dev/runbooks/prometheus/%s',
|
||||
},
|
||||
},
|
||||
thanos: {},
|
||||
thanos: null,
|
||||
};
|
||||
|
||||
|
||||
@@ -49,18 +45,22 @@ function(params) {
|
||||
assert std.isObject(p._config.resources),
|
||||
assert std.isObject(p._config.mixin._config),
|
||||
|
||||
mixin:: (import 'github.com/prometheus/prometheus/documentation/prometheus-mixin/mixin.libsonnet') +
|
||||
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/alerts/add-runbook-links.libsonnet') + (
|
||||
if p._config.thanos != {} then
|
||||
(import 'github.com/thanos-io/thanos/mixin/alerts/sidecar.libsonnet') + {
|
||||
sidecar: {
|
||||
selector: p._config.mixin._config.thanosSelector,
|
||||
},
|
||||
}
|
||||
else {}
|
||||
) {
|
||||
_config+:: p._config.mixin._config,
|
||||
},
|
||||
mixin::
|
||||
(import 'github.com/prometheus/prometheus/documentation/prometheus-mixin/mixin.libsonnet') +
|
||||
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/lib/add-runbook-links.libsonnet') + {
|
||||
_config+:: p._config.mixin._config,
|
||||
},
|
||||
|
||||
mixinThanos::
|
||||
(import 'github.com/thanos-io/thanos/mixin/alerts/sidecar.libsonnet') +
|
||||
(import 'github.com/kubernetes-monitoring/kubernetes-mixin/lib/add-runbook-links.libsonnet') + {
|
||||
_config+:: p._config.mixin._config,
|
||||
targetGroups: {},
|
||||
sidecar: {
|
||||
selector: p._config.mixin._config.thanosSelector,
|
||||
dimensions: std.join(', ', ['job', 'instance']),
|
||||
},
|
||||
},
|
||||
|
||||
prometheusRule: {
|
||||
apiVersion: 'monitoring.coreos.com/v1',
|
||||
@@ -100,7 +100,7 @@ function(params) {
|
||||
{ name: 'web', targetPort: 'web', port: 9090 },
|
||||
] +
|
||||
(
|
||||
if p._config.thanos != {} then
|
||||
if p._config.thanos != null then
|
||||
[{ name: 'grpc', port: 10901, targetPort: 10901 }]
|
||||
else []
|
||||
),
|
||||
@@ -276,15 +276,17 @@ function(params) {
|
||||
labels: p._config.commonLabels,
|
||||
},
|
||||
externalLabels: p._config.externalLabels,
|
||||
enableFeatures: p._config.enableFeatures,
|
||||
serviceAccountName: 'prometheus-' + p._config.name,
|
||||
serviceMonitorSelector: {},
|
||||
podMonitorSelector: {},
|
||||
probeSelector: {},
|
||||
serviceMonitorNamespaceSelector: {},
|
||||
podMonitorNamespaceSelector: {},
|
||||
probeSelector: {},
|
||||
probeNamespaceSelector: {},
|
||||
nodeSelector: { 'kubernetes.io/os': 'linux' },
|
||||
ruleNamespaceSelector: {},
|
||||
ruleSelector: p._config.ruleSelector,
|
||||
serviceMonitorSelector: {},
|
||||
serviceMonitorNamespaceSelector: {},
|
||||
nodeSelector: { 'kubernetes.io/os': 'linux' },
|
||||
resources: p._config.resources,
|
||||
alerting: {
|
||||
alertmanagers: [{
|
||||
@@ -322,8 +324,24 @@ function(params) {
|
||||
},
|
||||
},
|
||||
|
||||
// Include thanos sidecar PrometheusRule only if thanos config was passed by user
|
||||
[if std.objectHas(params, 'thanos') && params.thanos != null then 'prometheusRuleThanosSidecar']: {
|
||||
apiVersion: 'monitoring.coreos.com/v1',
|
||||
kind: 'PrometheusRule',
|
||||
metadata: {
|
||||
labels: p._config.commonLabels + p._config.mixin.ruleLabels,
|
||||
name: 'prometheus-' + p._config.name + '-thanos-sidecar-rules',
|
||||
namespace: p._config.namespace,
|
||||
},
|
||||
spec: {
|
||||
local r = if std.objectHasAll(p.mixinThanos, 'prometheusRules') then p.mixinThanos.prometheusRules.groups else [],
|
||||
local a = if std.objectHasAll(p.mixinThanos, 'prometheusAlerts') then p.mixinThanos.prometheusAlerts.groups else [],
|
||||
groups: a + r,
|
||||
},
|
||||
},
|
||||
|
||||
// Include thanos sidecar Service only if thanos config was passed by user
|
||||
[if std.objectHas(params, 'thanos') && std.length(params.thanos) > 0 then 'serviceThanosSidecar']: {
|
||||
[if std.objectHas(params, 'thanos') && params.thanos != null then 'serviceThanosSidecar']: {
|
||||
apiVersion: 'v1',
|
||||
kind: 'Service',
|
||||
metadata+: {
|
||||
@@ -348,7 +366,7 @@ function(params) {
|
||||
},
|
||||
|
||||
// Include thanos sidecar ServiceMonitor only if thanos config was passed by user
|
||||
[if std.objectHas(params, 'thanos') && std.length(params.thanos) > 0 then 'serviceMonitorThanosSidecar']: {
|
||||
[if std.objectHas(params, 'thanos') && params.thanos != null then 'serviceMonitorThanosSidecar']: {
|
||||
apiVersion: 'monitoring.coreos.com/v1',
|
||||
kind: 'ServiceMonitor',
|
||||
metadata+: {
|
||||
|
||||
@@ -8,7 +8,7 @@
|
||||
"subdir": "grafana"
|
||||
}
|
||||
},
|
||||
"version": "8ea4e7bc04b1bf5e9bd99918ca28c6271b42be0e"
|
||||
"version": "90f38916f1f8a310a715d18e36f787f84df4ddf5"
|
||||
},
|
||||
{
|
||||
"source": {
|
||||
@@ -17,7 +17,7 @@
|
||||
"subdir": "contrib/mixin"
|
||||
}
|
||||
},
|
||||
"version": "562d645ac923388ff5b8d270b0536764d34b0e0f"
|
||||
"version": "release-3.5"
|
||||
},
|
||||
{
|
||||
"source": {
|
||||
@@ -26,7 +26,7 @@
|
||||
"subdir": "jsonnet/prometheus-operator"
|
||||
}
|
||||
},
|
||||
"version": "release-0.47"
|
||||
"version": "release-0.50"
|
||||
},
|
||||
{
|
||||
"source": {
|
||||
@@ -35,7 +35,7 @@
|
||||
"subdir": "jsonnet/mixin"
|
||||
}
|
||||
},
|
||||
"version": "release-0.47",
|
||||
"version": "release-0.50",
|
||||
"name": "prometheus-operator-mixin"
|
||||
},
|
||||
{
|
||||
@@ -45,7 +45,7 @@
|
||||
"subdir": ""
|
||||
}
|
||||
},
|
||||
"version": "release-0.8"
|
||||
"version": "release-0.9"
|
||||
},
|
||||
{
|
||||
"source": {
|
||||
@@ -54,7 +54,7 @@
|
||||
"subdir": "jsonnet/kube-state-metrics"
|
||||
}
|
||||
},
|
||||
"version": "release-2.0"
|
||||
"version": "release-2.1"
|
||||
},
|
||||
{
|
||||
"source": {
|
||||
@@ -63,7 +63,7 @@
|
||||
"subdir": "jsonnet/kube-state-metrics-mixin"
|
||||
}
|
||||
},
|
||||
"version": "release-2.0"
|
||||
"version": "release-2.1"
|
||||
},
|
||||
{
|
||||
"source": {
|
||||
@@ -72,7 +72,7 @@
|
||||
"subdir": "docs/node-mixin"
|
||||
}
|
||||
},
|
||||
"version": "release-1.1"
|
||||
"version": "832909dd257eb368cf83363ffcae3ab84cb4bcb1"
|
||||
},
|
||||
{
|
||||
"source": {
|
||||
@@ -81,7 +81,7 @@
|
||||
"subdir": "documentation/prometheus-mixin"
|
||||
}
|
||||
},
|
||||
"version": "release-2.26",
|
||||
"version": "751ca03faddc9c64089c41d0da370a3a0b477742",
|
||||
"name": "prometheus"
|
||||
},
|
||||
{
|
||||
@@ -91,7 +91,7 @@
|
||||
"subdir": "doc/alertmanager-mixin"
|
||||
}
|
||||
},
|
||||
"version": "99f64e944b1043c790784cf5373c8fb349816fc4",
|
||||
"version": "b408b522bc653d014e53035e59fa394cc1edd762",
|
||||
"name": "alertmanager"
|
||||
},
|
||||
{
|
||||
@@ -101,7 +101,7 @@
|
||||
"subdir": "mixin"
|
||||
}
|
||||
},
|
||||
"version": "release-0.19",
|
||||
"version": "release-0.22",
|
||||
"name": "thanos-mixin"
|
||||
}
|
||||
],
|
||||
|
||||
@@ -8,29 +8,29 @@ local defaults = {
|
||||
};
|
||||
|
||||
function(params) {
|
||||
config:: defaults + params,
|
||||
_config:: defaults + params,
|
||||
|
||||
local m = self,
|
||||
|
||||
local prometheusRules = if std.objectHasAll(m.config.mixin, 'prometheusRules') || std.objectHasAll(m.config.mixin, 'prometheusAlerts') then {
|
||||
local prometheusRules = if std.objectHasAll(m._config.mixin, 'prometheusRules') || std.objectHasAll(m._config.mixin, 'prometheusAlerts') then {
|
||||
apiVersion: 'monitoring.coreos.com/v1',
|
||||
kind: 'PrometheusRule',
|
||||
metadata: {
|
||||
labels: m.config.labels,
|
||||
name: m.config.name,
|
||||
namespace: m.config.namespace,
|
||||
labels: m._config.labels,
|
||||
name: m._config.name,
|
||||
namespace: m._config.namespace,
|
||||
},
|
||||
spec: {
|
||||
local r = if std.objectHasAll(m.config.mixin, 'prometheusRules') then m.config.mixin.prometheusRules.groups else [],
|
||||
local a = if std.objectHasAll(m.config.mixin, 'prometheusAlerts') then m.config.mixin.prometheusAlerts.groups else [],
|
||||
local r = if std.objectHasAll(m._config.mixin, 'prometheusRules') then m._config.mixin.prometheusRules.groups else [],
|
||||
local a = if std.objectHasAll(m._config.mixin, 'prometheusAlerts') then m._config.mixin.prometheusAlerts.groups else [],
|
||||
groups: a + r,
|
||||
},
|
||||
},
|
||||
|
||||
local grafanaDashboards = if std.objectHasAll(m.config.mixin, 'grafanaDashboards') then (
|
||||
if std.objectHas(m.config, 'dashboardFolder') then {
|
||||
[m.config.dashboardFolder]+: m.config.mixin.grafanaDashboards,
|
||||
} else (m.config.mixin.grafanaDashboards)
|
||||
local grafanaDashboards = if std.objectHasAll(m._config.mixin, 'grafanaDashboards') then (
|
||||
if std.objectHas(m._config, 'dashboardFolder') then {
|
||||
[m._config.dashboardFolder]+: m._config.mixin.grafanaDashboards,
|
||||
} else (m._config.mixin.grafanaDashboards)
|
||||
),
|
||||
|
||||
prometheusRules: prometheusRules,
|
||||
|
||||
7
jsonnet/kube-prometheus/lib/utils.libsonnet
Normal file
7
jsonnet/kube-prometheus/lib/utils.libsonnet
Normal file
@@ -0,0 +1,7 @@
|
||||
{
|
||||
// rangeInterval takes a scrape interval and convert its to a range interval
|
||||
// following Prometheus rule of thumb for rate() and irate().
|
||||
rangeInterval(i='1m'):
|
||||
local interval = std.parseInt(std.substr(i, 0, std.length(i) - 1));
|
||||
interval * 4 + i[std.length(i) - 1],
|
||||
}
|
||||
@@ -11,11 +11,14 @@ local prometheus = import './components/prometheus.libsonnet';
|
||||
|
||||
local platformPatch = import './platforms/platforms.libsonnet';
|
||||
|
||||
local utils = import './lib/utils.libsonnet';
|
||||
|
||||
{
|
||||
// using `values` as this is similar to helm
|
||||
values:: {
|
||||
common: {
|
||||
namespace: 'default',
|
||||
platform: null,
|
||||
ruleLabels: {
|
||||
role: 'alert-rules',
|
||||
prometheus: $.values.prometheus.name,
|
||||
@@ -40,7 +43,7 @@ local platformPatch = import './platforms/platforms.libsonnet';
|
||||
kubeStateMetrics: 'k8s.gcr.io/kube-state-metrics/kube-state-metrics:v' + $.values.common.versions.kubeStateMetrics,
|
||||
nodeExporter: 'quay.io/prometheus/node-exporter:v' + $.values.common.versions.nodeExporter,
|
||||
prometheus: 'quay.io/prometheus/prometheus:v' + $.values.common.versions.prometheus,
|
||||
prometheusAdapter: 'directxman12/k8s-prometheus-adapter:v' + $.values.common.versions.prometheusAdapter,
|
||||
prometheusAdapter: 'k8s.gcr.io/prometheus-adapter/prometheus-adapter:v' + $.values.common.versions.prometheusAdapter,
|
||||
prometheusOperator: 'quay.io/prometheus-operator/prometheus-operator:v' + $.values.common.versions.prometheusOperator,
|
||||
prometheusOperatorReloader: 'quay.io/prometheus-operator/prometheus-config-reloader:v' + $.values.common.versions.prometheusOperator,
|
||||
kubeRbacProxy: 'quay.io/brancz/kube-rbac-proxy:v' + $.values.common.versions.kubeRbacProxy,
|
||||
@@ -67,7 +70,7 @@ local platformPatch = import './platforms/platforms.libsonnet';
|
||||
image: $.values.common.images.grafana,
|
||||
prometheusName: $.values.prometheus.name,
|
||||
// TODO(paulfantom) This should be done by iterating over all objects and looking for object.mixin.grafanaDashboards
|
||||
dashboards: $.nodeExporter.mixin.grafanaDashboards + $.prometheus.mixin.grafanaDashboards + $.kubernetesControlPlane.mixin.grafanaDashboards,
|
||||
dashboards: $.nodeExporter.mixin.grafanaDashboards + $.prometheus.mixin.grafanaDashboards + $.kubernetesControlPlane.mixin.grafanaDashboards + $.alertmanager.mixin.grafanaDashboards,
|
||||
},
|
||||
kubeStateMetrics: {
|
||||
namespace: $.values.common.namespace,
|
||||
@@ -96,15 +99,16 @@ local platformPatch = import './platforms/platforms.libsonnet';
|
||||
version: $.values.common.versions.prometheusAdapter,
|
||||
image: $.values.common.images.prometheusAdapter,
|
||||
prometheusURL: 'http://prometheus-' + $.values.prometheus.name + '.' + $.values.common.namespace + '.svc.cluster.local:9090/',
|
||||
rangeIntervals+: {
|
||||
kubelet: utils.rangeInterval($.kubernetesControlPlane.serviceMonitorKubelet.spec.endpoints[0].interval),
|
||||
nodeExporter: utils.rangeInterval($.nodeExporter.serviceMonitor.spec.endpoints[0].interval),
|
||||
},
|
||||
},
|
||||
prometheusOperator: {
|
||||
namespace: $.values.common.namespace,
|
||||
version: $.values.common.versions.prometheusOperator,
|
||||
image: $.values.common.images.prometheusOperator,
|
||||
configReloaderImage: $.values.common.images.prometheusOperatorReloader,
|
||||
commonLabels+: {
|
||||
'app.kubernetes.io/part-of': 'kube-prometheus',
|
||||
},
|
||||
mixin+: { ruleLabels: $.values.common.ruleLabels },
|
||||
kubeRbacProxyImage: $.values.common.images.kubeRbacProxy,
|
||||
},
|
||||
@@ -112,11 +116,6 @@ local platformPatch = import './platforms/platforms.libsonnet';
|
||||
namespace: $.values.common.namespace,
|
||||
mixin+: { ruleLabels: $.values.common.ruleLabels },
|
||||
},
|
||||
kubePrometheus: {
|
||||
namespace: $.values.common.namespace,
|
||||
mixin+: { ruleLabels: $.values.common.ruleLabels },
|
||||
platform: null,
|
||||
},
|
||||
},
|
||||
|
||||
alertmanager: alertmanager($.values.alertmanager),
|
||||
@@ -128,12 +127,17 @@ local platformPatch = import './platforms/platforms.libsonnet';
|
||||
prometheusAdapter: prometheusAdapter($.values.prometheusAdapter),
|
||||
prometheusOperator: prometheusOperator($.values.prometheusOperator),
|
||||
kubernetesControlPlane: kubernetesControlPlane($.values.kubernetesControlPlane),
|
||||
kubePrometheus: customMixin($.values.kubePrometheus) + {
|
||||
kubePrometheus: customMixin(
|
||||
{
|
||||
namespace: $.values.common.namespace,
|
||||
mixin+: { ruleLabels: $.values.common.ruleLabels },
|
||||
}
|
||||
) + {
|
||||
namespace: {
|
||||
apiVersion: 'v1',
|
||||
kind: 'Namespace',
|
||||
metadata: {
|
||||
name: $.values.kubePrometheus.namespace,
|
||||
name: $.values.common.namespace,
|
||||
},
|
||||
},
|
||||
},
|
||||
|
||||
@@ -1,10 +1,5 @@
|
||||
{
|
||||
values+:: {
|
||||
eks: {
|
||||
minimumAvailableIPs: 10,
|
||||
minimumAvailableIPsTime: '10m',
|
||||
},
|
||||
},
|
||||
(import '../addons/aws-vpc-cni.libsonnet') +
|
||||
(import '../addons/managed-cluster.libsonnet') + {
|
||||
kubernetesControlPlane+: {
|
||||
serviceMonitorCoreDNS+: {
|
||||
spec+: {
|
||||
@@ -17,82 +12,5 @@
|
||||
],
|
||||
},
|
||||
},
|
||||
AwsEksCniMetricService: {
|
||||
apiVersion: 'v1',
|
||||
kind: 'Service',
|
||||
metadata: {
|
||||
name: 'aws-node',
|
||||
namespace: 'kube-system',
|
||||
labels: { 'app.kubernetes.io/name': 'aws-node' },
|
||||
},
|
||||
spec: {
|
||||
ports: [
|
||||
{ name: 'cni-metrics-port', port: 61678, targetPort: 61678 },
|
||||
],
|
||||
selector: { 'app.kubernetes.io/name': 'aws-node' },
|
||||
clusterIP: 'None',
|
||||
},
|
||||
},
|
||||
|
||||
serviceMonitorAwsEksCNI: {
|
||||
apiVersion: 'monitoring.coreos.com/v1',
|
||||
kind: 'ServiceMonitor',
|
||||
metadata: {
|
||||
name: 'awsekscni',
|
||||
namespace: $.values.common.namespace,
|
||||
labels: {
|
||||
'app.kubernetes.io/name': 'eks-cni',
|
||||
},
|
||||
},
|
||||
spec: {
|
||||
jobLabel: 'app.kubernetes.io/name',
|
||||
selector: {
|
||||
matchLabels: {
|
||||
'app.kubernetes.io/name': 'aws-node',
|
||||
},
|
||||
},
|
||||
namespaceSelector: {
|
||||
matchNames: [
|
||||
'kube-system',
|
||||
],
|
||||
},
|
||||
endpoints: [
|
||||
{
|
||||
port: 'cni-metrics-port',
|
||||
interval: '30s',
|
||||
path: '/metrics',
|
||||
},
|
||||
],
|
||||
},
|
||||
},
|
||||
prometheusRuleEksCNI: {
|
||||
apiVersion: 'monitoring.coreos.com/v1',
|
||||
kind: 'PrometheusRule',
|
||||
metadata: {
|
||||
labels: $.prometheus._config.commonLabels + $.prometheus._config.mixin.ruleLabels,
|
||||
name: 'eks-rules',
|
||||
namespace: $.prometheus._config.namespace,
|
||||
},
|
||||
spec: {
|
||||
groups: [
|
||||
{
|
||||
name: 'kube-prometheus-eks.rules',
|
||||
rules: [
|
||||
{
|
||||
expr: 'sum by(instance) (awscni_ip_max) - sum by(instance) (awscni_assigned_ip_addresses) < %s' % $.values.eks.minimumAvailableIPs,
|
||||
labels: {
|
||||
severity: 'critical',
|
||||
},
|
||||
annotations: {
|
||||
message: 'Instance {{ $labels.instance }} has less than 10 IPs available.',
|
||||
},
|
||||
'for': $.values.eks.minimumAvailableIPsTime,
|
||||
alert: 'EksAvailableIPs',
|
||||
},
|
||||
],
|
||||
},
|
||||
],
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
@@ -1,56 +1 @@
|
||||
local service(name, namespace, labels, selector, ports) = {
|
||||
apiVersion: 'v1',
|
||||
kind: 'Service',
|
||||
metadata: {
|
||||
name: name,
|
||||
namespace: namespace,
|
||||
labels: labels,
|
||||
},
|
||||
spec: {
|
||||
ports+: ports,
|
||||
selector: selector,
|
||||
clusterIP: 'None',
|
||||
},
|
||||
};
|
||||
|
||||
{
|
||||
|
||||
kubernetesControlPlane+: {
|
||||
kubeControllerManagerPrometheusDiscoveryService: service(
|
||||
'kube-controller-manager-prometheus-discovery',
|
||||
'kube-system',
|
||||
{ 'app.kubernetes.io/name': 'kube-controller-manager' },
|
||||
{ 'app.kubernetes.io/name': 'kube-controller-manager' },
|
||||
[{ name: 'https-metrics', port: 10257, targetPort: 10257 }]
|
||||
),
|
||||
|
||||
kubeSchedulerPrometheusDiscoveryService: service(
|
||||
'kube-scheduler-prometheus-discovery',
|
||||
'kube-system',
|
||||
{ 'app.kubernetes.io/name': 'kube-scheduler' },
|
||||
{ 'app.kubernetes.io/name': 'kube-scheduler' },
|
||||
[{ name: 'https-metrics', port: 10259, targetPort: 10259 }],
|
||||
),
|
||||
|
||||
serviceMonitorKubeScheduler+: {
|
||||
spec+: {
|
||||
selector+: {
|
||||
matchLabels: {
|
||||
'app.kubernetes.io/name': 'kube-scheduler',
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
|
||||
serviceMonitorKubeControllerManager+: {
|
||||
spec+: {
|
||||
selector+: {
|
||||
matchLabels: {
|
||||
'app.kubernetes.io/name': 'kube-controller-manager',
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
|
||||
},
|
||||
}
|
||||
(import './kubeadm.libsonnet')
|
||||
|
||||
@@ -26,7 +26,7 @@ local platformPatch(p) = if p != null && std.objectHas(platforms, p) then platfo
|
||||
prometheusOperator: {},
|
||||
kubernetesControlPlane: {},
|
||||
kubePrometheus: {},
|
||||
} + platformPatch($.values.kubePrometheus.platform),
|
||||
} + platformPatch($.values.common.platform),
|
||||
|
||||
alertmanager+: p.alertmanager,
|
||||
blackboxExporter+: p.blackboxExporter,
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
{
|
||||
"alertmanager": "0.21.0",
|
||||
"blackboxExporter": "0.18.0",
|
||||
"grafana": "7.5.4",
|
||||
"kubeStateMetrics": "2.0.0",
|
||||
"nodeExporter": "1.1.2",
|
||||
"prometheus": "2.26.0",
|
||||
"prometheusAdapter": "0.8.4",
|
||||
"prometheusOperator": "0.47.0",
|
||||
"kubeRbacProxy": "0.8.0",
|
||||
"alertmanager": "0.22.2",
|
||||
"blackboxExporter": "0.19.0",
|
||||
"grafana": "8.1.1",
|
||||
"kubeStateMetrics": "2.1.1",
|
||||
"nodeExporter": "1.2.2",
|
||||
"prometheus": "2.29.1",
|
||||
"prometheusAdapter": "0.9.0",
|
||||
"prometheusOperator": "0.49.0",
|
||||
"kubeRbacProxy": "0.11.0",
|
||||
"configmapReload": "0.5.0"
|
||||
}
|
||||
}
|
||||
|
||||
@@ -8,8 +8,8 @@
|
||||
"subdir": "grafana"
|
||||
}
|
||||
},
|
||||
"version": "8ea4e7bc04b1bf5e9bd99918ca28c6271b42be0e",
|
||||
"sum": "muenICtKXABk6MZZHCZD2wCbmtiE96GwWRMGa1Rg+wA="
|
||||
"version": "90f38916f1f8a310a715d18e36f787f84df4ddf5",
|
||||
"sum": "0kZ1pnuIirDtbg6F9at5+NQOwKNONIGEPq0eECzvRkI="
|
||||
},
|
||||
{
|
||||
"source": {
|
||||
@@ -18,7 +18,7 @@
|
||||
"subdir": "contrib/mixin"
|
||||
}
|
||||
},
|
||||
"version": "562d645ac923388ff5b8d270b0536764d34b0e0f",
|
||||
"version": "e8732fb5f35d4f5229c983fea478ed13b11d729e",
|
||||
"sum": "W/Azptf1PoqjyMwJON96UY69MFugDA4IAYiKURscryc="
|
||||
},
|
||||
{
|
||||
@@ -28,7 +28,7 @@
|
||||
"subdir": "grafonnet"
|
||||
}
|
||||
},
|
||||
"version": "6db00c292d3a1c71661fc875f90e0ec7caa538c2",
|
||||
"version": "3626fc4dc2326931c530861ac5bebe39444f6cbf",
|
||||
"sum": "gF8foHByYcB25jcUOBqP6jxk0OPifQMjPvKY0HaCk6w="
|
||||
},
|
||||
{
|
||||
@@ -38,19 +38,8 @@
|
||||
"subdir": "grafana-builder"
|
||||
}
|
||||
},
|
||||
"version": "98c3060877aa178f6bdfc6ac618fbe0043fc3de7",
|
||||
"sum": "0KkygBQd/AFzUvVzezE4qF/uDYgrwUXVpZfINBti0oc="
|
||||
},
|
||||
{
|
||||
"source": {
|
||||
"git": {
|
||||
"remote": "https://github.com/ksonnet/ksonnet-lib.git",
|
||||
"subdir": ""
|
||||
}
|
||||
},
|
||||
"version": "0d2f82676817bbf9e4acf6495b2090205f323b9f",
|
||||
"sum": "h28BXZ7+vczxYJ2sCt8JuR9+yznRtU/iA6DCpQUrtEg=",
|
||||
"name": "ksonnet"
|
||||
"version": "2ed138b205717af721af57b572bc7cd63bda62fd",
|
||||
"sum": "U34Nd1ViO2LZ3D8IzygPPRfUcy6zOgCnTMVHZ+9O/QE="
|
||||
},
|
||||
{
|
||||
"source": {
|
||||
@@ -59,8 +48,8 @@
|
||||
"subdir": ""
|
||||
}
|
||||
},
|
||||
"version": "7d3bb79a4983052d421264a7e0f3c9b0d4a22268",
|
||||
"sum": "DFo3YX4xc6GJTSZDaG5XRE/ixY/5GZJwdyqBkvons4M="
|
||||
"version": "1163ea85e45e1f7edf6d4f83758d44c6fef1f2fa",
|
||||
"sum": "4H2pzHd6A47rQIZcQ3B0o+nFMeNgLE9dGYJv7ZP7m2s="
|
||||
},
|
||||
{
|
||||
"source": {
|
||||
@@ -69,7 +58,7 @@
|
||||
"subdir": "lib/promgrafonnet"
|
||||
}
|
||||
},
|
||||
"version": "0f0f3dc472ff2a8cdc6a6c6f938a2c450cb493ec",
|
||||
"version": "06d00e40b43e4e618afbebe8e453b5650c659015",
|
||||
"sum": "zv7hXGui6BfHzE9wPatHI/AGZa4A2WKo6pq7ZdqBsps="
|
||||
},
|
||||
{
|
||||
@@ -79,7 +68,7 @@
|
||||
"subdir": "jsonnet/kube-state-metrics"
|
||||
}
|
||||
},
|
||||
"version": "b1889aa1561ee269f628e2b9659155e7714dbbf0",
|
||||
"version": "d60e6f7ba1719045edc0f60857faadeb87280421",
|
||||
"sum": "S5qI+PJUdNeYOv76jH5nxwYS9N6U7CRxvyuB1wI4cTE="
|
||||
},
|
||||
{
|
||||
@@ -89,8 +78,8 @@
|
||||
"subdir": "jsonnet/kube-state-metrics-mixin"
|
||||
}
|
||||
},
|
||||
"version": "b1889aa1561ee269f628e2b9659155e7714dbbf0",
|
||||
"sum": "Yf8mNAHrV1YWzrdV8Ry5dJ8YblepTGw3C0Zp10XIYLo="
|
||||
"version": "d60e6f7ba1719045edc0f60857faadeb87280421",
|
||||
"sum": "u8gaydJoxEjzizQ8jY8xSjYgWooPmxw+wIWdDxifMAk="
|
||||
},
|
||||
{
|
||||
"source": {
|
||||
@@ -99,7 +88,7 @@
|
||||
"subdir": "jsonnet/mixin"
|
||||
}
|
||||
},
|
||||
"version": "b7ca32169844f0b5143f3e5e318fc05fa025df18",
|
||||
"version": "83fe36566f4e0894eb5ffcd2638a0f039a17bdeb",
|
||||
"sum": "6reUygVmQrLEWQzTKcH8ceDbvM+2ztK3z2VBR2K2l+U=",
|
||||
"name": "prometheus-operator-mixin"
|
||||
},
|
||||
@@ -110,8 +99,8 @@
|
||||
"subdir": "jsonnet/prometheus-operator"
|
||||
}
|
||||
},
|
||||
"version": "b7ca32169844f0b5143f3e5e318fc05fa025df18",
|
||||
"sum": "MRwyChXdKG3anL2OWpbUu3qWc97w9J6YsjUWjLFQyB0="
|
||||
"version": "83fe36566f4e0894eb5ffcd2638a0f039a17bdeb",
|
||||
"sum": "J1G++A8hrtr3+OZQMmcNeb1w/C30bXqqwpwHL/Xhsd4="
|
||||
},
|
||||
{
|
||||
"source": {
|
||||
@@ -120,8 +109,8 @@
|
||||
"subdir": "doc/alertmanager-mixin"
|
||||
}
|
||||
},
|
||||
"version": "99f64e944b1043c790784cf5373c8fb349816fc4",
|
||||
"sum": "V8jcZQ1Qrlm7AQ6wjbuQQsacPb0NvrcZovKyplmzW5w=",
|
||||
"version": "b408b522bc653d014e53035e59fa394cc1edd762",
|
||||
"sum": "pep+dHzfIjh2SU5pEkwilMCAT/NoL6YYflV4x8cr7vU=",
|
||||
"name": "alertmanager"
|
||||
},
|
||||
{
|
||||
@@ -131,8 +120,8 @@
|
||||
"subdir": "docs/node-mixin"
|
||||
}
|
||||
},
|
||||
"version": "b597c1244d7bef49e6f3359c87a56dd7707f6719",
|
||||
"sum": "cZTNXQMUCLB5FGYpMn845dcqGdkcYt58qCqOFIV/BoQ="
|
||||
"version": "832909dd257eb368cf83363ffcae3ab84cb4bcb1",
|
||||
"sum": "MmxGhE2PJ1a52mk2x7vDpMT2at4Jglbud/rK74CB5i0="
|
||||
},
|
||||
{
|
||||
"source": {
|
||||
@@ -141,8 +130,8 @@
|
||||
"subdir": "documentation/prometheus-mixin"
|
||||
}
|
||||
},
|
||||
"version": "6eeded0fdf760e81af75d9c44ce539ab77da4505",
|
||||
"sum": "VK0c3sQ3ksiM6JQsAVfWmL5NbzGv9llMfXFNXfFdJ+A=",
|
||||
"version": "751ca03faddc9c64089c41d0da370a3a0b477742",
|
||||
"sum": "AS8WYFi/z10BZSF6DFkKBscjB32XDMM7iIso7CO/FyI=",
|
||||
"name": "prometheus"
|
||||
},
|
||||
{
|
||||
@@ -152,8 +141,8 @@
|
||||
"subdir": "mixin"
|
||||
}
|
||||
},
|
||||
"version": "09b36547e5ed61a32a309648a8913bd02c08d3cc",
|
||||
"sum": "XP3uq7xcfKHsnWsz1v992csZhhZR3jQma6hFOfSViTs=",
|
||||
"version": "ff363498fc95cfe17de894d7237bcf38bdd0bc36",
|
||||
"sum": "cajthvLKDjYgYHCKQU2g/pTMRkxcbuJEvTnCyJOihl8=",
|
||||
"name": "thanos-mixin"
|
||||
},
|
||||
{
|
||||
|
||||
@@ -6,11 +6,11 @@ metadata:
|
||||
app.kubernetes.io/component: alert-router
|
||||
app.kubernetes.io/name: alertmanager
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 0.21.0
|
||||
app.kubernetes.io/version: 0.22.2
|
||||
name: main
|
||||
namespace: monitoring
|
||||
spec:
|
||||
image: quay.io/prometheus/alertmanager:v0.21.0
|
||||
image: quay.io/prometheus/alertmanager:v0.22.2
|
||||
nodeSelector:
|
||||
kubernetes.io/os: linux
|
||||
podMetadata:
|
||||
@@ -18,7 +18,7 @@ spec:
|
||||
app.kubernetes.io/component: alert-router
|
||||
app.kubernetes.io/name: alertmanager
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 0.21.0
|
||||
app.kubernetes.io/version: 0.22.2
|
||||
replicas: 3
|
||||
resources:
|
||||
limits:
|
||||
@@ -32,4 +32,4 @@ spec:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000
|
||||
serviceAccountName: alertmanager-main
|
||||
version: 0.21.0
|
||||
version: 0.22.2
|
||||
|
||||
@@ -5,7 +5,7 @@ metadata:
|
||||
app.kubernetes.io/component: alert-router
|
||||
app.kubernetes.io/name: alertmanager
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 0.21.0
|
||||
app.kubernetes.io/version: 0.22.2
|
||||
name: alertmanager-main
|
||||
namespace: monitoring
|
||||
spec:
|
||||
|
||||
@@ -5,7 +5,7 @@ metadata:
|
||||
app.kubernetes.io/component: alert-router
|
||||
app.kubernetes.io/name: alertmanager
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 0.21.0
|
||||
app.kubernetes.io/version: 0.22.2
|
||||
prometheus: k8s
|
||||
role: alert-rules
|
||||
name: alertmanager-main-rules
|
||||
@@ -17,7 +17,7 @@ spec:
|
||||
- alert: AlertmanagerFailedReload
|
||||
annotations:
|
||||
description: Configuration has failed to load for {{ $labels.namespace }}/{{ $labels.pod}}.
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/alertmanagerfailedreload
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagerfailedreload
|
||||
summary: Reloading an Alertmanager configuration has failed.
|
||||
expr: |
|
||||
# Without max_over_time, failed scrapes could create false negatives, see
|
||||
@@ -29,7 +29,7 @@ spec:
|
||||
- alert: AlertmanagerMembersInconsistent
|
||||
annotations:
|
||||
description: Alertmanager {{ $labels.namespace }}/{{ $labels.pod}} has only found {{ $value }} members of the {{$labels.job}} cluster.
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/alertmanagermembersinconsistent
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagermembersinconsistent
|
||||
summary: A member of an Alertmanager cluster has not found all other cluster members.
|
||||
expr: |
|
||||
# Without max_over_time, failed scrapes could create false negatives, see
|
||||
@@ -37,13 +37,13 @@ spec:
|
||||
max_over_time(alertmanager_cluster_members{job="alertmanager-main",namespace="monitoring"}[5m])
|
||||
< on (namespace,service) group_left
|
||||
count by (namespace,service) (max_over_time(alertmanager_cluster_members{job="alertmanager-main",namespace="monitoring"}[5m]))
|
||||
for: 10m
|
||||
for: 15m
|
||||
labels:
|
||||
severity: critical
|
||||
- alert: AlertmanagerFailedToSendAlerts
|
||||
annotations:
|
||||
description: Alertmanager {{ $labels.namespace }}/{{ $labels.pod}} failed to send {{ $value | humanizePercentage }} of notifications to {{ $labels.integration }}.
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/alertmanagerfailedtosendalerts
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagerfailedtosendalerts
|
||||
summary: An Alertmanager instance failed to send notifications.
|
||||
expr: |
|
||||
(
|
||||
@@ -58,7 +58,7 @@ spec:
|
||||
- alert: AlertmanagerClusterFailedToSendAlerts
|
||||
annotations:
|
||||
description: The minimum notification failure rate to {{ $labels.integration }} sent from any instance in the {{$labels.job}} cluster is {{ $value | humanizePercentage }}.
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/alertmanagerclusterfailedtosendalerts
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagerclusterfailedtosendalerts
|
||||
summary: All Alertmanager instances in a cluster failed to send notifications to a critical integration.
|
||||
expr: |
|
||||
min by (namespace,service, integration) (
|
||||
@@ -73,7 +73,7 @@ spec:
|
||||
- alert: AlertmanagerClusterFailedToSendAlerts
|
||||
annotations:
|
||||
description: The minimum notification failure rate to {{ $labels.integration }} sent from any instance in the {{$labels.job}} cluster is {{ $value | humanizePercentage }}.
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/alertmanagerclusterfailedtosendalerts
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagerclusterfailedtosendalerts
|
||||
summary: All Alertmanager instances in a cluster failed to send notifications to a non-critical integration.
|
||||
expr: |
|
||||
min by (namespace,service, integration) (
|
||||
@@ -88,7 +88,7 @@ spec:
|
||||
- alert: AlertmanagerConfigInconsistent
|
||||
annotations:
|
||||
description: Alertmanager instances within the {{$labels.job}} cluster have different configurations.
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/alertmanagerconfiginconsistent
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagerconfiginconsistent
|
||||
summary: Alertmanager instances within the same cluster have different configurations.
|
||||
expr: |
|
||||
count by (namespace,service) (
|
||||
@@ -101,7 +101,7 @@ spec:
|
||||
- alert: AlertmanagerClusterDown
|
||||
annotations:
|
||||
description: '{{ $value | humanizePercentage }} of Alertmanager instances within the {{$labels.job}} cluster have been up for less than half of the last 5m.'
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/alertmanagerclusterdown
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagerclusterdown
|
||||
summary: Half or more of the Alertmanager instances within the same cluster are down.
|
||||
expr: |
|
||||
(
|
||||
@@ -120,7 +120,7 @@ spec:
|
||||
- alert: AlertmanagerClusterCrashlooping
|
||||
annotations:
|
||||
description: '{{ $value | humanizePercentage }} of Alertmanager instances within the {{$labels.job}} cluster have restarted at least 5 times in the last 10m.'
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/alertmanagerclustercrashlooping
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagerclustercrashlooping
|
||||
summary: Half or more of the Alertmanager instances within the same cluster are crashlooping.
|
||||
expr: |
|
||||
(
|
||||
|
||||
@@ -6,7 +6,7 @@ metadata:
|
||||
app.kubernetes.io/component: alert-router
|
||||
app.kubernetes.io/name: alertmanager
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 0.21.0
|
||||
app.kubernetes.io/version: 0.22.2
|
||||
name: alertmanager-main
|
||||
namespace: monitoring
|
||||
stringData:
|
||||
|
||||
@@ -6,7 +6,7 @@ metadata:
|
||||
app.kubernetes.io/component: alert-router
|
||||
app.kubernetes.io/name: alertmanager
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 0.21.0
|
||||
app.kubernetes.io/version: 0.22.2
|
||||
name: alertmanager-main
|
||||
namespace: monitoring
|
||||
spec:
|
||||
|
||||
@@ -6,6 +6,6 @@ metadata:
|
||||
app.kubernetes.io/component: alert-router
|
||||
app.kubernetes.io/name: alertmanager
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 0.21.0
|
||||
app.kubernetes.io/version: 0.22.2
|
||||
name: alertmanager-main
|
||||
namespace: monitoring
|
||||
|
||||
@@ -5,7 +5,7 @@ metadata:
|
||||
app.kubernetes.io/component: alert-router
|
||||
app.kubernetes.io/name: alertmanager
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 0.21.0
|
||||
app.kubernetes.io/version: 0.22.2
|
||||
name: alertmanager
|
||||
namespace: monitoring
|
||||
spec:
|
||||
|
||||
@@ -46,6 +46,6 @@ metadata:
|
||||
app.kubernetes.io/component: exporter
|
||||
app.kubernetes.io/name: blackbox-exporter
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 0.18.0
|
||||
app.kubernetes.io/version: 0.19.0
|
||||
name: blackbox-exporter-configuration
|
||||
namespace: monitoring
|
||||
|
||||
@@ -5,7 +5,7 @@ metadata:
|
||||
app.kubernetes.io/component: exporter
|
||||
app.kubernetes.io/name: blackbox-exporter
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 0.18.0
|
||||
app.kubernetes.io/version: 0.19.0
|
||||
name: blackbox-exporter
|
||||
namespace: monitoring
|
||||
spec:
|
||||
@@ -23,13 +23,13 @@ spec:
|
||||
app.kubernetes.io/component: exporter
|
||||
app.kubernetes.io/name: blackbox-exporter
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 0.18.0
|
||||
app.kubernetes.io/version: 0.19.0
|
||||
spec:
|
||||
containers:
|
||||
- args:
|
||||
- --config.file=/etc/blackbox_exporter/config.yml
|
||||
- --web.listen-address=:19115
|
||||
image: quay.io/prometheus/blackbox-exporter:v0.18.0
|
||||
image: quay.io/prometheus/blackbox-exporter:v0.19.0
|
||||
name: blackbox-exporter
|
||||
ports:
|
||||
- containerPort: 19115
|
||||
@@ -74,7 +74,7 @@ spec:
|
||||
- --secure-listen-address=:9115
|
||||
- --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
|
||||
- --upstream=http://127.0.0.1:19115/
|
||||
image: quay.io/brancz/kube-rbac-proxy:v0.8.0
|
||||
image: quay.io/brancz/kube-rbac-proxy:v0.11.0
|
||||
name: kube-rbac-proxy
|
||||
ports:
|
||||
- containerPort: 9115
|
||||
|
||||
@@ -5,7 +5,7 @@ metadata:
|
||||
app.kubernetes.io/component: exporter
|
||||
app.kubernetes.io/name: blackbox-exporter
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 0.18.0
|
||||
app.kubernetes.io/version: 0.19.0
|
||||
name: blackbox-exporter
|
||||
namespace: monitoring
|
||||
spec:
|
||||
|
||||
@@ -5,7 +5,7 @@ metadata:
|
||||
app.kubernetes.io/component: exporter
|
||||
app.kubernetes.io/name: blackbox-exporter
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 0.18.0
|
||||
app.kubernetes.io/version: 0.19.0
|
||||
name: blackbox-exporter
|
||||
namespace: monitoring
|
||||
spec:
|
||||
|
||||
@@ -7,7 +7,7 @@ metadata:
|
||||
app.kubernetes.io/component: grafana
|
||||
app.kubernetes.io/name: grafana
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 7.5.4
|
||||
app.kubernetes.io/version: 8.1.1
|
||||
name: grafana-datasources
|
||||
namespace: monitoring
|
||||
type: Opaque
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -21,6 +21,6 @@ metadata:
|
||||
app.kubernetes.io/component: grafana
|
||||
app.kubernetes.io/name: grafana
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 7.5.4
|
||||
app.kubernetes.io/version: 8.1.1
|
||||
name: grafana-dashboards
|
||||
namespace: monitoring
|
||||
|
||||
@@ -5,7 +5,7 @@ metadata:
|
||||
app.kubernetes.io/component: grafana
|
||||
app.kubernetes.io/name: grafana
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 7.5.4
|
||||
app.kubernetes.io/version: 8.1.1
|
||||
name: grafana
|
||||
namespace: monitoring
|
||||
spec:
|
||||
@@ -18,16 +18,16 @@ spec:
|
||||
template:
|
||||
metadata:
|
||||
annotations:
|
||||
checksum/grafana-datasources: bff02b6fd55e414ce7cf08a5ea2a85e3
|
||||
checksum/grafana-datasources: fbf9c3b28f5667257167c2cec0ac311a
|
||||
labels:
|
||||
app.kubernetes.io/component: grafana
|
||||
app.kubernetes.io/name: grafana
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 7.5.4
|
||||
app.kubernetes.io/version: 8.1.1
|
||||
spec:
|
||||
containers:
|
||||
- env: []
|
||||
image: grafana/grafana:7.5.4
|
||||
image: grafana/grafana:8.1.1
|
||||
name: grafana
|
||||
ports:
|
||||
- containerPort: 3000
|
||||
@@ -53,6 +53,9 @@ spec:
|
||||
- mountPath: /etc/grafana/provisioning/dashboards
|
||||
name: grafana-dashboards
|
||||
readOnly: false
|
||||
- mountPath: /grafana-dashboard-definitions/0/alertmanager-overview
|
||||
name: grafana-dashboard-alertmanager-overview
|
||||
readOnly: false
|
||||
- mountPath: /grafana-dashboard-definitions/0/apiserver
|
||||
name: grafana-dashboard-apiserver
|
||||
readOnly: false
|
||||
@@ -116,14 +119,11 @@ spec:
|
||||
- mountPath: /grafana-dashboard-definitions/0/scheduler
|
||||
name: grafana-dashboard-scheduler
|
||||
readOnly: false
|
||||
- mountPath: /grafana-dashboard-definitions/0/statefulset
|
||||
name: grafana-dashboard-statefulset
|
||||
readOnly: false
|
||||
- mountPath: /grafana-dashboard-definitions/0/workload-total
|
||||
name: grafana-dashboard-workload-total
|
||||
readOnly: false
|
||||
nodeSelector:
|
||||
beta.kubernetes.io/os: linux
|
||||
kubernetes.io/os: linux
|
||||
securityContext:
|
||||
fsGroup: 65534
|
||||
runAsNonRoot: true
|
||||
@@ -138,6 +138,9 @@ spec:
|
||||
- configMap:
|
||||
name: grafana-dashboards
|
||||
name: grafana-dashboards
|
||||
- configMap:
|
||||
name: grafana-dashboard-alertmanager-overview
|
||||
name: grafana-dashboard-alertmanager-overview
|
||||
- configMap:
|
||||
name: grafana-dashboard-apiserver
|
||||
name: grafana-dashboard-apiserver
|
||||
@@ -201,9 +204,6 @@ spec:
|
||||
- configMap:
|
||||
name: grafana-dashboard-scheduler
|
||||
name: grafana-dashboard-scheduler
|
||||
- configMap:
|
||||
name: grafana-dashboard-statefulset
|
||||
name: grafana-dashboard-statefulset
|
||||
- configMap:
|
||||
name: grafana-dashboard-workload-total
|
||||
name: grafana-dashboard-workload-total
|
||||
|
||||
@@ -5,7 +5,7 @@ metadata:
|
||||
app.kubernetes.io/component: grafana
|
||||
app.kubernetes.io/name: grafana
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 7.5.4
|
||||
app.kubernetes.io/version: 8.1.1
|
||||
name: grafana
|
||||
namespace: monitoring
|
||||
spec:
|
||||
|
||||
@@ -5,7 +5,7 @@ metadata:
|
||||
app.kubernetes.io/component: grafana
|
||||
app.kubernetes.io/name: grafana
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 7.5.4
|
||||
app.kubernetes.io/version: 8.1.1
|
||||
name: grafana
|
||||
namespace: monitoring
|
||||
spec:
|
||||
|
||||
@@ -16,7 +16,7 @@ spec:
|
||||
- alert: TargetDown
|
||||
annotations:
|
||||
description: '{{ printf "%.4g" $value }}% of the {{ $labels.job }}/{{ $labels.service }} targets in {{ $labels.namespace }} namespace are down.'
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/targetdown
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/general/targetdown
|
||||
summary: One or more targets are unreachable.
|
||||
expr: 100 * (count(up == 0) BY (job, namespace, service) / count(up) BY (job, namespace, service)) > 10
|
||||
for: 10m
|
||||
@@ -30,7 +30,7 @@ spec:
|
||||
and always fire against a receiver. There are integrations with various notification
|
||||
mechanisms that send a notification when this alert is not firing. For example the
|
||||
"DeadMansSnitch" integration in PagerDuty.
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/watchdog
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/general/watchdog
|
||||
summary: An alert that should always be firing to certify that Alertmanager is working properly.
|
||||
expr: vector(1)
|
||||
labels:
|
||||
@@ -39,8 +39,9 @@ spec:
|
||||
rules:
|
||||
- alert: NodeNetworkInterfaceFlapping
|
||||
annotations:
|
||||
message: Network interface "{{ $labels.device }}" changing it's up status often on node-exporter {{ $labels.namespace }}/{{ $labels.pod }}
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodenetworkinterfaceflapping
|
||||
description: Network interface "{{ $labels.device }}" changing its up status often on node-exporter {{ $labels.namespace }}/{{ $labels.pod }}
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/general/nodenetworkinterfaceflapping
|
||||
summary: Network interface is often changing its status
|
||||
expr: |
|
||||
changes(node_network_up{job="node-exporter",device!~"veth.+"}[2m]) > 2
|
||||
for: 2m
|
||||
|
||||
@@ -5,7 +5,7 @@ metadata:
|
||||
app.kubernetes.io/component: exporter
|
||||
app.kubernetes.io/name: kube-state-metrics
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 2.0.0
|
||||
app.kubernetes.io/version: 2.1.1
|
||||
name: kube-state-metrics
|
||||
rules:
|
||||
- apiGroups:
|
||||
|
||||
@@ -5,7 +5,7 @@ metadata:
|
||||
app.kubernetes.io/component: exporter
|
||||
app.kubernetes.io/name: kube-state-metrics
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 2.0.0
|
||||
app.kubernetes.io/version: 2.1.1
|
||||
name: kube-state-metrics
|
||||
roleRef:
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
|
||||
@@ -5,7 +5,7 @@ metadata:
|
||||
app.kubernetes.io/component: exporter
|
||||
app.kubernetes.io/name: kube-state-metrics
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 2.0.0
|
||||
app.kubernetes.io/version: 2.1.1
|
||||
name: kube-state-metrics
|
||||
namespace: monitoring
|
||||
spec:
|
||||
@@ -23,7 +23,7 @@ spec:
|
||||
app.kubernetes.io/component: exporter
|
||||
app.kubernetes.io/name: kube-state-metrics
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 2.0.0
|
||||
app.kubernetes.io/version: 2.1.1
|
||||
spec:
|
||||
containers:
|
||||
- args:
|
||||
@@ -31,7 +31,7 @@ spec:
|
||||
- --port=8081
|
||||
- --telemetry-host=127.0.0.1
|
||||
- --telemetry-port=8082
|
||||
image: k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.0.0
|
||||
image: k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.1.1
|
||||
name: kube-state-metrics
|
||||
resources:
|
||||
limits:
|
||||
@@ -47,7 +47,7 @@ spec:
|
||||
- --secure-listen-address=:8443
|
||||
- --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
|
||||
- --upstream=http://127.0.0.1:8081/
|
||||
image: quay.io/brancz/kube-rbac-proxy:v0.8.0
|
||||
image: quay.io/brancz/kube-rbac-proxy:v0.11.0
|
||||
name: kube-rbac-proxy-main
|
||||
ports:
|
||||
- containerPort: 8443
|
||||
@@ -68,7 +68,7 @@ spec:
|
||||
- --secure-listen-address=:9443
|
||||
- --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
|
||||
- --upstream=http://127.0.0.1:8082/
|
||||
image: quay.io/brancz/kube-rbac-proxy:v0.8.0
|
||||
image: quay.io/brancz/kube-rbac-proxy:v0.11.0
|
||||
name: kube-rbac-proxy-self
|
||||
ports:
|
||||
- containerPort: 9443
|
||||
|
||||
@@ -5,7 +5,7 @@ metadata:
|
||||
app.kubernetes.io/component: exporter
|
||||
app.kubernetes.io/name: kube-state-metrics
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 2.0.0
|
||||
app.kubernetes.io/version: 2.1.1
|
||||
prometheus: k8s
|
||||
role: alert-rules
|
||||
name: kube-state-metrics-rules
|
||||
@@ -17,7 +17,7 @@ spec:
|
||||
- alert: KubeStateMetricsListErrors
|
||||
annotations:
|
||||
description: kube-state-metrics is experiencing errors at an elevated rate in list operations. This is likely causing it to not be able to expose metrics about Kubernetes objects correctly or at all.
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/kubestatemetricslisterrors
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kube-state-metrics/kubestatemetricslisterrors
|
||||
summary: kube-state-metrics is experiencing errors in list operations.
|
||||
expr: |
|
||||
(sum(rate(kube_state_metrics_list_total{job="kube-state-metrics",result="error"}[5m]))
|
||||
@@ -30,7 +30,7 @@ spec:
|
||||
- alert: KubeStateMetricsWatchErrors
|
||||
annotations:
|
||||
description: kube-state-metrics is experiencing errors at an elevated rate in watch operations. This is likely causing it to not be able to expose metrics about Kubernetes objects correctly or at all.
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/kubestatemetricswatcherrors
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kube-state-metrics/kubestatemetricswatcherrors
|
||||
summary: kube-state-metrics is experiencing errors in watch operations.
|
||||
expr: |
|
||||
(sum(rate(kube_state_metrics_watch_total{job="kube-state-metrics",result="error"}[5m]))
|
||||
@@ -40,3 +40,26 @@ spec:
|
||||
for: 15m
|
||||
labels:
|
||||
severity: critical
|
||||
- alert: KubeStateMetricsShardingMismatch
|
||||
annotations:
|
||||
description: kube-state-metrics pods are running with different --total-shards configuration, some Kubernetes objects may be exposed multiple times or not exposed at all.
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kube-state-metrics/kubestatemetricsshardingmismatch
|
||||
summary: kube-state-metrics sharding is misconfigured.
|
||||
expr: |
|
||||
stdvar (kube_state_metrics_total_shards{job="kube-state-metrics"}) != 0
|
||||
for: 15m
|
||||
labels:
|
||||
severity: critical
|
||||
- alert: KubeStateMetricsShardsMissing
|
||||
annotations:
|
||||
description: kube-state-metrics shards are missing, some Kubernetes objects are not being exposed.
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kube-state-metrics/kubestatemetricsshardsmissing
|
||||
summary: kube-state-metrics shards are missing.
|
||||
expr: |
|
||||
2^max(kube_state_metrics_total_shards{job="kube-state-metrics"}) - 1
|
||||
-
|
||||
sum( 2 ^ max by (shard_ordinal) (kube_state_metrics_shard_ordinal{job="kube-state-metrics"}) )
|
||||
!= 0
|
||||
for: 15m
|
||||
labels:
|
||||
severity: critical
|
||||
|
||||
@@ -5,7 +5,7 @@ metadata:
|
||||
app.kubernetes.io/component: exporter
|
||||
app.kubernetes.io/name: kube-state-metrics
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 2.0.0
|
||||
app.kubernetes.io/version: 2.1.1
|
||||
name: kube-state-metrics
|
||||
namespace: monitoring
|
||||
spec:
|
||||
|
||||
@@ -5,6 +5,6 @@ metadata:
|
||||
app.kubernetes.io/component: exporter
|
||||
app.kubernetes.io/name: kube-state-metrics
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 2.0.0
|
||||
app.kubernetes.io/version: 2.1.1
|
||||
name: kube-state-metrics
|
||||
namespace: monitoring
|
||||
|
||||
@@ -5,7 +5,7 @@ metadata:
|
||||
app.kubernetes.io/component: exporter
|
||||
app.kubernetes.io/name: kube-state-metrics
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 2.0.0
|
||||
app.kubernetes.io/version: 2.1.1
|
||||
name: kube-state-metrics
|
||||
namespace: monitoring
|
||||
spec:
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -31,7 +31,7 @@ spec:
|
||||
sourceLabels:
|
||||
- __name__
|
||||
- action: drop
|
||||
regex: etcd_(helper_cache_hit_count|helper_cache_miss_count|helper_cache_entry_count|request_cache_get_latencies_summary|request_cache_add_latencies_summary|request_latencies_summary)
|
||||
regex: etcd_(helper_cache_hit_count|helper_cache_miss_count|helper_cache_entry_count|object_counts|request_cache_get_latencies_summary|request_cache_add_latencies_summary|request_latencies_summary)
|
||||
sourceLabels:
|
||||
- __name__
|
||||
- action: drop
|
||||
|
||||
@@ -16,4 +16,4 @@ spec:
|
||||
- kube-system
|
||||
selector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: kube-dns
|
||||
k8s-app: kube-dns
|
||||
|
||||
@@ -31,7 +31,7 @@ spec:
|
||||
sourceLabels:
|
||||
- __name__
|
||||
- action: drop
|
||||
regex: etcd_(helper_cache_hit_count|helper_cache_miss_count|helper_cache_entry_count|request_cache_get_latencies_summary|request_cache_add_latencies_summary|request_latencies_summary)
|
||||
regex: etcd_(helper_cache_hit_count|helper_cache_miss_count|helper_cache_entry_count|object_counts|request_cache_get_latencies_summary|request_cache_add_latencies_summary|request_latencies_summary)
|
||||
sourceLabels:
|
||||
- __name__
|
||||
- action: drop
|
||||
|
||||
@@ -32,7 +32,7 @@ spec:
|
||||
sourceLabels:
|
||||
- __name__
|
||||
- action: drop
|
||||
regex: etcd_(helper_cache_hit_count|helper_cache_miss_count|helper_cache_entry_count|request_cache_get_latencies_summary|request_cache_add_latencies_summary|request_latencies_summary)
|
||||
regex: etcd_(helper_cache_hit_count|helper_cache_miss_count|helper_cache_entry_count|object_counts|request_cache_get_latencies_summary|request_cache_add_latencies_summary|request_latencies_summary)
|
||||
sourceLabels:
|
||||
- __name__
|
||||
- action: drop
|
||||
@@ -61,11 +61,16 @@ spec:
|
||||
sourceLabels:
|
||||
- __name__
|
||||
- action: drop
|
||||
regex: (container_fs_.*|container_spec_.*|container_blkio_device_usage_total|container_file_descriptors|container_sockets|container_threads_max|container_threads|container_start_time_seconds|container_last_seen);;
|
||||
regex: (container_spec_.*|container_file_descriptors|container_sockets|container_threads_max|container_threads|container_start_time_seconds|container_last_seen);;
|
||||
sourceLabels:
|
||||
- __name__
|
||||
- pod
|
||||
- namespace
|
||||
- action: drop
|
||||
regex: (container_blkio_device_usage_total);.+
|
||||
sourceLabels:
|
||||
- __name__
|
||||
- container
|
||||
path: /metrics/cadvisor
|
||||
port: https-metrics
|
||||
relabelings:
|
||||
|
||||
@@ -5,7 +5,7 @@ metadata:
|
||||
app.kubernetes.io/component: exporter
|
||||
app.kubernetes.io/name: node-exporter
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 1.1.2
|
||||
app.kubernetes.io/version: 1.2.2
|
||||
name: node-exporter
|
||||
rules:
|
||||
- apiGroups:
|
||||
|
||||
@@ -5,7 +5,7 @@ metadata:
|
||||
app.kubernetes.io/component: exporter
|
||||
app.kubernetes.io/name: node-exporter
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 1.1.2
|
||||
app.kubernetes.io/version: 1.2.2
|
||||
name: node-exporter
|
||||
roleRef:
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
|
||||
@@ -5,7 +5,7 @@ metadata:
|
||||
app.kubernetes.io/component: exporter
|
||||
app.kubernetes.io/name: node-exporter
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 1.1.2
|
||||
app.kubernetes.io/version: 1.2.2
|
||||
name: node-exporter
|
||||
namespace: monitoring
|
||||
spec:
|
||||
@@ -20,7 +20,7 @@ spec:
|
||||
app.kubernetes.io/component: exporter
|
||||
app.kubernetes.io/name: node-exporter
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 1.1.2
|
||||
app.kubernetes.io/version: 1.2.2
|
||||
spec:
|
||||
containers:
|
||||
- args:
|
||||
@@ -32,7 +32,7 @@ spec:
|
||||
- --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/pods/.+)($|/)
|
||||
- --collector.netclass.ignored-devices=^(veth.*|[a-f0-9]{15})$
|
||||
- --collector.netdev.device-exclude=^(veth.*|[a-f0-9]{15})$
|
||||
image: quay.io/prometheus/node-exporter:v1.1.2
|
||||
image: quay.io/prometheus/node-exporter:v1.2.2
|
||||
name: node-exporter
|
||||
resources:
|
||||
limits:
|
||||
@@ -60,7 +60,7 @@ spec:
|
||||
valueFrom:
|
||||
fieldRef:
|
||||
fieldPath: status.podIP
|
||||
image: quay.io/brancz/kube-rbac-proxy:v0.8.0
|
||||
image: quay.io/brancz/kube-rbac-proxy:v0.11.0
|
||||
name: kube-rbac-proxy
|
||||
ports:
|
||||
- containerPort: 9100
|
||||
|
||||
@@ -5,7 +5,7 @@ metadata:
|
||||
app.kubernetes.io/component: exporter
|
||||
app.kubernetes.io/name: node-exporter
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 1.1.2
|
||||
app.kubernetes.io/version: 1.2.2
|
||||
prometheus: k8s
|
||||
role: alert-rules
|
||||
name: node-exporter-rules
|
||||
@@ -17,7 +17,7 @@ spec:
|
||||
- alert: NodeFilesystemSpaceFillingUp
|
||||
annotations:
|
||||
description: Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available space left and is filling up.
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodefilesystemspacefillingup
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemspacefillingup
|
||||
summary: Filesystem is predicted to run out of space within the next 24 hours.
|
||||
expr: |
|
||||
(
|
||||
@@ -33,7 +33,7 @@ spec:
|
||||
- alert: NodeFilesystemSpaceFillingUp
|
||||
annotations:
|
||||
description: Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available space left and is filling up fast.
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodefilesystemspacefillingup
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemspacefillingup
|
||||
summary: Filesystem is predicted to run out of space within the next 4 hours.
|
||||
expr: |
|
||||
(
|
||||
@@ -49,7 +49,7 @@ spec:
|
||||
- alert: NodeFilesystemAlmostOutOfSpace
|
||||
annotations:
|
||||
description: Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available space left.
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodefilesystemalmostoutofspace
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemalmostoutofspace
|
||||
summary: Filesystem has less than 5% space left.
|
||||
expr: |
|
||||
(
|
||||
@@ -57,13 +57,13 @@ spec:
|
||||
and
|
||||
node_filesystem_readonly{job="node-exporter",fstype!=""} == 0
|
||||
)
|
||||
for: 1h
|
||||
for: 30m
|
||||
labels:
|
||||
severity: warning
|
||||
- alert: NodeFilesystemAlmostOutOfSpace
|
||||
annotations:
|
||||
description: Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available space left.
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodefilesystemalmostoutofspace
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemalmostoutofspace
|
||||
summary: Filesystem has less than 3% space left.
|
||||
expr: |
|
||||
(
|
||||
@@ -71,13 +71,13 @@ spec:
|
||||
and
|
||||
node_filesystem_readonly{job="node-exporter",fstype!=""} == 0
|
||||
)
|
||||
for: 1h
|
||||
for: 30m
|
||||
labels:
|
||||
severity: critical
|
||||
- alert: NodeFilesystemFilesFillingUp
|
||||
annotations:
|
||||
description: Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available inodes left and is filling up.
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodefilesystemfilesfillingup
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemfilesfillingup
|
||||
summary: Filesystem is predicted to run out of inodes within the next 24 hours.
|
||||
expr: |
|
||||
(
|
||||
@@ -93,7 +93,7 @@ spec:
|
||||
- alert: NodeFilesystemFilesFillingUp
|
||||
annotations:
|
||||
description: Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available inodes left and is filling up fast.
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodefilesystemfilesfillingup
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemfilesfillingup
|
||||
summary: Filesystem is predicted to run out of inodes within the next 4 hours.
|
||||
expr: |
|
||||
(
|
||||
@@ -109,7 +109,7 @@ spec:
|
||||
- alert: NodeFilesystemAlmostOutOfFiles
|
||||
annotations:
|
||||
description: Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available inodes left.
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodefilesystemalmostoutoffiles
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemalmostoutoffiles
|
||||
summary: Filesystem has less than 5% inodes left.
|
||||
expr: |
|
||||
(
|
||||
@@ -123,7 +123,7 @@ spec:
|
||||
- alert: NodeFilesystemAlmostOutOfFiles
|
||||
annotations:
|
||||
description: Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available inodes left.
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodefilesystemalmostoutoffiles
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemalmostoutoffiles
|
||||
summary: Filesystem has less than 3% inodes left.
|
||||
expr: |
|
||||
(
|
||||
@@ -137,7 +137,7 @@ spec:
|
||||
- alert: NodeNetworkReceiveErrs
|
||||
annotations:
|
||||
description: '{{ $labels.instance }} interface {{ $labels.device }} has encountered {{ printf "%.0f" $value }} receive errors in the last two minutes.'
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodenetworkreceiveerrs
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodenetworkreceiveerrs
|
||||
summary: Network interface is reporting many receive errors.
|
||||
expr: |
|
||||
rate(node_network_receive_errs_total[2m]) / rate(node_network_receive_packets_total[2m]) > 0.01
|
||||
@@ -147,7 +147,7 @@ spec:
|
||||
- alert: NodeNetworkTransmitErrs
|
||||
annotations:
|
||||
description: '{{ $labels.instance }} interface {{ $labels.device }} has encountered {{ printf "%.0f" $value }} transmit errors in the last two minutes.'
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodenetworktransmiterrs
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodenetworktransmiterrs
|
||||
summary: Network interface is reporting many transmit errors.
|
||||
expr: |
|
||||
rate(node_network_transmit_errs_total[2m]) / rate(node_network_transmit_packets_total[2m]) > 0.01
|
||||
@@ -157,7 +157,7 @@ spec:
|
||||
- alert: NodeHighNumberConntrackEntriesUsed
|
||||
annotations:
|
||||
description: '{{ $value | humanizePercentage }} of conntrack entries are used.'
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodehighnumberconntrackentriesused
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodehighnumberconntrackentriesused
|
||||
summary: Number of conntrack are getting close to the limit.
|
||||
expr: |
|
||||
(node_nf_conntrack_entries / node_nf_conntrack_entries_limit) > 0.75
|
||||
@@ -166,7 +166,7 @@ spec:
|
||||
- alert: NodeTextFileCollectorScrapeError
|
||||
annotations:
|
||||
description: Node Exporter text file collector failed to scrape.
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodetextfilecollectorscrapeerror
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodetextfilecollectorscrapeerror
|
||||
summary: Node Exporter text file collector failed to scrape.
|
||||
expr: |
|
||||
node_textfile_scrape_error{job="node-exporter"} == 1
|
||||
@@ -175,7 +175,7 @@ spec:
|
||||
- alert: NodeClockSkewDetected
|
||||
annotations:
|
||||
description: Clock on {{ $labels.instance }} is out of sync by more than 300s. Ensure NTP is configured correctly on this host.
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodeclockskewdetected
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodeclockskewdetected
|
||||
summary: Clock skew detected.
|
||||
expr: |
|
||||
(
|
||||
@@ -195,7 +195,7 @@ spec:
|
||||
- alert: NodeClockNotSynchronising
|
||||
annotations:
|
||||
description: Clock on {{ $labels.instance }} is not synchronising. Ensure NTP is configured on this host.
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodeclocknotsynchronising
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodeclocknotsynchronising
|
||||
summary: Clock not synchronising.
|
||||
expr: |
|
||||
min_over_time(node_timex_sync_status[5m]) == 0
|
||||
@@ -207,7 +207,7 @@ spec:
|
||||
- alert: NodeRAIDDegraded
|
||||
annotations:
|
||||
description: RAID array '{{ $labels.device }}' on {{ $labels.instance }} is in degraded state due to one or more disks failures. Number of spare drives is insufficient to fix issue automatically.
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/noderaiddegraded
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/noderaiddegraded
|
||||
summary: RAID Array is degraded
|
||||
expr: |
|
||||
node_md_disks_required - ignoring (state) (node_md_disks{state="active"}) > 0
|
||||
@@ -217,12 +217,36 @@ spec:
|
||||
- alert: NodeRAIDDiskFailure
|
||||
annotations:
|
||||
description: At least one device in RAID array on {{ $labels.instance }} failed. Array '{{ $labels.device }}' needs attention and possibly a disk swap.
|
||||
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/noderaiddiskfailure
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/noderaiddiskfailure
|
||||
summary: Failed device in RAID array
|
||||
expr: |
|
||||
node_md_disks{state="failed"} > 0
|
||||
labels:
|
||||
severity: warning
|
||||
- alert: NodeFileDescriptorLimit
|
||||
annotations:
|
||||
description: File descriptors limit at {{ $labels.instance }} is currently at {{ printf "%.2f" $value }}%.
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefiledescriptorlimit
|
||||
summary: Kernel is predicted to exhaust file descriptors limit soon.
|
||||
expr: |
|
||||
(
|
||||
node_filefd_allocated{job="node-exporter"} * 100 / node_filefd_maximum{job="node-exporter"} > 70
|
||||
)
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
- alert: NodeFileDescriptorLimit
|
||||
annotations:
|
||||
description: File descriptors limit at {{ $labels.instance }} is currently at {{ printf "%.2f" $value }}%.
|
||||
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefiledescriptorlimit
|
||||
summary: Kernel is predicted to exhaust file descriptors limit soon.
|
||||
expr: |
|
||||
(
|
||||
node_filefd_allocated{job="node-exporter"} * 100 / node_filefd_maximum{job="node-exporter"} > 90
|
||||
)
|
||||
for: 15m
|
||||
labels:
|
||||
severity: critical
|
||||
- name: node-exporter.rules
|
||||
rules:
|
||||
- expr: |
|
||||
@@ -234,9 +258,9 @@ spec:
|
||||
record: instance:node_num_cpu:sum
|
||||
- expr: |
|
||||
1 - avg without (cpu, mode) (
|
||||
rate(node_cpu_seconds_total{job="node-exporter", mode="idle"}[1m])
|
||||
rate(node_cpu_seconds_total{job="node-exporter", mode="idle"}[5m])
|
||||
)
|
||||
record: instance:node_cpu_utilisation:rate1m
|
||||
record: instance:node_cpu_utilisation:rate5m
|
||||
- expr: |
|
||||
(
|
||||
node_load1{job="node-exporter"}
|
||||
@@ -252,31 +276,31 @@ spec:
|
||||
)
|
||||
record: instance:node_memory_utilisation:ratio
|
||||
- expr: |
|
||||
rate(node_vmstat_pgmajfault{job="node-exporter"}[1m])
|
||||
record: instance:node_vmstat_pgmajfault:rate1m
|
||||
rate(node_vmstat_pgmajfault{job="node-exporter"}[5m])
|
||||
record: instance:node_vmstat_pgmajfault:rate5m
|
||||
- expr: |
|
||||
rate(node_disk_io_time_seconds_total{job="node-exporter", device=~"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+"}[1m])
|
||||
record: instance_device:node_disk_io_time_seconds:rate1m
|
||||
rate(node_disk_io_time_seconds_total{job="node-exporter", device=~"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+"}[5m])
|
||||
record: instance_device:node_disk_io_time_seconds:rate5m
|
||||
- expr: |
|
||||
rate(node_disk_io_time_weighted_seconds_total{job="node-exporter", device=~"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+"}[1m])
|
||||
record: instance_device:node_disk_io_time_weighted_seconds:rate1m
|
||||
rate(node_disk_io_time_weighted_seconds_total{job="node-exporter", device=~"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+"}[5m])
|
||||
record: instance_device:node_disk_io_time_weighted_seconds:rate5m
|
||||
- expr: |
|
||||
sum without (device) (
|
||||
rate(node_network_receive_bytes_total{job="node-exporter", device!="lo"}[1m])
|
||||
rate(node_network_receive_bytes_total{job="node-exporter", device!="lo"}[5m])
|
||||
)
|
||||
record: instance:node_network_receive_bytes_excluding_lo:rate1m
|
||||
record: instance:node_network_receive_bytes_excluding_lo:rate5m
|
||||
- expr: |
|
||||
sum without (device) (
|
||||
rate(node_network_transmit_bytes_total{job="node-exporter", device!="lo"}[1m])
|
||||
rate(node_network_transmit_bytes_total{job="node-exporter", device!="lo"}[5m])
|
||||
)
|
||||
record: instance:node_network_transmit_bytes_excluding_lo:rate1m
|
||||
record: instance:node_network_transmit_bytes_excluding_lo:rate5m
|
||||
- expr: |
|
||||
sum without (device) (
|
||||
rate(node_network_receive_drop_total{job="node-exporter", device!="lo"}[1m])
|
||||
rate(node_network_receive_drop_total{job="node-exporter", device!="lo"}[5m])
|
||||
)
|
||||
record: instance:node_network_receive_drop_excluding_lo:rate1m
|
||||
record: instance:node_network_receive_drop_excluding_lo:rate5m
|
||||
- expr: |
|
||||
sum without (device) (
|
||||
rate(node_network_transmit_drop_total{job="node-exporter", device!="lo"}[1m])
|
||||
rate(node_network_transmit_drop_total{job="node-exporter", device!="lo"}[5m])
|
||||
)
|
||||
record: instance:node_network_transmit_drop_excluding_lo:rate1m
|
||||
record: instance:node_network_transmit_drop_excluding_lo:rate5m
|
||||
|
||||
@@ -5,7 +5,7 @@ metadata:
|
||||
app.kubernetes.io/component: exporter
|
||||
app.kubernetes.io/name: node-exporter
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 1.1.2
|
||||
app.kubernetes.io/version: 1.2.2
|
||||
name: node-exporter
|
||||
namespace: monitoring
|
||||
spec:
|
||||
|
||||
@@ -5,6 +5,6 @@ metadata:
|
||||
app.kubernetes.io/component: exporter
|
||||
app.kubernetes.io/name: node-exporter
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 1.1.2
|
||||
app.kubernetes.io/version: 1.2.2
|
||||
name: node-exporter
|
||||
namespace: monitoring
|
||||
|
||||
@@ -5,7 +5,7 @@ metadata:
|
||||
app.kubernetes.io/component: exporter
|
||||
app.kubernetes.io/name: node-exporter
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 1.1.2
|
||||
app.kubernetes.io/version: 1.2.2
|
||||
name: node-exporter
|
||||
namespace: monitoring
|
||||
spec:
|
||||
|
||||
@@ -5,7 +5,7 @@ metadata:
|
||||
app.kubernetes.io/component: metrics-adapter
|
||||
app.kubernetes.io/name: prometheus-adapter
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 0.8.4
|
||||
app.kubernetes.io/version: 0.9.0
|
||||
name: v1beta1.metrics.k8s.io
|
||||
spec:
|
||||
group: metrics.k8s.io
|
||||
|
||||
@@ -5,7 +5,7 @@ metadata:
|
||||
app.kubernetes.io/component: metrics-adapter
|
||||
app.kubernetes.io/name: prometheus-adapter
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 0.8.4
|
||||
app.kubernetes.io/version: 0.9.0
|
||||
name: prometheus-adapter
|
||||
rules:
|
||||
- apiGroups:
|
||||
|
||||
@@ -5,7 +5,7 @@ metadata:
|
||||
app.kubernetes.io/component: metrics-adapter
|
||||
app.kubernetes.io/name: prometheus-adapter
|
||||
app.kubernetes.io/part-of: kube-prometheus
|
||||
app.kubernetes.io/version: 0.8.4
|
||||
app.kubernetes.io/version: 0.9.0
|
||||
rbac.authorization.k8s.io/aggregate-to-admin: "true"
|
||||
rbac.authorization.k8s.io/aggregate-to-edit: "true"
|
||||
rbac.authorization.k8s.io/aggregate-to-view: "true"
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user