Compare commits

...

33 Commits

Author SHA1 Message Date
Paweł Krupa
382f16c2ea Merge pull request #1496 from PhilipGough/revert-1397-dropped-cadvisor-metrics 2021-11-24 13:13:23 +01:00
Philip Gough
9adf60f2b3 Revert "Adjust dropped metrics from cAdvisor" 2021-11-10 10:39:11 +00:00
Paweł Krupa
276fae57a6 Merge pull request #1444 from prometheus-operator/automated-updates-release-0.7 2021-10-18 10:42:10 +02:00
dgrisonnet
9b3270d3cf [bot] [release-0.7] Automated version update 2021-10-18 07:39:42 +00:00
Damien Grisonnet
37730a20c5 Merge pull request #1431 from prometheus-operator/automated-updates-release-0.7
[bot] [release-0.7] Automated version update
2021-10-12 09:20:19 +02:00
dgrisonnet
f42546d547 [bot] [release-0.7] Automated version update 2021-10-11 07:39:28 +00:00
Damien Grisonnet
6c63d2f1cb Merge pull request #1397 from PhilipGough/dropped-cadvisor-metrics
Adjust dropped metrics from cAdvisor
2021-09-28 12:01:12 +02:00
Philip Gough
138b7bf9e7 Adjust dropped metrics from cAdvisor
This change drops pod-centric metrics without a non-empty 'container' label.
Previously we dropped pod-centric metrics without a (pod, namespace) label set
however these can be critical for debugging.
2021-09-28 10:47:59 +01:00
Arthur Silva Sens
55df3c1e20 Merge pull request #1354 from PhilipGough/bz-1999072
jsonnet: Drop cAdvisor metrics with no (pod, namespace) labels while …
2021-09-02 17:17:01 -03:00
Philip Gough
0df52893e9 jsonnet: Drop cAdvisor metrics with no (pod, namespace) labels while preserving ability to monitor system services resource usage
The following provides a description and cardinality estimation based on the tests in a local cluster:

container_blkio_device_usage_total - useful for containers, but not for system services (nodes*disks*services*operations*2)
container_fs_.*                    - add filesystem read/write data (nodes*disks*services*4)
container_file_descriptors         - file descriptors limits and global numbers are exposed via (nodes*services)
container_threads_max              - max number of threads in cgroup. Usually for system services it is not limited (nodes*services)
container_threads                  - used threads in cgroup. Usually not important for system services (nodes*services)
container_sockets                  - used sockets in cgroup. Usually not important for system services (nodes*services)
container_start_time_seconds       - container start. Possibly not needed for system services (nodes*services)
container_last_seen                - Not needed as system services are always running (nodes*services)
container_spec_.*                  - Everything related to cgroup specification and thus static data (nodes*services*5)
2021-08-30 12:29:32 +01:00
Paweł Krupa
123326004b Merge pull request #1319 from prometheus-operator/automated-updates-release-0.7
[bot] [release-0.7] Automated version update
2021-08-16 10:13:07 +02:00
paulfantom
58775cedf8 [bot] [release-0.7] Automated version update 2021-08-16 08:04:46 +00:00
Paweł Krupa
4ae24f3726 Merge pull request #1295 from dgrisonnet/1245-release-0.7
release-0.7: *: add "update" target to makefile and use it in automatic updater
2021-08-16 10:03:42 +02:00
Paweł Krupa
4f02cea7a0 Merge pull request #1309 from dgrisonnet/backport-1237-0.7 2021-08-11 14:00:38 +02:00
Damien Grisonnet
f9727df11a .github: bump kind version and images
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-08-09 18:09:50 +02:00
Philip Gough
cdab4847e0 ci: Harden action to wait for kind cluster readiness 2021-08-05 12:03:06 +02:00
paulfantom
f12a8a3d5c *: add "update" target to makefile and use it in automatic updater
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2021-08-02 12:41:04 +02:00
Matthias Loibl
8a150fbe5f Merge pull request #1009 from paulfantom/etcdmixin
jsonnet: pin to correct etcd commit
2021-03-09 15:07:09 +01:00
paulfantom
e1ad2ddf55 jsonnet: pin to correct etcd commit 2021-03-05 19:12:39 +01:00
Paweł Krupa
4dbc23c923 Merge pull request #980 from patricio-dorantes/hotfix/etcd-route-fix 2021-03-05 15:37:36 +01:00
Patricio M Dorantes Jamarne
e04abe8551 docker run --rm -it -v /home/patricio.dorantes/ocp-train/install-prometheus/kube-prometheus:/home/patricio.dorantes/ocp-train/install-prometheus/kube-prometheus:z -w /home/patricio.dorantes/ocp-train/install-prometheus/kube-prometheus --entrypoint=bash golang:1.16.0 -c 'make generate --always-generate' 2021-02-25 12:07:55 -06:00
Patricio M Dorantes Jamarne
1ae6f1b679 missing /etcd/ 2021-02-25 09:25:23 -06:00
Patricio M Dorantes Jamarne
60c33ff10a fix etcd mixin new path 2021-02-24 20:31:19 -06:00
Paweł Krupa
c1130442d6 Merge pull request #977 from vshn/0.7/pin-dependencies
[release-0.7] Pin Jsonnet dependencies
2021-02-24 14:49:26 +01:00
Simon Rüegg
da8928452e [release-0.7] Pin Jsonnet dependencies
Pin all Jsonnet dependencies to current commit SHA.

Signed-off-by: Simon Rüegg <simon@rueggs.ch>
2021-02-24 13:49:29 +01:00
Frederic Branczyk
5cf88fa121 Merge pull request #957 from xFragger/patch-1
fixing release-0.7 etcd repository
2021-02-22 13:25:37 +01:00
Michael Freund
e3bc5885bc fixing release-0.7 etcd repository
as in 
#935
#934
for 
https://github.com/prometheus-operator/kube-prometheus/issues/933
2021-02-22 11:34:53 +01:00
Frederic Branczyk
15fc8ca4bf Merge pull request #947 from a8j8i8t8/patch-1
Fix: service names in kops for scheduler and kube-dns
2021-02-18 15:01:08 +01:00
Ajit
8d57dedf8e Fix #946
Fix for service names
2021-02-16 14:30:18 +01:00
Paweł Krupa
7433943d6f Merge pull request #916 from aveyrenc/release-0.7
Fix scheduler and controller selectors for Kubespray
2021-02-08 09:53:53 +01:00
Alexandre Veyrenc
093ff90c3c Fix scheduler and controller selectors for Kubespray 2021-02-05 11:42:30 +01:00
Lili Cosic
84f24095d6 Merge pull request #830 from dgrisonnet/pin-release-0.7
Pin jsonnet dependencies in release-0.7
2020-12-10 15:11:01 +01:00
Damien Grisonnet
df8ef58246 jsonnet,manifests: pin jsonnet deps in release-0.7
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2020-12-10 15:02:28 +01:00
14 changed files with 197 additions and 162 deletions

View File

@@ -4,7 +4,7 @@ on:
- pull_request
env:
golang-version: '1.15'
kind-version: 'v0.9.0'
kind-version: 'v0.11.1'
jobs:
generate:
runs-on: ${{ matrix.os }}
@@ -32,8 +32,8 @@ jobs:
strategy:
matrix:
kind-image:
- 'kindest/node:v1.19.0'
- 'kindest/node:v1.20.0'
- 'kindest/node:v1.19.11'
- 'kindest/node:v1.20.7'
steps:
- uses: actions/checkout@v2
- name: Start KinD
@@ -41,13 +41,9 @@ jobs:
with:
version: ${{ env.kind-version }}
image: ${{ matrix.kind-image }}
wait: 300s
- name: Wait for cluster to finish bootstraping
run: |
until [ "$(kubectl get pods --all-namespaces --no-headers | grep -cEv '([0-9]+)/\1')" -eq 0 ]; do
sleep 5s
done
kubectl cluster-info
kubectl get pods -A
run: kubectl wait --for=condition=Ready pods --all --all-namespaces --timeout=300s
- name: Create kube-prometheus stack
run: |
kubectl create -f manifests/setup

View File

@@ -31,6 +31,10 @@ vendor: $(JB_BIN) jsonnetfile.json jsonnetfile.lock.json
rm -rf vendor
$(JB_BIN) install
.PHONY: update
update: $(JB_BIN)
$(JB_BIN) update
.PHONY: fmt
fmt: $(JSONNETFMT_BIN)
find . -name 'vendor' -prune -o -name '*.libsonnet' -print -o -name '*.jsonnet' -print | \

View File

@@ -8,16 +8,16 @@
"subdir": "grafana"
}
},
"version": "master"
"version": "release-0.2"
},
{
"source": {
"git": {
"remote": "https://github.com/etcd-io/etcd",
"subdir": "Documentation/etcd-mixin"
"subdir": "contrib/mixin"
}
},
"version": "master"
"version": "60d5159091ab06e80ad446ce9e4f415e5f53439e"
},
{
"source": {
@@ -35,7 +35,7 @@
"subdir": "jsonnet/mixin"
}
},
"version": "master"
"version": "release-0.44"
},
{
"source": {
@@ -44,7 +44,7 @@
"subdir": ""
}
},
"version": "master"
"version": "release-0.6"
},
{
"source": {
@@ -62,7 +62,7 @@
"subdir": "jsonnet/kube-state-metrics-mixin"
}
},
"version": "master"
"version": "release-1.9"
},
{
"source": {
@@ -71,7 +71,7 @@
"subdir": "docs/node-mixin"
}
},
"version": "master"
"version": "8b466360a35581e0301bd22918be7011cf4203c3"
},
{
"source": {
@@ -90,7 +90,7 @@
"subdir": "doc/alertmanager-mixin"
}
},
"version": "master",
"version": "193ebba04d1e70d971047e983a0b489112610460",
"name": "alertmanager"
},
{

View File

@@ -31,6 +31,23 @@
regex: 'container_(network_tcp_usage_total|network_udp_usage_total|tasks_state|cpu_load_average_10s)',
action: 'drop',
},
// Drop cAdvisor metrics with no (pod, namespace) labels while preserving ability to monitor system services resource usage (cardinality estimation)
{
sourceLabels: ['__name__', 'pod', 'namespace'],
action: 'drop',
regex: '(' + std.join('|',
[
'container_fs_.*', // add filesystem read/write data (nodes*disks*services*4)
'container_spec_.*', // everything related to cgroup specification and thus static data (nodes*services*5)
'container_blkio_device_usage_total', // useful for containers, but not for system services (nodes*disks*services*operations*2)
'container_file_descriptors', // file descriptors limits and global numbers are exposed via (nodes*services)
'container_sockets', // used sockets in cgroup. Usually not important for system services (nodes*services)
'container_threads_max', // max number of threads in cgroup. Usually for system services it is not limited (nodes*services)
'container_threads', // used threads in cgroup. Usually not important for system services (nodes*services)
'container_start_time_seconds', // container start. Possibly not needed for system services (nodes*services)
'container_last_seen', // not needed as system services are always running (nodes*services)
]) + ');;',
},
],
},
],

View File

@@ -23,14 +23,14 @@ local service(name, namespace, labels, selector, ports) = {
[{ name: 'https-metrics', port: 10257, targetPort: 10257 }]
),
kubeSchedulerPrometheusDiscoveryService: service(
'kube-controller-manager-prometheus-discovery',
'kube-scheduler-prometheus-discovery',
'kube-system',
{ 'k8s-app': 'kube-scheduler' },
{ 'k8s-app': 'kube-scheduler' },
[{ name: 'https-metrics', port: 10259, targetPort: 10259 }]
),
kubeDnsPrometheusDiscoveryService: service(
'kube-controller-manager-prometheus-discovery',
'kube-dns-prometheus-discovery',
'kube-system',
{ 'k8s-app': 'kube-dns' },
{ 'k8s-app': 'kube-dns' },

View File

@@ -20,7 +20,7 @@ local service(name, namespace, labels, selector, ports) = {
'kube-controller-manager-prometheus-discovery',
'kube-system',
{ 'k8s-app': 'kube-controller-manager' },
{ 'k8s-app': 'kube-controller-manager' },
{ 'component': 'kube-controller-manager' },
[{ name: 'https-metrics', port: 10257, targetPort: 10257 }]
),
@@ -28,7 +28,7 @@ local service(name, namespace, labels, selector, ports) = {
'kube-scheduler-prometheus-discovery',
'kube-system',
{ 'k8s-app': 'kube-scheduler' },
{ 'k8s-app': 'kube-scheduler' },
{ 'component': 'kube-scheduler' },
[{ name: 'https-metrics', port: 10259, targetPort: 10259 }],
),

View File

@@ -1,4 +1,4 @@
(import 'github.com/etcd-io/etcd/Documentation/etcd-mixin/mixin.libsonnet') + {
(import 'github.com/etcd-io/etcd/contrib/mixin/mixin.libsonnet') + {
_config+:: {
etcd: {
ips: [],

View File

@@ -322,6 +322,23 @@ local relabelings = import 'kube-prometheus/dropping-deprecated-metrics-relabeli
regex: 'container_(network_tcp_usage_total|network_udp_usage_total|tasks_state|cpu_load_average_10s)',
action: 'drop',
},
// Drop cAdvisor metrics with no (pod, namespace) labels while preserving ability to monitor system services resource usage (cardinality estimation)
{
sourceLabels: ['__name__', 'pod', 'namespace'],
action: 'drop',
regex: '(' + std.join('|',
[
'container_fs_.*', // add filesystem read/write data (nodes*disks*services*4)
'container_spec_.*', // everything related to cgroup specification and thus static data (nodes*services*5)
'container_blkio_device_usage_total', // useful for containers, but not for system services (nodes*disks*services*operations*2)
'container_file_descriptors', // file descriptors limits and global numbers are exposed via (nodes*services)
'container_sockets', // used sockets in cgroup. Usually not important for system services (nodes*services)
'container_threads_max', // max number of threads in cgroup. Usually for system services it is not limited (nodes*services)
'container_threads', // used threads in cgroup. Usually not important for system services (nodes*services)
'container_start_time_seconds', // container start. Possibly not needed for system services (nodes*services)
'container_last_seen', // not needed as system services are always running (nodes*services)
]) + ');;',
},
],
},
{

View File

@@ -8,18 +8,18 @@
"subdir": "grafana"
}
},
"version": "7176a6d54b3b19e0529ce574ab5ed427f1c721e9",
"sum": "IrxVMYJrTbDliaVMXX72jUKm8Ju2Za8cAbds7d26wuY="
"version": "014301fd5f71d8305a395b2fb437089a7b1a3999",
"sum": "RHtpk2c0CcliWyt6F4DIgwpi4cEfHADK7nAxIw6RTGs="
},
{
"source": {
"git": {
"remote": "https://github.com/etcd-io/etcd.git",
"subdir": "Documentation/etcd-mixin"
"subdir": "contrib/mixin"
}
},
"version": "ca866c02422ff3f3d1f0876898a30c33dd7bcccf",
"sum": "bLqTqEr0jky9zz5MV/7ucn6H5mph2NlXas0TVnGNB1Y="
"version": "60d5159091ab06e80ad446ce9e4f415e5f53439e",
"sum": "EgKKzxcW3ttt7gjPMX//DNTqNcn/0o2VAIaWJ/HSLEc="
},
{
"source": {
@@ -28,8 +28,8 @@
"subdir": "grafonnet"
}
},
"version": "356bd73e4792ffe107725776ca8946895969c191",
"sum": "CSMZ3dJrpJpwvffie8BqcfrIVVwiKNqdPEN+1XWRBGU="
"version": "3626fc4dc2326931c530861ac5bebe39444f6cbf",
"sum": "gF8foHByYcB25jcUOBqP6jxk0OPifQMjPvKY0HaCk6w="
},
{
"source": {
@@ -38,8 +38,8 @@
"subdir": "grafana-builder"
}
},
"version": "9c3fb8096e1f80e2f3a84566566906ff187f5a8c",
"sum": "9/eJqljTTtJeq9QRjabdKWL6yD8a7VzLmGKBK3ir77k="
"version": "2ed138b205717af721af57b572bc7cd63bda62fd",
"sum": "U34Nd1ViO2LZ3D8IzygPPRfUcy6zOgCnTMVHZ+9O/QE="
},
{
"source": {
@@ -59,8 +59,8 @@
"subdir": ""
}
},
"version": "ead45674dba3c8712e422d99223453177aac6bf4",
"sum": "3i0NkntlBluDS1NRF+iSc2e727Alkv3ziuVjAP12/kE="
"version": "1941868d86a7c37e5505a14e3d567bda90e80357",
"sum": "ypWxhZVFWF53k7qIkSpUvnI6IGyFBNKmgrzjNtLwMIM="
},
{
"source": {
@@ -69,7 +69,7 @@
"subdir": "lib/promgrafonnet"
}
},
"version": "ead45674dba3c8712e422d99223453177aac6bf4",
"version": "06d00e40b43e4e618afbebe8e453b5650c659015",
"sum": "zv7hXGui6BfHzE9wPatHI/AGZa4A2WKo6pq7ZdqBsps="
},
{
@@ -79,7 +79,7 @@
"subdir": "jsonnet/kube-state-metrics"
}
},
"version": "89aaf6c524ee891140c4c8f2a05b1b16f5847309",
"version": "e72315512a38653b19dcfe4429f93eadedc0ea96",
"sum": "zD/pbQLnQq+5hegEelaheHS8mn1h09GTktFO74iwlBI="
},
{
@@ -89,8 +89,8 @@
"subdir": "jsonnet/kube-state-metrics-mixin"
}
},
"version": "7bdd62593c9273b5179cf3c9d2d819e9d997aaa4",
"sum": "Yf8mNAHrV1YWzrdV8Ry5dJ8YblepTGw3C0Zp10XIYLo="
"version": "e72315512a38653b19dcfe4429f93eadedc0ea96",
"sum": "E1GGavnf9PCWBm4WVrxWnc0FIj72UcbcweqGioWrOdU="
},
{
"source": {
@@ -99,7 +99,7 @@
"subdir": "jsonnet/mixin"
}
},
"version": "22aaf848a27f6e45702131e22a596778686068d5",
"version": "d8b7d3766225908d0239fd0d78258892cd0fc384",
"sum": "6reUygVmQrLEWQzTKcH8ceDbvM+2ztK3z2VBR2K2l+U="
},
{
@@ -151,7 +151,7 @@
"subdir": "mixin"
}
},
"version": "37e6ef61566c7c70793ba6d128f00c4c66cb2402",
"version": "79d8cfdc1a00f8a96475d5d1ff1a852b184b146e",
"sum": "OptiWUMOHFrRGTZhSfxV1RCeXZ90qsefGNTD4lDYVG0="
},
{

View File

@@ -4807,7 +4807,7 @@ items:
"title": "CPU Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "singlestat",
@@ -4891,7 +4891,7 @@ items:
"title": "CPU Requests Commitment",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "singlestat",
@@ -4975,7 +4975,7 @@ items:
"title": "CPU Limits Commitment",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "singlestat",
@@ -5059,7 +5059,7 @@ items:
"title": "Memory Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "singlestat",
@@ -5143,7 +5143,7 @@ items:
"title": "Memory Requests Commitment",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "singlestat",
@@ -5227,7 +5227,7 @@ items:
"title": "Memory Limits Commitment",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "singlestat",
@@ -5325,7 +5325,7 @@ items:
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -5653,7 +5653,7 @@ items:
"title": "CPU Quota",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
@@ -5752,7 +5752,7 @@ items:
"title": "Memory Usage (w/o cache)",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -6080,7 +6080,7 @@ items:
"title": "Requests by Namespace",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
@@ -6382,7 +6382,7 @@ items:
"title": "Current Network Usage",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
@@ -6481,7 +6481,7 @@ items:
"title": "Receive Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -6579,7 +6579,7 @@ items:
"title": "Transmit Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -6677,7 +6677,7 @@ items:
"title": "Average Container Bandwidth by Namespace: Received",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -6775,7 +6775,7 @@ items:
"title": "Average Container Bandwidth by Namespace: Transmitted",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -6873,7 +6873,7 @@ items:
"title": "Rate of Received Packets",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -6971,7 +6971,7 @@ items:
"title": "Rate of Transmitted Packets",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -7069,7 +7069,7 @@ items:
"title": "Rate of Received Packets Dropped",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -7167,7 +7167,7 @@ items:
"title": "Rate of Transmitted Packets Dropped",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -7221,7 +7221,7 @@ items:
"value": "default"
},
"hide": 0,
"label": null,
"label": "Data Source",
"name": "datasource",
"options": [
@@ -7372,7 +7372,7 @@ items:
"title": "CPU Utilisation (from requests)",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "singlestat",
@@ -7456,7 +7456,7 @@ items:
"title": "CPU Utilisation (from limits)",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "singlestat",
@@ -7540,7 +7540,7 @@ items:
"title": "Memory Utilization (from requests)",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "singlestat",
@@ -7624,7 +7624,7 @@ items:
"title": "Memory Utilisation (from limits)",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "singlestat",
@@ -7757,7 +7757,7 @@ items:
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -8029,7 +8029,7 @@ items:
"title": "CPU Quota",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
@@ -8163,7 +8163,7 @@ items:
"title": "Memory Usage (w/o cache)",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -8519,7 +8519,7 @@ items:
"title": "Memory Quota",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
@@ -8821,7 +8821,7 @@ items:
"title": "Current Network Usage",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
@@ -8920,7 +8920,7 @@ items:
"title": "Receive Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -9018,7 +9018,7 @@ items:
"title": "Transmit Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -9116,7 +9116,7 @@ items:
"title": "Rate of Received Packets",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -9214,7 +9214,7 @@ items:
"title": "Rate of Transmitted Packets",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -9312,7 +9312,7 @@ items:
"title": "Rate of Received Packets Dropped",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -9410,7 +9410,7 @@ items:
"title": "Rate of Transmitted Packets Dropped",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -9464,7 +9464,7 @@ items:
"value": "default"
},
"hide": 0,
"label": null,
"label": "Data Source",
"name": "datasource",
"options": [
@@ -9644,7 +9644,7 @@ items:
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -9916,7 +9916,7 @@ items:
"title": "CPU Quota",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
@@ -10015,7 +10015,7 @@ items:
"title": "Memory Usage (w/o cache)",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -10371,7 +10371,7 @@ items:
"title": "Memory Quota",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
@@ -10426,7 +10426,7 @@ items:
"value": "default"
},
"hide": 0,
"label": null,
"label": "Data Source",
"name": "datasource",
"options": [
@@ -10639,7 +10639,7 @@ items:
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -10744,7 +10744,7 @@ items:
"title": "CPU Throttling",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -11016,7 +11016,7 @@ items:
"title": "CPU Quota",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
@@ -11150,7 +11150,7 @@ items:
"title": "Memory Usage",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -11506,7 +11506,7 @@ items:
"title": "Memory Quota",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
@@ -11606,7 +11606,7 @@ items:
"title": "Receive Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -11705,7 +11705,7 @@ items:
"title": "Transmit Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -11804,7 +11804,7 @@ items:
"title": "Rate of Received Packets",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -11903,7 +11903,7 @@ items:
"title": "Rate of Transmitted Packets",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -12002,7 +12002,7 @@ items:
"title": "Rate of Received Packets Dropped",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -12101,7 +12101,7 @@ items:
"title": "Rate of Transmitted Packets Dropped",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -12155,7 +12155,7 @@ items:
"value": "default"
},
"hide": 0,
"label": null,
"label": "Data Source",
"name": "datasource",
"options": [
@@ -12362,7 +12362,7 @@ items:
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -12634,7 +12634,7 @@ items:
"title": "CPU Quota",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
@@ -12733,7 +12733,7 @@ items:
"title": "Memory Usage",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -13005,7 +13005,7 @@ items:
"title": "Memory Quota",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
@@ -13307,7 +13307,7 @@ items:
"title": "Current Network Usage",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
@@ -13406,7 +13406,7 @@ items:
"title": "Receive Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -13504,7 +13504,7 @@ items:
"title": "Transmit Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -13602,7 +13602,7 @@ items:
"title": "Average Container Bandwidth by Pod: Received",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -13700,7 +13700,7 @@ items:
"title": "Average Container Bandwidth by Pod: Transmitted",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -13798,7 +13798,7 @@ items:
"title": "Rate of Received Packets",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -13896,7 +13896,7 @@ items:
"title": "Rate of Transmitted Packets",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -13994,7 +13994,7 @@ items:
"title": "Rate of Received Packets Dropped",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -14092,7 +14092,7 @@ items:
"title": "Rate of Transmitted Packets Dropped",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -14146,7 +14146,7 @@ items:
"value": "default"
},
"hide": 0,
"label": null,
"label": "Data Source",
"name": "datasource",
"options": [
@@ -14415,7 +14415,7 @@ items:
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -14734,7 +14734,7 @@ items:
"title": "CPU Quota",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
@@ -14868,7 +14868,7 @@ items:
"title": "Memory Usage",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -15187,7 +15187,7 @@ items:
"title": "Memory Quota",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
@@ -15508,7 +15508,7 @@ items:
"title": "Current Network Usage",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
@@ -15607,7 +15607,7 @@ items:
"title": "Receive Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -15705,7 +15705,7 @@ items:
"title": "Transmit Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -15803,7 +15803,7 @@ items:
"title": "Average Container Bandwidth by Workload: Received",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -15901,7 +15901,7 @@ items:
"title": "Average Container Bandwidth by Workload: Transmitted",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -15999,7 +15999,7 @@ items:
"title": "Rate of Received Packets",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -16097,7 +16097,7 @@ items:
"title": "Rate of Transmitted Packets",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -16195,7 +16195,7 @@ items:
"title": "Rate of Received Packets Dropped",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -16293,7 +16293,7 @@ items:
"title": "Rate of Transmitted Packets Dropped",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -16347,7 +16347,7 @@ items:
"value": "default"
},
"hide": 0,
"label": null,
"label": "Data Source",
"name": "datasource",
"options": [
@@ -16657,7 +16657,7 @@ items:
"tableColumn": "",
"targets": [
{
"expr": "sum(kubelet_running_pods{cluster=\"$cluster\", job=\"kubelet\", metrics_path=\"/metrics\", instance=~\"$instance\"}) OR sum(kubelet_running_pod_count{cluster=\"$cluster\", job=\"kubelet\", metrics_path=\"/metrics\", instance=~\"$instance\"})",
"expr": "sum(kubelet_running_pods{cluster=\"$cluster\", job=\"kubelet\", metrics_path=\"/metrics\", instance=~\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@@ -16741,7 +16741,7 @@ items:
"tableColumn": "",
"targets": [
{
"expr": "sum(kubelet_running_containers{cluster=\"$cluster\", job=\"kubelet\", metrics_path=\"/metrics\", instance=~\"$instance\"}) OR sum(kubelet_running_container_count{cluster=\"$cluster\", job=\"kubelet\", metrics_path=\"/metrics\", instance=~\"$instance\"})",
"expr": "sum(kubelet_running_containers{cluster=\"$cluster\", job=\"kubelet\", metrics_path=\"/metrics\", instance=~\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@@ -22244,7 +22244,7 @@ items:
"title": "CPU Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -22330,7 +22330,7 @@ items:
"title": "CPU Saturation (load1 per CPU)",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -22428,7 +22428,7 @@ items:
"title": "Memory Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -22514,7 +22514,7 @@ items:
"title": "Memory Saturation (Major Page Faults)",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -22628,7 +22628,7 @@ items:
"title": "Net Utilisation (Bytes Receive/Transmit)",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -22730,7 +22730,7 @@ items:
"title": "Net Saturation (Drops Receive/Transmit)",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -22828,7 +22828,7 @@ items:
"title": "Disk IO Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -22914,7 +22914,7 @@ items:
"title": "Disk IO Saturation",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -23012,7 +23012,7 @@ items:
"title": "Disk Space Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -23066,7 +23066,7 @@ items:
"value": "default"
},
"hide": 0,
"label": null,
"label": "Data Source",
"name": "datasource",
"options": [
@@ -23192,7 +23192,7 @@ items:
"title": "CPU Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -23278,7 +23278,7 @@ items:
"title": "CPU Saturation (Load1 per CPU)",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -23376,7 +23376,7 @@ items:
"title": "Memory Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -23462,7 +23462,7 @@ items:
"title": "Memory Saturation (Major Page Faults)",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -23576,7 +23576,7 @@ items:
"title": "Net Utilisation (Bytes Receive/Transmit)",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -23678,7 +23678,7 @@ items:
"title": "Net Saturation (Drops Receive/Transmit)",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -23776,7 +23776,7 @@ items:
"title": "Disk IO Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -23862,7 +23862,7 @@ items:
"title": "Disk IO Saturation",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -23960,7 +23960,7 @@ items:
"title": "Disk Space Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -24014,7 +24014,7 @@ items:
"value": "default"
},
"hide": 0,
"label": null,
"label": "Data Source",
"name": "datasource",
"options": [
@@ -28707,7 +28707,7 @@ items:
"title": "Prometheus Stats",
"tooltip": {
"shared": true,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
@@ -28806,7 +28806,7 @@ items:
"title": "Target Sync",
"tooltip": {
"shared": true,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -28892,7 +28892,7 @@ items:
"title": "Targets",
"tooltip": {
"shared": true,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -28990,7 +28990,7 @@ items:
"title": "Average Scrape Interval Duration",
"tooltip": {
"shared": true,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -29100,7 +29100,7 @@ items:
"title": "Scrape failures",
"tooltip": {
"shared": true,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -29186,7 +29186,7 @@ items:
"title": "Appended Samples",
"tooltip": {
"shared": true,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -29284,7 +29284,7 @@ items:
"title": "Head Series",
"tooltip": {
"shared": true,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -29370,7 +29370,7 @@ items:
"title": "Head Chunks",
"tooltip": {
"shared": true,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -29468,7 +29468,7 @@ items:
"title": "Query Rate",
"tooltip": {
"shared": true,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -29554,7 +29554,7 @@ items:
"title": "Stage Duration",
"tooltip": {
"shared": true,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@@ -29608,7 +29608,7 @@ items:
"value": "default"
},
"hide": 0,
"label": null,
"label": "Data Source",
"name": "datasource",
"options": [
@@ -29619,7 +29619,7 @@ items:
"type": "datasource"
},
{
"allValue": null,
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
@@ -29647,7 +29647,7 @@ items:
"useTags": false
},
{
"allValue": null,
"allValue": ".+",
"current": {
"selected": true,
"text": "All",

View File

@@ -13,7 +13,6 @@ spec:
template:
metadata:
annotations:
checksum/grafana-dashboards: ce13f0b50d04c73fb01da858eb1fb608
checksum/grafana-datasources: 48faab41f579fc8efde6034391496f6a
labels:
app: grafana
@@ -118,7 +117,6 @@ spec:
nodeSelector:
beta.kubernetes.io/os: linux
securityContext:
fsGroup: 65534
runAsNonRoot: true
runAsUser: 65534
serviceAccountName: grafana

View File

@@ -12,4 +12,3 @@ spec:
targetPort: http
selector:
app: grafana
type: NodePort

View File

@@ -770,9 +770,8 @@ spec:
rules:
- alert: KubeStateMetricsListErrors
annotations:
description: kube-state-metrics is experiencing errors at an elevated rate in list operations. This is likely causing it to not be able to expose metrics about Kubernetes objects correctly or at all.
message: kube-state-metrics is experiencing errors at an elevated rate in list operations. This is likely causing it to not be able to expose metrics about Kubernetes objects correctly or at all.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/kubestatemetricslisterrors
summary: kube-state-metrics is experiencing errors in list operations.
expr: |
(sum(rate(kube_state_metrics_list_total{job="kube-state-metrics",result="error"}[5m]))
/
@@ -783,9 +782,8 @@ spec:
severity: critical
- alert: KubeStateMetricsWatchErrors
annotations:
description: kube-state-metrics is experiencing errors at an elevated rate in watch operations. This is likely causing it to not be able to expose metrics about Kubernetes objects correctly or at all.
message: kube-state-metrics is experiencing errors at an elevated rate in watch operations. This is likely causing it to not be able to expose metrics about Kubernetes objects correctly or at all.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/kubestatemetricswatcherrors
summary: kube-state-metrics is experiencing errors in watch operations.
expr: |
(sum(rate(kube_state_metrics_watch_total{job="kube-state-metrics",result="error"}[5m]))
/

View File

@@ -60,6 +60,12 @@ spec:
regex: container_(network_tcp_usage_total|network_udp_usage_total|tasks_state|cpu_load_average_10s)
sourceLabels:
- __name__
- action: drop
regex: (container_fs_.*|container_spec_.*|container_blkio_device_usage_total|container_file_descriptors|container_sockets|container_threads_max|container_threads|container_start_time_seconds|container_last_seen);;
sourceLabels:
- __name__
- pod
- namespace
path: /metrics/cadvisor
port: https-metrics
relabelings: