Commit Graph

645 Commits

Author SHA1 Message Date
Philip Gough 4a40a2a11c Adjust dropped metrics from cAdvisor
This change drops pod-centric metrics without a non-empty 'container' label.

Previously we dropped pod-centric metrics without a (pod, namespace) label set
however these can be critical for debugging.

Keep 'container_fs_.*' metrics from cAdvisor
2021-09-28 10:18:58 +01:00
Philip Gough 74594f2170 jsonnet: Drop cAdvisor metrics with no (pod, namespace) labels while preserving ability to monitor system services resource usage
The following provides a description and cardinality estimation based on the tests in a local cluster:

container_blkio_device_usage_total - useful for containers, but not for system services (nodes*disks*services*operations*2)
container_fs_.*                    - add filesystem read/write data (nodes*disks*services*4)
container_file_descriptors         - file descriptors limits and global numbers are exposed via (nodes*services)
container_threads_max              - max number of threads in cgroup. Usually for system services it is not limited (nodes*services)
container_threads                  - used threads in cgroup. Usually not important for system services (nodes*services)
container_sockets                  - used sockets in cgroup. Usually not important for system services (nodes*services)
container_start_time_seconds       - container start. Possibly not needed for system services (nodes*services)
container_last_seen                - Not needed as system services are always running (nodes*services)
container_spec_.*                  - Everything related to cgroup specification and thus static data (nodes*services*5)
2021-08-30 12:16:04 +01:00
Philip Gough 710f6aa24d jsonnet: The node exporter should not export data about veth interfaces.
In case of the OVN, the regex was incorrect and was exporting veth metrics.
2021-08-16 10:26:36 +01:00
dgrisonnet b983b579d3 [bot] [release-0.8] Automated version update 2021-08-02 13:37:18 +00:00
Arunprasad Rajkumar 4dfa6f6bc8 sync: Update 0.8 dependencies for kubernetes-mixin and generate
Signed-off-by: Arunprasad Rajkumar <arajkuma@redhat.com>
2021-07-22 18:52:11 +05:30
Sunil Thaha ed87db34b6 jsonnet: kube-prometheus adapt to changes to veth interfaces names
With OVN, the container veth network interface names that used to start
with `veth` has now changed to `<rand-hex>{15}@if<number>`(see Related
Links below).

This patch adapts to the new change introduced in ovn and ignores the network
interfaces that match `[a-z0-9]{15}@if\d+` in addition to those starting
with `veth`

Related Links:
  - https://github.com/openshift/ovn-kubernetes/blob/master/go-controller/vendor/github.com/containernetworking/plugins/pkg/ip/link_linux.go#L107
  - https://github.com/openshift/ovn-kubernetes/blob/master/go-controller/pkg/cni/helper_linux.go#L148

Signed-off-by: Sunil Thaha <sthaha@redhat.com>
(cherry picked from commit 0280f4ddf9)
2021-07-05 22:40:47 +10:00
paulfantom 4bef6d2736 manifests: regenerate 2021-07-05 11:18:11 +01:00
Philip Gough 2d1ffd6459 sync: Update 0.8 dependencies for kubernetes-mixin and generate 2021-07-05 11:01:13 +01:00
Damien Grisonnet 7760c2b801 jsonnet: add PDB to prometheus-adapter
Adding a PodDisruptionBudget to prometheus-adapter ensure that at least
one replica of the adapter is always available. This make sure that even
during disruption the aggregated API is available and thus does not
impact the availability of the apiserver.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-05-05 17:08:18 +02:00
paulfantom 415afa4cc0 *: cut release-0.8
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2021-04-27 13:08:03 +02:00
Paweł Krupa a3d67f5219 Merge pull request #1095 from dgrisonnet/prometheus-adapter-ha
Make prometheus-adapter highly-available
2021-04-22 12:00:39 +02:00
Damien Grisonnet 4c6a06cf7e jsonnet: make prometheus-adapter highly-available
Prometheus-adapter is a component of the monitoring stack that in most
cases require to be highly available. For instance, we most likely
always want the autoscaling pipeline to be available and we also want to
avoid having no available backends serving the metrics API apiservices
has it would result in both the AggregatedAPIDown alert firing and the
kubectl top command not working anymore.

In order to make the adapter highly-avaible, we need to increase its
replica count to 2 and come up with a rolling update strategy and a
pod anti-affinity rule based on the kubernetes hostname to prevent the
adapters to be scheduled on the same node. The default rolling update
strategy for deployments isn't enough as the default maxUnavaible value
is 25% and is rounded down to 0. This means that during rolling-updates
scheduling will fail if there isn't more nodes than the number of
replicas. As for the maxSurge, the default should be fine as it is
rounded up to 1, but for clarity it might be better to just set it to 1.
For the pod anti-affinity constraints, it would be best if it was hard,
but having it soft should be good enough and fit most use-cases.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-04-22 09:57:14 +02:00
paulfantom 412061ef51 manifests: regenerate 2021-04-21 18:43:01 +02:00
Paweł Krupa 752d1a7fdc Merge pull request #1093 from ArthurSens/as/custom-alerts-description 2021-04-20 19:13:48 +02:00
Jan Fajerski 8b39a459fa update generated assets
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
2021-04-20 14:35:31 +02:00
ArthurSens 72b742d7e8 Regenerate manifests
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2021-04-16 18:06:47 +00:00
Kristijan Sedlak 28d58a9dbc Update versions 2021-04-14 20:19:00 +02:00
Jan Fajerski 1cefb18e55 update generated manifests
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
2021-04-09 11:53:06 +02:00
Paweł Krupa 2ba8d8aca2 Merge pull request #1058 from mansikulkarni96/windows_exporter 2021-04-07 10:07:33 +02:00
mansikulkarni96 7ba0479433 jsonnet: Add windows_exporter queries for adapter
This commit includes windows_exporter metrics in the
node queries for the prometheus adapter configuration.
This will help obtain the resource metrics: memory and
CPU for Windows nodes. This change will also help in
displaying metrics reported through the 'kubectl top'
command which currently reports 'unknown' status for
Windows nodes.
2021-03-29 14:55:11 -04:00
Lili Cosic 0df93109d4 manifests: Regenerate files 2021-03-29 14:32:08 +02:00
viperstars d1f401a73d add cluster role to list and watch ingresses in api group "networking.k8s.io" 2021-03-29 14:19:35 +08:00
paulfantom c960da64bb manifests: regenerate 2021-03-25 14:22:38 +01:00
Paweł Krupa ea12911e4f Merge pull request #1041 from lilic/ksm-2.0.0-rc.0 2021-03-25 14:18:27 +01:00
Jan Fajerski 9966c37573 update generated manifests
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
2021-03-24 16:52:30 +01:00
Paweł Krupa 63e20afe98 Merge pull request #1038 from paulfantom/prom-op-0.46 2021-03-19 16:11:41 +01:00
ArthurSens 2fa7ef162f Add externalLabels on Prometheus defaults
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2021-03-18 18:36:10 +00:00
Lili Cosic 09b30e124f manifests: Regenerate 2021-03-18 09:30:35 +01:00
paulfantom 8b877c1753 manifests: regenerate 2021-03-16 18:48:58 +01:00
paulfantom 8b30b2b669 manifests: regenerate 2021-03-16 15:19:18 +01:00
paulfantom 9268851d8b *: regenerate 2021-03-15 16:34:29 +01:00
Matthias Loibl 8e5bf00c54 Merge pull request #984 from paulfantom/am_resources
jsonnet/alertmanager: add default alertmanager resource requirements
2021-03-08 10:20:20 +01:00
ArthurSens bb2971e874 Add runbook_url annotation for custom mixins
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2021-03-05 14:07:01 +00:00
ArthurSens e586afb280 Add runbook_url annotation to all alerts
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2021-03-05 13:39:40 +00:00
s-urbaniak 654283a048 Auto-updated dependencies 2021-03-03 08:38:39 +00:00
paulfantom e13ec2e448 manifests: regenerate 2021-03-01 13:27:17 +01:00
paulfantom d753169176 manifests: regenerate 2021-02-25 18:52:31 +01:00
paulfantom c229d9d34c manifests: regenerate 2021-02-23 12:09:07 +01:00
Paweł Krupa f691421c91 Merge pull request #960 from paulfantom/k8s-control-plane
Do not modify $.prometheus object when it is not needed (k8s control plane)
2021-02-23 10:30:17 +01:00
Frederic Branczyk da05d36c31 Merge pull request #941 from paulfantom/ksm-krp-cpu
increase default CPU values for main kube-rbac-proxy sidecar in kube-state-metrics
2021-02-23 09:50:16 +01:00
paulfantom 390f2d72db manifests: regenerate 2021-02-23 09:36:35 +01:00
paulfantom 66e4a7ba15 *: regenerate 2021-02-22 16:38:34 +01:00
Maxime Brunet f039fc94cf Ensure Prometheus ServiceMonitor is unique 2021-02-19 17:09:52 -08:00
Paweł Krupa daad0e1fae Merge pull request #925 from shreyashah1903/fix-kubelet-label
kubelet: Update label selector
2021-02-19 10:19:35 +01:00
paulfantom 0fbf8e03e0 manifests: regenerate 2021-02-12 09:40:22 +01:00
paulfantom e40e42cf72 manifests: regenerate 2021-02-10 12:07:32 +01:00
Shreya Shah ff3e0e1ee4 Update kubelet label selector 2021-02-09 17:52:54 +05:30
paulfantom fc1a03053d manifests: regenerate 2021-02-06 19:58:55 +01:00
Lili Cosic 73db89874e Merge pull request #914 from paulfantom/typo
jsonnet: remove superfluous quotation mark
2021-02-05 16:48:13 +01:00
paulfantom f8bae9fd96 manifests: regenerate 2021-02-04 14:43:23 +01:00