Commit Graph

1513 Commits

Author SHA1 Message Date
Frederic Branczyk 2539ba9548 Merge pull request #621 from tafkam/master
secure metrics port for scheduler and controller-manager
2020-07-27 10:46:17 +02:00
root 3a6a0d0837 make generate 2020-07-27 10:29:31 +02:00
tafkam 6dfbcf35f2 port https-metrics 2020-07-27 10:27:14 +02:00
tafkam c1304caa28 update secure ports for other cluster 2020-07-25 18:30:07 +02:00
tafkam 4410a80e4e secure scheduler/controller metrics ports, kubeadm discovery services 2020-07-25 18:27:17 +02:00
Frederic Branczyk 40adbfae6c Merge pull request #617 from paulfantom/node_filesystem_usage
Remove instance:node_filesystem_usage:sum
2020-07-23 21:25:55 +02:00
Frederic Branczyk ba5c6e2e6a Merge pull request #618 from simonpasquier/bump-thanos
jsonnet: update component versions
2020-07-23 21:24:48 +02:00
Frederic Branczyk d67c5da75e Merge pull request #620 from adinhodovic/regenerate-dashboards-rules
Regenerate dashboards and prometheus alerts
2020-07-23 21:04:47 +02:00
Adin Hodovic 6a34239786 Regenerate dashboards and alerts
Merged https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/463 to remove duplicate entries for memory usage, however I'd like to move these changes to the Prometheus-Operator helm chart(https://github.com/helm/charts/pull/23024#issuecomment-661967101). I've regenerated the dashboards/alerts.
2020-07-23 18:36:41 +02:00
Simon Pasquier a9ffdaa35c manifests: regenerate 2020-07-23 18:04:56 +02:00
Simon Pasquier fcf7a2fcbf jsonnet: update component versions 2020-07-23 17:06:48 +02:00
paulfantom 550d42d95b manifests: regenerate 2020-07-23 16:51:35 +02:00
paulfantom 4e116aa7e2 jsonnet: remove incorrect instance:node_filesystem_usage:sum rule 2020-07-23 16:50:27 +02:00
Frederic Branczyk b55c2825f7 Merge pull request #610 from lilic/add-more-alerts
Add PrometheusOperatorListErrors and fix PrometheusOperatorWatchErrors threshold
2020-07-15 13:19:45 +02:00
Lili Cosic d88cb26377 manifests/prometheus-rules.yaml: Regenerate 2020-07-15 10:28:03 +02:00
Lili Cosic 5743540fbb prometheus-operator.libsonnet: Add List error alert and fix threshold to
Watch error alert
2020-07-15 10:24:45 +02:00
Frederic Branczyk 1917a57280 Merge pull request #608 from ghostsquad/chore/update-go-jsonnet
chore(jsonnet): ⬆️  update jsonnet to master
2020-07-14 10:10:36 +02:00
Frederic Branczyk 2421e8cbe9 Merge pull request #609 from lilic/add-prom-operator-alerts
prometheus-operator.libsonnet: Add PrometheusOperatorWatchErrors alert
2020-07-14 08:17:32 +02:00
Lili Cosic a5b71282cd manifests/prometheus-rules.yaml: Regenerate 2020-07-13 17:35:36 +02:00
Lili Cosic dfe9184c9b prometheus-operator.libsonnet: Add PrometheusOperatorWatchErrors alert 2020-07-13 17:35:36 +02:00
Weston McNamee 6f4a9e5233 chore(jsonnet): ⬆️ update jsonnet to master
pulls in recent performance improvement changes to speed up rendering

resolves #537
2020-07-12 23:27:36 -07:00
Lili Cosic a87f322edc Merge pull request #605 from lilic/bump-prom-version
jsonnet/kube-prometheus: Bump default versions of prometheus and alertmanager
2020-07-09 12:03:01 +02:00
Lili Cosic 617003a583 manifests: Regenerate files 2020-07-09 11:48:30 +02:00
Lili Cosic 3865eacdb3 jsonnet/kube-prometheus: Bump default versions of prometheus and alertmanager 2020-07-09 11:48:22 +02:00
Frederic Branczyk bce16b41eb Merge pull request #600 from tkashem/etcd-latency-metrics
enable etcd latency metrics in kube-apiserver
2020-07-03 16:20:52 +02:00
Abu Kashem 4d6e3d5c19 enable etcd latency metrics in kube-apiserver
kube-apiserver has a histogram etcd_request_duration_seconds that
measures latency between the kube-apiserver and etcd instance.
This metrics is currently dropped by cluster-prometheus. Enable
this metrics so we have visibility into etcd latency.

We ensured that this does not enable other unwanted metrcis

count by(name) ({name=~"etcd_request.+"})

etcd_request_duration_seconds_bucket
etcd_request_duration_seconds_count
etcd_request_duration_seconds_sum
2020-07-03 09:49:56 -04:00
Matthias Loibl f4568b06dc Merge pull request #594 from metalmatze/discussions
Update the Issue templates to redirect to GitHub Discussions.
2020-06-30 12:58:59 +02:00
Matthias Loibl cc7583fefb Update the Issue templates to redirect to GitHub Discussions. 2020-06-30 10:38:28 +02:00
Frederic Branczyk 176e9659f3 Merge pull request #590 from metalmatze/update-kubernetes-mixin
Update kubernetes-mixin to remove KubeAPILatencyHigh & KubeAPIErrorsHigh
2020-06-30 09:09:53 +02:00
Matthias Loibl ea7a834755 Update kubernetes-mixin to remove KubeAPILatencyHigh & KubeAPIErrorsHigh 2020-06-29 19:43:34 +02:00
Lucas Servén Marín 2c1fc1cc11 Merge pull request #587 from andresterba/fix-typo
Fix typo
2020-06-26 12:58:22 +02:00
André Sterba 829a553e7a Fix typo 2020-06-26 12:17:49 +02:00
Simon Pasquier de9591cbb0 Merge pull request #584 from simonpasquier/bump-grafana-6.7.4
Bump Grafana to v6.7.4
2020-06-24 13:32:26 +02:00
Simon Pasquier 83ebd535e6 manifests: regenerate 2020-06-24 10:55:13 +02:00
Simon Pasquier bbd4e61fc1 Bump Grafana version to v6.7.4 2020-06-24 10:51:35 +02:00
Frederic Branczyk 1d41243b54 Merge pull request #579 from tommyjmquinn/master
Updated prometheus adapter deployment to use a multi arch image repo
2020-06-23 16:09:32 +02:00
Frederic Branczyk b707a94314 Merge pull request #577 from kradalby/master
Make node-exporter listening address configurable
2020-06-23 16:00:51 +02:00
Tom Quinn e82acdb253 Updated prometheus adapter deployment to use a multi arch image repo 2020-06-22 13:57:41 +01:00
Kristoffer Dalby f55a17718d Allow nodeExporter address to be configured 2020-06-21 09:11:16 +01:00
Kristoffer Dalby 6b4bc0bb26 Allow nodeExporter address to be configured 2020-06-21 08:28:48 +01:00
Frederic Branczyk 6f488250fd Merge pull request #576 from simonpasquier/fix-alertmanager-config-inconsistent-alert
Fix AlertmanagerConfigInconsistent alert
2020-06-19 16:20:40 +02:00
Frederic Branczyk 97ca4616ff Merge pull request #575 from stafot/update_adapter_endpoint
Update prometheus-adapter endpoint
2020-06-19 16:08:30 +02:00
Simon Pasquier 0a43e85917 manifests: regenerate 2020-06-19 14:41:11 +02:00
Simon Pasquier c3ea4675da Fix AlertmanagerConfigInconsistent alert
Previously the alert would fire when the number of Alertmanager pods
didn't match the number of replicas defined in the Alertmanager spec
even though all the running pods had the same configuration hash. This
type of issue is already covered by KubeStatefulSetUpdateNotRolledOut
(and possibly KubePodNotReady), having AlertmanagerConfigInconsistent
also active in this situation creates unnecessary noise.

With this change, the alert expression only returns when Alertmanager
pods have different configuration hash values irrespective of the number
of pod replicas. The message annotation has also been enhanced to report
the configuration hash for each pod.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-06-19 14:30:55 +02:00
Stavros Foteinopoulos 3cbc97d782 Update prometheus-adapter endpoint 2020-06-19 15:27:26 +03:00
Lili Cosic 17989b42aa Merge pull request #574 from lilic/bump-prom-op-40
Bump prometheus-operator to v0.40
2020-06-19 11:55:50 +02:00
Lili Cosic beaba9f4da docs, manifests: Regenerate files 2020-06-19 10:30:50 +02:00
Lili Cosic c5ecc42244 jsonnetfile.lock.json: jb update 2020-06-19 10:27:34 +02:00
Lili Cosic 53bb3431ad jsonnet/kube-prometheus/jsonnetfile.json: Bump prometheus-operator to
v0.40
2020-06-19 10:26:55 +02:00
Frederic Branczyk 7e0c503b13 Merge pull request #553 from atmosx/update-grafana-dashboard-docs
Update grafana dashboard docs
2020-05-27 19:09:32 +02:00