The default value of 30 is too low for the repository and the action
fails to process all issues and pull requests.
This change also modifies when the job is executed to avoid running it
at the same time than prometheus-operator/prometheus-operator and
experience quota issues with the GitHub API.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
This commit adds the options `containerMetricsPrefix`
to the prometheus-adapter config-map generator. By default this option
is the empty string and doesn't change the current behavior. If set
however to e.g. `pa_`, the prometheus-adapter configuration will add
this prefix to all container_ queries in the resource rules.
This enables users of kube-prometheus to define a specialised service
monitor, that only expose the prometheus-adapter related container metrics with a
different configuration, like `honorTimestamps: true` or a tighter
scrape interval.
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
The `kube` page index gets a weight of 300 and its child pages should
get weight values starting > 300.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Problem: Currently the prometheus-adapter pods are restarted at the same
time even though the deployment is configured with strategy RollingUpdate.
This happens because the kubelet does not know when the prometheus-adapter
pods are ready to start receiving requests.
Solution: Add both readinessProbe and livenessProbe to the
prometheus-adapter, this way the kubelet will know when either the pod
stoped working and should be restarted or simply when it ready to start
receiving requests.
Issue: https://bugzilla.redhat.com/show_bug.cgi?id=2048333
The following metrics are missing from kube-state-metrics:
- kube_pod_container_status_terminated_reason
- kube_pod_init_container_status_terminated_reason
- kube_pod_status_scheduled_time
Previously, some metrics were removed from kube-state-metrics by adding the following --metric-denylist argument to the kube-state-metrics container
```
--metric-denylist=
kube_.+_created,
kube_.+_metadata_resource_version,
kube_replicaset_metadata_generation,
kube_replicaset_status_observed_generation,
kube_pod_restart_policy,
kube_pod_init_container_status_terminated,
kube_pod_init_container_status_running,
kube_pod_container_status_terminated,
kube_pod_container_status_running,
kube_pod_completion_time,
kube_pod_status_scheduled
```
--metric-denylist: Comma-separated list of metrics not to be enabled. This list comprises of exact metric names and/or regex patterns. The allowlist and denylist are mutually exclusive.
However, all the list of metrics is managed as RegEx, thus "kube_pod_container_status_terminated" denies .*kube_pod_container_status_terminated.*, that's why kube_pod_init_container_status_terminated_reason is missing
Co-authored-by: Florian Gleizes <fgleizes@redhat.com>
Signed-off-by: Arunprasad Rajkumar <arajkuma@redhat.com>
Although containers that do not run as privileged already have this set to false by kubernetes
Kubespace [asks us](https://hub.armo.cloud/docs/c-0016) to explicitly declare it to false where not needed.
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
This commit removes existing `thanosSelector` and exposes a single config variable `mixin._config.thanos` to customize thanos sidecar mixins. It follows same structure as d2d74dac98/mixin/config.libsonnet
Signed-off-by: Arunprasad Rajkumar <arajkuma@redhat.com>
With the objective of improving our README, customization examples are being moved to a dedicated folder under `docs/`.
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
Since `release-0.8` resources has become a first-class object to all components of kube-prometheus. Therefore, we're simplifying this addon to reflect those changes.
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
Add the links to the changelogs of the freshly updated components to the
automated PR that does the update. This allow verifying that we are not
missing any important changes before merging the update. This happened
recently with node-exporter 1.2 which changed some flag names that we
took 3 months to update.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
The current implementation of prometheus-adapter exposes a lot of
metrics about the health of its aggregated apiserver. The issue is that
the some of these metrics are not very useful in the context of
prometheus-adapter, and we currently can't avoid exposing them since
they are registered to the Kubernetes global Prometheus registry. Until
this is improved in upstream Kubernetes, we could benefit from dropping
some of the metrics that are not very useful.
Before this change, in a default kube-prometheus installation, we would
have 800+ series for prometheus-adapter against 400+, so we divided the
number of series by two will focusing on the most valuable metrics for
prometheus-adapter.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
This commit pulls latest changes from thanos mixins and sets `thanosPrometheusCommonDimensions`
to `namespace, pod` for k8s use case.
Refer https://github.com/thanos-io/thanos/pull/4508 for more details.
Signed-off-by: Arunprasad Rajkumar <arajkuma@redhat.com>
This change drops pod-centric metrics without a non-empty 'container' label.
Previously we dropped pod-centric metrics without a (pod, namespace) label set
however these can be critical for debugging.
Instead of running kubeconform on only one version of Kubernetes, it
would be better to run it against the 2 latests versions of Kubernetes
that kube-prometheus supports, so that the validation will be in line
with our support policy.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
Reduce threshold of NodeFilesystemSpaceFillingUp warning alert to 20% space available, instead of 40% (default).
This will align the threshold according to default kubelet GC values
below[1],
"imageMinimumGCAge": "2m0s",
"imageGCHighThresholdPercent": 85,
"imageGCLowThresholdPercent": 80,
[1] https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
Signed-off-by: Arunprasad Rajkumar <arajkuma@redhat.com>
According to our policy, main branch of kube-prometheus should support
the 2 latest versions of Kubernetes. These changes update the tests and
the compatibility matrix to reflect that.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
Automated dependencies update in CI was failing whenever no new changes
were detected since git diff was returning 1.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
Temporarily switch to manual dependencies update workflow to test if it
is updated correctly across the different release branch.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
The following provides a description and cardinality estimation based on the tests in a local cluster:
container_blkio_device_usage_total - useful for containers, but not for system services (nodes*disks*services*operations*2)
container_fs_.* - add filesystem read/write data (nodes*disks*services*4)
container_file_descriptors - file descriptors limits and global numbers are exposed via (nodes*services)
container_threads_max - max number of threads in cgroup. Usually for system services it is not limited (nodes*services)
container_threads - used threads in cgroup. Usually not important for system services (nodes*services)
container_sockets - used sockets in cgroup. Usually not important for system services (nodes*services)
container_start_time_seconds - container start. Possibly not needed for system services (nodes*services)
container_last_seen - Not needed as system services are always running (nodes*services)
container_spec_.* - Everything related to cgroup specification and thus static data (nodes*services*5)
Previously, prometheus-adapter configuration wasn't taking into account
the scrape interval of kubelet, node-exporter and windows-exporter
leading to getting non fresh results, and even negative results from the
CPU queries when the irate() function was extrapolating data.
To fix that, we want to set the interval used in the irate() function in
the CPU queries to 4x scrape interval in order to extrapolate data
between the last two scrapes. This will improve the freshness of the cpu
usage exposed and prevent incorrect extrapolations.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
Make target `manifests` has a dependency on build.sh which if untouched
wouldn't generate the manifests after the first run. This patch fixes it
by removing the `build.sh` dependency
Signed-off-by: Sunil Thaha <sthaha@redhat.com>
Alertmanager changed its default branch to main.
This commit updates the alertmanager branch to track the new default.
Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
Running sslscan against the prometheus adapter secure port reports two
insecure SSL ciphers, ECDHE-RSA-DES-CBC3-SHA and DES-CBC3-SHA.
This commit removes those ciphers from the list.
Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
`imageRepos` field was removed and the project no longer tries to compose image strings. Now the libraries use `$.values.common.images` to override default images.
This change updates the version of kubeconform to 0.4.7. It simplifies the
`validate` Makefile target and extracts the kubernetes version into a variable.
Adding a PodDisruptionBudget to prometheus-adapter ensure that at least
one replica of the adapter is always available. This make sure that even
during disruption the aggregated API is available and thus does not
impact the availability of the apiserver.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
Export the antiaffinity function of the anti-affinity addon to make it
possible to extend the addon to component that are not present in the
kube-prometheus stack.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
Prometheus-adapter is a component of the monitoring stack that in most
cases require to be highly available. For instance, we most likely
always want the autoscaling pipeline to be available and we also want to
avoid having no available backends serving the metrics API apiservices
has it would result in both the AggregatedAPIDown alert firing and the
kubectl top command not working anymore.
In order to make the adapter highly-avaible, we need to increase its
replica count to 2 and come up with a rolling update strategy and a
pod anti-affinity rule based on the kubernetes hostname to prevent the
adapters to be scheduled on the same node. The default rolling update
strategy for deployments isn't enough as the default maxUnavaible value
is 25% and is rounded down to 0. This means that during rolling-updates
scheduling will fail if there isn't more nodes than the number of
replicas. As for the maxSurge, the default should be fine as it is
rounded up to 1, but for clarity it might be better to just set it to 1.
For the pod anti-affinity constraints, it would be best if it was hard,
but having it soft should be good enough and fit most use-cases.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
This commit adds a relabeling config to the scrape config of
windows-exporter using the 'replace' action field to replace
the node endpoint address with node name. The windows_exporter
returns endpoint target as node IP but we need it to be node name
to use the prometheus adapter queries and collect resource metrics
information.
This commit includes windows_exporter metrics in the
node queries for the prometheus adapter configuration.
This will help obtain the resource metrics: memory and
CPU for Windows nodes. This change will also help in
displaying metrics reported through the 'kubectl top'
command which currently reports 'unknown' status for
Windows nodes.
Replace Watchdog alerts part of the `example-group` in some examples by
ExampleAlert alerts to reinforce the fact that this is just an example.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
* Extract ServiceMonitors related to k8s control plane from prometheus
object into separate one
* Add kubernetes-mixin to new object
Signed-off-by: paulfantom <pawel@krupa.net.pl>
Alertmanager API v2 is available for more than 2 years now, there's no
reason to not use it by default.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
The Kubelet's Service/Endpoints object maintained by the Prometheus
Operator does not have the recommended app label (yet). Therefore we
need to use the old label until a Prometheus Operator version has been
released and integrated in kube-promteheus that does use it.
This change documents where to find documentation and support for the
various components of kube-prometheus.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
The `find` call in the Makefile doesn't actually output any `*.libsonnet` file due to the way `find` handles operators. This was discovered using GNU `find` on a Mac. From the manpages:
> Please note that -a when specified implicitly (for example by two tests appearing without an explicit operator between them) or explicitly has higher precedence than -o. This means that find . -name afile -o -name bfile -print will never print afile.
A simple addition of `-print` to force the print fixes the issue.
There are still some dependencies that we need to make work to fully
deactivate the legacyImports in the future. I'll start opening PRs
against those other repositories.
This could have been achieved either by switching to stringData, or doing
`std.base64(std.encodeUTF8($._config.alertmanager.config))` as per
google/jsonnet#575
I went with the former, because it's:
1. Easier to read existing config
2. Consistent with the way jsonnet object-based config is written just above
Expose the Thanos Sidecar's HTTP metrics and gRPC StoreAPI interfaces via a dedicated service. ClusterIP is set to none to allow for full discovery of all endpoints behind the service via in-cluster DNS. This allows for a new ServiceMonitor to then scrape the metrics available on the Sidecar's HTTP port.
A new service has been used so as to separate out the metrics that Prometheus makes available, and the metrics the Thanos Sidecar makes available. The Thanos Mixins from thanos-io/thanos default to a job label of 'thanos-sidecar', and hence the service here has had this label applied.
kube-apiserver has a histogram etcd_request_duration_seconds that
measures latency between the kube-apiserver and etcd instance.
This metrics is currently dropped by cluster-prometheus. Enable
this metrics so we have visibility into etcd latency.
We ensured that this does not enable other unwanted metrcis
count by(name) ({name=~"etcd_request.+"})
etcd_request_duration_seconds_bucket
etcd_request_duration_seconds_count
etcd_request_duration_seconds_sum
Previously the alert would fire when the number of Alertmanager pods
didn't match the number of replicas defined in the Alertmanager spec
even though all the running pods had the same configuration hash. This
type of issue is already covered by KubeStatefulSetUpdateNotRolledOut
(and possibly KubePodNotReady), having AlertmanagerConfigInconsistent
also active in this situation creates unnecessary noise.
With this change, the alert expression only returns when Alertmanager
pods have different configuration hash values irrespective of the number
of pod replicas. The message annotation has also been enhanced to report
the configuration hash for each pod.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Instructions to add Grafana dashboard do not work. The proposed
functions are wrong, according to
[grafana.libsonnet](https://github.com/brancz/kubernetes-grafana/blob/master/grafana/grafana.libsonnet)
`dashboards` and `rawDashboards` should be used in `grafana+::`
field.
This PR updates the existing documentation and fixes minor typos.
This readme update includes two changes:
1) Update the kubelet config requirements to mention the modern (non-deprecated) kubelet configuration values that can be used in place of the flags
2) Update the compatibility matrix to mention the issue running release-0.4 on kubernetes versions 1.16.2 through 1.16.4, including a workaround.
Ignore kubelet pod filesystem mounts of the form:
/var/lib/kubelet/pods/1b260ce7-e75d-44d4-8409-922d2bd0851f/volumes...
Metrics for these volumes are available via the kubelet_volume_stats*
metrics.
stale-issue-message:'This issue has been automatically marked as stale because it has not had any activity in the last 60 days. Thank you for your contributions.'
close-issue-message:'This issue was closed because it has not had any activity in the last 120 days. Please reopen if you feel this is still valid.'
* [CHANGE] Updates Prometheus Adapater version to 0.10.0 [#1865](https://github.com/prometheus-operator/kube-prometheus/pull/1865)
* [FEATURE] Added a AKS platform [#1869](https://github.com/prometheus-operator/kube-prometheus/pull/1869)
* [BUGFIX] Update Pyrra to 0.4.2 [#1800](https://github.com/prometheus-operator/kube-prometheus/pull/1800)
* [BUGFIX] Jsonnet: enable automountServiceAccountToken for prometheus service account [#1808](https://github.com/prometheus-operator/kube-prometheus/pull/1808)
* [BUGFIX] Fix diskDeviceSelector regex for aks and eks [#1810](https://github.com/prometheus-operator/kube-prometheus/pull/1810)
* [BUGFIX] Set path.udev.data Argument of Node Exporter [#1913](https://github.com/prometheus-operator/kube-prometheus/pull/1913)
* [BUGFIX] Include RAID device md.* in disk seletor [#1945](https://github.com/prometheus-operator/kube-prometheus/pull/1945)
* [ENHANCEMENT] Prometheus-adapter: add prefix option to config for container metrics [#1844](https://github.com/prometheus-operator/kube-prometheus/pull/1844)
* [ENHANCEMENT] Switch kube-state-metrics registry to registry.k8s.io [#1914](https://github.com/prometheus-operator/kube-prometheus/pull/1914)
* [FEATURE] Add example usage of prometheus-agent [#1472](https://github.com/prometheus-operator/kube-prometheus/pull/1472)
* [FEATURE] Add Pyrra as (optional) component [#1667](https://github.com/prometheus-operator/kube-prometheus/pull/1667)
* [ENHANCEMENT] Adds NetworkPolicies to all components of Kube-prometheus [#1650](https://github.com/prometheus-operator/kube-prometheus/pull/1650)
* [ENHANCEMENT] Scan generated manifests with kubescape in CI [#1584](https://github.com/prometheus-operator/kube-prometheus/pull/1584)
* [ENHANCEMENT] Explicitly declare allowPrivilegeEscalation to false in all components [#1593](https://github.com/prometheus-operator/kube-prometheus/pull/1593)
* [ENHANCEMENT] Forbid write access to root filesystem [#1600](https://github.com/prometheus-operator/kube-prometheus/pull/1600)
* [ENHANCEMENT] Drop Linux capabilities, , just keeping CAP_SYS_TIME for node-exporter [#1610](https://github.com/prometheus-operator/kube-prometheus/pull/1610)
* [ENHANCEMENT] Remove hostPort from node-export daemonset [#1612](https://github.com/prometheus-operator/kube-prometheus/pull/1612)
* [ENHANCEMENT] Add priorityClassName as system-cluster-critical for node_exporter [#1649](https://github.com/prometheus-operator/kube-prometheus/pull/1649)
* [ENHANCEMENT] Added custom overrides for kube-rbac-proxy-self [#1637](https://github.com/prometheus-operator/kube-prometheus/pull/1637)
* [ENHANCEMENT] Adds readinessProbe and livenessProbe to prometheus-adapter jsonnet [#1696](https://github.com/prometheus-operator/kube-prometheus/pull/1696)
* [BUGFIX] Update kubeadm integration of kube-prometheus [#1569](https://github.com/prometheus-operator/kube-prometheus/pull/1569)
* [BUGFIX] Add projected volumes permission to addon/podsecuritypolicie [#1572](https://github.com/prometheus-operator/kube-prometheus/pull/1572)
* [BUGFIX] Hide namespace for prometheus clusterRole and clusterRolebinding [#1566](https://github.com/prometheus-operator/kube-prometheus/pull/1566)
* [BUGFIX] Fix accidentally broken thanosSelector after #1543 [#1556](https://github.com/prometheus-operator/kube-prometheus/pull/1556)
* [BUGFIX] Jsonnet: filter out kube-proxy alerts when kube-proxy is disabled [#1609](https://github.com/prometheus-operator/kube-prometheus/pull/1609)
* [BUGFIX] Sanitize regex denylist in ksm-lite addon [#1613](https://github.com/prometheus-operator/kube-prometheus/pull/1613)
* [BUGFIX] Sanitize all regex denylist in ksm-lite addon [#1614](https://github.com/prometheus-operator/kube-prometheus/pull/1614)
* [BUGFIX] Add extra-volume mount for plugins downloads [#1624](https://github.com/prometheus-operator/kube-prometheus/pull/1624)
* [BUGFIX] Added allowedCapabilities to node-exporter psp [#1642](https://github.com/prometheus-operator/kube-prometheus/pull/1642)
* [CHANGE] Make filesystem ignored mount points configurable for node-exporter [#1376](https://github.com/prometheus-operator/kube-prometheus/pull/1376)
* [CHANGE] Drop some high cardinality cAdvisor metrics [#1406](https://github.com/prometheus-operator/kube-prometheus/pull/1406), [#1396](https://github.com/prometheus-operator/kube-prometheus/pull/1396)
* [CHANGE] Use `--collector.filesystem.mount-points-exclude` instead of deprecated `--collector.filesystem.ignored-mount-points` argument for `node-exporter` [#1407](https://github.com/prometheus-operator/kube-prometheus/pull/1407)
* [CHANGE] Drop some of prometheus-adapter metrics that are inherited from the apiserver code but aren't useful in the context of prometheus-adapter [#1409](https://github.com/prometheus-operator/kube-prometheus/pull/1409)
* [CHANGE] Remove "app" label selector deprecated by Prometheus-operator [#1420](https://github.com/prometheus-operator/kube-prometheus/pull/1420)
* [CHANGE] Use recommended instance label for Prometheus/Alertmanager resources [#1520](https://github.com/prometheus-operator/kube-prometheus/pull/1520)
* [CHANGE] Drop deprecated apiserver_longrunning_gauge and apiserver_registered_watchers metrics [#1553](https://github.com/prometheus-operator/kube-prometheus/pull/1553)
* [CHANGE] Drop deprecated coredns_cache_misses_total [#1553](https://github.com/prometheus-operator/kube-prometheus/pull/1553)
* [ENHANCEMENT] Add support for LDAP authentication in Grafana [#1455](https://github.com/prometheus-operator/kube-prometheus/pull/1445)
* [ENHANCEMENT] Include rewritten kubernetes-grafana for easier usage of new library features [#1450](https://github.com/prometheus-operator/kube-prometheus/pull/1450)
* [ENHANCEMENT] Specify default container in node-exporter pod [#1462](https://github.com/prometheus-operator/kube-prometheus/pull/1462)
* [ENHANCEMENT] Make metadata consistent across objects in the same component [#1471](https://github.com/prometheus-operator/kube-prometheus/pull/1471)
* [ENHANCEMENT] Establish convention for default field types [#1475](https://github.com/prometheus-operator/kube-prometheus/pull/1475)
* [ENHANCEMENT] Alertmanager now uses the new `matcher` syntax in the routing tree and inhibition rules [#1508](https://github.com/prometheus-operator/kube-prometheus/pull/1508)
* [ENHANCEMENT] Deprecate `thanosSelector` and expose `mixin._config.thanos` config variable for thanos sidecar [#1543](https://github.com/prometheus-operator/kube-prometheus/pull/1543)
* [ENHANCEMENT] Added configurable default values for sidecar container kube-rbac-proxy-self in deployment kube-statate-metrics. [#1637](https://github.com/prometheus-operator/kube-prometheus/pull/1637)
* [FEATURE] Support scraping config-reloader sidecar for Prometheus and AlertManager StatefulSets [#1344](https://github.com/prometheus-operator/kube-prometheus/pull/1344)
* [FEATURE] Expose prometheus alerting configuration in $.values.prometheus configuration [#1476](https://github.com/prometheus-operator/kube-prometheus/pull/1476)
* [BUGFIX] Remove deprecated policy/v1beta1 Kubernetes API [#1433](https://github.com/prometheus-operator/kube-prometheus/pull/1433)
* [BUGFIX] Fix prometheus URL in prometheus-adapter [#1463](https://github.com/prometheus-operator/kube-prometheus/pull/1463)
* [BUGFIX] Always use proper values scope for namespace in addons [#1518](https://github.com/prometheus-operator/kube-prometheus/pull/1518)
* [BUGFIX] Fix default empty groups for k8s PrometheusRule [#1534](https://github.com/prometheus-operator/kube-prometheus/pull/1534)
## release-0.9 / 2021-08-19
* [CHANGE] Test against Kubernetes 1.21 and 1,22. #1161#1337
* [CHANGE] Drop cAdvisor metrics without (pod, namespace) label pairs. #1250
> Note that everything is experimental and may change significantly at any time.
This repository collects Kubernetes manifests, [Grafana](http://grafana.com/) dashboards, and [Prometheus rules](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with [Prometheus](https://prometheus.io/) using the Prometheus Operator.
@@ -8,75 +12,80 @@ The content of this project is written in [jsonnet](http://jsonnet.org/). This p
Components included in this package:
* The [Prometheus Operator](https://github.com/coreos/prometheus-operator)
* The [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator)
* Highly available [Prometheus](https://prometheus.io/)
* Highly available [Alertmanager](https://github.com/prometheus/alertmanager)
This stack is meant for cluster monitoring, so it is pre-configured to collect metrics from all Kubernetes components. In addition to that it delivers a default set of dashboards and alerting rules. Many of the useful dashboards and alerts come from the [kubernetes-mixin project](https://github.com/kubernetes-monitoring/kubernetes-mixin), similar to this project it provides composable jsonnet as a library for users to customize to their needs.
You will need a Kubernetes cluster, that's it! By default it is assumed, that the kubelet uses token authentication and authorization, as otherwise Prometheus needs a client certificate, which gives it full access to the kubelet, rather than just the metrics. Token authentication and authorization allows more fine grained and easier access control.
This means the kubelet configuration must contain these flags:
*`--authentication-token-webhook=true` This flag enables, that a `ServiceAccount` token can be used to authenticate against the kubelet(s).
*`--authorization-mode=Webhook` This flag enables, that the kubelet will perform an RBAC request with the API to determine, whether the requesting entity (Prometheus in this case) is allow to access a resource, in specific for this project the `/metrics` endpoint.
*`--authentication-token-webhook=true` This flag enables, that a `ServiceAccount` token can be used to authenticate against the kubelet(s). This can also be enabled by setting the kubelet configuration value `authentication.webhook.enabled` to `true`.
*`--authorization-mode=Webhook` This flag enables, that the kubelet will perform an RBAC request with the API to determine, whether the requesting entity (Prometheus in this case) is allowed to access a resource, in specific for this project the `/metrics` endpoint. This can also be enabled by setting the kubelet configuration value `authorization.mode` to `Webhook`.
This stack provides [resource metrics](https://github.com/kubernetes/metrics#resource-metrics-api) by deploying the [Prometheus Adapter](https://github.com/DirectXMan12/k8s-prometheus-adapter/).
This adapter is an Extension API Server and Kubernetes needs to be have this feature enabled, otherwise the adapter has no effect, but is still deployed.
This stack provides [resource metrics](https://github.com/kubernetes/metrics#resource-metrics-api) by deploying
the [Prometheus Adapter](https://github.com/kubernetes-sigs/prometheus-adapter).
This adapter is an Extension API Server and Kubernetes needs to be have this feature enabled, otherwise the adapter has
no effect, but is still deployed.
## Compatibility
The following Kubernetes versions are supported and work as we test against these versions in their respective branches. But note that other versions might work!
> Note: For versions before Kubernetes v1.21.z refer to the [Kubernetes compatibility matrix](#compatibility) in order to choose a compatible branch.
This project is intended to be used as a library (i.e. the intent is not for you to create your own modified copy of this repository).
Though for a quickstart a compiled version of the Kubernetes [manifests](manifests) generated with this library (specifically with `example.jsonnet`) is checked into this repository in order to try the content out quickly. To try out the stack un-customized run:
* Create the monitoring stack using the config in the `manifests` directory:
```shell
# Create the namespace and CRDs, and then wait for them to be available before creating the remaining resources
# Note that due to some CRD size we are using kubectl server-side apply feature which is generally available since kubernetes 1.22.
# If you are using previous kubernetes versions this feature may not be available and you would need to use kubectl create instead.
kubectl apply --server-side -f manifests/setup
kubectl wait\
--for condition=Established \
--all CustomResourceDefinition \
--namespace=monitoring
kubectl apply -f manifests/
```
We create the namespace and CustomResourceDefinitions first to avoid race conditions when deploying the monitoring components.
Alternatively, the resources in both folders can be applied with a single command
`kubectl apply --server-side -f manifests/setup -f manifests`, but it may be necessary to run the command multiple times for all components to
The kube-prometheus stack includes a resource metrics API server, so the metrics-server addon is not necessary. Ensure the metrics-server addon is disabled on minikube:
@@ -85,635 +94,29 @@ The kube-prometheus stack includes a resource metrics API server, so the metrics
$ minikube addons disable metrics-server
```
## Compatibility
## Getting started
### Kubernetes compatibility matrix
Before deploying kube-prometheus in a production environment, read:
>Note: For versions before Kubernetes v1.17.0 refer to the [Kubernetes compatibility matrix](#kubernetes-compatibility-matrix) in order to choose a compatible branch.
This project is intended to be used as a library (i.e. the intent is not for you to create your own modified copy of this repository).
Though for a quickstart a compiled version of the Kubernetes [manifests](manifests) generated with this library (specifically with `example.jsonnet`) is checked into this repository in order to try the content out quickly. To try out the stack un-customized run:
* Create the monitoring stack using the config in the `manifests` directory:
```shell
# Create the namespace and CRDs, and then wait for them to be availble before creating the remaining resources
kubectl create -f manifests/setup
until kubectl get servicemonitors --all-namespaces ;do date; sleep 1;echo"";done
kubectl create -f manifests/
```
We create the namespace and CustomResourceDefinitions first to avoid race conditions when deploying the monitoring components.
Alternatively, the resources in both folders can be applied with a single command
`kubectl create -f manifests/setup -f manifests`, but it may be necessary to run the command multiple times for all components to
Prometheus, Grafana, and Alertmanager dashboards can be accessed quickly using `kubectl port-forward` after running the quickstart via the commands below. Kubernetes 1.10 or later is required.
> Note: There are instructions on how to route to these pods behind an ingress controller in the [Exposing Prometheus/Alermanager/Grafana via Ingress](#exposing-prometheusalermanagergrafana-via-ingress) section.
Then access via [http://localhost:9093](http://localhost:9093)
## Customizing Kube-Prometheus
This section:
* describes how to customize the kube-prometheus library via compiling the kube-prometheus manifests yourself (as an alternative to the [Quickstart section](#Quickstart)).
* still doesn't require you to make a copy of this entire repository, but rather only a copy of a few select files.
### Installing
The content of this project consists of a set of [jsonnet](http://jsonnet.org/) files making up a library to be consumed.
Install this library in your own project with [jsonnet-bundler](https://github.com/jsonnet-bundler/jsonnet-bundler#install) (the jsonnet package manager):
```shell
$ mkdir my-kube-prometheus;cd my-kube-prometheus
$ jb init # Creates the initial/empty `jsonnetfile.json`
# Install the kube-prometheus dependency
$ jb install github.com/coreos/kube-prometheus/jsonnet/kube-prometheus@release-0.4 # Creates `vendor/` & `jsonnetfile.lock.json`, and fills in `jsonnetfile.json`
```
> `jb` can be installed with `go get github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb`
> An e.g. of how to install a given version of this library: `jb install github.com/coreos/kube-prometheus/jsonnet/kube-prometheus@release-0.4`
In order to update the kube-prometheus dependency, simply use the jsonnet-bundler update functionality:
```shell
$ jb update
```
### Compiling
e.g. of how to compile the manifests: `./build.sh example.jsonnet`
> before compiling, install `gojsontoyaml` tool with `go get github.com/brancz/gojsontoyaml`
Here's [example.jsonnet](example.jsonnet):
> Note: some of the following components must be configured beforehand. See [configuration](#configuration) and [customization-examples](#customization-examples).
> Note you need `jsonnet` (`go get github.com/google/go-jsonnet/cmd/jsonnet`) and `gojsontoyaml` (`go get github.com/brancz/gojsontoyaml`) installed to run `build.sh`. If you just want json output, not yaml, then you can skip the pipe and everything afterwards.
This script runs the jsonnet code, then reads each key of the generated json and uses that as the file name, and writes the value of that key to that file, and converts each json manifest to yaml.
### Apply the kube-prometheus stack
The previous steps (compilation) has created a bunch of manifest files in the manifest/ folder.
Now simply use `kubectl` to install Prometheus and Grafana as per your configuration:
```shell
# Update the namespace and CRDs, and then wait for them to be availble before creating the remaining resources
$ kubectl apply -f manifests/setup
$ kubectl apply -f manifests/
```
Alternatively, the resources in both folders can be applied with a single command
`kubectl apply -Rf manifests`, but it may be necessary to run the command multiple times for all components to
be created successfullly.
Check the monitoring namespace (or the namespace you have specific in `namespace: `) and make sure the pods are running. Prometheus and Grafana should be up and running soon.
### Containerized Installing and Compiling
If you don't care to have `jb` nor `jsonnet` nor `gojsontoyaml` installed, then use `quay.io/coreos/jsonnet-ci` container image. Do the following from this `kube-prometheus` directory:
You may wish to fetch changes made on this project so they are available to you.
### Update jb
`jb` may have been updated so it's a good idea to get the latest version of this binary:
```shell
$ go get -u github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb
```
### Update kube-prometheus
The command below will sync with upstream project:
```shell
$ jb update
```
### Compile the manifests and apply
Once updated, just follow the instructions under "Compiling" and "Apply the kube-prometheus stack" to apply the changes to your cluster.
## Configuration
Jsonnet has the concept of hidden fields. These are fields, that are not going to be rendered in a result. This is used to configure the kube-prometheus components in jsonnet. In the example jsonnet code of the above [Customizing Kube-Prometheus section](#customizing-kube-prometheus), you can see an example of this, where the `namespace` is being configured to be `monitoring`. In order to not override the whole object, use the `+::` construct of jsonnet, to merge objects, this way you can override individual settings, but retain all other settings and defaults.
These are the available fields with their respective default values:
collectors: '', // empty string gets a default set
scrapeInterval: '30s',
scrapeTimeout: '30s',
baseCPU: '100m',
baseMemory: '150Mi',
},
nodeExporter+:: {
port: 9100,
},
},
}
```
The grafana definition is located in a different project (https://github.com/brancz/kubernetes-grafana), but needed configuration can be customized from the same top level `_config` field. For example to allow anonymous access to grafana, add the following `_config` section:
Jsonnet is a turing complete language, any logic can be reflected in it. It also has powerful merge functionalities, allowing sophisticated customizations of any kind simply by merging it into the object the library provides.
### Cluster Creation Tools
A common example is that not all Kubernetes clusters are created exactly the same way, meaning the configuration to monitor them may be slightly different. For [kubeadm](examples/jsonnet-snippets/kubeadm.jsonnet), [bootkube](examples/jsonnet-snippets/bootkube.jsonnet), [kops](examples/jsonnet-snippets/kops.jsonnet) and [kubespray](examples/jsonnet-snippets/kubespray.jsonnet) clusters there are mixins available to easily configure these:
Some Kubernetes installations source all their images from an internal registry. kube-prometheus supports this use case and helps the user synchronize every image it uses to the internal registry and generate manifests pointing at the internal registry.
To produce the `docker pull/tag/push` commands that will synchronize upstream images to `internal-registry.com/organization` (after having run the `jb` command to populate the vendor directory):
Standard Kubernetes manifests are all written using [ksonnet-lib](https://github.com/ksonnet/ksonnet-lib/), so they can be modified with the mixins supplied by ksonnet-lib. For example to override the namespace of the node-exporter DaemonSet:
The Alertmanager configuration is located in the `_config.alertmanager.config` configuration field. In order to set a custom Alertmanager configuration simply set this field.
In order to monitor additional namespaces, the Prometheus server requires the appropriate `Role` and `RoleBinding` to be able to discover targets from that namespace. By default the Prometheus server is limited to the three namespaces it requires: default, kube-system and the namespace you configure the stack to run in via `$._config.namespace`. This is specified in `$._config.prometheus.namespaces`, to add new namespaces to monitor, simply append the additional namespaces:
#### Defining the ServiceMonitor for each addional Namespace
In order to Prometheus be able to discovery and scrape services inside the additional namespaces specified in previous step you need to define a ServiceMonitor resource.
> Typically it is up to the users of a namespace to provision the ServiceMonitor resource, but in case you want to generate it with the same tooling as the rest of the cluster monitoring infrastructure, this is a guide on how to achieve this.
You can define ServiceMonitor resources in your `jsonnet` spec. See the snippet bellow:
> NOTE: make sure your service resources has the right labels (eg. `'app': 'myapp'`) applied. Prometheus use kubernetes labels to discovery resources inside the namespaces.
### Static etcd configuration
In order to configure a static etcd cluster to scrape there is a simple [kube-prometheus-static-etcd.libsonnet](jsonnet/kube-prometheus/kube-prometheus-static-etcd.libsonnet) mixin prepared - see [etcd.jsonnet](examples/etcd.jsonnet) for an example of how to use that mixin, and [Monitoring external etcd](docs/monitoring-external-etcd.md) for more information.
> Note that monitoring etcd in minikube is currently not possible because of how etcd is setup. (minikube's etcd binds to 127.0.0.1:2379 only, and within host networking namespace.)
### Pod Anti-Affinity
To prevent `Prometheus` and `Alertmanager` instances from being deployed onto the same node when
possible, one can include the [kube-prometheus-anti-affinity.libsonnet](jsonnet/kube-prometheus/kube-prometheus-anti-affinity.libsonnet) mixin:
### Customizing Prometheus alerting/recording rules and Grafana dashboards
See [developing Prometheus rules and Grafana dashboards](docs/developing-prometheus-rules-and-grafana-dashboards.md) guide.
### Exposing Prometheus/Alermanager/Grafana via Ingress
See [exposing Prometheus/Alertmanager/Grafana](docs/exposing-prometheus-alertmanager-grafana-ingress.md) guide.
## Minikube Example
To use an easy to reproduce example, see [minikube.jsonnet](examples/minikube.jsonnet), which uses the minikube setup as demonstrated in [Prerequisites](#prerequisites). Because we would like easy access to our Prometheus, Alertmanager and Grafana UIs, `minikube.jsonnet` exposes the services as NodePort type services.
## Troubleshooting
### Error retrieving kubelet metrics
Should the Prometheus `/targets` page show kubelet targets, but not able to successfully scrape the metrics, then most likely it is a problem with the authentication and authorization setup of the kubelets.
As described in the [Prerequisites](#prerequisites) section, in order to retrieve metrics from the kubelet token authentication and authorization must be enabled. Some Kubernetes setup tools do not enable this by default.
- If you are using Google's GKE product, see [cAdvisor support](docs/GKE-cadvisor-support.md).
- If you are using AWS EKS, see [AWS EKS CNI support](docs/EKS-cni-support.md).
- If you are using Weave Net, see [Weave Net support](docs/weave-net-support.md).
#### Authentication problem
The Prometheus `/targets` page will show the kubelet job with the error `403 Unauthorized`, when token authentication is not enabled. Ensure, that the `--authentication-token-webhook=true` flag is enabled on all kubelet configurations.
#### Authorization problem
The Prometheus `/targets` page will show the kubelet job with the error `401 Unauthorized`, when token authorization is not enabled. Ensure that the `--authorization-mode=Webhook` flag is enabled on all kubelet configurations.
### kube-state-metrics resource usage
In some environments, kube-state-metrics may need additional
resources. One driver for more resource needs, is a high number of
namespaces. There may be others.
kube-state-metrics resource allocation is managed by
If you have any questions or feedback regarding kube-prometheus, join the [kube-prometheus discussion](https://github.com/prometheus-operator/kube-prometheus/discussions). Alternatively, consider joining [the kubernetes slack #prometheus-operator channel](http://slack.k8s.io/) or project's bi-weekly [Contributor Office Hours](https://docs.google.com/document/d/1-fjJmzrwRpKmSPHtXN5u6VZnn39M28KqyQGBEJsqUOk/edit#).
## License
Apache License 2.0, see [LICENSE](https://github.com/prometheus-operator/kube-prometheus/blob/main/LICENSE).
Once the workflow is completed, the prometheus-operator bot will create some
PRs. You should merge the one prefixed by `[bot][main]` if created before
proceeding. If the bot didn't create the PR, it is either because the workflow
failed or because the main branch was already up-to-date.
## Update Kubernetes supported versions
The main branch of kube-prometheus should support the last 2 versions of
Kubernetes. We need to make sure that the CI on the main branch is testing the
kube-prometheus configuration against both of these versions by updating the [CI
worklow](.github/workflows/ci.yaml) to include the latest kind version and the
2 latest images versions that are attached to the kind release. Once that is
done, the [compatibility matrix](README.md#compatibility) in
the README should also be updated to reflect the CI changes.
## Create pull request to cut the release
### Pin Jsonnet dependencies
Pin jsonnet dependencies in
[jsonnetfile.json](jsonnet/kube-prometheus/jsonnetfile.json). Each dependency
should be pinned to the latest release branch or if it doesn't have one, pinned
to the latest commit.
### Start with a fresh environment
```bash
make clean
```
### Update Jsonnet dependencies
```bash
make update
```
### Generate manifests
```bash
make generate
```
### Update the compatibility matrix
Update the [compatibility matrix](README.md#compatibility) in
the README, by adding the new release based on the `main` branch compatibility
and removing the oldest release branch to only keep the latest 5 branches in the
matrix.
### Update changelog
Iterate over the PRs that were merged between the latest release of kube-prometheus and the HEAD and add the changelog entries to the [CHANGELOG](CHANGELOG.md).
## Create release branch
Once the PR cutting the release is merged, pull the changes, create a new
release branch named `release-x.y` based on the latest changes and push it to
the upstream repository.
## Create follow-up pull request
### Unpin Jsonnet dependencies
Revert previous changes made when pinning the jsonnet dependencies since we want
the main branch to be in sync with the latest changes of its dependencies.
### Update CI workflow
Update the [versions workflow](.github/workflows/versions.yaml) to include the latest release branch and remove the oldest one to reflect the list of supported releases.
### Update Kubernetes versions used by kubeconform
Update the versions of Kubernetes used when validating manifests with
kubeconform in the [Makefile](Makefile) to align with the compatibility
Aiming to provide better developer experience when making contributions to kube-prometheus, whether by actively developing new features/bug fixes or by reviewing pull requests, we want to provide ephemeral developer workspaces with everything already configured (as far as tooling makes it possible).
Those developer workspaces should provide a brand new kubernetes cluster, where kube-prometheus can be easily deployed and the contributor can easily see the impact that a pull request is proposing.
Unfortunately, Codespaces is not available for everyone. If you are fortunate to have access to it, you can open a new workspace from a specific branch, or even from Pull Requests.
After your workspace start, you can deploy a kube-prometheus inside a Kind cluster inside by running `make deploy`.
If you are reviewing a PR, you'll have a fully-functional kubernetes cluster, generating real monitoring data that can be used to review if the proposed changes works as described.
If you are working on new features/bug fixes, you can regenerate kube-prometheus's YAML manifests with `make generate` and deploy it again with `make deploy`.
## Gitpod
Gitpod is already available to everyone to use for free. It can also run commands that we speficy in the `.gitpod.yml` file located in the root directory of the git repository, so even the cluster creation can be fully automated.
You can use the same workflow as mentioned in the [Codespaces](#codespaces) section, however Gitpod doesn't have native support for any kubernetes distribution. The workaround is to create a full QEMU Virtual Machine and deploy [k3s](https://github.com/k3s-io/k3s) inside this VM. Don't worry, this whole process is already fully automated, but due to the workaround the whole workspace may be very slow.
To open up a workspace with Gitpod, you can install the [Google Chrome extension](https://www.gitpod.io/docs/browser-extension/) to add a new button to Github UI and use it on PRs or from the main page. Or by directly typing in the browser `http://gitpod.io/#https://github.com/prometheus-operator/kube-prometheus/pull/<Pull Request Number>` or just `http://gitpod.io/#https://github.com/prometheus-operator/kube-prometheus`
One fatal issue that can occur is that you run out of IP addresses in your eks cluster. (Generally happens due to error configs where pods keep scheduling).
You can monitor the `awscni` using kube-promethus with :
Prometheus, Grafana, and Alertmanager dashboards can be accessed quickly using `kubectl port-forward` after running the quickstart via the commands below. Kubernetes 1.10 or later is required.
> Note: There are instructions on how to route to these pods behind an ingress controller in the [Exposing Prometheus/Alermanager/Grafana via Ingress](customizations/exposing-prometheus-alertmanager-grafana-ingress.md) section.
lead: This guide will help you deploying the blackbox-exporter with the Probe custom resource definition.
images: []
draft: false
description: This guide will help you deploying the blackbox-exporter with the Probe custom resource definition.
date: "2021-03-08T08:49:31+00:00"
---
# Setting up a blackbox exporter
The `prometheus-operator` defines a `Probe` resource type that can be used to describe blackbox checks. To execute these, a separate component called [`blackbox_exporter`](https://github.com/prometheus/blackbox_exporter) has to be deployed, which can be scraped to retrieve the results of these checks. You can use `kube-prometheus` to set up such a blackbox exporter within your Kubernetes cluster.
## Adding blackbox exporter manifests to an existing `kube-prometheus` configuration
1. Override blackbox-related configuration parameters as needed.
2. Add the following to the list of renderers to render the blackbox exporter manifests:
```
{ ['blackbox-exporter-' + name]: kp.blackboxExporter[name] for name in std.objectFields(kp.blackboxExporter) }
```
## Configuration parameters influencing the blackbox exporter
*`_config.namespace`: the namespace where the various generated resources (`ConfigMap`, `Deployment`, `Service`, `ServiceAccount` and `ServiceMonitor`) will reside. This does not affect where you can place `Probe` objects; that is determined by the configuration of the `Prometheus` resource. This option is shared with other `kube-prometheus` components; defaults to `default`.
*`_config.imageRepos.blackboxExporter`: the name of the blackbox exporter image to deploy. Defaults to `quay.io/prometheus/blackbox-exporter`.
*`_config.versions.blackboxExporter`: the tag of the blackbox exporter image to deploy. Defaults to the version `kube-prometheus` was tested with.
*`_config.imageRepos.configmapReloader`: the name of the ConfigMap reloader image to deploy. Defaults to `jimmidyson/configmap-reload`.
*`_config.versions.configmapReloader`: the tag of the ConfigMap reloader image to deploy. Defaults to the version `kube-prometheus` was tested with.
*`_config.resources.blackbox-exporter.requests`: the requested resources; this is used for each container. Defaults to `10m` CPU and `20Mi` RAM. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ for details.
*`_config.resources.blackbox-exporter.limits`: the resource limits; this is used for each container. Defaults to `20m` CPU and `40Mi` RAM. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ for details.
*`_config.blackboxExporter.port`: the exposed HTTPS port of the exporter. This is what Prometheus can scrape for metrics related to the blackbox exporter itself. Defaults to `9115`.
*`_config.blackboxExporter.internalPort`: the internal plaintext port of the exporter. Prometheus scrapes configured via `Probe` objects cannot access the HTTPS port right now, so you have to specify this port in the `url` field. Defaults to `19115`.
*`_config.blackboxExporter.replicas`: the number of exporter replicas to be deployed. Defaults to `1`.
*`_config.blackboxExporter.matchLabels`: map of the labels to be used to select resources belonging to the instance deployed. Defaults to `{ 'app.kubernetes.io/name': 'blackbox-exporter' }`
*`_config.blackboxExporter.assignLabels`: map of the labels applied to components of the instance deployed. Defaults to all the labels included in the `matchLabels` option, and additionally `app.kubernetes.io/version` is set to the version of the blackbox exporter.
*`_config.blackboxExporter.modules`: the modules available in the blackbox exporter installation, i.e. the types of checks it can perform. The default value includes most of the modules defined in the default blackbox exporter configuration: `http_2xx`, `http_post_2xx`, `tcp_connect`, `pop3s_banner`, `ssh_banner`, and `irc_banner`. `icmp` is omitted so the exporter can be run with minimum privileges, but you can add it back if needed - see the example below. See https://github.com/prometheus/blackbox_exporter/blob/master/CONFIGURATION.md for the configuration format, except you have to use JSON instead of YAML here.
*`_config.blackboxExporter.privileged`: whether the `blackbox-exporter` container should be running as non-root (`false`) or root with heavily-restricted capability set (`true`). Defaults to `true` if you have any ICMP modules defined (which need the extra permissions) and `false` otherwise.
For bugs, you can use the GitHub [issue tracker](https://github.com/prometheus-operator/kube-prometheus/issues/new/choose).
For questions, you can use the GitHub [discussions forum](https://github.com/prometheus-operator/kube-prometheus/discussions).
Many of the `kube-prometheus` project's contributors and users can also be found on the #prometheus-operator channel of the [Kubernetes Slack](https://slack.k8s.io/).
`kube-prometheus` is the aggregation of many projects that all have different
channels to reach out for help and support. This community strives at
supporting all users and you should never be afraid of asking us first. However
if your request relates specifically to one of the projects listed below, it is
often more efficient to reach out to the project directly. If you are unsure,
please feel free to open an issue in this repository and we will redirect you
if applicable.
## prometheus-operator
For documentation, check the project's [documentation directory](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation).
For questions, use the #prometheus-operator channel on the [Kubernetes Slack](https://slack.k8s.io/).
For bugs, use the GitHub [issue tracker](https://github.com/prometheus-operator/prometheus-operator/issues/new/choose).
## Prometheus, Alertmanager, node_exporter
For documentation, check the Prometheus [online docs](https://prometheus.io/docs/). There is a
[section](https://prometheus.io/docs/introduction/media/) with links to blog
posts, recorded talks and presentations. This [repository](https://github.com/roaldnefs/awesome-prometheus)
(not affiliated to the Prometheus project) has also a list of curated resources
related to the Prometheus ecosystem.
For questions, see the Prometheus [community page](https://prometheus.io/community/) for the various channels.
There is also a #prometheus channel on the [CNCF Slack](https://slack.cncf.io/).
## kube-state-metrics
For documentation, see the project's [docs directory](https://github.com/kubernetes/kube-state-metrics/tree/main/docs).
For questions, use the #kube-state-metrics channel on the [Kubernetes Slack](https://slack.k8s.io/).
For bugs, use the GitHub [issue tracker](https://github.com/kubernetes/kube-state-metrics/issues/new/choose).
## Kubernetes
For documentation, check the [Kubernetes docs](https://kubernetes.io/docs/home/).
For questions, use the [community forums](https://discuss.kubernetes.io/) and the [Kubernetes Slack](https://slack.k8s.io/). Check also the [community page](https://kubernetes.io/community/#discuss).
For bugs, use the GitHub [issue tracker](https://github.com/kubernetes/kubernetes/issues/new/choose).
## Prometheus adapter
For documentation, check the project's [README](https://github.com/DirectXMan12/k8s-prometheus-adapter/blob/master/README.md).
For questions, use the #sig-instrumentation channel on the [Kubernetes Slack](https://slack.k8s.io/).
For bugs, use the GitHub [issue tracker](https://github.com/DirectXMan12/k8s-prometheus-adapter/issues/new).
## Grafana
For documentation, check the [Grafana docs](https://grafana.com/docs/grafana/latest/).
For questions, use the [community forums](https://community.grafana.com/).
For bugs, use the GitHub [issue tracker](https://github.com/grafana/grafana/issues/new/choose).
## kubernetes-mixin
For documentation, check the project's [README](https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/README.md).
For questions, use #monitoring-mixins channel on the [Kubernetes Slack](https://slack.k8s.io/).
For bugs, use the GitHub [issue tracker](https://github.com/kubernetes-monitoring/kubernetes-mixin/issues/new).
## Jsonnet
For documentation, check the [Jsonnet](https://jsonnet.org/) website.
For questions, use the [mailing list](https://groups.google.com/forum/#!forum/jsonnet).
The Alertmanager configuration is located in the `values.alertmanager.config` configuration field. In order to set a custom Alertmanager configuration simply set this field.
It is possible to override the namespace where kube-prometheus is going to be deployed, like the example below:
```jsonnet
localkp=(import'kube-prometheus/main.libsonnet')+
{
values+::{
common+:{
namespace:'monitoring',
},
},
};
```
If prefered, it can be changed individually by component. It is also possible to change the name of Prometheus and Alertmanager Custom Resources, like shown below:
lead: This guide will help you adding Prometheus Rules and Grafana Dashboards on top of kube-prometheus
images: []
draft: false
description: This guide will help you adding Prometheus Rules and Grafana Dashboards on top of kube-prometheus
---
`kube-prometheus` ships with a set of default [Prometheus rules](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) and [Grafana](http://grafana.com/) dashboards. At some point one might like to extend them, the purpose of this document is to explain how to do this.
All manifests of kube-prometheus are generated using [jsonnet](https://jsonnet.org/).
Prometheus rules and Grafana dashboards in specific follow the
For both the Prometheus rules and the Grafana dashboards Kubernetes `ConfigMap`s are generated within kube-prometheus. In order to add additional rules and dashboards simply merge them onto the existing json objects. This document illustrates examples for rules as well as dashboards.
As a basis, all examples in this guide are based on the base example of the kube-prometheus [readme](https://github.com/prometheus-operator/kube-prometheus/blob/main/README.md):
```jsonnet mdox-exec="cat example.jsonnet"
local kp =
(import 'kube-prometheus/main.libsonnet') +
// Uncomment the following imports to enable its patches
local kp = (import 'kube-prometheus/main.libsonnet') + {
values+:: {
common+: {
namespace: 'monitoring',
},
},
exampleApplication: {
prometheusRuleExample: {
apiVersion: 'monitoring.coreos.com/v1',
kind: 'PrometheusRule',
metadata: {
name: 'my-prometheus-rule',
namespace: $.values.common.namespace,
},
spec: {
groups: [
{
name: 'example-group',
rules: [
{
record: 'some_recording_rule_name',
expr: 'vector(1)',
},
],
},
],
},
},
},
};
{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['prometheus-adapter-' + name]: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) } +
{ ['example-application-' + name]: kp.exampleApplication[name] for name in std.objectFields(kp.exampleApplication) }
```
### Pre-rendered rules
We acknowledge, that users may need to transition existing rules, and therefore allow an option to add additional pre-rendered rules. Luckily the yaml and json formats are very close so the yaml rules just need to be converted to json without any manual interaction needed. Just a tool to convert yaml to json is needed:
local kp = (import 'kube-prometheus/main.libsonnet') + {
values+:: {
common+: {
namespace: 'monitoring',
},
},
exampleApplication: {
prometheusRuleExample: {
apiVersion: 'monitoring.coreos.com/v1',
kind: 'PrometheusRule',
metadata: {
name: 'my-prometheus-rule',
namespace: $.values.common.namespace,
},
spec: {
groups: (import 'existingrule.json').groups,
},
},
},
};
{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['prometheus-adapter-' + name]: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) } +
{ ['example-application-' + name]: kp.exampleApplication[name] for name in std.objectFields(kp.exampleApplication) }
```
### Changing default rules
Along with adding additional rules, we give the user the option to filter or adjust the existing rules imported by `kube-prometheus/main.libsonnet`.
The recording rules can be found in [kube-prometheus/components/mixin/rules](https://github.com/prometheus-operator/kube-prometheus/tree/main/jsonnet/kube-prometheus/components/mixin/rules)
and [kubernetes-mixin/rules](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/rules).
The alerting rules can be found in [kube-prometheus/components/mixin/alerts](https://github.com/prometheus-operator/kube-prometheus/tree/main/jsonnet/kube-prometheus/components/mixin/alerts)
and [kubernetes-mixin/alerts](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/alerts).
Knowing which rules to change, the user can now use functions from the [Jsonnet standard library](https://jsonnet.org/ref/stdlib.html) to make these changes.
Below are examples of both a filter and an adjustment being made to the default rules.
These changes can be assigned to a local variable and then added to the `local kp` object as seen in the examples above.
#### Filter
Here the alert `KubeStatefulSetReplicasMismatch` is being filtered out of the group `kubernetes-apps`.
The default rule can be seen [here](https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/alerts/apps_alerts.libsonnet).
You first need to find out in which component the rule is defined (here it is kuberentesControlPlane).
```jsonnet
local filter = {
kubernetesControlPlane+: {
prometheusRule+: {
spec+: {
groups: std.map(
function(group)
if group.name == 'kubernetes-apps' then
group {
rules: std.filter(
function(rule)
rule.alert != 'KubeStatefulSetReplicasMismatch',
group.rules
),
}
else
group,
super.groups
),
},
},
},
};
```
#### Adjustment
Here the expression for another alert in the same component is updated from its previous value.
The default rule can be seen [here](https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/alerts/apps_alerts.libsonnet).
{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) }
```
### Pre-rendered Grafana dashboards
As jsonnet is a superset of json, the jsonnet `import` function can be used to include Grafana dashboard json blobs.
In this example we are importing a [provided example dashboard](https://github.com/prometheus-operator/kube-prometheus/tree/main/examples/example-grafana-dashboard.json).
{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) }
```
### Mixins
Kube-prometheus comes with a couple of default mixins as the Kubernetes-mixin and the Node-exporter mixin,
however there [are many more mixins](https://monitoring.mixins.dev/).
To use other mixins, kube-prometheus has a jsonnet library for creating a PrometheusRule CRD and Grafana dashboards from a mixin.
Below is an example of creating a mixin object that has Prometheus rules and Grafana dashboards:
```jsonnet
// Import the library function for adding mixins
local addMixin = (import 'kube-prometheus/lib/mixin.libsonnet');
// Create your mixin
local myMixin = addMixin({
name: 'myMixin',
mixin: import 'my-mixin/mixin.libsonnet',
});
```
The myMixin object will have two objects - `prometheusRules` and `grafanaDashboards`. The `grafanaDashboards` object will be needed to be added to the `dashboards` field as in the example below:
```jsonnet
values+:: {
grafana+:: {
dashboards+:: myMixin.grafanaDashboards
```
The `prometheusRules` object is a PrometheusRule CRD. It should be defined as its own jsonnet object.
If you define multiple mixins in a single jsonnet object, there is a possibility that they will overwrite each others'
configuration and there will be unintended effects.
Therefore, use the `prometheusRules` object as its own jsonnet object:
```jsonnet
...
{ ['kubernetes-' + name]: kp.kubernetesControlPlane[name] for name in std.objectFields(kp.kubernetesControlPlane) }
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ 'external-mixins/my-mixin-prometheus-rules': myMixin.prometheusRules } // one object for each mixin
```
As mentioned above each mixin is configurable and you would configure the mixin as in the example below:
```jsonnet
local myMixin = addMixin({
name: 'myMixin',
mixin: (import 'my-mixin/mixin.libsonnet') + {
_config+:: {
myMixinSelector: 'my-selector',
interval: '30d', // example
},
},
});
```
The library has also two optional parameters - the namespace for the `PrometheusRule` CRD and the dashboard folder for the Grafana dashboards.
The below example shows how to use both:
```jsonnet
local myMixin = addMixin({
name: 'myMixin',
namespace: 'prometheus', // default is monitoring
dashboardFolder: 'Observability',
mixin: (import 'my-mixin/mixin.libsonnet') + {
_config+:: {
myMixinSelector: 'my-selector',
interval: '30d', // example
},
},
});
```
The created `prometheusRules` object will have the metadata field `namespace` added and the usage will remain the same.
However, the `grafanaDasboards` will be added to the `folderDashboards` field instead of the `dashboards` field as shown in the example below:
When deploying kube-prometheus, your Grafana instance is deployed with a lot of dashboards by default. All those dashboards are comming from upstream projects like [kubernetes-mixin](https://github.com/kubernetes-monitoring/kubernetes-mixin), [prometheus-mixin](https://github.com/prometheus/prometheus/tree/main/documentation/prometheus-mixin) and [node-exporter-mixin](https://github.com/prometheus/node_exporter/tree/master/docs/node-mixin), among others.
In case you find out that you don't need some of them, you can choose to remove those dashboards like in the example below, which removes the [`alertmanager-overview.json`](https://github.com/prometheus/alertmanager/blob/main/doc/alertmanager-mixin/dashboards/overview.libsonnet) dashboard.
# Exposing Prometheus, Alertmanager and Grafana UIs via Ingress
---
weight: 303
toc: true
title: Expose via Ingress
menu:
docs:
parent: kube
lead: This guide will help you deploying a Kubernetes Ingress to expose Prometheus, Alertmanager and Grafana.
images: []
draft: false
description: This guide will help you deploying a Kubernetes Ingress to expose Prometheus, Alertmanager and Grafana.
---
In order to access the web interfaces via the Internet [Kubernetes Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) is a popular option. This guide explains, how Kubernetes Ingress can be setup, in order to expose the Prometheus, Alertmanager and Grafana UIs, that are included in the [kube-prometheus](https://github.com/coreos/kube-prometheus) project.
In order to access the web interfaces via the Internet [Kubernetes Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) is a popular option. This guide explains, how Kubernetes Ingress can be setup, in order to expose the Prometheus, Alertmanager and Grafana UIs, that are included in the [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus) project.
Note: before continuing, it is recommended to first get familiar with the [kube-prometheus](https://github.com/coreos/kube-prometheus) stack by itself.
Note: before continuing, it is recommended to first get familiar with the [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus) stack by itself.
## Prerequisites
Apart from a running Kubernetes cluster with a running [kube-prometheus](https://github.com/coreos/kube-prometheus) stack, a Kubernetes Ingress controller must be installed and functional. This guide was tested with the [nginx-ingress-controller](https://github.com/kubernetes/ingress-nginx). If you wish to reproduce the exact result in as depicted in this guide we recommend using the nginx-ingress-controller.
Apart from a running Kubernetes cluster with a running [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus) stack, a Kubernetes Ingress controller must be installed and functional. This guide was tested with the [nginx-ingress-controller](https://github.com/kubernetes/ingress-nginx). If you wish to reproduce the exact result in as depicted in this guide we recommend using the nginx-ingress-controller.
## Setting up Ingress
@@ -27,13 +38,6 @@ In order to use this a secret needs to be created containing the name of the `ht
Also, the applications provide external links to themselves in alerts and various places. When an ingress is used in front of the applications these links need to be based on the external URL's. This can be configured for each application in jsonnet.
```jsonnet
local k = import 'ksonnet/ksonnet.beta.3/k.libsonnet';
local secret = k.core.v1.secret;
local ingress = k.extensions.v1beta1.ingress;
local ingressTls = ingress.mixin.spec.tlsType;
local ingressRule = ingress.mixin.spec.rulesType;
local httpIngressPath = ingressRule.mixin.http.pathsType;
In order to expose Alertmanager and Grafana, simply create additional fields containing an ingress object, but simply pointing at the `alertmanager` or `grafana` instead of the `prometheus-k8s` Service. Make sure to also use the correct port respectively, for Alertmanager it is also `web`, for Grafana it is `http`. Be sure to also specify the appropriate external URL. Note that the external URL for grafana is set in a different way than the external URL for Prometheus or Alertmanager. See [ingress.jsonnet](../examples/ingress.jsonnet) for how to set the Grafana external URL.
In order to expose Alertmanager and Grafana, simply create additional fields containing an ingress object, but simply pointing at the `alertmanager` or `grafana` instead of the `prometheus-k8s` Service. Make sure to also use the correct port respectively, for Alertmanager it is also `web`, for Grafana it is `http`. Be sure to also specify the appropriate external URL. Note that the external URL for grafana is set in a different way than the external URL for Prometheus or Alertmanager. See [ingress.jsonnet](https://github.com/prometheus-operator/kube-prometheus/tree/main/examples/ingress.jsonnet) for how to set the Grafana external URL.
In order to render the ingress objects similar to the other objects use as demonstrated in the [main readme](../README.md#usage):
In order to render the ingress objects similar to the other objects use as demonstrated in the [main readme](https://github.com/prometheus-operator/kube-prometheus/tree/main/README.md):
```
```jsonnet
{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
@@ -98,4 +118,36 @@ In order to render the ingress objects similar to the other objects use as demon
Note, that in comparison only the last line was added, the rest is identical to the original.
See [ingress.jsonnet](../examples/ingress.jsonnet) for an example implementation.
See [ingress.jsonnet](https://github.com/prometheus-operator/kube-prometheus/tree/main/examples/ingress.jsonnet) for an example implementation.
## Adding Ingress namespace to NetworkPolicies
NetworkPolicies restricting access to the components are added by default. These can either be removed as in
[networkpolicies-disabled.jsonnet](https://github.com/prometheus-operator/kube-prometheus/tree/main/examples/networkpolicies-disabled.jsonnet) or modified as
described here.
This is an example for grafana, but the same can be applied to alertmanager and prometheus.
In order to monitor additional namespaces, the Prometheus server requires the appropriate `Role` and `RoleBinding` to be able to discover targets from that namespace. By default the Prometheus server is limited to the three namespaces it requires: default, kube-system and the namespace you configure the stack to run in via `$.values.namespace`. This is specified in `$.values.prometheus.namespaces`, to add new namespaces to monitor, simply append the additional namespaces:
{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) }
```
#### Defining the ServiceMonitor for each additional Namespace
In order to Prometheus be able to discovery and scrape services inside the additional namespaces specified in previous step you need to define a ServiceMonitor resource.
> Typically it is up to the users of a namespace to provision the ServiceMonitor resource, but in case you want to generate it with the same tooling as the rest of the cluster monitoring infrastructure, this is a guide on how to achieve this.
You can define ServiceMonitor resources in your `jsonnet` spec. See the snippet bellow:
{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) } +
{ ['example-application-' + name]: kp.exampleApplication[name] for name in std.objectFields(kp.exampleApplication) }
```
> NOTE: make sure your service resources have the right labels (eg. `'app': 'myapp'`) applied. Prometheus uses kubernetes labels to discover resources inside the namespaces.
In case you want to monitor all namespaces in a cluster, you can add the following mixin. Also, make sure to empty the namespaces defined in prometheus so that roleBindings are not created against them.
{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) }
```
> NOTE: This configuration can potentially make your cluster insecure especially in a multi-tenant cluster. This is because this gives Prometheus visibility over the whole cluster which might not be expected in a scenario when certain namespaces are locked down for security reasons.
Proceed with [creating ServiceMonitors for the services in the namespaces](monitoring-additional-namespaces.md#defining-the-servicemonitor-for-each-additional-namespace) you actually want to monitor
A common example is that not all Kubernetes clusters are created exactly the same way, meaning the configuration to monitor them may be slightly different. For the following clusters there are mixins available to easily configure them:
* aws
* bootkube
* eks
* gke
* kops
* kops_coredns
* kubeadm
* kubespray
These mixins are selectable via the `platform` field of kubePrometheus:
To prevent `Prometheus` and `Alertmanager` instances from being deployed onto the same node when
possible, one can include the [kube-prometheus-anti-affinity.libsonnet](https://github.com/prometheus-operator/kube-prometheus/tree/main/jsonnet/kube-prometheus/addons/anti-affinity.libsonnet) mixin:
***ATTENTION***: Although it is possible to run Prometheus in Agent mode with Prometheus-Operator, it requires strategic merge patches. This practice is not recommended and we do not provide support if Prometheus doesn't work as you expect. **Try it at your own risk!**
In order to configure a static etcd cluster to scrape there is a simple [static-etcd.libsonnet](https://github.com/prometheus-operator/kube-prometheus/tree/main/jsonnet/kube-prometheus/addons/static-etcd.libsonnet) mixin prepared.
An example of how to use it can be seen below:
```jsonnet mdox-exec="cat examples/etcd.jsonnet"
local kp = (import 'kube-prometheus/main.libsonnet') +
// Set these three variables to the fully qualified directory path on your work machine to the certificate files that are valid to scrape etcd metrics with (check the apiserver container).
// Most likely these certificates are generated somewhere in an infrastructure repository, so using the jsonnet `importstr` function can
// be useful here. (Kube-aws stores these three files inside the credential folder.)
// All the sensitive information on the certificates will end up in a Kubernetes Secret.
clientCA: importstr 'etcd-client-ca.crt',
clientKey: importstr 'etcd-client.key',
clientCert: importstr 'etcd-client.crt',
// Note that you should specify a value EITHER for 'serverName' OR for 'insecureSkipVerify'. (Don't specify a value for both of them, and don't specify a value for neither of them.)
// * Specifying serverName: Ideally you should provide a valid value for serverName (and then insecureSkipVerify should be left as false - so that serverName gets used).
// * Specifying insecureSkipVerify: insecureSkipVerify is only to be used (i.e. set to true) if you cannot (based on how your etcd certificates were created) use a Subject Alternative Name.
// * If you specify a value:
// ** for both of these variables: When 'insecureSkipVerify: true' is specified, then also specifying a value for serverName won't hurt anything but it will be ignored.
// ** for neither of these variables: then you'll get authentication errors on the prom '/targets' page with your etcd targets.
// A valid name (DNS or Subject Alternative Name) that the client (i.e. prometheus) will use to verify the etcd TLS certificate.
// * Note that doing `nslookup etcd.kube-system.svc.cluster.local` (on a pod in a K8s cluster where kube-prometheus has been installed) shows that kube-prometheus sets up this hostname.
// * `openssl x509 -noout -text -in etcd-client.pem` will print the Subject Alternative Names.
serverName: 'etcd.kube-system.svc.cluster.local',
// When insecureSkipVerify isn't specified, the default value is "false".
//insecureSkipVerify: true,
// In case you have generated the etcd certificate with kube-aws:
// * If you only have one etcd node, you can use the value from 'etcd.internalDomainName' (specified in your kube-aws cluster.yaml) as the value for 'serverName'.
// * But if you have multiple etcd nodes, you will need to use 'insecureSkipVerify: true' (if using default certificate generators method), as the valid certificate domain
// will be different for each etcd node. (kube-aws default certificates are not valid against the IP - they were created for the DNS.)
},
},
};
{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) }
```
If you'd like to monitor an etcd instance that lives outside the cluster, see [Monitoring external etcd](../monitoring-external-etcd.md) for more information.
> Note that monitoring etcd in minikube is currently not possible because of how etcd is setup. (minikube's etcd binds to 127.0.0.1:2379 only, and within host networking namespace.)
Sometimes in small clusters, the CPU/memory limits can get high enough for alerts to be fired continuously. To prevent this, one can strip off the predefined limits.
Some Kubernetes installations source all their images from an internal registry. kube-prometheus supports this use case and helps the user synchronize every image it uses to the internal registry and generate manifests pointing at the internal registry.
To produce the `docker pull/tag/push` commands that will synchronize upstream images to `internal-registry.com/organization` (after having run the `jb` command to populate the vendor directory):
* describes how to customize the kube-prometheus library via compiling the kube-prometheus manifests yourself (as an alternative to the [README.md quickstart section](../README.md#quickstart)).
* still doesn't require you to make a copy of this entire repository, but rather only a copy of a few select files.
## Installing
The content of this project consists of a set of [jsonnet](http://jsonnet.org/) files making up a library to be consumed.
Install this library in your own project with [jsonnet-bundler](https://github.com/jsonnet-bundler/jsonnet-bundler#install) (the jsonnet package manager):
```shell
$ mkdir my-kube-prometheus;cd my-kube-prometheus
$ jb init # Creates the initial/empty `jsonnetfile.json`
# Install the kube-prometheus dependency
$ jb install github.com/prometheus-operator/kube-prometheus/jsonnet/kube-prometheus@main # Creates `vendor/` & `jsonnetfile.lock.json`, and fills in `jsonnetfile.json`
> `jb` can be installed with `go install -a github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb@latest`
> An e.g. of how to install a given version of this library: `jb install github.com/prometheus-operator/kube-prometheus/jsonnet/kube-prometheus@main`
In order to update the kube-prometheus dependency, simply use the jsonnet-bundler update functionality:
```shell
$ jb update
```
## Generating
e.g. of how to compile the manifests: `./build.sh example.jsonnet`
> before compiling, install `gojsontoyaml` tool with `go install github.com/brancz/gojsontoyaml@latest` and `jsonnet` with `go install github.com/google/go-jsonnet/cmd/jsonnet@latest`
Here's [example.jsonnet](../example.jsonnet):
> Note: some of the following components must be configured beforehand. See [configuration](#configuring) and [customization-examples](customizations).
```jsonnet mdox-exec="cat example.jsonnet"
local kp =
(import 'kube-prometheus/main.libsonnet') +
// Uncomment the following imports to enable its patches
> Note you need `jsonnet` (`go install github.com/google/go-jsonnet/cmd/jsonnet@latest`) and `gojsontoyaml` (`go install github.com/brancz/gojsontoyaml@latest`) installed to run `build.sh`. If you just want json output, not yaml, then you can skip the pipe and everything afterwards.
This script runs the jsonnet code, then reads each key of the generated json and uses that as the file name, and writes the value of that key to that file, and converts each json manifest to yaml.
## Configuring
Jsonnet has the concept of hidden fields. These are fields, that are not going to be rendered in a result. This is used to configure the kube-prometheus components in jsonnet. In the example jsonnet code of the above [Generating section](#generating), you can see an example of this, where the `namespace` is being configured to be `monitoring`. In order to not override the whole object, use the `+::` construct of jsonnet, to merge objects, this way you can override individual settings, but retain all other settings and defaults.
The available fields and their default values can be seen in [main.libsonnet](../jsonnet/kube-prometheus/main.libsonnet). Note that many of the fields get their default values from variables, and for example the version numbers are imported from [versions.json](../jsonnet/kube-prometheus/versions.json).
Configuration is mainly done in the `values` map. You can see this being used in the `example.jsonnet` to set the namespace to `monitoring`. This is done in the `common` field, which all other components take their default value from. See for example how Alertmanager is configured in `main.libsonnet`:
```
alertmanager: {
name: 'main',
// Use the namespace specified under values.common by default.
The grafana definition is located in a different project (https://github.com/brancz/kubernetes-grafana ), but needed configuration can be customized from the same top level `values` field. For example to allow anonymous access to grafana, add the following `values` section:
The previous generation step has created a bunch of manifest files in the manifest/ folder.
Now simply use `kubectl` to install Prometheus and Grafana as per your configuration:
```shell
# Update the namespace and CRDs, and then wait for them to be available before creating the remaining resources
$ kubectl apply --server-side -f manifests/setup
$ kubectl apply -f manifests/
```
> Note that due to some CRD size we are using kubeclt server-side apply feature which is generally available since
> kubernetes 1.22. If you are using previous kubernetes versions this feature may not be available and you would need to
> use `kubectl create` instead.
Alternatively, the resources in both folders can be applied with a single command
`kubectl apply --server-side -Rf manifests`, but it may be necessary to run the command multiple times for all components to
be created successfully.
Check the monitoring namespace (or the namespace you have specific in `namespace: `) and make sure the pods are running. Prometheus and Grafana should be up and running soon.
## Minikube Example
To use an easy to reproduce example, see [minikube.jsonnet](../examples/minikube.jsonnet), which uses the minikube setup as demonstrated in [Prerequisites](../README.md#prerequisites). Because we would like easy access to our Prometheus, Alertmanager and Grafana UIs, `minikube.jsonnet` exposes the services as NodePort type services.
# Developing Prometheus Rules and Grafana Dashboards
`kube-prometheus` ships with a set of default [Prometheus rules](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) and [Grafana](http://grafana.com/) dashboards. At some point one might like to extend them, the purpose of this document is to explain how to do this.
All manifests of kube-prometheus are generated using [jsonnet](https://jsonnet.org/) and Prometheus rules and Grafana dashboards in specific follow the [Prometheus Monitoring Mixins proposal](https://docs.google.com/document/d/1A9xvzwqnFVSOZ5fD3blKODXfsat5fg6ZhnKu9LK3lB4/).
For both the Prometheus rules and the Grafana dashboards Kubernetes `ConfigMap`s are generated within kube-prometheus. In order to add additional rules and dashboards simply merge them onto the existing json objects. This document illustrates examples for rules as well as dashboards.
As a basis, all examples in this guide are based on the base example of the kube-prometheus [readme](../README.md):
According to the [Prometheus Monitoring Mixins proposal](https://docs.google.com/document/d/1A9xvzwqnFVSOZ5fD3blKODXfsat5fg6ZhnKu9LK3lB4/) Prometheus alerting rules are under the key `prometheusAlerts` in the top level object, so in order to add an additional alerting rule, we can simply merge an extra rule into the existing object.
The format is exactly the Prometheus format, so there should be no changes necessary should you have existing rules that you want to include.
> Note that alerts can just as well be included into this file, using the jsonnet `import` function. In this example it is just inlined in order to demonstrate their use in a single file.
In order to add a recording rule, simply do the same with the `prometheusRules` field.
> Note that rules can just as well be included into this file, using the jsonnet `import` function. In this example it is just inlined in order to demonstrate their use in a single file.
We acknowledge, that users may need to transition existing rules, and therefore allow an option to add additional pre-rendered rules. Luckily the yaml and json formats are very close so the yaml rules just need to be converted to json without any manual interaction needed. Just a tool to convert yaml to json is needed:
Along with adding additional rules, we give the user the option to filter or adjust the existing rules imported by `kube-prometheus/kube-prometheus.libsonnet`. The recording rules can be found in [kube-prometheus/rules](../jsonnet/kube-prometheus/rules) and [kubernetes-mixin/rules](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/rules) while the alerting rules can be found in [kube-prometheus/alerts](../jsonnet/kube-prometheus/alerts) and [kubernetes-mixin/alerts](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/alerts).
Knowing which rules to change, the user can now use functions from the [Jsonnet standard library](https://jsonnet.org/ref/stdlib.html) to make these changes. Below are examples of both a filter and an adjustment being made to the default rules. These changes can be assigned to a local variable and then added to the `local kp` object as seen in the examples above.
#### Filter
Here the alert `KubeStatefulSetReplicasMismatch` is being filtered out of the group `kubernetes-apps`. The default rule can be seen [here](https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/alerts/apps_alerts.libsonnet).
```jsonnet
localfilter={
prometheusAlerts+::{
groups:std.map(
function(group)
ifgroup.name=='kubernetes-apps'then
group{
rules:std.filter(function(rule)
rule.alert!="KubeStatefulSetReplicasMismatch",
group.rules
)
}
else
group,
super.groups
),
},
};
```
#### Adjustment
Here the expression for the alert used above is updated from its previous value. The default rule can be seen [here](https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/alerts/apps_alerts.libsonnet).
Werecommendusingthe[grafonnet](https://github.com/grafana/grafonnet-lib/) library for jsonnet, which gives you a simple DSL to generate Grafana dashboards. Following the [Prometheus Monitoring Mixins proposal](https://docs.google.com/document/d/1A9xvzwqnFVSOZ5fD3blKODXfsat5fg6ZhnKu9LK3lB4/) additional dashboards are added to the `grafanaDashboards` key, located in the top level object. To add new jsonnet dashboards, simply add one.
As jsonnet is a superset of json, the jsonnet `import` function can be used to include Grafana dashboard json blobs. In this example we are importing a [provided example dashboard](../examples/example-grafana-dashboard.json).
Incase you have lots of json dashboard exported out from grafan UI the above approch is going to take lots of time. to improve performance we can use `rawGrafanaDashboards` field and provide it's value as json string by using importstr
<i class="fa fa-exclamation-triangle"></i><b> Note:</b> Starting with v0.12.0, Prometheus Operator requires use of Kubernetes v1.7.x and up.
</div>
---
weight: 302
toc: true
title: Deploy to kubeadm
menu:
docs:
parent: kube
lead: This guide will help you deploying kube-prometheus on Kubernetes kubeadm.
images: []
draft: false
description: This guide will help you deploying kube-prometheus on Kubernetes kubeadm.
---
# Kube Prometheus on Kubeadm
The [kubeadm](https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/) tool is linked by Kubernetes as the offical way to deploy and manage self-hosted clusters. Kubeadm does a lot of heavy lifting by automatically configuring your Kubernetes cluster with some common options. This guide is intended to show you how to deploy Prometheus, Prometheus Operator and Kube Prometheus to get you started monitoring your cluster that was deployed with Kubeadm.
The [kubeadm](https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/) tool is linked by Kubernetes as the offical way to deploy and manage self-hosted clusters. kubeadm does a lot of heavy lifting by automatically configuring your Kubernetes cluster with some common options. This guide is intended to show you how to deploy Prometheus, Prometheus Operator and Kube Prometheus to get you started monitoring your cluster that was deployed with kubeadm.
This guide assumes you have a basic understanding of how to use the functionality the Prometheus Operator implements. If you haven't yet, we recommend reading through the [getting started guide](https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/getting-started.md) as well as the [alerting guide](https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/alerting.md).
## Kubeadm Pre-requisites
## kubeadm Pre-requisites
This guide assumes you have some familiarity with `kubeadm` or at least have deployed a cluster using `kubeadm`. By default, `kubeadm` does not expose two of the services that we will be monitoring. Therefore, in order to get the most out of the `kube-prometheus` package, we need to make some quick tweaks to the Kubernetes cluster. Since we will be monitoring the `kube-controller-manager` and `kube-scheduler`, we must expose them to the cluster.
By default, `kubeadm` runs these pods on your master and bound to `127.0.0.1`. There are a couple of ways to change this. The recommended way to change these features is to use the [kubeadm config file](https://kubernetes.io/docs/reference/generated/kubeadm/#config-file). An example configuration file can be used:
```yaml
apiVersion:kubeadm.k8s.io/v1alpha1
kind:MasterConfiguration
api:
advertiseAddress:192.168.1.173
bindPort:6443
authorizationModes:
- Node
- RBAC
certificatesDir:/etc/kubernetes/pki
cloudProvider:
apiVersion:kubeadm.k8s.io/v1beta2
kind:ClusterConfiguration
controlPlaneEndpoint:"192.168.1.173:6443"
apiServer:
extraArgs:
authorization-mode:"Node,RBAC"
controllerManager:
extraArgs:
bind-address:"0.0.0.0"
scheduler:
extraArgs:
bind-address:"0.0.0.0"
certificatesDir:"/etc/kubernetes/pki"
etcd:
dataDir:/var/lib/etcd
endpoints:null
imageRepository:gcr.io/google_containers
kubernetesVersion:v1.8.3
# one of local or external
local:
dataDir:"/var/lib/etcd"
kubernetesVersion:"v1.23.1"
networking:
dnsDomain:cluster.local
serviceSubnet:10.96.0.0/12
nodeName:your-dev
tokenTTL:24h0m0s
controllerManagerExtraArgs:
address:0.0.0.0
schedulerExtraArgs:
address:0.0.0.0
dnsDomain:"cluster.local"
serviceSubnet:"10.96.0.0/12"
imageRepository:"registry.k8s.io"
```
Notice the `schedulerExtraArgs` and `controllerManagerExtraArgs`. This exposes the `kube-controller-manager` and `kube-scheduler` services to the rest of the cluster. If you have kubernetes core components as pods in the kube-system namespace, ensure that the `kube-prometheus-exporter-kube-scheduler` and `kube-prometheus-exporter-kube-controller-manager` services' `spec.selector` values match those of pods.
Notice the `.scheduler.extraArgs` and `.controllerManager.extraArgs`. This exposes the `kube-controller-manager` and `kube-scheduler` services to the rest of the cluster. If you have kubernetes core components as pods in the kube-system namespace, ensure that the `kube-prometheus-exporter-kube-scheduler` and `kube-prometheus-exporter-kube-controller-manager` services' `spec.selector` values match those of pods.
In addition, we will be using `node-exporter` to monitor the `cAdvisor` service on all the nodes. This, however requires a change to the `kubelet` service on the master as well as all the nodes. According to the Kubernetes documentation
> The kubeadm deb package ships with configuration for how the kubelet should be run. Note that the `kubeadm` CLI command will never touch this drop-in file. This drop-in file belongs to the kubeadm deb/rpm package.
Again, we need to expose the `cadvisor` that is installed and managed by the `kubelet` daemon and allow webhook token authentication. To do so, we do the following on all the masters and nodes:
sed -e "/cadvisor-port=0/d" -i "$KUBEADM_SYSTEMD_CONF"
if ! grep -q "authentication-token-webhook=true""$KUBEADM_SYSTEMD_CONF";then
sed -e "s/--authorization-mode=Webhook/--authentication-token-webhook=true --authorization-mode=Webhook/" -i "$KUBEADM_SYSTEMD_CONF"
fi
systemctl daemon-reload
systemctl restart kubelet
```
In previous versions of Kubernetes, we had to make a change to the `kubelet` setting with regard to `cAdvisor` monitoring on the control-plane as well as all the nodes. But this is **no longer required due to [the change of Kubernetes](https://github.com/kubernetes/kubernetes/issues/56523)**.
In case you already have a Kubernetes deployed with kubeadm, change the address kube-controller-manager and kube-scheduler listens in addition to previous kubelet change:
```
sed -e "s/- --address=127.0.0.1/- --address=0.0.0.0/" -i /etc/kubernetes/manifests/kube-controller-manager.yaml
sed -e "s/- --address=127.0.0.1/- --address=0.0.0.0/" -i /etc/kubernetes/manifests/kube-scheduler.yaml
sed -e "s/- --bind-address=127.0.0.1/- --bind-address=0.0.0.0/" -i /etc/kubernetes/manifests/kube-controller-manager.yaml
sed -e "s/- --bind-address=127.0.0.1/- --bind-address=0.0.0.0/" -i /etc/kubernetes/manifests/kube-scheduler.yaml
```
With these changes, your Kubernetes cluster is ready.
@@ -86,7 +76,6 @@ Once you complete this guide you will monitor the following:
* kube-scheduler
* kube-controller-manager
## Getting Up and Running Fast with Kube-Prometheus
To help get started more quickly with monitoring Kubernetes clusters, [kube-prometheus](https://github.com/coreos/kube-prometheus) was created. It is a collection of manifests including dashboards and alerting rules that can easily be deployed. It utilizes the Prometheus Operator and all the manifests demonstrated in this guide.
Thanks to our community we identified a lot of short-commings of previous design, varying from issues with global state to UX problems. Hoping to fix at least part of those issues we decided to do a complete refactor of the codebase.
## Overview
### Breaking Changes
- global `_config` object is removed and the new `values` object is a partial replacement
-`imageRepos` field was removed and the project no longer tries to compose image strings. Use `$.values.common.images` to override default images.
- prometheus alerting and recording rules are split into multiple `PrometheusRule` objects
- kubernetes control plane ServiceMonitors and Services are now part of the new `kubernetesControlPlane` top-level object instead of `prometheus` object
-`jsonnet/kube-prometheus/kube-prometheus.libsonnet` file was renamed to `jsonnet/kube-prometheus/main.libsonnet` and slimmed down to bare minimum
-`jsonnet/kube-prometheus/kube-prometheus*-.libsonnet` files were move either to `jsonnet/kube-prometheus/addons/` or `jsonnet/kube-prometheus/platforms/` depending on the feature they provided
- all component libraries are now function- and not object-based
- monitoring-mixins are included inside each component and not globally. `prometheusRules`, `prometheusAlerts`, and `grafanaDashboards` are accessible only per component via `mixin` object (ex. `$.alertmanager.mixin.prometheusAlerts`)
- default repository branch changed from `master` to `main`
- labels on resources have changes, `kubectl apply` will not work correctly due to those field being immutable. Deleting the resource first before applying is a workaround if you are using the kubectl CLI. (This only applies to `Deployments` and `DaemonSets`.)
### New Features
- concept of `addons`, `components`, and `platforms` was introduced
- all main `components` are now represented internally by a function with default values and required parameters (see #Component-configuration for more information)
-`$.values` holds main configuration parameters and should be used to set basic stack configuration.
- common parameters across all `components` are stored now in `$.values.common`
- removed dependency on deprecated ksonnet library
## Details
### Components, Addons, Platforms
Those concepts were already present in the repository but it wasn't clear which file is holding what. After refactoring we categorized jsonnet code into 3 buckets and put them into separate directories:
-`components` - main building blocks for kube-prometheus, written as functions responsible for creating multiple objects representing kubernetes manifests. For example all objects for node_exporter deployment are bundled in `components/node_exporter.libsonnet` library
-`addons` - everything that can enhance kube-prometheus deployment. Those are small snippets of code adding a small feature, for example adding anti-affinity to pods via [`addons/anti-affinity.libsonnet`](https://github.com/prometheus-operator/kube-prometheus/blob/main/jsonnet/kube-prometheus/addons/anti-affinity.libsonnet). Addons are meant to be used in object-oriented way like `local kp = (import 'kube-prometheus/main.libsonnet') + (import 'kube-prometheus/addons/all-namespaces.libsonnet')`
-`platforms` - currently those are `addons` specialized to allow deploying kube-prometheus project on a specific platform.
### Component configuration
Refactoring main components to use functions allowed us to define APIs for said components. Each function has a default set of parameters that can be overridden or that are required to be set by a user. Those default parameters are represented in each component by `defaults` map at the top of each library file, for example in [`node_exporter.libsonnet`](https://github.com/prometheus-operator/kube-prometheus/blob/1d2a0e275af97948667777739a18b24464480dc8/jsonnet/kube-prometheus/components/node-exporter.libsonnet#L3-L34).
This API is meant to ease the use of kube-prometheus as parameters can be passed from a JSON file and don't need to be in jsonnet format. However, if you need to modify particular parts of the stack, jsonnet allows you to do this and we are also not restricting such access in any way. An example of such modifications can be seen in any of our `addons`, like the [`addons/anti-affinity.libsonnet`](https://github.com/prometheus-operator/kube-prometheus/blob/main/jsonnet/kube-prometheus/addons/anti-affinity.libsonnet) one.
### Mixin integration
Previously kube-prometheus project joined all mixins on a global level. However with a wider adoption of monitoring mixins this turned out to be a problem, especially apparent when two mixins started to use the same configuration field for different purposes. To fix this we moved all mixins into their own respective components:
- kube-prometheus alerts and rules -> `components/mixin/custom.libsonnet`
> etcd mixin is a special case as we add it inside an `addon` in `addons/static-etcd.libsonnet`
This results in creating multiple `PrometheusRule` objects instead of having one giant object as before. It also means each mixin is configured separately and accessing mixin objects is done via `$.<component>.mixin`.
## Examples
All examples from `examples/` directory were adapted to the new codebase. [Please take a look at them for guideance](https://github.com/prometheus-operator/kube-prometheus/tree/main/examples)
## Legacy migration
An example of conversion of a legacy release-0.3 my.jsonnet file to release-0.8 can be found in [migration-example](migration-example)
## Advanced usage examples
For more advanced usage examples you can take a look at those two, open to public, implementations:
- [thaum-xyz/ankhmorpork](https://github.com/thaum-xyz/ankhmorpork/blob/master/apps/monitoring/jsonnet) - extending kube-prometheus to adapt to a required environment
- [openshift/cluster-monitoring-operator](https://github.com/openshift/cluster-monitoring-operator/pull/1044) - using kube-prometheus components as standalone libraries to build a custom solution
## Final note
Refactoring was a huge undertaking and possibly this document didn't describe in enough detail how to help you with migration to the new stack. If that is the case, please reach out to us by using [GitHub discussions](https://github.com/prometheus-operator/kube-prometheus/discussions) feature or directly on [#prometheus-operator kubernetes slack channel](http://slack.k8s.io/).
This guide will help you monitor an external etcd cluster. When the etcd cluster is not hosted inside Kubernetes.
---
weight: 305
toc: true
title: Monitoring external etcd
menu:
docs:
parent: kube
lead: This guide will help you monitoring an external etcd cluster.
images: []
draft: false
description: This guide will help you monitoring an external etcd cluster.
---
When the etcd cluster is not hosted inside Kubernetes.
This is often the case with Kubernetes setups. This approach has been tested with kube-aws but the same principals apply to other tools.
Note that [etcd.jsonnet](../examples/etcd.jsonnet) & [kube-prometheus-static-etcd.libsonnet](../jsonnet/kube-prometheus/kube-prometheus-static-etcd.libsonnet) (which are described by a section of the [Readme](../README.md#static-etcd-configuration)) do the following:
* Put the three etcd TLS client files (CA & cert & key) into a secret in the namespace, and have Prometheus Operator load the secret.
* Create the following (to expose etcd metrics - port 2379): a Service, Endpoint, & ServiceMonitor.
Note that [etcd.jsonnet](../examples/etcd.jsonnet) & [static-etcd.libsonnet](../jsonnet/kube-prometheus/addons/static-etcd.libsonnet) (which are described by a section of the [customization](customizations/static-etcd-configuration.md)) do the following:
* Put the three etcd TLS client files (CA & cert & key) into a secret in the namespace, and have Prometheus Operator load the secret.
* Create the following (to expose etcd metrics - port 2379): a Service, Endpoint, & ServiceMonitor.
# Step 1: Open the port
@@ -13,6 +25,7 @@ You now need to allow the nodes Prometheus are running on to talk to the etcd on
If using kube-aws, you will need to edit the etcd security group inbound, specifying the security group of your Kubernetes node (worker) as the source.
## kube-aws and EIP or ENI inconsistency
With kube-aws, each etcd node has two IP addresses:
* EC2 instance IP
@@ -27,6 +40,7 @@ Another idea woud be to use the DNS entries of etcd, but those are not currently
# Step 2: verify
Go to the Prometheus UI on :9090/config and check that you have an etcd job entry:
```
- job_name: monitoring/etcd-k8s/0
scrape_interval: 30s
@@ -35,6 +49,5 @@ Go to the Prometheus UI on :9090/config and check that you have an etcd job entr
```
On the :9090/targets page:
* You should see "etcd" with the UP state. If not, check the Error column for more information.
* If no "etcd" targets are even shown on this page, prometheus isn't attempting to scrape it.
* You should see "etcd" with the UP state. If not, check the Error column for more information.
* If no "etcd" targets are even shown on this page, prometheus isn't attempting to scrape it.
This guide will help you monitor applications in other Namespaces. By default the RBAC rules are only enabled for the `Default` and `kube-system` Namespace during Install.
---
weight: 306
toc: true
title: Monitoring other Namespaces
menu:
docs:
parent: kube
lead: This guide will help you monitoring applications in other namespaces.
images: []
draft: false
description: This guide will help you monitoring applications in other namespaces.
---
By default the RBAC rules are only enabled for the `Default` and `kube-system` namespaces.
# Setup
You have to give the list of the Namespaces that you want to be able to monitor.
You have to give the list of the namespaces that you want to be able to monitor.
This is done in the variable `prometheus.roleSpecificNamespaces`. You usually set this in your `.jsonnet` file when building the manifests.
Example to create the needed `Role` and `RoleBinding` for the Namespace `foo` :
```
local kp = (import 'kube-prometheus/kube-prometheus.libsonnet') + {
_config+:: {
namespace: 'monitoring',
Example to create the needed `Role` and `RoleBinding` for the Namespace `foo` :
The manifests generated in this repository are subject to a security audit in CI via [kubescape](https://github.com/armosec/kubescape).
The scan can be run locally via `make kubescape`.
While we aim for best practices in terms of security by default, due to the nature of the project, we are required to make the exceptions in the following components:
#### node-exporter
* Host Port is set. [Kubernetes already sets a Host Port by default when Host Network is enabled.](https://github.com/kubernetes/kubernetes/blob/1945829906546caf867992669a0bfa588edf8be6/pkg/apis/core/v1/defaults.go#L402-L411). Since nothing can be done here, we configure it to our preference port.
* Host PID is set to `true`, since node-exporter requires direct access to the host namespace to gather statistics.
* Host Network is set to `true`, since node-exporter requires direct access to the host network to gather statistics.
*`automountServiceAccountToken` is set to `true` on Pod level as kube-rbac-proxy sidecar requires connection to kubernetes API server.
#### prometheus-adapter
*`automountServiceAccountToken` is set to `true` on Pod level as application requires connection to kubernetes API server.
#### blackbox-exporter
*`automountServiceAccountToken` is set to `true` on Pod level as kube-rbac-proxy sidecar requires connection to kubernetes API server.
#### kube-state-metrics
*`automountServiceAccountToken` is set to `true` on Pod level as kube-rbac-proxy sidecars requires connection to kubernetes API server.
#### prometheus-operator
*`automountServiceAccountToken` is set to `true` on Pod level as kube-rbac-proxy sidecars requires connection to kubernetes API server.
See the general [guidelines](community-support.md) for getting support from the community.
## Error retrieving kubelet metrics
Should the Prometheus `/targets` page show kubelet targets, but not able to successfully scrape the metrics, then most likely it is a problem with the authentication and authorization setup of the kubelets.
As described in the [README.md Prerequisites](../README.md#prerequisites) section, in order to retrieve metrics from the kubelet token authentication and authorization must be enabled. Some Kubernetes setup tools do not enable this by default.
- If you are using Google's GKE product, see [cAdvisor support](GKE-cadvisor-support.md).
- If you are using AWS EKS, see [AWS EKS CNI support](EKS-cni-support.md).
- If you are using Weave Net, see [Weave Net support](weave-net-support.md).
### Authentication problem
The Prometheus `/targets` page will show the kubelet job with the error `403 Unauthorized`, when token authentication is not enabled. Ensure, that the `--authentication-token-webhook=true` flag is enabled on all kubelet configurations.
### Authorization problem
The Prometheus `/targets` page will show the kubelet job with the error `401 Unauthorized`, when token authorization is not enabled. Ensure that the `--authorization-mode=Webhook` flag is enabled on all kubelet configurations.
## kube-state-metrics resource usage
In some environments, kube-state-metrics may need additional
resources. One driver for more resource needs, is a high number of
namespaces. There may be others.
kube-state-metrics resource allocation is managed by
You can control it's parameters by setting variables in the
config. They default to:
```jsonnet
kubeStateMetrics+::{
baseCPU:'100m',
cpuPerNode:'2m',
baseMemory:'150Mi',
memoryPerNode:'30Mi',
}
```
## Error retrieving kube-proxy metrics
By default, kubeadm will configure kube-proxy to listen on 127.0.0.1 for metrics. Because of this prometheus would not be able to scrape these metrics. This would have to be changed to 0.0.0.0 in one of the following two places:
1. Before cluster initialization, the config file passed to kubeadm init should have KubeProxyConfiguration manifest with the field metricsBindAddress set to 0.0.0.0:10249
2. If the k8s cluster is already up and running, we'll have to modify the configmap kube-proxy in the namespace kube-system and set the metricsBindAddress field. After this kube-proxy daemonset would have to be restarted with
You may wish to fetch changes made on this project so they are available to you.
## Update jb
`jb` may have been updated so it's a good idea to get the latest version of this binary:
```shell
$ go install -a github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb@latest
```
## Update kube-prometheus
The command below will sync with upstream project:
```shell
$ jb update
```
## Compile the manifests and apply
Once updated, just follow the instructions under [Generating](customizing.md#generating) and [Apply the kube-prometheus stack](customizing.md#apply-the-kube-prometheus-stack) from [customizing.md doc](customizing.md) to apply the changes to your cluster.
## Migration from previous versions
If you are migrating from `release-0.7` branch or earlier please read [what changed and how to migrate in our guide](https://github.com/prometheus-operator/kube-prometheus/blob/main/docs/migration-guide.md).
Refer to [migration document](migration-example) for more information about migration from 0.3 and 0.8 versions of kube-prometheus.
# Setup Weave Net monitoring using kube-prometheus
[Weave Net](https://kubernetes.io/docs/concepts/cluster-administration/networking/#weave-net-from-weaveworks) is a resilient and simple to use CNI provider for Kubernetes. A well monitored and observed CNI provider helps in troubleshooting Kubernetes networking problems. [Weave Net](https://www.weave.works/docs/net/latest/concepts/how-it-works/) emits [prometheus metrics](https://www.weave.works/docs/net/latest/tasks/manage/metrics/) for monitoring Weave Net. There are many ways to install Weave Net in your cluster. One of them is using [kops](https://github.com/kubernetes/kops/blob/master/docs/networking.md).
Following this document, you can setup Weave Net monitoring for your cluster using kube-prometheus.
## Contents
Using kube-prometheus and kubectl you will be able install the following for monitoring Weave Net in your cluster:
1. [Service for Weave Net](https://gist.github.com/alok87/379c6234b582f555c141f6fddea9fbce) The service which the [service monitor](https://coreos.com/operators/prometheus/docs/latest/user-guides/cluster-monitoring.html) scrapes.
1. [Service for Weave Net](https://gist.github.com/alok87/379c6234b582f555c141f6fddea9fbce) The service which the ServiceMonitor scrapes.
2. [ServiceMonitor for Weave Net](https://gist.github.com/alok87/e46a7f9a79ef6d1da6964a035be2cfb9) Service monitor to scrape the Weave Net metrics and bring it to Prometheus.
3. [Prometheus Alerts for Weave Net](https://stackoverflow.com/a/60447864) This will setup all the important Weave Net metrics you should be alerted on.
4. [Grafana Dashboard for Weave Net](https://grafana.com/grafana/dashboards/11789) This will setup the per Weave Net pod level monitoring for Weave Net.
@@ -15,38 +17,43 @@ Using kube-prometheus and kubectl you will be able install the following for mon
## Instructions
- You can monitor Weave Net using an example like below. **Please note that some alert configurations are environment specific and may require modifications of alert thresholds**. For example: The FastDP flows have never gone below 15000 for us. But if this value is say 20000 for you then you can use an example like below to update the alert. The alerts which may require threshold modifications are `WeaveNetFastDPFlowsLow` and `WeaveNetIPAMUnreachable`.
The [Windows addon](../examples/windows.jsonnet) adds the dashboards and rules from [kubernetes-monitoring/kubernetes-mixin](https://github.com/kubernetes-monitoring/kubernetes-mixin#dashboards-for-windows-nodes).
Currently, Docker based Windows does not support running with [windows_exporter](https://github.com/prometheus-community/windows_exporter) in a pod so this add on uses [additional scrape configuration](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/additional-scrape-config.md) to set up a static config to scrape the node ports where windows_exporter is configured.
The addon requires you to specify the node ips and ports where it can find the windows_exporter. See the [full example](../examples/windows.jsonnet) for setup.
```
local kp = (import 'kube-prometheus/main.libsonnet') +
// Set these three variables to the fully qualified directory path on your work machine to the certificate files that are valid to scrape etcd metrics with (check the apiserver container).
// Most likely these certificates are generated somewhere in an infrastructure repository, so using the jsonnet `importstr` function can
{"groups":[{"name":"example-group","rules":[{"alert":"Watchdog","annotations":{"description":"This is a Watchdog meant to ensure that the entire alerting pipeline is functional."},"expr":"vector(1)","labels":{"severity":"none"}}]}]}
{"groups":[{"name":"example-group","rules":[{"alert":"ExampleAlert","annotations":{"description":"This is an example alert."},"expr":"vector(1)","labels":{"severity":"warning"}}]}]}
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.