Fix README
This commit is contained in:
180
README.md
180
README.md
@@ -1,8 +1,8 @@
|
|||||||
# kube-prometheus
|
# kube-prometheus
|
||||||
|
|
||||||
This repository collects Kubernetes manifests, dashboards, and alerting rules
|
This repository collects Kubernetes manifests, dashboards, and alerting rules
|
||||||
combined with documentation and scripts to deploy them to get a full cluster
|
combined with documentation and scripts to provide single-command deployments
|
||||||
monitoring setup working.
|
of end-to-end Kubernetes cluster monitoring.
|
||||||
|
|
||||||
## Prerequisites
|
## Prerequisites
|
||||||
|
|
||||||
@@ -12,13 +12,92 @@ instructions of [bootkube](https://github.com/kubernetes-incubator/bootkube) or
|
|||||||
repository are adapted to work with a [multi-node setup](https://github.com/kubernetes-incubator/bootkube/tree/master/hack/multi-node)
|
repository are adapted to work with a [multi-node setup](https://github.com/kubernetes-incubator/bootkube/tree/master/hack/multi-node)
|
||||||
using [bootkube](https://github.com/kubernetes-incubator/bootkube).
|
using [bootkube](https://github.com/kubernetes-incubator/bootkube).
|
||||||
|
|
||||||
Prometheus discovers targets via Kubernetes endpoints objects, which are automatically
|
## Monitoring Kubernetes
|
||||||
populated by Kubernetes services. Therefore Prometheus can
|
|
||||||
automatically find and pick up all services within a cluster. By
|
The manifests used here use the [Prometheus Operator](https://github.com/coreos/prometheus-operator),
|
||||||
default there is a service for the Kubernetes API server. For other Kubernetes
|
which manages Prometheus servers and their configuration in a cluster. With a single command we can install
|
||||||
core components to be monitored, headless services must be setup for them to be
|
|
||||||
discovered by Prometheus as they may be deployed differently depending
|
* The Operator itself
|
||||||
on the cluster.
|
* The Prometheus [node_exporter](https://github.com/prometheus/node_exporter)
|
||||||
|
* [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics)
|
||||||
|
* The [Prometheus specification](https://github.com/coreos/prometheus-operator/blob/master/Documentation/prometheus.md) based on which the Operator deploys a Prometheus setup
|
||||||
|
* A Prometheus configuration covering monitoring of all Kubernetes core components and exporters
|
||||||
|
* A default set of alerting rules on the cluster component's health
|
||||||
|
* A Grafana instance serving dashboards on cluster metrics
|
||||||
|
|
||||||
|
Simply run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export KUBECONFIG=<path> # defaults to "~/.kube/config"
|
||||||
|
hack/cluster-monitoring/deploy
|
||||||
|
```
|
||||||
|
|
||||||
|
After all pods are ready, you can reach:
|
||||||
|
|
||||||
|
* Prometheus UI on node port `30900`
|
||||||
|
* Grafana on node port `30902`
|
||||||
|
|
||||||
|
To tear it all down again, run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
hack/cluster-monitoring/teardown
|
||||||
|
```
|
||||||
|
|
||||||
|
> All services in the manifest still contain the `prometheus.io/scrape = true`
|
||||||
|
> annotations. It is not used by the Prometheus Operator. They remain for
|
||||||
|
> pre Prometheus v1.3.0 deployments as in [this example configuration](https://github.com/prometheus/prometheus/blob/6703404cb431f57ca4c5097bc2762438d3c1968e/documentation/examples/prometheus-kubernetes.yml).
|
||||||
|
|
||||||
|
## Monitoring custom services
|
||||||
|
|
||||||
|
The example manifests in [/manifests/examples/example-app](/manifests/examples/example-app)
|
||||||
|
deploy a fake service exposing Prometheus metrics. They additionally define a new Prometheus
|
||||||
|
server and a [`ServiceMonitor`](https://github.com/coreos/prometheus-operator/blob/master/Documentation/service-monitor.md),
|
||||||
|
which specifies how the example service should be monitored.
|
||||||
|
The Prometheus Operator will deploy and configure the desired Prometheus instance and continiously
|
||||||
|
manage its life cycle.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
hack/example-service-monitoring/deploy
|
||||||
|
```
|
||||||
|
|
||||||
|
After all pods are ready you can reach the Prometheus server on node port `30100` and observe
|
||||||
|
how it monitors the service as specified.
|
||||||
|
|
||||||
|
Teardown:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
hack/example-service-monitoring/teardown
|
||||||
|
```
|
||||||
|
|
||||||
|
## Dashboarding
|
||||||
|
|
||||||
|
The provided manifests deploy a Grafana instance serving dashboards provided via a ConfigMap.
|
||||||
|
To modify, delete, or add dashboards, the `grafana-dashboards` ConfigMap must be modified.
|
||||||
|
|
||||||
|
Currently, Grafana does not support serving dashboards from static files. Instead, the `grafana-watcher`
|
||||||
|
sidecar container aims to emulate the behavior, by keeping the Grafana database always in sync
|
||||||
|
with the provided ConfigMap. Hence, the Grafana pod is effectively stateless.
|
||||||
|
This allows managing dashboards via `git` etc. and easily deploying them via CD pipelines.
|
||||||
|
|
||||||
|
In the future, a separate Grafana opeartor will support gathering dashboards from multiple
|
||||||
|
ConfigMaps based on label selection.
|
||||||
|
|
||||||
|
## Roadmap
|
||||||
|
|
||||||
|
* Alertmanager Operator automatically handling HA clusters
|
||||||
|
* Grafana Operator that dynamically discovers and deploys dashboards from ConfigMaps
|
||||||
|
* KPM/Helm packages to easily provide production-ready cluster-monitoring setup (essentially contents of `hack/cluster-monitoring`)
|
||||||
|
* Add meta-monitoring to default cluster monitoring setup
|
||||||
|
* Build out the provided dashboards and alerts for cluster monitoring to have full coverage of all system aspects
|
||||||
|
|
||||||
|
## Monitoring other Cluster Components
|
||||||
|
|
||||||
|
Discovery of API servers and kubelets works the same across all clusters.
|
||||||
|
Depending on a cluster's setup several other core components, such as etcd or the
|
||||||
|
scheduler, may be deployed in different ways.
|
||||||
|
The easiest integration point is for the cluster operator to provide headless services
|
||||||
|
of all those components to provide a common interface of discovering them. With that
|
||||||
|
setup they will automatically be discovered by the provided Prometheus configuration.
|
||||||
|
|
||||||
For the `kube-scheduler` and `kube-controller-manager` there are headless
|
For the `kube-scheduler` and `kube-controller-manager` there are headless
|
||||||
services prepared, simply add them to your running cluster:
|
services prepared, simply add them to your running cluster:
|
||||||
@@ -44,14 +123,8 @@ An example for bootkube's multi-node vagrant setup is [here](/manifests/etcd/etc
|
|||||||
> Hint: this is merely an example for a local setup. The addresses will have to
|
> Hint: this is merely an example for a local setup. The addresses will have to
|
||||||
> be adapted for a setup, that is not a single etcd bootkube created cluster.
|
> be adapted for a setup, that is not a single etcd bootkube created cluster.
|
||||||
|
|
||||||
Before you continue, you should have endpoints objects for:
|
With that setup the headless services provide endpoint lists consumed by
|
||||||
|
Prometheus to discover the endpoints as targets:
|
||||||
* `apiserver` (called `kubernetes` here)
|
|
||||||
* `kube-controller-manager`
|
|
||||||
* `kube-scheduler`
|
|
||||||
* `etcd` (called `etcd-k8s` to make clear this is the etcd used by kubernetes)
|
|
||||||
|
|
||||||
For example:
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
$ kubectl get endpoints --all-namespaces
|
$ kubectl get endpoints --all-namespaces
|
||||||
@@ -60,75 +133,4 @@ default kubernetes 172.17.4.101:443
|
|||||||
kube-system kube-controller-manager-prometheus-discovery 10.2.30.2:10252 1h
|
kube-system kube-controller-manager-prometheus-discovery 10.2.30.2:10252 1h
|
||||||
kube-system kube-scheduler-prometheus-discovery 10.2.30.4:10251 1h
|
kube-system kube-scheduler-prometheus-discovery 10.2.30.4:10251 1h
|
||||||
monitoring etcd-k8s 172.17.4.51:2379 1h
|
monitoring etcd-k8s 172.17.4.51:2379 1h
|
||||||
```
|
```
|
||||||
|
|
||||||
## Monitoring Kubernetes
|
|
||||||
|
|
||||||
The manifests used here use the [Prometheus Operator](https://github.com/coreos/prometheus-operator),
|
|
||||||
which manages Prometheus servers and their configuration in your cluster. To install the
|
|
||||||
controller, the [node_exporter](https://github.com/prometheus/node_exporter),
|
|
||||||
[Grafana](https://grafana.org) including default dashboards, and the Prometheus server, run:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
export KUBECONFIG=<path> # defaults to "~/.kube/config"
|
|
||||||
hack/cluster-monitoring/deploy
|
|
||||||
```
|
|
||||||
|
|
||||||
After all pods are ready, you can reach:
|
|
||||||
|
|
||||||
* Prometheus UI on node port `30900`
|
|
||||||
* Grafana on node port `30902`
|
|
||||||
|
|
||||||
To tear it all down again, run:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
hack/cluster-monitoring/teardown
|
|
||||||
```
|
|
||||||
|
|
||||||
> All services in the manifest still contain the `prometheus.io/scrape = true`
|
|
||||||
> annotations. It is not used by the Prometheus controller. They remain for
|
|
||||||
> pre Prometheus v1.3.0 deployments as in [this example configuration](https://github.com/prometheus/prometheus/blob/6703404cb431f57ca4c5097bc2762438d3c1968e/documentation/examples/prometheus-kubernetes.yml).
|
|
||||||
|
|
||||||
## Monitoring custom services
|
|
||||||
|
|
||||||
The example manifests in [/manifests/examples/example-app](/manifests/examples/example-app)
|
|
||||||
deploy a fake service into the `production` and `development` namespaces and define
|
|
||||||
a Prometheus server monitoring them.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
kubectl --kubeconfig="$KUBECONFIG" create namespace production
|
|
||||||
kubectl --kubeconfig="$KUBECONFIG" create namespace development
|
|
||||||
hack/example-service-monitoring/deploy
|
|
||||||
```
|
|
||||||
|
|
||||||
After all pods are ready you can reach the Prometheus server monitoring your services
|
|
||||||
on node port `30100`.
|
|
||||||
|
|
||||||
Teardown:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
hack/example-service-monitoring/teardown
|
|
||||||
```
|
|
||||||
|
|
||||||
## Dashboarding
|
|
||||||
|
|
||||||
The provided manifests deploy a Grafana instance serving dashboards provided via a ConfigMap.
|
|
||||||
To modify, delete, or add dashboards, the `grafana-dashboards` ConfigMap must be modified.
|
|
||||||
|
|
||||||
Currently, Grafana does not support serving dashboards from static files. Instead, the `grafana-watcher`
|
|
||||||
sidecar container aims to emulate the behavior, by keeping the Grafana database always in sync
|
|
||||||
with the provided ConfigMap. Hence, the Grafana pod is effectively stateless.
|
|
||||||
This allows managing dashboards via `git` etc. and easily deploying them via CD pipelines.
|
|
||||||
|
|
||||||
In the future, a separate Grafana controller should support gathering dashboards from multiple
|
|
||||||
ConfigMaps, which are selected by their labels.
|
|
||||||
Prometheus servers deployed by the Prometheus controller should be automatically added as
|
|
||||||
Grafana data sources.
|
|
||||||
|
|
||||||
## Roadmap
|
|
||||||
|
|
||||||
* Incorporate [Alertmanager controller](https://github.com/coreos/kube-alertmanager-controller)
|
|
||||||
* Grafana controller that dynamically discovers and deploys dashboards from ConfigMaps
|
|
||||||
* KPM/Helm packages to easily provide production-ready cluster-monitoring setup (essentially contents of `hack/cluster-monitoring`)
|
|
||||||
* Add meta-monitoring to default cluster monitoring setup
|
|
||||||
|
|
@@ -4,4 +4,5 @@ metadata:
|
|||||||
name: prometheus-k8s
|
name: prometheus-k8s
|
||||||
labels:
|
labels:
|
||||||
prometheus: k8s
|
prometheus: k8s
|
||||||
spec: {}
|
spec:
|
||||||
|
version: v1.3.0
|
||||||
|
Reference in New Issue
Block a user