Weave Net name consistencies resolved

https://github.com/coreos/kube-prometheus/pull/425#pullrequestreview-368779890
This commit is contained in:
Alok Kumar Singh
2020-03-04 19:57:23 +05:30
parent c942d6b837
commit 7a85d7d8a6
6 changed files with 42 additions and 42 deletions

View File

@@ -658,7 +658,7 @@ As described in the [Prerequisites](#prerequisites) section, in order to retriev
- If you are using Google's GKE product, see [cAdvisor support](docs/GKE-cadvisor-support.md).
- If you are using AWS EKS, see [AWS EKS CNI support](docs/EKS-cni-support.md).
- If you are using WeaveNet as the CNI, see [weave-net support](docs/weave-net-support.md).
- If you are using Weave Net, see [Weave Net support](docs/weave-net-support.md).
#### Authentication problem

View File

@@ -1,24 +1,24 @@
# Setup WeaveNet CNI monitoring using kube-prometheus
[WeaveNet](https://kubernetes.io/docs/concepts/cluster-administration/networking/#weave-net-from-weaveworks) is a resilient and simple to use CNI for Kubernetes. A well monitored and observed CNI helps in troubleshooting Kubernetes networking problems. [WeaveNet](https://www.weave.works/docs/net/latest/concepts/how-it-works/) emits [prometheus metrics](https://www.weave.works/docs/net/latest/tasks/manage/metrics/) for monitoring WeaveNet. There are many ways to install WeaveNet in your cluster. One of them is using [kops](https://github.com/kubernetes/kops/blob/master/docs/networking.md).
# Setup Weave Net monitoring using kube-prometheus
[Weave Net](https://kubernetes.io/docs/concepts/cluster-administration/networking/#weave-net-from-weaveworks) is a resilient and simple to use CNI provider for Kubernetes. A well monitored and observed CNI provider helps in troubleshooting Kubernetes networking problems. [Weave Net](https://www.weave.works/docs/net/latest/concepts/how-it-works/) emits [prometheus metrics](https://www.weave.works/docs/net/latest/tasks/manage/metrics/) for monitoring Weave Net. There are many ways to install Weave Net in your cluster. One of them is using [kops](https://github.com/kubernetes/kops/blob/master/docs/networking.md).
Following this document, you can setup weave net CNI monitoring for your cluster using kube-prometheus.
Following this document, you can setup Weave Net monitoring for your cluster using kube-prometheus.
## Contents
Using kube-prometheus and kubectl you will be able install the following for monitoring weave-net in your cluster:
Using kube-prometheus and kubectl you will be able install the following for monitoring Weave Net in your cluster:
1. [Service for WeaveNet](https://gist.github.com/alok87/379c6234b582f555c141f6fddea9fbce) The service which the [service monitor](https://coreos.com/operators/prometheus/docs/latest/user-guides/cluster-monitoring.html) scraps.
2. [ServiceMonitor for WeaveNet](https://gist.github.com/alok87/e46a7f9a79ef6d1da6964a035be2cfb9) Service monitor to scraps the weavenet metrics and bring it to Prometheus.
3. [Prometheus Alerts for WeaveNet](https://stackoverflow.com/a/60447864) This will setup all the important weave net metrics you should be alerted on.
4. [Grafana Dashboard for WeaveNet](https://grafana.com/grafana/dashboards/11789) This will setup the per CNI pod level monitoring for weave net.
5. [Grafana Dashboard for WeaveNet(Cluster)](https://grafana.com/grafana/dashboards/11789) This will setup the cluster level monitoring for weave net.
1. [Service for Weave Net](https://gist.github.com/alok87/379c6234b582f555c141f6fddea9fbce) The service which the [service monitor](https://coreos.com/operators/prometheus/docs/latest/user-guides/cluster-monitoring.html) scrapes.
2. [ServiceMonitor for Weave Net](https://gist.github.com/alok87/e46a7f9a79ef6d1da6964a035be2cfb9) Service monitor to scrape the Weave Net metrics and bring it to Prometheus.
3. [Prometheus Alerts for Weave Net](https://stackoverflow.com/a/60447864) This will setup all the important Weave Net metrics you should be alerted on.
4. [Grafana Dashboard for Weave Net](https://grafana.com/grafana/dashboards/11789) This will setup the per Weave Net pod level monitoring for Weave Net.
5. [Grafana Dashboard for Weave Net(Cluster)](https://grafana.com/grafana/dashboards/11789) This will setup the cluster level monitoring for Weave Net.
## Instructions
- You can monitor weave-net CNI using an example like below. **Please note that some alert configurations are environment specific and may require modifications of alert thresholds**. For example: The FastDP flows have never gone below 1500 for us. But if this value is say 2000 for you then you can use an example like below to update the alert. The alerts which may require threshold modifications are `WeaveNetFastDPFlowsLow` and `WeaveNetIPAMUnreachable`.
- You can monitor Weave Net using an example like below. **Please note that some alert configurations are environment specific and may require modifications of alert thresholds**. For example: The FastDP flows have never gone below 15000 for us. But if this value is say 20000 for you then you can use an example like below to update the alert. The alerts which may require threshold modifications are `WeaveNetFastDPFlowsLow` and `WeaveNetIPAMUnreachable`.
[embedmd]:# (../examples/weavenet-example.jsonnet)
[embedmd]:# (../examples/weave-net-example.jsonnet)
```jsonnet
local kp = (import 'kube-prometheus/kube-prometheus.libsonnet') +
(import 'kube-prometheus/kube-prometheus-weavenet.libsonnet') + {
(import 'kube-prometheus/kube-prometheus-weave-net.libsonnet') + {
_config+:: {
namespace: 'monitoring',
},
@@ -30,7 +30,7 @@ local kp = (import 'kube-prometheus/kube-prometheus.libsonnet') +
rules: std.map(function(rule)
if rule.alert == "WeaveNetFastDPFlowsLow" then
rule {
expr: "sum(weave_flows) < 2000"
expr: "sum(weave_flows) < 20000"
}
else if rule.alert == "WeaveNetIPAMUnreachable" then
rule {

View File

@@ -1,5 +1,5 @@
local kp = (import 'kube-prometheus/kube-prometheus.libsonnet') +
(import 'kube-prometheus/kube-prometheus-weavenet.libsonnet') + {
(import 'kube-prometheus/kube-prometheus-weave-net.libsonnet') + {
_config+:: {
namespace: 'monitoring',
},
@@ -11,7 +11,7 @@ local kp = (import 'kube-prometheus/kube-prometheus.libsonnet') +
rules: std.map(function(rule)
if rule.alert == "WeaveNetFastDPFlowsLow" then
rule {
expr: "sum(weave_flows) < 2000"
expr: "sum(weave_flows) < 20000"
}
else if rule.alert == "WeaveNetIPAMUnreachable" then
rule {

View File

@@ -45,7 +45,7 @@
}
]
},
"description": "WeaveNet metrics at the cluster level. It was made on top of the weavenet prometheus metrics. Please check this for more details https://www.weave.works/docs/net/latest/tasks/manage/metrics",
"description": "WeaveNet metrics at the cluster level. It was made on top of the weave-net prometheus metrics. Please check this for more details https://www.weave.works/docs/net/latest/tasks/manage/metrics",
"editable": true,
"gnetId": null,
"graphTooltip": 0,

View File

@@ -39,7 +39,7 @@
}
]
},
"description": "WeaveNet metrics. It was made on top of the weavenet prometheus metrics. Please check this for more details https://www.weave.works/docs/net/latest/tasks/manage/metrics",
"description": "WeaveNet metrics. It was made on top of the weave-net prometheus metrics. Please check this for more details https://www.weave.works/docs/net/latest/tasks/manage/metrics",
"editable": true,
"gnetId": null,
"graphTooltip": 0,

View File

@@ -54,8 +54,8 @@ local servicePort = k.core.v1.service.mixin.spec.portsType;
severity: 'critical',
},
annotations: {
summary: 'WeaveNetIPAM has a split brain. Go to the below prometheus link for details.',
description: 'Actionable: Every node should see same unreachability percentage. Please check and fix why it is not so.',
summary: 'Percentage of all IP addresses owned by unreachable peers is not same for every node.',
description: 'actionable: Weave Net network has a split brain problem. Please find the problem and fix it.',
},
},
{
@@ -66,8 +66,8 @@ local servicePort = k.core.v1.service.mixin.spec.portsType;
severity: 'critical',
},
annotations: {
summary: 'WeaveNetIPAM unreachability percentage is above threshold. Go to the below prometheus link for details.',
description: 'Actionable: Find why the unreachability threshold have increased from threshold and fix it. WeaveNet is responsible to keep it under control. Weave rm peer deployment can help clean things.',
summary: 'Percentage of all IP addresses owned by unreachable peers is above threshold.',
description: 'actionable: Please find the problem and fix it.',
},
},
{
@@ -78,8 +78,8 @@ local servicePort = k.core.v1.service.mixin.spec.portsType;
severity: 'critical',
},
annotations: {
summary: 'WeaveNet IPAM has pending allocates. Go to the below prometheus link for details.',
description: 'Actionable: Find the reason for IPAM allocates to be in pending state and fix it.',
summary: 'Number of pending allocates is above the threshold.',
description: 'actionable: Please find the problem and fix it.',
},
},
{
@@ -90,8 +90,8 @@ local servicePort = k.core.v1.service.mixin.spec.portsType;
severity: 'critical',
},
annotations: {
summary: 'WeaveNet IPAM has pending claims. Go to the below prometheus link for details.',
description: 'Actionable: Find the reason for IPAM claims to be in pending state and fix it.',
summary: 'Number of pending claims is above the threshold.',
description: 'actionable: Please find the problem and fix it.',
},
},
{
@@ -102,8 +102,8 @@ local servicePort = k.core.v1.service.mixin.spec.portsType;
severity: 'critical',
},
annotations: {
summary: 'WeaveNet total FastDP flows is below threshold. Go to the below prometheus link for details.',
description: 'Actionable: Find the reason for fast dp flows dropping below the threshold.',
summary: 'Number of FastDP flows is below the threshold.',
description: 'actionable: Please find the reason for FastDP flows to go below the threshold and fix it.',
},
},
{
@@ -114,8 +114,8 @@ local servicePort = k.core.v1.service.mixin.spec.portsType;
severity: 'critical',
},
annotations: {
summary: 'WeaveNet FastDP flows is not happening in some or all nodes. Go to the below prometheus link for details.',
description: 'Actionable: Find the reason for fast dp being off.',
summary: 'FastDP flows is zero.',
description: 'actionable: Please find the reason for FastDP flows to be off and fix it.',
},
},
{
@@ -126,8 +126,8 @@ local servicePort = k.core.v1.service.mixin.spec.portsType;
severity: 'critical',
},
annotations: {
summary: 'A lot of connections are getting terminated. Go to the below prometheus link for details.',
description: 'Actionable: Find the reason for high connection termination rate and fix it.',
summary: 'A lot of connections are getting terminated.',
description: 'actionable: Please find the reason for the high connection termination rate and fix it.',
},
},
{
@@ -138,8 +138,8 @@ local servicePort = k.core.v1.service.mixin.spec.portsType;
severity: 'critical',
},
annotations: {
summary: 'A lot of connections are in connecting state. Go to the below prometheus link for details.',
description: 'Actionable: Find the reason and fix it.',
summary: 'A lot of connections are in connecting state.',
description: 'actionable: Please find the reason for this and fix it.',
},
},
{
@@ -150,8 +150,8 @@ local servicePort = k.core.v1.service.mixin.spec.portsType;
severity: 'critical',
},
annotations: {
summary: 'A lot of connections are in retrying state. Go to the below prometheus link for details.',
description: 'Actionable: Find the reason and fix it.',
summary: 'A lot of connections are in retrying state.',
description: 'actionable: Please find the reason for this and fix it.',
},
},
{
@@ -162,8 +162,8 @@ local servicePort = k.core.v1.service.mixin.spec.portsType;
severity: 'critical',
},
annotations: {
summary: 'A lot of connections are in pending state. Go to the below prometheus link for details.',
description: 'Actionable: Find the reason and fix it.',
summary: 'A lot of connections are in pending state.',
description: 'actionable: Please find the reason for this and fix it.',
},
},
{
@@ -174,8 +174,8 @@ local servicePort = k.core.v1.service.mixin.spec.portsType;
severity: 'critical',
},
annotations: {
summary: 'A lot of connections are in failed state. Go to the below prometheus link for details.',
description: 'Actionable: Find the reason and fix it.',
summary: 'A lot of connections are in failed state.',
description: 'actionable: Please find the reason and fix it.',
},
},
],
@@ -183,7 +183,7 @@ local servicePort = k.core.v1.service.mixin.spec.portsType;
],
},
grafanaDashboards+:: {
'weavenet.json': (import 'grafana-weavenet.json'),
'weavenet-cluster.json': (import 'grafana-weavenet-cluster.json'),
'weave-net.json': (import 'grafana-weave-net.json'),
'weave-net-cluster.json': (import 'grafana-weave-net-cluster.json'),
},
}