The following provides a description and cardinality estimation based on the tests in a local cluster:
container_blkio_device_usage_total - useful for containers, but not for system services (nodes*disks*services*operations*2)
container_fs_.* - add filesystem read/write data (nodes*disks*services*4)
container_file_descriptors - file descriptors limits and global numbers are exposed via (nodes*services)
container_threads_max - max number of threads in cgroup. Usually for system services it is not limited (nodes*services)
container_threads - used threads in cgroup. Usually not important for system services (nodes*services)
container_sockets - used sockets in cgroup. Usually not important for system services (nodes*services)
container_start_time_seconds - container start. Possibly not needed for system services (nodes*services)
container_last_seen - Not needed as system services are always running (nodes*services)
container_spec_.* - Everything related to cgroup specification and thus static data (nodes*services*5)
Adding a PodDisruptionBudget to prometheus-adapter ensure that at least
one replica of the adapter is always available. This make sure that even
during disruption the aggregated API is available and thus does not
impact the availability of the apiserver.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
Prometheus-adapter is a component of the monitoring stack that in most
cases require to be highly available. For instance, we most likely
always want the autoscaling pipeline to be available and we also want to
avoid having no available backends serving the metrics API apiservices
has it would result in both the AggregatedAPIDown alert firing and the
kubectl top command not working anymore.
In order to make the adapter highly-avaible, we need to increase its
replica count to 2 and come up with a rolling update strategy and a
pod anti-affinity rule based on the kubernetes hostname to prevent the
adapters to be scheduled on the same node. The default rolling update
strategy for deployments isn't enough as the default maxUnavaible value
is 25% and is rounded down to 0. This means that during rolling-updates
scheduling will fail if there isn't more nodes than the number of
replicas. As for the maxSurge, the default should be fine as it is
rounded up to 1, but for clarity it might be better to just set it to 1.
For the pod anti-affinity constraints, it would be best if it was hard,
but having it soft should be good enough and fit most use-cases.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>