kubernetes组件指标梳理
本文梳理指标对应的的kubernetes版本为1.23.1, etcd版本为3.5.1
kube-apiserver 指标
kuber-apiserver暴露了148个指标,梳理后比较重要的指标如下。
# HELP apiserver_request_duration_seconds [STABLE] Response latency distribution in seconds for each verb, dry run value, group, version, resource, subresource, scope and component.
# TYPE apiserver_request_duration_seconds histogram
apiserver响应的时间分布,按照url 和 verb 分类
一般按照instance和verb+时间 汇聚
# HELP apiserver_request_total [STABLE] Counter of apiserver requests broken out for each verb, dry run value, group, version, resource, scope, component, and HTTP response code.
# TYPE apiserver_request_total counter
apiserver的请求总数,按照verb、 version、 group、resource、scope、component、 http返回码分类统计
# HELP apiserver_current_inflight_requests [STABLE] Maximal number of currently used inflight request limit of this apiserver per request kind in last second.
# TYPE apiserver_current_inflight_requests gauge
当前最大请求数(利用channel大小限流), 按mutating(非get list watch的请求) 和 readOnly (get list watch)分别限制
超过max-requests-inflight(默认值400) 和 max-mutating-requests-inflight(默认200) 的请求会被限流
apiserver变更时要注意观察,也是反馈集群容量的一个重要指标
# HELP apiserver_response_sizes [STABLE] Response size distribution in bytes for each group, version, verb, resource, subresource, scope and component.
# TYPE apiserver_response_sizes histogram
apiserver 响应大小,单位byte, 按照verb、 version、 group、resource、scope、component分类统计
# HELP watch_cache_capacity [ALPHA] Total capacity of watch cache broken by resource type.
# TYPE watch_cache_capacity gauge
按照资源类型统计的watch缓存大小
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
每秒钟用户态和系统态cpu消耗时间, 计算apiserver进程的cpu的使用率
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
apiserver的内存使用量(单位:Byte)
# HELP workqueue_adds_total [ALPHA] Total number of adds handled by workqueue
# TYPE workqueue_adds_total counter
apiserver中包含的controller的工作队列,已处理的任务总数
# HELP workqueue_depth [ALPHA] Current depth of workqueue
# TYPE workqueue_depth gauge
apiserver中包含的controller的工作队列深度,表示当前队列中要处理的任务的数量,数值越小越好
例如APIServiceRegistrationController admission_quota_controller