you are better than you think

备忘

last update:

kubernetes监控

kubernetes监控

云原生包含了开源软件、云计算和应用架构的元素。云计算解决开源软件的运行门槛问题,同时降低了运维成本和基础架构成本; 云原生给出了更多的应用架构规范, 更聚焦于能力和生态。随着kubernetes在基础设施的大规模落地,监控的需求也发生了变化,建设一套更符合云原生规范的监控体系势在必行。

一 监控需求变化

  • 指标周期变短: 相比物理机时代, 基础设施动态化,Pod销毁重建非常频繁,监控指标跟随Pod的生命周期。
  • 指标数量增加: 随着微服务化流行,指标的数量也大幅增长,研发工程师也更愿意埋点,获取服务状态;各种采集器层出不穷,指标应采尽采
  • 指标维度更加丰富:物理机时代监控多从资源视角出发,更关注机器、交换机、中间件的采集;新的监控维度更加丰富,维度标签动辄几十上百个,甚至组合会有高基数问题
  • 基础设施复杂度变高,监控难度增加:kubernetes组件和应用架构模型都需要投入时间去了解学习。kubernetes本身组件都通过/metrics接口暴露了监控数据,但是缺少体系化的文档指导和最佳实践总结
  • 自动发现更重要:相比物理机时代的静态采集,自动发现采集目标的能力变得更重要

问题描述

收到基础组件错误报警, 检查发现ES oom.

问题排查

10:21:50 vmtools 申请4kB(order=0) 内存触发了OOM .

Apr 11 10:21:50  kernel: ##vmtoolsd## invoked oom-killer: gfp_mask=0x280da, ##order=0##, oom_score_adj=0
Apr 11 10:21:50  kernel: vmtoolsd cpuset=/ mems_allowed=0
Apr 11 10:21:50  kernel: CPU: 6 PID: 867 Comm: vmtoolsd Not tainted 3.10.0-693.21.1.el7.x86_64 #1
Apr 11 10:21:50  kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
Apr 11 10:21:50  kernel: Call Trace:
Apr 11 10:21:50  kernel: [<ffffffff816ae7c8>] dump_stack+0x19/0x1b
Apr 11 10:21:50  kernel: [<ffffffff816a9b90>] dump_header+0x90/0x229
Apr 11 10:21:50  kernel: [<ffffffff810ecec2>] ? ktime_get_ts64+0x52/0xf0
Apr 11 10:21:50  kernel: [<ffffffff8114140f>] ? delayacct_end+0x8f/0xb0
Apr 11 10:21:50  kernel: [<ffffffff8118a884>] oom_kill_process+0x254/0x3d0
Apr 11 10:21:50  kernel: [<ffffffff8118a32d>] ? oom_unkillable_task+0xcd/0x120
Apr 11 10:21:50  kernel: [<ffffffff8118a3d6>] ? find_lock_task_mm+0x56/0xc0
Apr 11 10:21:50  kernel: [<ffffffff8118b0c6>] out_of_memory+0x4b6/0x4f0
Apr 11 10:21:50  kernel: [<ffffffff816aa694>] __alloc_pages_slowpath+0x5d6/0x724
Apr 11 10:21:50  kernel: [<ffffffff811912a5>] __alloc_pages_nodemask+0x405/0x420
Apr 11 10:21:50  kernel: [<ffffffff811d8a75>] alloc_pages_vma+0xb5/0x200
Apr 11 10:21:50  kernel: [<ffffffff811b6c50>] handle_mm_fault+0xb60/0xfa0
Apr 11 10:21:50  kernel: [<ffffffff81333563>] ? number.isra.2+0x323/0x360
Apr 11 10:21:50  kernel: [<ffffffff816bb504>] __do_page_fault+0x154/0x450
Apr 11 10:21:50  kernel: [<ffffffff816bb835>] do_page_fault+0x35/0x90
Apr 11 10:21:50  kernel: [<ffffffff816b7768>] page_fault+0x28/0x30
Apr 11 10:21:50  kernel: [<ffffffff81336539>] ? copy_user_generic_unrolled+0x89/0xc0
Apr 11 10:21:50  kernel: [<ffffffff8122b369>] ? seq_read+0x2c9/0x3e0
Apr 11 10:21:50  kernel: [<ffffffff812756b0>] proc_reg_read+0x40/0x80
Apr 11 10:21:50  kernel: [<ffffffff812054ef>] vfs_read+0x9f/0x170
Apr 11 10:21:50  kernel: [<ffffffff812063bf>] SyS_read+0x7f/0xe0
Apr 11 10:21:50  kernel: [<ffffffff816c0715>] system_call_fastpath+0x1c/0x21
Apr 11 10:21:50  kernel: Mem-Info:
Apr 11 10:21:50  kernel: active_anon:2911137 inactive_anon:58987 isolated_anon:0#012 active_file:231 inactive_file:1006 isolated_file:193#012 unevictable:941692 dirty:28 writeback:0 unstable:0#012 slab_reclaimable:57650 slab_unreclaimable:23641#012 mapped:69758 shmem:206054 pagetables:15540 bounce:0#012 free:33111 free_pcp:150 free_cma:0
Apr 11 10:21:50  kernel: Node 0 DMA ##free:15840kB## min:64kB low:80kB ##high:96kB## active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Apr 11 10:21:50  kernel: lowmem_reserve[]: 0 2977 ##16015## 16015
Apr 11 10:21:50  kernel: Node 0 DMA32 ##free:61692kB## min:12544kB low:15680kB ##high:18816kB## active_anon:2234420kB inactive_anon:41828kB active_file:0kB inactive_file:28kB unevictable:526724kB isolated(anon):0kB isolated(file):0kB present:3129152kB managed:3048960kB mlocked:526724kB dirty:0kB writeback:0kB mapped:55232kB shmem:147496kB slab_reclaimable:70336kB slab_unreclaimable:44232kB kernel_stack:12192kB pagetables:9352kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:88 all_unreclaimable? yes
Apr 11 10:21:50  kernel: lowmem_reserve[]: 0 0 ##13037## 13037
Apr 11 10:21:50  kernel: Node 0 Normal ##free:54912kB min:54968kB## low:68708kB high:82452kB active_anon:9410128kB inactive_anon:194120kB active_file:924kB inactive_file:3996kB unevictable:3240044kB isolated(anon):0kB isolated(file):772kB present:13631488kB managed:13350504kB mlocked:3240044kB dirty:112kB writeback:0kB mapped:223800kB shmem:676720kB slab_reclaimable:160264kB slab_unreclaimable:50300kB kernel_stack:9600kB pagetables:52808kB unstable:0kB bounce:0kB free_pcp:600kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:650 all_unreclaimable? no
Apr 11 10:21:50  kernel: lowmem_reserve[]: 0 0 0 0
Apr 11 10:21:50  kernel: Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15840kB
Apr 11 10:21:50  kernel: Node 0 DMA32: 4174*4kB (UEM) 1746*8kB (UEM) 1143*16kB (UEM) 387*32kB (EM) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 61336kB
Apr 11 10:21:50  kernel: Node 0 Normal: 13786*4kB (UE) 19*8kB (U) 1*16kB (M) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 55312kB
Apr 11 10:21:50  kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Apr 11 10:21:50  kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Apr 11 10:21:50  kernel: 273226 total pagecache pages
Apr 11 10:21:50  kernel: 0 pages in swap cache
Apr 11 10:21:50  kernel: Swap cache stats: add 0, delete 0, find 0/0
Apr 11 10:21:50  kernel: Free swap  = 0kB
Apr 11 10:21:50  kernel: Total swap = 0kB
Apr 11 10:21:50  kernel: 4194157 pages RAM
Apr 11 10:21:50  kernel: 0 pages HighMem/MovableOnly
Apr 11 10:21:50  kernel: 90315 pages reserved

此时node0 normal 4kB 是包含E类型的内存页的,为什么会申请失败,先计算一下? UME, 分别 表示UNMOVABEL/RECLAIMABAL/MOVABLE)

背景

去年11月份研究了一下开源的datadog agent代码(7.32.1), 整理了一篇文档。

1 日志数据流转

  1. tailer首先从log文件读取,将读取的内容源源不断地发送到 decoder的 input channel中
  2. decoder 从自身的input channel读取数据 ,判断数据是否需要截断,将数据写入line parser的 input channel
  3. line parser从自身的input channel读取数据,解析内容、status、时间戳等,写入line handler的 input channel
  4. line handler从自身的input channel 读取数据,去除空格,发送到自身的output channel
  5. tailer forwardMessage 从decoder的output channel(与line handler共享)读取数据,添加tag后, 发送给pipeline的 input channel
  6. processor 从自身的input channel(pipe line的input channel)读取数据,encode后(比如encode为json/pb格式),发送到sender的input channel
  7. sender 从input channel读取数据,最后又将message写入pipeline的output channle
  8. sender将message的content发送给datadog 后台,发送时默认不压缩传输,http支持gzip压缩传输,tcp不支持压缩。
  9. pipeline的output channel初始传入的是auditor的 input channel。 auditor从input channel 读取数据,写入内存 ,定时器从buffer刷入磁盘,另外一个定时器定期清理内存过期数据