you are better than you think
kubernetes监控
云原生包含了开源软件、云计算和应用架构的元素。云计算解决开源软件的运行门槛问题,同时降低了运维成本和基础架构成本; 云原生给出了更多的应用架构规范, 更聚焦于能力和生态。随着kubernetes在基础设施的大规模落地,监控的需求也发生了变化,建设一套更符合云原生规范的监控体系势在必行。
一 监控需求变化
- 指标周期变短: 相比物理机时代, 基础设施动态化,Pod销毁重建非常频繁,监控指标跟随Pod的生命周期。
- 指标数量增加: 随着微服务化流行,指标的数量也大幅增长,研发工程师也更愿意埋点,获取服务状态;各种采集器层出不穷,指标应采尽采
- 指标维度更加丰富:物理机时代监控多从资源视角出发,更关注机器、交换机、中间件的采集;新的监控维度更加丰富,维度标签动辄几十上百个,甚至组合会有高基数问题
- 基础设施复杂度变高,监控难度增加:kubernetes组件和应用架构模型都需要投入时间去了解学习。kubernetes本身组件都通过/metrics接口暴露了监控数据,但是缺少体系化的文档指导和最佳实践总结
- 自动发现更重要:相比物理机时代的静态采集,自动发现采集目标的能力变得更重要
问题描述
收到基础组件错误报警, 检查发现ES oom.
问题排查
10:21:50 vmtools 申请4kB(order=0) 内存触发了OOM .
Apr 11 10:21:50 kernel: ##vmtoolsd## invoked oom-killer: gfp_mask=0x280da, ##order=0##, oom_score_adj=0
Apr 11 10:21:50 kernel: vmtoolsd cpuset=/ mems_allowed=0
Apr 11 10:21:50 kernel: CPU: 6 PID: 867 Comm: vmtoolsd Not tainted 3.10.0-693.21.1.el7.x86_64 #1
Apr 11 10:21:50 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
Apr 11 10:21:50 kernel: Call Trace:
Apr 11 10:21:50 kernel: [<ffffffff816ae7c8>] dump_stack+0x19/0x1b
Apr 11 10:21:50 kernel: [<ffffffff816a9b90>] dump_header+0x90/0x229
Apr 11 10:21:50 kernel: [<ffffffff810ecec2>] ? ktime_get_ts64+0x52/0xf0
Apr 11 10:21:50 kernel: [<ffffffff8114140f>] ? delayacct_end+0x8f/0xb0
Apr 11 10:21:50 kernel: [<ffffffff8118a884>] oom_kill_process+0x254/0x3d0
Apr 11 10:21:50 kernel: [<ffffffff8118a32d>] ? oom_unkillable_task+0xcd/0x120
Apr 11 10:21:50 kernel: [<ffffffff8118a3d6>] ? find_lock_task_mm+0x56/0xc0
Apr 11 10:21:50 kernel: [<ffffffff8118b0c6>] out_of_memory+0x4b6/0x4f0
Apr 11 10:21:50 kernel: [<ffffffff816aa694>] __alloc_pages_slowpath+0x5d6/0x724
Apr 11 10:21:50 kernel: [<ffffffff811912a5>] __alloc_pages_nodemask+0x405/0x420
Apr 11 10:21:50 kernel: [<ffffffff811d8a75>] alloc_pages_vma+0xb5/0x200
Apr 11 10:21:50 kernel: [<ffffffff811b6c50>] handle_mm_fault+0xb60/0xfa0
Apr 11 10:21:50 kernel: [<ffffffff81333563>] ? number.isra.2+0x323/0x360
Apr 11 10:21:50 kernel: [<ffffffff816bb504>] __do_page_fault+0x154/0x450
Apr 11 10:21:50 kernel: [<ffffffff816bb835>] do_page_fault+0x35/0x90
Apr 11 10:21:50 kernel: [<ffffffff816b7768>] page_fault+0x28/0x30
Apr 11 10:21:50 kernel: [<ffffffff81336539>] ? copy_user_generic_unrolled+0x89/0xc0
Apr 11 10:21:50 kernel: [<ffffffff8122b369>] ? seq_read+0x2c9/0x3e0
Apr 11 10:21:50 kernel: [<ffffffff812756b0>] proc_reg_read+0x40/0x80
Apr 11 10:21:50 kernel: [<ffffffff812054ef>] vfs_read+0x9f/0x170
Apr 11 10:21:50 kernel: [<ffffffff812063bf>] SyS_read+0x7f/0xe0
Apr 11 10:21:50 kernel: [<ffffffff816c0715>] system_call_fastpath+0x1c/0x21
Apr 11 10:21:50 kernel: Mem-Info:
Apr 11 10:21:50 kernel: active_anon:2911137 inactive_anon:58987 isolated_anon:0#012 active_file:231 inactive_file:1006 isolated_file:193#012 unevictable:941692 dirty:28 writeback:0 unstable:0#012 slab_reclaimable:57650 slab_unreclaimable:23641#012 mapped:69758 shmem:206054 pagetables:15540 bounce:0#012 free:33111 free_pcp:150 free_cma:0
Apr 11 10:21:50 kernel: Node 0 DMA ##free:15840kB## min:64kB low:80kB ##high:96kB## active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Apr 11 10:21:50 kernel: lowmem_reserve[]: 0 2977 ##16015## 16015
Apr 11 10:21:50 kernel: Node 0 DMA32 ##free:61692kB## min:12544kB low:15680kB ##high:18816kB## active_anon:2234420kB inactive_anon:41828kB active_file:0kB inactive_file:28kB unevictable:526724kB isolated(anon):0kB isolated(file):0kB present:3129152kB managed:3048960kB mlocked:526724kB dirty:0kB writeback:0kB mapped:55232kB shmem:147496kB slab_reclaimable:70336kB slab_unreclaimable:44232kB kernel_stack:12192kB pagetables:9352kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:88 all_unreclaimable? yes
Apr 11 10:21:50 kernel: lowmem_reserve[]: 0 0 ##13037## 13037
Apr 11 10:21:50 kernel: Node 0 Normal ##free:54912kB min:54968kB## low:68708kB high:82452kB active_anon:9410128kB inactive_anon:194120kB active_file:924kB inactive_file:3996kB unevictable:3240044kB isolated(anon):0kB isolated(file):772kB present:13631488kB managed:13350504kB mlocked:3240044kB dirty:112kB writeback:0kB mapped:223800kB shmem:676720kB slab_reclaimable:160264kB slab_unreclaimable:50300kB kernel_stack:9600kB pagetables:52808kB unstable:0kB bounce:0kB free_pcp:600kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:650 all_unreclaimable? no
Apr 11 10:21:50 kernel: lowmem_reserve[]: 0 0 0 0
Apr 11 10:21:50 kernel: Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15840kB
Apr 11 10:21:50 kernel: Node 0 DMA32: 4174*4kB (UEM) 1746*8kB (UEM) 1143*16kB (UEM) 387*32kB (EM) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 61336kB
Apr 11 10:21:50 kernel: Node 0 Normal: 13786*4kB (UE) 19*8kB (U) 1*16kB (M) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 55312kB
Apr 11 10:21:50 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Apr 11 10:21:50 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Apr 11 10:21:50 kernel: 273226 total pagecache pages
Apr 11 10:21:50 kernel: 0 pages in swap cache
Apr 11 10:21:50 kernel: Swap cache stats: add 0, delete 0, find 0/0
Apr 11 10:21:50 kernel: Free swap = 0kB
Apr 11 10:21:50 kernel: Total swap = 0kB
Apr 11 10:21:50 kernel: 4194157 pages RAM
Apr 11 10:21:50 kernel: 0 pages HighMem/MovableOnly
Apr 11 10:21:50 kernel: 90315 pages reserved
此时node0 normal 4kB 是包含E类型的内存页的,为什么会申请失败,先计算一下?
UME, 分别 表示UNMOVABEL/RECLAIMABAL/MOVABLE)
背景
去年11月份研究了一下开源的datadog agent代码(7.32.1), 整理了一篇文档。
1 日志数据流转
- tailer首先从log文件读取,将读取的内容源源不断地发送到 decoder的 input channel中
- decoder 从自身的input channel读取数据 ,判断数据是否需要截断,将数据写入line parser的 input channel
- line parser从自身的input channel读取数据,解析内容、status、时间戳等,写入line handler的 input channel
- line handler从自身的input channel 读取数据,去除空格,发送到自身的output channel
- tailer forwardMessage 从decoder的output channel(与line handler共享)读取数据,添加tag后, 发送给pipeline的 input channel
- processor 从自身的input channel(pipe line的input channel)读取数据,encode后(比如encode为json/pb格式),发送到sender的input channel
- sender 从input channel读取数据,最后又将message写入pipeline的output channle
- sender将message的content发送给datadog 后台,发送时默认不压缩传输,http支持gzip压缩传输,tcp不支持压缩。
- pipeline的output channel初始传入的是auditor的 input channel。 auditor从input channel 读取数据,写入内存 ,定时器从buffer刷入磁盘,另外一个定时器定期清理内存过期数据