you are better than you think

备忘

last update:

问题描述

收到基础组件错误报警, 检查发现ES oom.

问题排查

10:21:50 vmtools 申请4kB(order=0) 内存触发了OOM .

Apr 11 10:21:50  kernel: ##vmtoolsd## invoked oom-killer: gfp_mask=0x280da, ##order=0##, oom_score_adj=0
Apr 11 10:21:50  kernel: vmtoolsd cpuset=/ mems_allowed=0
Apr 11 10:21:50  kernel: CPU: 6 PID: 867 Comm: vmtoolsd Not tainted 3.10.0-693.21.1.el7.x86_64 #1
Apr 11 10:21:50  kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
Apr 11 10:21:50  kernel: Call Trace:
Apr 11 10:21:50  kernel: [<ffffffff816ae7c8>] dump_stack+0x19/0x1b
Apr 11 10:21:50  kernel: [<ffffffff816a9b90>] dump_header+0x90/0x229
Apr 11 10:21:50  kernel: [<ffffffff810ecec2>] ? ktime_get_ts64+0x52/0xf0
Apr 11 10:21:50  kernel: [<ffffffff8114140f>] ? delayacct_end+0x8f/0xb0
Apr 11 10:21:50  kernel: [<ffffffff8118a884>] oom_kill_process+0x254/0x3d0
Apr 11 10:21:50  kernel: [<ffffffff8118a32d>] ? oom_unkillable_task+0xcd/0x120
Apr 11 10:21:50  kernel: [<ffffffff8118a3d6>] ? find_lock_task_mm+0x56/0xc0
Apr 11 10:21:50  kernel: [<ffffffff8118b0c6>] out_of_memory+0x4b6/0x4f0
Apr 11 10:21:50  kernel: [<ffffffff816aa694>] __alloc_pages_slowpath+0x5d6/0x724
Apr 11 10:21:50  kernel: [<ffffffff811912a5>] __alloc_pages_nodemask+0x405/0x420
Apr 11 10:21:50  kernel: [<ffffffff811d8a75>] alloc_pages_vma+0xb5/0x200
Apr 11 10:21:50  kernel: [<ffffffff811b6c50>] handle_mm_fault+0xb60/0xfa0
Apr 11 10:21:50  kernel: [<ffffffff81333563>] ? number.isra.2+0x323/0x360
Apr 11 10:21:50  kernel: [<ffffffff816bb504>] __do_page_fault+0x154/0x450
Apr 11 10:21:50  kernel: [<ffffffff816bb835>] do_page_fault+0x35/0x90
Apr 11 10:21:50  kernel: [<ffffffff816b7768>] page_fault+0x28/0x30
Apr 11 10:21:50  kernel: [<ffffffff81336539>] ? copy_user_generic_unrolled+0x89/0xc0
Apr 11 10:21:50  kernel: [<ffffffff8122b369>] ? seq_read+0x2c9/0x3e0
Apr 11 10:21:50  kernel: [<ffffffff812756b0>] proc_reg_read+0x40/0x80
Apr 11 10:21:50  kernel: [<ffffffff812054ef>] vfs_read+0x9f/0x170
Apr 11 10:21:50  kernel: [<ffffffff812063bf>] SyS_read+0x7f/0xe0
Apr 11 10:21:50  kernel: [<ffffffff816c0715>] system_call_fastpath+0x1c/0x21
Apr 11 10:21:50  kernel: Mem-Info:
Apr 11 10:21:50  kernel: active_anon:2911137 inactive_anon:58987 isolated_anon:0#012 active_file:231 inactive_file:1006 isolated_file:193#012 unevictable:941692 dirty:28 writeback:0 unstable:0#012 slab_reclaimable:57650 slab_unreclaimable:23641#012 mapped:69758 shmem:206054 pagetables:15540 bounce:0#012 free:33111 free_pcp:150 free_cma:0
Apr 11 10:21:50  kernel: Node 0 DMA ##free:15840kB## min:64kB low:80kB ##high:96kB## active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Apr 11 10:21:50  kernel: lowmem_reserve[]: 0 2977 ##16015## 16015
Apr 11 10:21:50  kernel: Node 0 DMA32 ##free:61692kB## min:12544kB low:15680kB ##high:18816kB## active_anon:2234420kB inactive_anon:41828kB active_file:0kB inactive_file:28kB unevictable:526724kB isolated(anon):0kB isolated(file):0kB present:3129152kB managed:3048960kB mlocked:526724kB dirty:0kB writeback:0kB mapped:55232kB shmem:147496kB slab_reclaimable:70336kB slab_unreclaimable:44232kB kernel_stack:12192kB pagetables:9352kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:88 all_unreclaimable? yes
Apr 11 10:21:50  kernel: lowmem_reserve[]: 0 0 ##13037## 13037
Apr 11 10:21:50  kernel: Node 0 Normal ##free:54912kB min:54968kB## low:68708kB high:82452kB active_anon:9410128kB inactive_anon:194120kB active_file:924kB inactive_file:3996kB unevictable:3240044kB isolated(anon):0kB isolated(file):772kB present:13631488kB managed:13350504kB mlocked:3240044kB dirty:112kB writeback:0kB mapped:223800kB shmem:676720kB slab_reclaimable:160264kB slab_unreclaimable:50300kB kernel_stack:9600kB pagetables:52808kB unstable:0kB bounce:0kB free_pcp:600kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:650 all_unreclaimable? no
Apr 11 10:21:50  kernel: lowmem_reserve[]: 0 0 0 0
Apr 11 10:21:50  kernel: Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15840kB
Apr 11 10:21:50  kernel: Node 0 DMA32: 4174*4kB (UEM) 1746*8kB (UEM) 1143*16kB (UEM) 387*32kB (EM) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 61336kB
Apr 11 10:21:50  kernel: Node 0 Normal: 13786*4kB (UE) 19*8kB (U) 1*16kB (M) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 55312kB
Apr 11 10:21:50  kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Apr 11 10:21:50  kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Apr 11 10:21:50  kernel: 273226 total pagecache pages
Apr 11 10:21:50  kernel: 0 pages in swap cache
Apr 11 10:21:50  kernel: Swap cache stats: add 0, delete 0, find 0/0
Apr 11 10:21:50  kernel: Free swap  = 0kB
Apr 11 10:21:50  kernel: Total swap = 0kB
Apr 11 10:21:50  kernel: 4194157 pages RAM
Apr 11 10:21:50  kernel: 0 pages HighMem/MovableOnly
Apr 11 10:21:50  kernel: 90315 pages reserved

此时node0 normal 4kB 是包含E类型的内存页的,为什么会申请失败,先计算一下? UME, 分别 表示UNMOVABEL/RECLAIMABAL/MOVABLE)

背景

去年11月份研究了一下开源的datadog agent代码(7.32.1), 整理了一篇文档。

1 日志数据流转

  1. tailer首先从log文件读取,将读取的内容源源不断地发送到 decoder的 input channel中
  2. decoder 从自身的input channel读取数据 ,判断数据是否需要截断,将数据写入line parser的 input channel
  3. line parser从自身的input channel读取数据,解析内容、status、时间戳等,写入line handler的 input channel
  4. line handler从自身的input channel 读取数据,去除空格,发送到自身的output channel
  5. tailer forwardMessage 从decoder的output channel(与line handler共享)读取数据,添加tag后, 发送给pipeline的 input channel
  6. processor 从自身的input channel(pipe line的input channel)读取数据,encode后(比如encode为json/pb格式),发送到sender的input channel
  7. sender 从input channel读取数据,最后又将message写入pipeline的output channle
  8. sender将message的content发送给datadog 后台,发送时默认不压缩传输,http支持gzip压缩传输,tcp不支持压缩。
  9. pipeline的output channel初始传入的是auditor的 input channel。 auditor从input channel 读取数据,写入内存 ,定时器从buffer刷入磁盘,另外一个定时器定期清理内存过期数据

kubernetes e2e test

一 背景描述

线上最大的集群增长即将达到社区的最大节点数,团队比较关注, 按照当前线上的使用方式单个集群能支撑到多大规模, 因此安排了这次测试。

二 测试版本及配置信息

kubernetes 1.12.4

硬件配置:

cpu mem disk
2*Intel-E5-2670v3 8*16G 12*300G

集群由294台M10构成, 单位:台

集群 etcd台数 master台数 node数
挂载集群 3 2 283
测试集群 3 3 8k(hollow-node)

三 模型简述:

1 社区认为每个node上30个pod是正常负载,因此饱和性测试最终生成8k*30=24w 个pod。 记录24w pod生成时间,计算出调度的吞吐量。这个过程主要是用用于模拟集群大面积故障 ,恢复全部服务的时间。 =》 Scheduling throughput