问题描述
收到基础组件错误报警, 检查发现ES oom.
问题排查
10:21:50 vmtools 申请4kB(order=0) 内存触发了OOM .
Apr 11 10:21:50 kernel: ##vmtoolsd## invoked oom-killer: gfp_mask=0x280da, ##order=0##, oom_score_adj=0
Apr 11 10:21:50 kernel: vmtoolsd cpuset=/ mems_allowed=0
Apr 11 10:21:50 kernel: CPU: 6 PID: 867 Comm: vmtoolsd Not tainted 3.10.0-693.21.1.el7.x86_64 #1
Apr 11 10:21:50 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
Apr 11 10:21:50 kernel: Call Trace:
Apr 11 10:21:50 kernel: [<ffffffff816ae7c8>] dump_stack+0x19/0x1b
Apr 11 10:21:50 kernel: [<ffffffff816a9b90>] dump_header+0x90/0x229
Apr 11 10:21:50 kernel: [<ffffffff810ecec2>] ? ktime_get_ts64+0x52/0xf0
Apr 11 10:21:50 kernel: [<ffffffff8114140f>] ? delayacct_end+0x8f/0xb0
Apr 11 10:21:50 kernel: [<ffffffff8118a884>] oom_kill_process+0x254/0x3d0
Apr 11 10:21:50 kernel: [<ffffffff8118a32d>] ? oom_unkillable_task+0xcd/0x120
Apr 11 10:21:50 kernel: [<ffffffff8118a3d6>] ? find_lock_task_mm+0x56/0xc0
Apr 11 10:21:50 kernel: [<ffffffff8118b0c6>] out_of_memory+0x4b6/0x4f0
Apr 11 10:21:50 kernel: [<ffffffff816aa694>] __alloc_pages_slowpath+0x5d6/0x724
Apr 11 10:21:50 kernel: [<ffffffff811912a5>] __alloc_pages_nodemask+0x405/0x420
Apr 11 10:21:50 kernel: [<ffffffff811d8a75>] alloc_pages_vma+0xb5/0x200
Apr 11 10:21:50 kernel: [<ffffffff811b6c50>] handle_mm_fault+0xb60/0xfa0
Apr 11 10:21:50 kernel: [<ffffffff81333563>] ? number.isra.2+0x323/0x360
Apr 11 10:21:50 kernel: [<ffffffff816bb504>] __do_page_fault+0x154/0x450
Apr 11 10:21:50 kernel: [<ffffffff816bb835>] do_page_fault+0x35/0x90
Apr 11 10:21:50 kernel: [<ffffffff816b7768>] page_fault+0x28/0x30
Apr 11 10:21:50 kernel: [<ffffffff81336539>] ? copy_user_generic_unrolled+0x89/0xc0
Apr 11 10:21:50 kernel: [<ffffffff8122b369>] ? seq_read+0x2c9/0x3e0
Apr 11 10:21:50 kernel: [<ffffffff812756b0>] proc_reg_read+0x40/0x80
Apr 11 10:21:50 kernel: [<ffffffff812054ef>] vfs_read+0x9f/0x170
Apr 11 10:21:50 kernel: [<ffffffff812063bf>] SyS_read+0x7f/0xe0
Apr 11 10:21:50 kernel: [<ffffffff816c0715>] system_call_fastpath+0x1c/0x21
Apr 11 10:21:50 kernel: Mem-Info:
Apr 11 10:21:50 kernel: active_anon:2911137 inactive_anon:58987 isolated_anon:0#012 active_file:231 inactive_file:1006 isolated_file:193#012 unevictable:941692 dirty:28 writeback:0 unstable:0#012 slab_reclaimable:57650 slab_unreclaimable:23641#012 mapped:69758 shmem:206054 pagetables:15540 bounce:0#012 free:33111 free_pcp:150 free_cma:0
Apr 11 10:21:50 kernel: Node 0 DMA ##free:15840kB## min:64kB low:80kB ##high:96kB## active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Apr 11 10:21:50 kernel: lowmem_reserve[]: 0 2977 ##16015## 16015
Apr 11 10:21:50 kernel: Node 0 DMA32 ##free:61692kB## min:12544kB low:15680kB ##high:18816kB## active_anon:2234420kB inactive_anon:41828kB active_file:0kB inactive_file:28kB unevictable:526724kB isolated(anon):0kB isolated(file):0kB present:3129152kB managed:3048960kB mlocked:526724kB dirty:0kB writeback:0kB mapped:55232kB shmem:147496kB slab_reclaimable:70336kB slab_unreclaimable:44232kB kernel_stack:12192kB pagetables:9352kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:88 all_unreclaimable? yes
Apr 11 10:21:50 kernel: lowmem_reserve[]: 0 0 ##13037## 13037
Apr 11 10:21:50 kernel: Node 0 Normal ##free:54912kB min:54968kB## low:68708kB high:82452kB active_anon:9410128kB inactive_anon:194120kB active_file:924kB inactive_file:3996kB unevictable:3240044kB isolated(anon):0kB isolated(file):772kB present:13631488kB managed:13350504kB mlocked:3240044kB dirty:112kB writeback:0kB mapped:223800kB shmem:676720kB slab_reclaimable:160264kB slab_unreclaimable:50300kB kernel_stack:9600kB pagetables:52808kB unstable:0kB bounce:0kB free_pcp:600kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:650 all_unreclaimable? no
Apr 11 10:21:50 kernel: lowmem_reserve[]: 0 0 0 0
Apr 11 10:21:50 kernel: Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15840kB
Apr 11 10:21:50 kernel: Node 0 DMA32: 4174*4kB (UEM) 1746*8kB (UEM) 1143*16kB (UEM) 387*32kB (EM) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 61336kB
Apr 11 10:21:50 kernel: Node 0 Normal: 13786*4kB (UE) 19*8kB (U) 1*16kB (M) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 55312kB
Apr 11 10:21:50 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Apr 11 10:21:50 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Apr 11 10:21:50 kernel: 273226 total pagecache pages
Apr 11 10:21:50 kernel: 0 pages in swap cache
Apr 11 10:21:50 kernel: Swap cache stats: add 0, delete 0, find 0/0
Apr 11 10:21:50 kernel: Free swap = 0kB
Apr 11 10:21:50 kernel: Total swap = 0kB
Apr 11 10:21:50 kernel: 4194157 pages RAM
Apr 11 10:21:50 kernel: 0 pages HighMem/MovableOnly
Apr 11 10:21:50 kernel: 90315 pages reserved
此时node0 normal 4kB 是包含E类型的内存页的,为什么会申请失败,先计算一下?
UME, 分别 表示UNMOVABEL/RECLAIMABAL/MOVABLE)