you are better than you think

docker-runc关闭kmem

· by thur · Read in about 3 min · (438 Words)
runc kmem docker-runc

问题描述

值班同学反馈线上一台centos7的物理机container层kmem打开了 kubepods层kmem是关闭的。

问题排查

kubelet如果是打开了kmem,从kubepods层就能看到slabinfo的内容,因此排除是kubelet版本低的问题。继续排查18版本的docker-runc代码部分,发现runc中默认是打开kmem的。

13和18版本runc关于kmem部分的代码对比

docker13 runc docker18 runc

在docker18.09版本中提供了编译参数关闭kmem。按照libcontainer: ability to compile without kmem 提示, 下载runc代码,注意不是https://github.com/opencontainers/runc.git

git clone https://github.com/docker/runc.git
git checkout 18.06 

这个commit利用了build tag,编译参数指定nokmem时,编译的文件是libcontainer/cgroups/fs/kmem_disable.go,

 0// +build linux,nokmem
 1
 2package fs
 3
 4func EnableKernelMemoryAccounting(path string) error {
 5	return nil
 6}
 7
 8func setKernelMemory(path string, kernelMemoryLimit int64) error {
 9	return nil
10}

否则就是libcontainer/cgroups/fs/kmem.go

 0// +build linux,!nokmem
 1
 2package fs
 3
 4import (
 5	"fmt"
 6	"io/ioutil"
 7	"os"
 8	"path/filepath"
 9	"strconv"
10	"syscall" // for Errno type only
11
12	"github.com/opencontainers/runc/libcontainer/cgroups"
13	"golang.org/x/sys/unix"
14)
15
16const cgroupKernelMemoryLimit = "memory.kmem.limit_in_bytes"
17
18func EnableKernelMemoryAccounting(path string) error {
19	// Check if kernel memory is enabled
20	// We have to limit the kernel memory here as it won't be accounted at all
21	// until a limit is set on the cgroup and limit cannot be set once the
22	// cgroup has children, or if there are already tasks in the cgroup.
23	for _, i := range []int64{1, -1} {
24		if err := setKernelMemory(path, i); err != nil {
25			return err
26		}
27	}
28	return nil
29}
30
31func setKernelMemory(path string, kernelMemoryLimit int64) error {
32	if path == "" {
33		return fmt.Errorf("no such directory for %s", cgroupKernelMemoryLimit)
34	}
35	if !cgroups.PathExists(filepath.Join(path, cgroupKernelMemoryLimit)) {
36		// kernel memory is not enabled on the system so we should do nothing
37		return nil
38	}
39	if err := ioutil.WriteFile(filepath.Join(path, cgroupKernelMemoryLimit), []byte(strconv.FormatInt(kernelMemoryLimit, 10)), 0700); err != nil {
40		// Check if the error number returned by the syscall is "EBUSY"
41		// The EBUSY signal is returned on attempts to write to the
42		// memory.kmem.limit_in_bytes file if the cgroup has children or
43		// once tasks have been attached to the cgroup
44		if pathErr, ok := err.(*os.PathError); ok {
45			if errNo, ok := pathErr.Err.(syscall.Errno); ok {
46				if errNo == unix.EBUSY {
47					return fmt.Errorf("failed to set %s, because either tasks have already joined this cgroup or it has children", cgroupKernelMemoryLimit)
48				}
49			}
50		}
51		return fmt.Errorf("failed to write %v to %v: %v", kernelMemoryLimit, cgroupKernelMemoryLimit, err)
52	}
53	return nil
54}

顺带给Makefile增加了一个kmemtag, 执行docker-runc -version时 会输出

a592beb5bc4c4092b1b1bac971afed27687340c5-dirty-nokmem
或
a592beb5bc4c4092b1b1bac971afed27687340c5-dirty-kmem

替换前后docker info输出对比

替换前 替换后

测试

经过 centos7+docker18centos8+docker18场景测试,均符合预期

ps. 测试时报错 ./docker-runc: symbol lookup error: ./docker-runc: undefined symbol: seccomp_version 这个报错是因为libseccomp版本过低导致,升级到2.3以上版本可以解决。

大致替换步骤

  1. 检查目标机器上libseccomp版本,低于2.3 ,需要升级libseccomp

    rpm -qa |grep libseccomp
    yum update -y  libseccomp
    
  2. 替换docker-runc (建议替换前停止kubelet,防止替换过程中有容器启动,替换后再启动kubelet)

    cd /root/ && wget 10.85.99.38:8008/docker-runc -O docker-runc  && chmod u+x docker-runc  && mv /usr/bin/docker-runc{,.bak} && mv docker-runc /usr/bin/docker-runc
    
  3. 检查runc 版本

    docker-runc -version |awk '{if ($0 ~ "commit") print $2}'
    

    预期输出 a592beb5bc4c4092b1b1bac971afed27687340c5-dirty-nokmem

  4. 其他检查项

    a. 存量容器容器未发生重建
    b. centos7上新建容器kmem已关闭
    c. node 状态为ready
    d. kubelet 状态为running
    

Comments