docker-runc关闭kmem
问题描述
值班同学反馈线上一台centos7的物理机container层kmem打开了 kubepods层kmem是关闭的。
问题排查
kubelet如果是打开了kmem,从kubepods层就能看到slabinfo的内容,因此排除是kubelet版本低的问题。继续排查18版本的docker-runc代码部分,发现runc中默认是打开kmem的。
13和18版本runc关于kmem部分的代码对比
docker13 runc | docker18 runc |
在docker18.09版本中提供了编译参数关闭kmem。按照libcontainer: ability to compile without kmem 提示, 下载runc代码,注意不是https://github.com/opencontainers/runc.git
git clone https://github.com/docker/runc.git
git checkout 18.06
这个commit利用了build tag,编译参数指定nokmem时,编译的文件是libcontainer/cgroups/fs/kmem_disable.go
,
0// +build linux,nokmem
1
2package fs
3
4func EnableKernelMemoryAccounting(path string) error {
5 return nil
6}
7
8func setKernelMemory(path string, kernelMemoryLimit int64) error {
9 return nil
10}
否则就是libcontainer/cgroups/fs/kmem.go
0// +build linux,!nokmem
1
2package fs
3
4import (
5 "fmt"
6 "io/ioutil"
7 "os"
8 "path/filepath"
9 "strconv"
10 "syscall" // for Errno type only
11
12 "github.com/opencontainers/runc/libcontainer/cgroups"
13 "golang.org/x/sys/unix"
14)
15
16const cgroupKernelMemoryLimit = "memory.kmem.limit_in_bytes"
17
18func EnableKernelMemoryAccounting(path string) error {
19 // Check if kernel memory is enabled
20 // We have to limit the kernel memory here as it won't be accounted at all
21 // until a limit is set on the cgroup and limit cannot be set once the
22 // cgroup has children, or if there are already tasks in the cgroup.
23 for _, i := range []int64{1, -1} {
24 if err := setKernelMemory(path, i); err != nil {
25 return err
26 }
27 }
28 return nil
29}
30
31func setKernelMemory(path string, kernelMemoryLimit int64) error {
32 if path == "" {
33 return fmt.Errorf("no such directory for %s", cgroupKernelMemoryLimit)
34 }
35 if !cgroups.PathExists(filepath.Join(path, cgroupKernelMemoryLimit)) {
36 // kernel memory is not enabled on the system so we should do nothing
37 return nil
38 }
39 if err := ioutil.WriteFile(filepath.Join(path, cgroupKernelMemoryLimit), []byte(strconv.FormatInt(kernelMemoryLimit, 10)), 0700); err != nil {
40 // Check if the error number returned by the syscall is "EBUSY"
41 // The EBUSY signal is returned on attempts to write to the
42 // memory.kmem.limit_in_bytes file if the cgroup has children or
43 // once tasks have been attached to the cgroup
44 if pathErr, ok := err.(*os.PathError); ok {
45 if errNo, ok := pathErr.Err.(syscall.Errno); ok {
46 if errNo == unix.EBUSY {
47 return fmt.Errorf("failed to set %s, because either tasks have already joined this cgroup or it has children", cgroupKernelMemoryLimit)
48 }
49 }
50 }
51 return fmt.Errorf("failed to write %v to %v: %v", kernelMemoryLimit, cgroupKernelMemoryLimit, err)
52 }
53 return nil
54}
顺带给Makefile
增加了一个kmemtag
, 执行docker-runc -version
时 会输出
a592beb5bc4c4092b1b1bac971afed27687340c5-dirty-nokmem
或
a592beb5bc4c4092b1b1bac971afed27687340c5-dirty-kmem
替换前后docker info
输出对比
替换前 | 替换后 |
测试
经过 centos7+docker18
与centos8+docker18
场景测试,均符合预期
ps. 测试时报错 ./docker-runc: symbol lookup error: ./docker-runc: undefined symbol: seccomp_version
这个报错是因为libseccomp版本过低导致,升级到2.3以上版本可以解决。
大致替换步骤
检查目标机器上libseccomp版本,低于2.3 ,需要升级libseccomp
rpm -qa |grep libseccomp yum update -y libseccomp
替换docker-runc (建议替换前停止kubelet,防止替换过程中有容器启动,替换后再启动kubelet)
cd /root/ && wget 10.85.99.38:8008/docker-runc -O docker-runc && chmod u+x docker-runc && mv /usr/bin/docker-runc{,.bak} && mv docker-runc /usr/bin/docker-runc
检查runc 版本
docker-runc -version |awk '{if ($0 ~ "commit") print $2}'
预期输出
a592beb5bc4c4092b1b1bac971afed27687340c5-dirty-nokmem
其他检查项
a. 存量容器容器未发生重建 b. centos7上新建容器kmem已关闭 c. node 状态为ready d. kubelet 状态为running