GitHub - rootless-containers/rootlesskit: Linux-native "fake root" for implementing rootless containers (original) (raw)

RootlessKit: Linux-native fakeroot using user namespaces

RootlessKit is a Linux-native implementation of "fake root" using user_namespaces(7).

The purpose of RootlessKit is to run Docker and Kubernetes as an unprivileged user (known as "Rootless mode"), so as to protect the real root on the host from potential container-breakout attacks.

What RootlessKit actually does

RootlessKit creates user_namespaces(7) and mount_namespaces(7), and executes newuidmap(1)/newgidmap(1) along with subuid(5) and subgid(5).

RootlessKit also supports isolating network_namespaces(7) with userspace NAT using "slirp". Kernel-mode NAT using SUID-enabled lxc-user-nic(1) is also experimentally supported.

Similar projects

Tools based on LD_PRELOAD (not enough to run rootless containers and yet lacks support for static binaries):

Tools based on ptrace(2) (not enough to run rootless containers and yet slow):

Tools based on user_namespaces(7) (as in RootlessKit, but without support for --copy-up, --net, ...):

Projects using RootlessKit

Container engines:

Container image builders:

Kubernetes distributions:

Setup

Run make && sudo make install .

The following binaries will be installed:

Requirements

subuid

$ id -u 1001 $ whoami penguin $ grep "^$(whoami):" /etc/subuid penguin:231072:65536 $ grep "^$(whoami):" /etc/subgid penguin:231072:65536

See also https://rootlesscontaine.rs/getting-started/common/subuid/

sysctl

Some distros require setting up sysctl:

To persist sysctl configurations, edit /etc/sysctl.conf or add a file under /etc/sysctl.d.

See also https://rootlesscontaine.rs/getting-started/common/sysctl/

Usage

Inside rootlesskit bash, your UID is mapped to 0 but it is not the real root:

(host)$ rootlesskit bash (rootlesskit)# id uid=0(root) gid=0(root) groups=0(root),65534(nogroup) (rootlesskit)# ls -l /etc/shadow -rw-r----- 1 nobody nogroup 1050 Aug 21 19:02 /etc/shadow (rootlesskit)# cat /etc/shadow cat: /etc/shadow: Permission denied

Environment variables are kept untouched:

(host)$ rootlesskit bash (rootlesskit)# echo $USER penguin (rootlesskit)# echo $HOME /home/penguin (rootlesskit)# echo $XDG_RUNTIME_DIR /run/user/1001

Filesystems can be isolated from the host with --copy-up:

(host)$ rootlesskit --copy-up=/etc bash (rootlesskit)# rm /etc/resolv.conf (rootlesskit)# vi /etc/resolv.conf

You can even create network namespaces with Slirp:

(host)$ rootlesskit --copy-up=/etc --copy-up=/run --net=slirp4netns --disable-host-loopback bash (rootleesskit)# ip netns add foo ...

Full CLI options

$ rootlesskit --help NAME: rootlesskit - Linux-native fakeroot using user namespaces

USAGE: rootlesskit [global options] [arguments...]

VERSION: 2.0.0-alpha.0

DESCRIPTION: RootlessKit is a Linux-native implementation of "fake root" using user_namespaces(7).

Web site: https://github.com/rootless-containers/rootlesskit

Examples: # spawn a shell with a new user namespace and a mount namespace rootlesskit bash

 # make /etc writable
 rootlesskit --copy-up=/etc bash

 # set mount propagation to rslave
 rootlesskit --propagation=rslave bash

 # create a network namespace with slirp4netns, and expose 80/tcp on the namespace as 8080/tcp on the host
 rootlesskit --copy-up=/etc --net=slirp4netns --disable-host-loopback --port-driver=builtin -p 127.0.0.1:8080:80/tcp bash

Note: RootlessKit requires /etc/subuid and /etc/subgid to be configured by the real root user. See https://rootlesscontaine.rs/getting-started/common/ .

OPTIONS: Misc:
--debug debug mode (default: false) --print-semver value print a version component as a decimal integer [major, minor, patch] --help, -h show help --version, -v print the version

Mount:
--copy-up value [ --copy-up value ] mount a filesystem and copy-up the contents. e.g. "--copy-up=/etc" (typically required for non-host network) --copy-up-mode value copy-up mode [tmpfs+symlink] --propagation value mount propagation [rprivate, rslave]

Network:
--net value network driver [host, pasta(experimental), slirp4netns, vpnkit, lxc-user-nic(experimental)] --mtu value MTU for non-host network (default: 65520 for pasta and slirp4netns, 1500 for others) (default: 0) --cidr value CIDR for pasta and slirp4netns networks (default: 10.0.2.0/24) --ifname value Network interface name (default: tap0 for pasta, slirp4netns, and vpnkit; eth0 for lxc-user-nic) --disable-host-loopback prohibit connecting to 127.0.0.1:* on the host namespace (default: false) --ipv6 enable IPv6 routing. Unrelated to port forwarding. Only supported for pasta and slirp4netns. (experimental) (default: false) --detach-netns detach network namespaces (default: false)

Network [lxc-user-nic]:
--lxc-user-nic-binary value path of lxc-user-nic binary for --net=lxc-user-nic --lxc-user-nic-bridge value lxc-user-nic bridge name

Network [pasta]:
--pasta-binary value path of pasta binary for --net=pasta

Network [slirp4netns]:
--slirp4netns-binary value path of slirp4netns binary for --net=slirp4netns --slirp4netns-sandbox value enable slirp4netns sandbox (experimental) [auto, true, false] (the default is planned to be "auto" in future) --slirp4netns-seccomp value enable slirp4netns seccomp (experimental) [auto, true, false] (the default is planned to be "auto" in future)

Network [vpnkit]:
--vpnkit-binary value path of VPNKit binary for --net=vpnkit

Port:
--port-driver value port driver for non-host network. [none, implicit (for pasta), builtin, slirp4netns] --publish value, -p value [ --publish value, -p value ] publish ports. e.g. "127.0.0.1:8080:80/tcp"

Process:
--pidns create a PID namespace (default: false) --cgroupns create a cgroup namespace (default: false) --utsns create a UTS namespace (default: false) --ipcns create an IPC namespace (default: false) --reaper value enable process reaper. Requires --pidns. [auto,true,false] --evacuate-cgroup2 value evacuate processes into the specified subgroup. Requires --pidns and --cgroupns

State:
--state-dir value state directory

SubID:
--subid-source value the source of the subids. "dynamic" executes /usr/bin/getsubids. "static" reads /etc/{subuid,subgid}. [auto,dynamic,static]

State directory

The following files will be created in the state directory, which can be specified with --state-dir:

If --state-dir is not specified, RootlessKit creates a temporary state directory on /tmp and removes it on exit.

Undocumented files are subject to change.

Environment variables

The following environment variables will be set for the child process:

Undocumented environment variables are subject to change.

Additional documents