Namespaces and Cgroups in Docker |BLOG.INJUN.DEV

Linux container technology has steadily evolved since mount namespaces first appeared in kernel 2.4.19 in 2002. Namespaces provide process isolation, while Cgroups (Control Groups) provide resource control. Together, they form a core foundation of modern cloud infrastructure. Container runtimes such as Docker, Kubernetes, and Podman rely on these two kernel features to provide isolation that is much lighter and faster than virtual machines, and understanding them is the first step toward a deeper grasp of container technology.

Historical Background of Container Technology

Why Are Containers Needed?
Traditional virtual machines emulate entire hardware to run complete operating systems, resulting in large resource overhead and long startup times. Containers share the host kernel while providing process-level isolation, achieving the same isolation effect with millisecond-level fast startup and minimal resource usage.

The concept of Linux namespaces was inspired by the Plan 9 operating system from Bell Labs. The first implementation appeared as the mount namespace in Linux kernel 2.4.19 in 2002. Broader expansion began in 2006: PID and network namespaces were added in 2007, and memory cgroups appeared in 2008 to strengthen resource control. The technical foundation for full container support was completed when user namespace was introduced in kernel 3.8. LXC (Linux Containers) exposed these features through user tools in 2008, and Docker popularized container technology in 2013 by combining them with image-building and deployment workflows.

Namespaces: Isolation of Kernel Resources

What is a Namespace?
A namespace is a Linux kernel feature that partitions specific system resources by process group, making each group appear to have its own independent instance of that resource. Unlike virtual machines that virtualize hardware, namespaces partition kernel functionality itself to provide lighter and more efficient isolation.

The current Linux kernel (6.1 and above) provides 8 types of namespaces, each responsible for isolating specific system resources. Container runtimes combine the namespaces they need when creating containers, giving processes inside each container their own view of those resources.

Detailed Explanation by Namespace Type

Namespace	Isolation Target	Kernel Version	Description
Mount (mnt)	Filesystem mount points	Kernel 2.4.19 (2002)	Each namespace has independent mount point list
UTS	Hostname, domain name	Kernel 2.6.19 (2006)	Different hostname per container possible
IPC	System V IPC, POSIX message queues	Kernel 2.6.19 (2006)	Semaphores, message queues, shared memory isolation
PID	Process IDs	Kernel 2.6.24 (2008)	Independent numbering starting from PID 1 per namespace
Network (net)	Network stack	Kernel 2.6.29 (2009)	IP addresses, routing tables, sockets, firewall rules isolation
User	UID/GID mapping	Kernel 3.8 (2013)	Maps container root to regular user on host
Cgroup	Cgroup hierarchy view	Kernel 4.6 (2016)	Each container sees isolated cgroup hierarchy
Time	System time	Kernel 5.6 (2020)	Different system time per process possible

How PID Namespace Works

The first process created in a PID namespace is assigned PID 1 and receives the same special treatment as the traditional init process. When this process terminates, all processes in that namespace terminate immediately. Orphaned processes (processes whose parent has terminated) are re-parented to this PID 1 process, allowing each container to have its own process tree. Nested PID namespaces can be used, making it possible to run another container inside a container.

How Network Namespace Works

When a network namespace is created, it initially contains only the loopback interface (lo), and to communicate externally, a virtual network interface (veth pair) must be created and connected to the host namespace. Each network namespace has independent IP addresses, routing tables, iptables rules, and sockets. Physical or virtual network interfaces can belong to exactly one namespace but can be moved between namespaces. Docker implements network connectivity between containers and host using veth pairs in bridge network mode.

User Namespace and Security

User namespace plays the most important role in container security, implementing privilege separation by mapping UID/GID inside the container to different UID/GID on the host. For example, a process running as root (UID 0) inside a container can be mapped to a non-privileged user like UID 100000 on the host, so even if a container escape attack succeeds, it cannot gain root privileges on the host. This feature is called “rootless containers,” with Podman using this mode by default and Docker also supporting rootless mode.

Cgroups: Resource Allocation and Limitation

What are Cgroups?
Cgroups (Control Groups) are a Linux kernel feature that limit, account for, and isolate the resource usage of process groups. Developed by Google engineers in 2007, they were merged into kernel 2.6.24. Cgroups can finely control resources such as CPU, memory, disk I/O, and network bandwidth, and they are the core mechanism for preventing the “noisy neighbor” problem in container environments.

Cgroups are organized in a hierarchical structure, with child cgroups inheriting resource limits from parent cgroups. Each cgroup can contain multiple processes, and each process belongs to exactly one cgroup. Resource controllers (subsystems) perform actual resource limiting, with major controllers including cpu, cpuacct, memory, blkio, net_cls, and pids.

Major Cgroups Resource Controllers

Controller	Function	Key Parameters
cpu	CPU time allocation ratio adjustment	cpu.shares, cpu.cfs_quota_us
cpuacct	CPU usage accounting	cpuacct.usage, cpuacct.stat
memory	Memory usage limitation	memory.limit_in_bytes, memory.soft_limit_in_bytes
blkio	Block I/O bandwidth limitation	blkio.throttle.read_bps_device
pids	Process count limitation	pids.max
devices	Device access control	devices.allow, devices.deny

Differences Between Cgroups v1 and v2

Cgroups v1 allowed each resource controller to have a separate hierarchy, providing configuration flexibility but increasing complexity and reducing consistency between controllers. Cgroups v2 was introduced in kernel 4.5 in 2016 and uses a single unified hierarchy in which all controllers operate together, simplifying management. In v2, processes can only be attached to leaf nodes, control is applied at the process level rather than per thread, and the memory controller supports hierarchical memory limits by default.

Characteristic	Cgroups v1	Cgroups v2
Hierarchy	Multiple hierarchies per controller	Single unified hierarchy
Process Attachment	All nodes possible	Leaf nodes only
Thread Support	Per-thread cgroup allocation possible	Process granularity only
Memory Hierarchy	Optional	Default support
Kubernetes Support	Maintenance mode (v1.31~)	Recommended

The Kubernetes community transitioned cgroups v1 support to maintenance mode starting from v1.31, and RHEL 10 only supports cgroups v2. Using cgroups v2 is recommended for new deployments.

Docker and Container Isolation

Docker uses mount, UTS, IPC, PID, and network namespaces by default when creating containers, with user namespace optionally enabled for enhanced security. When a container starts, the Docker daemon creates a dedicated namespace set and cgroup for that container, and runc (OCI runtime) actually calls the kernel interface to configure the isolation environment.

Docker Container Isolation Architecture

Docker’s Resource Limiting Options

Docker abstracts cgroups to allow resource limits to be set with simple flags, using options like --memory, --cpus, and --blkio-weight in the docker run command to control resources per container. For example, to limit memory to 512MB and CPU to 1.5 cores, use the --memory=512m --cpus=1.5 flags, and these settings are directly reflected in the cgroup parameters for that container.

How to Verify Isolation Mechanisms

To verify namespaces and cgroups at the system level, use the lsns command to query all namespaces on the current system, and check specific process namespace symbolic links in the /proc/<PID>/ns/ directory. Cgroups settings can be verified in the /sys/fs/cgroup/ directory, and on systems using cgroups v2, resource settings for each service can be viewed under /sys/fs/cgroup/system.slice/. For Docker containers, the docker inspect command can be used to check that container’s namespace IDs and cgroup paths.

Security Considerations and Limitations

Fundamental Limitations of Containers
Unlike virtual machines, containers share the host kernel, so when kernel vulnerabilities are discovered, all containers can be affected. As the saying goes, “Containers don’t contain” - namespaces and cgroups alone cannot provide complete security boundaries, and additional security layers are needed.

Beyond the risks that come from sharing the kernel, default settings often place no limit on process creation, which leaves containers vulnerable to resource-exhaustion attacks such as fork bombs. Improper cgroups settings can also cause noisy neighbor problems, where one container consumes excessive resources and degrades the performance of others. Additionally, when containers are configured to access privileged host resources through the --privileged flag, isolation is effectively neutralized.

Security Enhancement Methods

A defense-in-depth approach is needed to strengthen container security. Enable user namespaces so container root maps to a non-privileged user on the host. Use MAC (Mandatory Access Control) systems such as AppArmor or SELinux to restrict process behavior. Apply seccomp profiles to limit containers to approved system calls, remove unnecessary Linux capabilities, and use read-only filesystems alongside the principle of least privilege. eBPF-based runtime security tools (Falco, Cilium, Tetragon, etc.) can detect and block anomalous container behavior in real time.

Security Layer	Technology	Effect
User Isolation	User namespace, Rootless mode	Limit host privileges on container escape
Access Control	AppArmor, SELinux	Restrict process behavior
System Call Filtering	Seccomp	Block dangerous system calls
Capability Limitation	–cap-drop	Remove unnecessary privileges
Runtime Security	eBPF, Falco	Real-time anomaly detection

Conclusion

Linux namespaces and cgroups have evolved since 2002 into the core foundation of modern container technology, and the broader container ecosystem, including Docker and Kubernetes, is built on these two kernel features. Namespaces isolate 8 types of resources, including processes, networks, filesystems, and user IDs, so each container can behave like an independent system. Cgroups allocate and limit CPU, memory, and I/O resources to support fair sharing and prevent noisy neighbor issues. Because containers still share the host kernel, they do not provide complete security isolation on their own, so combining user namespaces, MAC, seccomp, and eBPF is essential for a more secure container environment.

Historical Background of Container Technology#

Namespaces: Isolation of Kernel Resources#

Detailed Explanation by Namespace Type#

How PID Namespace Works#

How Network Namespace Works#

User Namespace and Security#

Cgroups: Resource Allocation and Limitation#

Major Cgroups Resource Controllers#

Differences Between Cgroups v1 and v2#

Docker and Container Isolation#

Docker’s Resource Limiting Options#

How to Verify Isolation Mechanisms#

Security Considerations and Limitations#

Security Enhancement Methods#

Conclusion#