Linux container technology has steadily evolved since mount namespaces first appeared in kernel 2.4.19 in 2002. Namespaces provide process isolation, while Cgroups (Control Groups) provide resource control. Together, they form a core foundation of modern cloud infrastructure. Container runtimes such as Docker, Kubernetes, and Podman rely on these two kernel features to provide isolation that is much lighter and faster than virtual machines, and understanding them is the first step toward a deeper grasp of container technology.
Historical Background of Container Technology
Why Are Containers Needed?
Traditional virtual machines emulate entire hardware to run complete operating systems, resulting in large resource overhead and long startup times. Containers share the host kernel while providing process-level isolation, achieving the same isolation effect with millisecond-level fast startup and minimal resource usage.
The concept of Linux namespaces was inspired by the Plan 9 operating system from Bell Labs. The first implementation appeared as the mount namespace in Linux kernel 2.4.19 in 2002. Broader expansion began in 2006: PID and network namespaces were added in 2007, and memory cgroups appeared in 2008 to strengthen resource control. The technical foundation for full container support was completed when user namespace was introduced in kernel 3.8. LXC (Linux Containers) exposed these features through user tools in 2008, and Docker popularized container technology in 2013 by combining them with image-building and deployment workflows.
Namespaces: Isolation of Kernel Resources
What is a Namespace?
A namespace is a Linux kernel feature that partitions specific system resources by process group, making each group appear to have its own independent instance of that resource. Unlike virtual machines that virtualize hardware, namespaces partition kernel functionality itself to provide lighter and more efficient isolation.
The current Linux kernel (6.1 and above) provides 8 types of namespaces, each responsible for isolating specific system resources. Container runtimes combine the namespaces they need when creating containers, giving processes inside each container their own view of those resources.
Detailed Explanation by Namespace Type
| Namespace | Isolation Target | Kernel Version | Description |
|---|---|---|---|
| Mount (mnt) | Filesystem mount points | Kernel 2.4.19 (2002) | Each namespace has independent mount point list |
| UTS | Hostname, domain name | Kernel 2.6.19 (2006) | Different hostname per container possible |
| IPC | System V IPC, POSIX message queues | Kernel 2.6.19 (2006) | Semaphores, message queues, shared memory isolation |
| PID | Process IDs | Kernel 2.6.24 (2008) | Independent numbering starting from PID 1 per namespace |
| Network (net) | Network stack | Kernel 2.6.29 (2009) | IP addresses, routing tables, sockets, firewall rules isolation |
| User | UID/GID mapping | Kernel 3.8 (2013) | Maps container root to regular user on host |
| Cgroup | Cgroup hierarchy view | Kernel 4.6 (2016) | Each container sees isolated cgroup hierarchy |
| Time | System time | Kernel 5.6 (2020) | Different system time per process possible |
How PID Namespace Works
The first process created in a PID namespace is assigned PID 1 and receives the same special treatment as the traditional init process. When this process terminates, all processes in that namespace terminate immediately. Orphaned processes (processes whose parent has terminated) are re-parented to this PID 1 process, allowing each container to have its own process tree. Nested PID namespaces can be used, making it possible to run another container inside a container.
How Network Namespace Works
When a network namespace is created, it initially contains only the loopback interface (lo), and to communicate externally, a virtual network interface (veth pair) must be created and connected to the host namespace. Each network namespace has independent IP addresses, routing tables, iptables rules, and sockets. Physical or virtual network interfaces can belong to exactly one namespace but can be moved between namespaces. Docker implements network connectivity between containers and host using veth pairs in bridge network mode.
User Namespace and Security
User namespace plays the most important role in container security, implementing privilege separation by mapping UID/GID inside the container to different UID/GID on the host. For example, a process running as root (UID 0) inside a container can be mapped to a non-privileged user like UID 100000 on the host, so even if a container escape attack succeeds, it cannot gain root privileges on the host. This feature is called “rootless containers,” with Podman using this mode by default and Docker also supporting rootless mode.
Cgroups: Resource Allocation and Limitation
What are Cgroups?
Cgroups (Control Groups) are a Linux kernel feature that limit, account for, and isolate the resource usage of process groups. Developed by Google engineers in 2007, they were merged into kernel 2.6.24. Cgroups can finely control resources such as CPU, memory, disk I/O, and network bandwidth, and they are the core mechanism for preventing the “noisy neighbor” problem in container environments.
Cgroups are organized in a hierarchical structure, with child cgroups inheriting resource limits from parent cgroups. Each cgroup can contain multiple processes, and each process belongs to exactly one cgroup. Resource controllers (subsystems) perform actual resource limiting, with major controllers including cpu, cpuacct, memory, blkio, net_cls, and pids.
Major Cgroups Resource Controllers
| Controller | Function | Key Parameters |
|---|---|---|
| cpu | CPU time allocation ratio adjustment | cpu.shares, cpu.cfs_quota_us |
| cpuacct | CPU usage accounting | cpuacct.usage, cpuacct.stat |
| memory | Memory usage limitation | memory.limit_in_bytes, memory.soft_limit_in_bytes |
| blkio | Block I/O bandwidth limitation | blkio.throttle.read_bps_device |
| pids | Process count limitation | pids.max |
| devices | Device access control | devices.allow, devices.deny |
Differences Between Cgroups v1 and v2
Cgroups v1 allowed each resource controller to have a separate hierarchy, providing configuration flexibility but increasing complexity and reducing consistency between controllers. Cgroups v2 was introduced in kernel 4.5 in 2016 and uses a single unified hierarchy in which all controllers operate together, simplifying management. In v2, processes can only be attached to leaf nodes, control is applied at the process level rather than per thread, and the memory controller supports hierarchical memory limits by default.
| Characteristic | Cgroups v1 | Cgroups v2 |
|---|---|---|
| Hierarchy | Multiple hierarchies per controller | Single unified hierarchy |
| Process Attachment | All nodes possible | Leaf nodes only |
| Thread Support | Per-thread cgroup allocation possible | Process granularity only |
| Memory Hierarchy | Optional | Default support |
| Kubernetes Support | Maintenance mode (v1.31~) | Recommended |
The Kubernetes community transitioned cgroups v1 support to maintenance mode starting from v1.31, and RHEL 10 only supports cgroups v2. Using cgroups v2 is recommended for new deployments.
Docker and Container Isolation
Docker uses mount, UTS, IPC, PID, and network namespaces by default when creating containers, with user namespace optionally enabled for enhanced security. When a container starts, the Docker daemon creates a dedicated namespace set and cgroup for that container, and runc (OCI runtime) actually calls the kernel interface to configure the isolation environment.

Docker’s Resource Limiting Options
Docker abstracts cgroups to allow resource limits to be set with simple flags, using options like --memory, --cpus, and --blkio-weight in the docker run command to control resources per container. For example, to limit memory to 512MB and CPU to 1.5 cores, use the --memory=512m --cpus=1.5 flags, and these settings are directly reflected in the cgroup parameters for that container.
How to Verify Isolation Mechanisms
To verify namespaces and cgroups at the system level, use the lsns command to query all namespaces on the current system, and check specific process namespace symbolic links in the /proc/<PID>/ns/ directory. Cgroups settings can be verified in the /sys/fs/cgroup/ directory, and on systems using cgroups v2, resource settings for each service can be viewed under /sys/fs/cgroup/system.slice/. For Docker containers, the docker inspect command can be used to check that container’s namespace IDs and cgroup paths.
Security Considerations and Limitations
Fundamental Limitations of Containers
Unlike virtual machines, containers share the host kernel, so when kernel vulnerabilities are discovered, all containers can be affected. As the saying goes, “Containers don’t contain” - namespaces and cgroups alone cannot provide complete security boundaries, and additional security layers are needed.
Beyond the risks that come from sharing the kernel, default settings often place no limit on process creation, which leaves containers vulnerable to resource-exhaustion attacks such as fork bombs. Improper cgroups settings can also cause noisy neighbor problems, where one container consumes excessive resources and degrades the performance of others. Additionally, when containers are configured to access privileged host resources through the --privileged flag, isolation is effectively neutralized.
Security Enhancement Methods
A defense-in-depth approach is needed to strengthen container security. Enable user namespaces so container root maps to a non-privileged user on the host. Use MAC (Mandatory Access Control) systems such as AppArmor or SELinux to restrict process behavior. Apply seccomp profiles to limit containers to approved system calls, remove unnecessary Linux capabilities, and use read-only filesystems alongside the principle of least privilege. eBPF-based runtime security tools (Falco, Cilium, Tetragon, etc.) can detect and block anomalous container behavior in real time.
| Security Layer | Technology | Effect |
|---|---|---|
| User Isolation | User namespace, Rootless mode | Limit host privileges on container escape |
| Access Control | AppArmor, SELinux | Restrict process behavior |
| System Call Filtering | Seccomp | Block dangerous system calls |
| Capability Limitation | –cap-drop | Remove unnecessary privileges |
| Runtime Security | eBPF, Falco | Real-time anomaly detection |
Conclusion
Linux namespaces and cgroups have evolved since 2002 into the core foundation of modern container technology, and the broader container ecosystem, including Docker and Kubernetes, is built on these two kernel features. Namespaces isolate 8 types of resources, including processes, networks, filesystems, and user IDs, so each container can behave like an independent system. Cgroups allocate and limit CPU, memory, and I/O resources to support fair sharing and prevent noisy neighbor issues. Because containers still share the host kernel, they do not provide complete security isolation on their own, so combining user namespaces, MAC, seccomp, and eBPF is essential for a more secure container environment.