Docker image optimization is a core technology that directly impacts build time reduction, deployment speed improvement, storage cost savings, and security vulnerability reduction for container-based applications. By applying techniques such as appropriate base image selection, multi-stage builds, and layer optimization, you can reduce image size by 10x or more, which also significantly affects CI/CD pipeline efficiency and cloud infrastructure costs.

Understanding Docker Image Size Problems

Why is Image Size Important?

Docker image size directly affects build time, push/pull time, container startup time, storage costs, and security attack surface. Image optimization is essential for efficient operations in production environments.

Main Causes of Image Size Increase

Typically, unoptimized Docker images become bloated for the following reasons. Understanding each cause is necessary to establish appropriate optimization strategies.

CauseDescriptionImpact
Heavy base imageUsing debian, ubuntu, etc. with full OS packagesAdds hundreds of MB ~ 1GB
Development tools includedCompilers, build tools unnecessarily included at runtimeAdds hundreds of MB
Development dependencies includeddevDependencies, test libraries includedAdds tens to hundreds of MB
Unnecessary files copied.git, node_modules, test files copiedAdds tens of MB ~ several GB
Layer inefficiencyTemporary files created in each RUN command not deletedAccumulates per layer
Cache files accumulatedapt, pip, npm caches included in imageAdds tens to hundreds of MB

Analyzing Images Before Optimization

Before starting optimization, you need to analyze the current image size and layer composition.

# Check image size
docker images myapp:latest

# Analyze size by layer
docker history myapp:latest

# Detailed analysis with dive tool
dive myapp:latest

Here is an example of a typical Node.js application Dockerfile before optimization.

FROM node:20
WORKDIR /app
COPY . .
RUN npm install
RUN npm run build
CMD ["npm", "start"]

Images built with this Dockerfile reach approximately 1.2GB ~ 1.5GB because node_modules and build artifacts are added on top of the node:20 base image (approximately 1GB).

Base Image Optimization

Importance of Base Image Selection

The base image accounts for the largest portion of final image size. Selecting the minimal base image that meets application requirements is the first step in optimization.

Comparison by Base Image Type

Image TypeSize RangeFeaturesSuitable Use Cases
Regular images (debian, ubuntu)100MB ~ 1GBFull package manager, shell, debugging toolsDevelopment environment, debugging needed
slim images (node:slim, python:slim)50MB ~ 200MBEssential runtime only, some tools removedGeneral production environment
Alpine images (node:alpine, python:alpine)5MB ~ 50MBmusl libc based, minimal packagesSize optimization critical
distroless images (gcr.io/distroless)2MB ~ 20MBNo shell, application runtime onlySecurity-critical production
scratch0MBCompletely empty imageStatic binaries (Go, Rust)

Alpine Linux Based Images

Alpine Linux is a lightweight Linux distribution based on musl libc and BusyBox. With a base image size of only about 5MB, it is widely used for Docker image optimization.

# Before: ~1GB
FROM node:20

# After: ~180MB
FROM node:20-alpine

When using Alpine images, note that it uses musl libc instead of glibc, which may cause compatibility issues with some native modules. In such cases, additional package installation may be required during build.

FROM node:20-alpine

# Install packages for native module builds
RUN apk add --no-cache python3 make g++

WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

distroless Images

Google’s distroless images are minimal images containing only the application runtime without a shell or package manager. They reduce the attack surface to enhance security and minimize image size.

# Build stage
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

# Runtime stage - using distroless
FROM gcr.io/distroless/nodejs20-debian12
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
CMD ["dist/index.js"]

Multi-Stage Builds

What is Multi-Stage Build?

Multi-stage build is a technique that uses multiple FROM instructions in a single Dockerfile to separate build and runtime environments. By excluding build tools and intermediate artifacts from the final image, it dramatically reduces image size.

Multi-Stage Build Principles

Multi-stage build is a feature introduced in Docker 17.05 that operates in the following stages:

  1. Build stage: Performs build tasks such as source code compilation, dependency installation, and test execution
  2. Runtime stage: Creates final image by copying only artifacts generated from build stage
  3. Layer separation: Each stage has independent layers, and only the last stage’s layers are included in the final image

Node.js Application Optimization

Here is a multi-stage build example for Node.js applications.

Before optimization (~1.2GB):

FROM node:20
WORKDIR /app
COPY . .
RUN npm install
RUN npm run build
CMD ["npm", "start"]

After optimization (~150MB):

# ===== Build Stage =====
FROM node:20-alpine AS builder

WORKDIR /app

# Copy dependency files first (cache utilization)
COPY package.json package-lock.json ./
RUN npm ci

# Copy source code and build
COPY . .
RUN npm run build

# Reinstall production dependencies only
RUN rm -rf node_modules && npm ci --only=production

# ===== Runtime Stage =====
FROM node:20-alpine

# Create non-root user for security
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nextjs -u 1001

WORKDIR /app

# Copy only build artifacts and production dependencies
COPY --from=builder --chown=nextjs:nodejs /app/dist ./dist
COPY --from=builder --chown=nextjs:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=nextjs:nodejs /app/package.json ./

# Switch to non-root user
USER nextjs

EXPOSE 3000
CMD ["node", "dist/index.js"]

Go Application Optimization

Go can generate static binaries, allowing the smallest images when using scratch image.

Before optimization (~800MB):

FROM golang:1.22
WORKDIR /app
COPY . .
RUN go build -o main .
CMD ["./main"]

After optimization (~10MB):

# ===== Build Stage =====
FROM golang:1.22-alpine AS builder

WORKDIR /app

# Download dependencies
COPY go.mod go.sum ./
RUN go mod download

# Copy source code and build static binary
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build \
    -ldflags="-w -s" \
    -o main .

# ===== Runtime Stage =====
FROM scratch

# Copy SSL certificates (needed for HTTPS requests)
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/

# Copy only the binary
COPY --from=builder /app/main /main

EXPOSE 8080
ENTRYPOINT ["/main"]

The -ldflags="-w -s" option removes debug information and symbol table to further reduce binary size.

Python Application Optimization

Python applications can be optimized using virtual environments.

Before optimization (~900MB):

FROM python:3.12
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "app.py"]

After optimization (~120MB):

# ===== Build Stage =====
FROM python:3.12-alpine AS builder

WORKDIR /app

# Install build tools (for native extension modules)
RUN apk add --no-cache gcc musl-dev libffi-dev

# Create virtual environment and install dependencies
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# ===== Runtime Stage =====
FROM python:3.12-alpine

# Install only runtime libraries
RUN apk add --no-cache libffi

WORKDIR /app

# Copy virtual environment
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# Copy source code
COPY . .

# Create and switch to non-root user
RUN adduser -D appuser
USER appuser

EXPOSE 8000
CMD ["python", "app.py"]

Java Application Optimization

Java applications can use runtime images containing only JRE, and additional optimization is possible by creating custom JRE with jlink.

Before optimization (~700MB):

FROM maven:3.9-eclipse-temurin-21
WORKDIR /app
COPY . .
RUN mvn package -DskipTests
CMD ["java", "-jar", "target/app.jar"]

After optimization (~150MB):

# ===== Build Stage =====
FROM maven:3.9-eclipse-temurin-21-alpine AS builder

WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline

COPY src ./src
RUN mvn package -DskipTests

# ===== Runtime Stage =====
FROM eclipse-temurin:21-jre-alpine

WORKDIR /app

# Copy only JAR file
COPY --from=builder /app/target/*.jar app.jar

# Create and switch to non-root user
RUN adduser -D appuser
USER appuser

EXPOSE 8080
ENTRYPOINT ["java", "-jar", "app.jar"]

Layer Optimization

Layer Optimization Principles

Each instruction (RUN, COPY, ADD) in a Dockerfile creates a new layer. Files added to a layer are not removed from image size even if deleted in later layers, so unnecessary files must be deleted within the same layer.

Command Consolidation

Connect multiple RUN instructions with && and delete temporary files in the same layer.

Inefficient approach:

RUN apt-get update
RUN apt-get install -y nginx curl
RUN apt-get clean
RUN rm -rf /var/lib/apt/lists/*

Optimized approach:

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        nginx \
        curl && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

Using .dockerignore

Using a .dockerignore file excludes unnecessary files from the build context, improving build speed and image size.

# Version control
.git
.gitignore

# Dependencies (reinstalled during build)
node_modules
vendor
__pycache__

# Development environment
.env.local
.env.development
*.log
.vscode
.idea

# Tests
tests
test
coverage
.nyc_output

# Documentation
docs
README.md
CHANGELOG.md

# Build artifacts (regenerated in multi-stage)
dist
build
target

Cache Cleanup

Delete package manager caches in the same layer.

# apt (Debian/Ubuntu)
RUN apt-get update && \
    apt-get install -y --no-install-recommends package && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# apk (Alpine)
RUN apk add --no-cache package

# pip (Python)
RUN pip install --no-cache-dir -r requirements.txt

# npm (Node.js)
RUN npm ci --only=production && \
    npm cache clean --force

Optimization Results Comparison

Comparing results after applying various languages and optimization techniques:

LanguageBeforeAfterReductionKey Techniques
Node.js1.2GB150MB87%Alpine + multi-stage
Go800MB10MB99%scratch + static build
Python900MB120MB87%Alpine + virtual env
Java700MB150MB79%JRE-alpine + multi-stage
Rust1.5GB8MB99%scratch + static build

Benefits of Optimization

Docker image size optimization provides the following practical benefits:

BenefitDescription
Reduced build timeCI/CD pipeline speed improvement with smaller base images and efficient layer caching
Improved deployment speedDeployment cycle improvement by reducing image push/pull time
Storage cost savingsReduced storage usage on container registries and nodes
Enhanced securityReduced attack surface and CVE vulnerabilities by removing unnecessary packages
Faster container startupImproved scale-out speed by reducing image pull time
Network bandwidth savingsEspecially effective in edge environments or bandwidth-limited environments

Conclusion

Docker image optimization can reduce image size by 10x or more by combining techniques such as appropriate base image selection, multi-stage builds, layer optimization, and cache cleanup. This leads to reduced build time, improved deployment speed, storage cost savings, and enhanced security. Since optimal strategies differ by language and framework, it is important to select optimization techniques appropriate for application characteristics and continuously monitor and improve image size.