Everything I got wrong about Docker the first time

I shipped my first Dockerized app to production in 2021. It worked on my machine. It worked in staging. Then production happened.

The container ignored SIGTERM and had to be force-killed on every deploy. The image was 1.8GB. We were running as root. Our "health check" was pinging a port that responded 200 even when the database was on fire. And my docker build took 9 minutes because I invalidated the layer cache every single time by copying package.json at the wrong step.

Here's everything I learned the hard way so you don't have to.

Your image is 2GB and it doesn't need to be

This is the Dockerfile I wrote the first time. I was proud of it.

# ❌ The "it works" Dockerfile
FROM node:20
WORKDIR /app
COPY . .
RUN npm install
RUN npm run build
EXPOSE 3000
CMD ["node", "dist/index.js"]

Looks fine, right? It builds. It runs. It's also 1.8GB because node:20 is based on Debian with a full toolchain, your node_modules includes dev dependencies, and you copied your entire repo — tests, docs, .git folder, everything.

Here's what it should look like:

# ✅ Multi-stage build, production-only
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
 
FROM node:20-alpine AS production
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package*.json ./
ENV NODE_ENV=production
EXPOSE 3000
CMD ["node", "dist/index.js"]

And a .dockerignore that actually does its job:

node_modules
.git
.gitignore
*.md
dist
.env*
tests
coverage
.vscode

The multi-stage build means your final image only contains the compiled output and production dependencies. Switching from node:20 (Debian) to node:20-alpine drops the base image from ~900MB to ~130MB. With both changes, that 1.8GB image becomes ~150MB.

For Go or Rust apps, you can go even further with distroless or scratch base images — your final image can literally be just your compiled binary. I've shipped Go services at 12MB.

The PID 1 problem will haunt you

This one is subtle and it will absolutely bite you during deploys.

When your app runs as PID 1 inside a container, it doesn't get default signal handling. That means when Kubernetes (or Docker, or ECS) sends SIGTERM to gracefully shut down your container, your process just... ignores it. The orchestrator waits for the grace period, then sends SIGKILL. Your app gets murdered mid-request. Your database connections don't close cleanly. Your users see 502s.

I spent a full Saturday debugging "random" 502 errors during deployments before I figured this out.

The fix is stupid simple. Use tini or dumb-init as your entrypoint:

# ✅ Proper signal handling
FROM node:20-alpine
 
# Install tini
RUN apk add --no-cache tini
 
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package*.json ./
 
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["node", "dist/index.js"]

tini runs as PID 1, forwards signals to your app, and reaps zombie processes. Three lines of config saved me hours of debugging.

Alternatively, if you're on Docker 1.13+, you can use docker run --init which bundles tini automatically. But I prefer being explicit in the Dockerfile — it's documentation that travels with the image.

Stop running as root

Every FROM image defaults to root. That means your Node.js web server has full root access inside the container. If someone exploits a vulnerability in your app, they have root. Inside a container, sure — but container escapes are real, and defense in depth matters.

# ❌ Running as root (the default)
FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm ci --production
CMD ["node", "index.js"]
# Congrats, your web server is root

# ✅ Non-root user
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package*.json ./
 
# Create a non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
 
# Own the app directory
RUN chown -R appuser:appgroup /app
 
USER appuser
 
EXPOSE 3000
CMD ["node", "dist/index.js"]

One gotcha: if your app writes to the filesystem (logs, temp files, uploads), make sure those directories exist and are owned by your non-root user before you switch with USER. I've had containers crash on startup because the app tried to write to a directory it didn't have permissions for.

Also — if you're using the official node images, they already ship with a node user (UID 1000). You can just do USER node instead of creating your own. But I like being explicit about the group and permissions.

Layer caching: order matters more than you think

Docker caches each layer. When a layer changes, every layer after it gets rebuilt. This means the order of your Dockerfile instructions directly impacts build speed.

Here's the mistake almost everyone makes:

# ❌ Cache-busting on every code change
FROM node:20-alpine
WORKDIR /app
COPY . .              # ← ANY file change invalidates this layer
RUN npm ci            # ← So this runs every. single. time.
RUN npm run build

Every time you change a single line of code, COPY . . invalidates, and npm ci runs from scratch. On a project with heavy dependencies, that's 2-5 minutes wasted per build.

# ✅ Dependencies cached separately from source code
FROM node:20-alpine
WORKDIR /app
COPY package.json package-lock.json ./   # ← Only changes when deps change
RUN npm ci                                # ← Cached until deps change
COPY . .                                  # ← Source code changes don't bust dep cache
RUN npm run build

By copying package.json and package-lock.json first, the npm ci layer only rebuilds when your dependencies actually change. Source code changes only invalidate the COPY . . and npm run build layers. My build times went from 8 minutes to 45 seconds.

Same principle applies to any language. For Go: copy go.mod and go.sum first, run go mod download, then copy source. For Rust: copy Cargo.toml and Cargo.lock first, build dependencies, then copy source.

Health checks that don't lie

Docker and Kubernetes health checks default to "is the port responding?" That's a terrible health check. Your server can return 200 on port 3000 while your database connection is dead, your Redis cache is gone, and your app is serving stale data from memory.

# ❌ The "technically alive" health check
HEALTHCHECK CMD curl -f http://localhost:3000/ || exit 1

This tells you the HTTP server is up. It tells you nothing about whether the app is actually working.

# ✅ Health check that verifies real dependencies
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
  CMD curl -f http://localhost:3000/health || exit 1

But the real work is in your /health endpoint:

// A health check that actually means something
app.get('/health', async (req, res) => {
  try {
    // Check database
    await db.query('SELECT 1');
 
    // Check Redis
    await redis.ping();
 
    // Check any critical external service
    const uptime = process.uptime();
 
    res.status(200).json({
      status: 'healthy',
      uptime: Math.floor(uptime),
      checks: {
        database: 'connected',
        cache: 'connected',
      },
    });
  } catch (error) {
    res.status(503).json({
      status: 'unhealthy',
      error: error.message,
    });
  }
});

The --start-period=10s is important — it gives your app time to start up before Docker starts counting failures. Without it, slow-starting apps get killed in a restart loop. I learned this one on a cold Sunday morning when our Java service kept restarting because it took 8 seconds to initialize its connection pool.

Docker Compose: local dev ≠ production

I see people using the same docker-compose.yml for local development and production. Please don't. They have completely different needs.

# docker-compose.yml — LOCAL DEV
services:
  app:
    build:
      context: .
      dockerfile: Dockerfile.dev  # Dev-specific Dockerfile
    volumes:
      - .:/app                    # Hot reload via bind mount
      - /app/node_modules         # Don't override node_modules
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=development
      - DATABASE_URL=postgres://postgres:postgres@db:5432/myapp
    depends_on:
      - db
      - redis
 
  db:
    image: postgres:16-alpine
    volumes:
      - postgres_data:/var/lib/postgresql/data
    environment:
      - POSTGRES_PASSWORD=postgres  # Fine for local dev
      - POSTGRES_DB=myapp
    ports:
      - "5432:5432"                 # Expose for local tools
 
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
 
volumes:
  postgres_data:

# docker-compose.prod.yml — PRODUCTION
services:
  app:
    image: ghcr.io/myorg/myapp:${APP_VERSION}  # Pre-built image, not build
    restart: unless-stopped
    deploy:
      replicas: 2
      resources:
        limits:
          memory: 512M
          cpus: "0.5"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 5s
      retries: 3
    environment:
      - NODE_ENV=production
    env_file:
      - .env.production              # Secrets from file, not inline
    networks:
      - internal
 
  # DB and Redis are managed services in prod, not containers
 
networks:
  internal:
    driver: bridge

Key differences: local dev uses bind mounts for hot reload and builds from source. Production uses pre-built images, resource limits, health checks, restart policies, and doesn't expose ports unnecessarily. Your database should be a managed service in production, not a container with a Docker volume.

Secrets in layers: the silent leak

Here's something that caught me off guard. Every RUN, COPY, and ADD instruction creates a layer. Layers are part of the image. Anyone who can pull your image can inspect every layer.

# ❌ Your NPM token is now baked into the image forever
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
ARG NPM_TOKEN
RUN echo "//registry.npmjs.org/:_authToken=${NPM_TOKEN}" > .npmrc
RUN npm ci
RUN rm .npmrc    # ← Too late. It's in a previous layer.

That rm .npmrc does nothing for security. The .npmrc with your token is permanently in the layer created by the RUN echo command. Anyone with docker history or dive can extract it.

The fix is Docker BuildKit's --mount=type=secret:

# ✅ Secrets never touch the layer cache
# syntax=docker/dockerfile:1
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN --mount=type=secret,id=npmrc,target=/app/.npmrc \
    npm ci
COPY . .
RUN npm run build

# Build with the secret
DOCKER_BUILDKIT=1 docker build \
  --secret id=npmrc,src=$HOME/.npmrc \
  -t myapp .

The secret is mounted during the RUN command but never written to a layer. It exists only during that build step. This works for any secret — API keys, private SSH keys for Git repos, auth tokens.

For environment variables at runtime, use env_file in Compose or your orchestrator's secret management (Kubernetes Secrets, AWS Secrets Manager, Vault). Never put secrets in ENV instructions in your Dockerfile — those are also baked into layers.

The complete Dockerfile

Putting it all together — here's what a production-ready Dockerfile actually looks like:

# syntax=docker/dockerfile:1
FROM node:20-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json ./
RUN --mount=type=secret,id=npmrc,target=/app/.npmrc \
    npm ci
COPY . .
RUN npm run build
RUN npm prune --production
 
FROM node:20-alpine AS production
RUN apk add --no-cache tini curl
WORKDIR /app
 
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
 
COPY --from=builder --chown=appuser:appgroup /app/dist ./dist
COPY --from=builder --chown=appuser:appgroup /app/node_modules ./node_modules
COPY --from=builder --chown=appuser:appgroup /app/package.json ./
 
USER appuser
 
ENV NODE_ENV=production
EXPOSE 3000
 
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
  CMD curl -f http://localhost:3000/health || exit 1
 
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["node", "dist/index.js"]

Multi-stage build. Alpine base. Non-root user. Proper signal handling. Layer caching optimized. Secrets handled properly. Health check included. Under 150MB.

It's not complicated. But every single one of these things was a lesson I learned from something breaking in production. Hopefully you can skip the 2am debugging sessions and just do it right the first time.

Google Distroless Images

Minimal container images without package managers or shells — as small as it gets

Docker BuildKit Secrets

Official docs on mounting secrets during builds without leaking them into layers

Dive — Explore Docker Image Layers

Tool for inspecting every layer of your image. Great for auditing size and accidental secret leaks