docker compose – Guide

text
$ docker compose up -d
[+] Running 5/5
⠿ Network infrastructure_default Created 0.1s
⠿ Container postgres-db Started 0.5s
⠿ Container redis-cache Started 0.4s
⠿ Container backend-api Error 2.1s
⠿ Container frontend-app Started 0.8s

Error response from daemon: failed to mount local volume: mount /home/sre_hell/app/config:/etc/app/config, flags: 0x1000: no such file or directory

$ docker compose logs –tail=20 backend-api
backend-api | 2024-05-22T03:14:11Z CRITICAL: Failed to load configuration from /etc/app/config/settings.yaml
backend-api | 2024-05-22T03:14:11Z CRITICAL: Traceback (most recent call last):
backend-api | 2024-05-22T03:14:11Z CRITICAL: File “/app/main.py”, line 42, in
backend-api | 2024-05-22T03:14:11Z CRITICAL: config = load_config()
backend-api | 2024-05-22T03:14:11Z CRITICAL: FileNotFoundError: [Errno 2] No such file or directory: ‘/etc/app/config/settings.yaml’
backend-api | 2024-05-22T03:14:11Z INFO: Exiting with code 1

$ docker compose ps
NAME IMAGE COMMAND SERVICE STATUS PORTS
backend-api myapp-backend “python3 main.py” backend exited (1)
frontend-app myapp-frontend “nginx -g ‘daemon of…” frontend running 0.0.0.0:80->80/tcp
postgres-db postgres:15-alpine “docker-entrypoint.s…” postgres running 5432/tcp
redis-cache redis:7-alpine “docker-entrypoint.s…” redis running 6379/tcp

Dammit. Another volume mount collision because some "full-stack" wizard decided to use absolute paths in a shared `docker-compose.yml` without checking if the directory exists on the host. It’s 4:00 AM. I’ve been awake for 72 hours because our staging environment—which some genius decided should "just run docker compose on an EC2 instance"—imploded.

## The Rant: Why Your Local Environment is a House of Cards

I am tired of hearing about "developer experience." You know what provides a good experience? A system that doesn't catch fire when you look at it sideways. For the last three days, I’ve been cleaning up the mess left behind by a team that thinks "DevOps" is a buzzword you put on a LinkedIn profile rather than a grueling discipline of managing state and failure modes. We had a "magic" setup script. It was 4,000 lines of Bash that wrapped Docker commands in a warm blanket of false security. It used `sed` to inject environment variables. It used `awk` to parse container IDs. It was a nightmare.

When that script inevitably failed because someone updated their macOS to a version that changed how `/tmp` permissions work, the entire engineering team ground to a halt. That’s when I stepped in. I stripped away the "magic." I deleted the Bash scripts. I forced everyone back to raw `docker compose`. 

Why? Because `docker compose` is the least-worst way to define a multi-container application. It isn't perfect. YAML is a formatting trap designed by people who hate parsers. But at least it’s a standard. When I look at a `docker-compose.yml` file, I don't have to wonder what a custom wrapper script is doing to my iptables. I can see the networking, the volumes, and the resource constraints—or the lack thereof, which is usually why the backend service is hitting Exit 137 (OOMKilled) because someone tried to run a Java heap inside a 512MB container.

The problem isn't the tool; it's the philosophy. The "Move Fast and Break Things" crowd treats the local environment like a playground. I treat it like a laboratory. If your local setup doesn't mirror the failure modes of production, your local setup is a lie. If you aren't using `healthcheck` keys, you aren't doing local development; you're just hoping for the best. If you aren't defining `mem_limit`, you're just waiting for a memory leak to freeze your entire laptop. I rebuilt our stack using `docker compose` because I needed a declarative source of truth that didn't require a PhD in shell scripting to debug. I needed to see exactly how the Redis sidecar was talking to the main API, and I needed to ensure that the Postgres container actually waited for the migrations to finish before the backend started spamming connection requests.

## The Deep Dive: Anatomy of a Stable YAML

Let’s look at how a professional sets up a multi-container environment. We aren't using the "version 3" syntax anymore—that’s deprecated. We are using the Compose Specification. If I see `version: '3.8'` at the top of your file, I already know you haven't read the documentation in two years.

Here is the base layer of a functional backend service. Notice the lack of "latest" tags. If you use `image: postgres:latest`, you are a liability to this company.

```yaml
services:
  postgres:
    image: postgres:15.6-bookworm # Specific version, Debian-based for stability
    container_name: postgres-db
    environment:
      POSTGRES_USER: ${DB_USER:-postgres}
      POSTGRES_PASSWORD: ${DB_PASSWORD?Error: DB_PASSWORD not set}
      POSTGRES_DB: application_db
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5
    networks:
      - backend-network
    deploy:
      resources:
        limits:
          memory: 1gb

  backend:
    build:
      context: ./backend
      dockerfile: Dockerfile.dev
      target: development
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_started
    env_file: .env
    volumes:
      - ./backend:/app:delegated # 'delegated' for macOS performance
    networks:
      - backend-network
    ports:
      - "8080:8080"

Let’s break down why this doesn’t suck. First, the healthcheck. Most developers just use depends_on: - postgres. That only tells Docker to wait until the Postgres container is running. It doesn’t mean the database is ready to accept connections. The condition: service_healthy flag ensures the backend doesn’t even attempt to start until pg_isready returns a 0 exit code.

Second, the environment variables. Using ${VAR?error} is a lifesaver. It forces the docker compose command to fail immediately if a required variable is missing from the .env file, rather than letting the container start and crash with an obscure “Connection Refused” error because the password was an empty string.

Third, the delegated flag on the volume. If you are on macOS, the virtiofs or gRPC FUSE overhead for file syncing is a performance killer. Using delegated tells Docker that the host’s view is authoritative and it’s okay if the container’s view is slightly out of sync for a few milliseconds. It’s the difference between a 2-second hot reload and a 20-second one.

The Networking Loophole That Cost Me My Saturday

Docker networking is where dreams go to die. By default, docker compose creates a single bridge network. That’s fine for a “Hello World” app, but we are running a complex stack. I’ve seen developers try to connect to localhost:5432 from inside a container to reach the database. That doesn’t work. localhost inside a container is the container itself.

Then you have the people who use network_mode: host. Don’t do that. You’re throwing away the entire isolation layer because you’re too lazy to figure out DNS resolution.

networks:
  backend-network:
    driver: bridge
    ipam:
      config:
        - subnet: 172.20.0.0/24

services:
  redis:
    image: redis:7.2-alpine
    networks:
      backend-network:
        ipv4_address: 172.20.0.10 # Static IP for legacy services that hate DNS

I prefer using service names for discovery, but occasionally you deal with legacy garbage that requires static IPs. The real “gotcha” here is the MTU (Maximum Transmission Unit). If you are running Docker inside a VM or over a VPN (like Wireguard), the default MTU of 1500 will cause packets to drop silently. You’ll spend six hours debugging why small API calls work but large JSON payloads hang indefinitely. You have to manually set the MTU in the network driver options. It’s a nightmare, and no one talks about it in the “Get Started” guides.

The “Gotchas” Gallery: Volume Permissions and Alpine Woes

If I had a dollar for every time a volume permission error broke a build, I’d retire and never look at a terminal again. On Linux, Docker runs as root. When it mounts a volume, it preserves the UID/GID of the host files. If your host user is 1000 and the container user is node (UID 1000), you’re fine. But if you’re using a specialized image where the service runs as UID 999, and you mount a folder owned by root, the service will crash with Permission Denied.

Then there’s the Alpine Linux trap. Everyone loves Alpine because it’s small. “Look at me, my image is only 5MB!” Great. Now try to run a Python library that requires C-extensions, like pandas or psycopg2. Alpine uses musl instead of glibc. You’ll spend three hours waiting for the image to compile from source, only for it to fail because of a missing header file. Use debian-slim or bookworm. The 50MB you save isn’t worth the gray hair.

# The "I hate permissions" hack
services:
  app:
    build: .
    user: "${UID:-1000}:${GID:-1000}"
    volumes:
      - .:/app

This snippet in your docker-compose.yml allows you to pass your host’s UID/GID into the container so that files created by the app aren’t owned by root. It’s a basic requirement for any sane development workflow, yet I see it missing in 90% of the repos I audit.

The Versioning War: Why I’m Staying on v2.24.0

The Docker team loves moving things around. We went from the Python-based docker-compose to the Go-based docker compose (the “Compose V2” plugin). While the performance improvement is real, the regressions are infuriating.

I am currently refusing to move the team past docker compose v2.24.0. Why? Because in later versions, they messed with how include works and how environment variables are interpolated in nested stacks. I don’t care about the “shiny new features” in v2.27.x if they break my ability to use ${VARIABLE:-default} in a sub-file.

Stability is a feature. In an SRE role, my job is to reduce variance. If I have 50 developers all running different versions of the Compose binary, I can’t guarantee that the build context will behave the same way. I’ve seen versions where docker compose up --build would randomly ignore the cache, and others where it would fail to pull updated base images. We pin the Docker Engine version, and we pin the Compose version. If you want to upgrade, you submit a PR with a 10-page justification and a full test suite run.

The Hard Truths: Compose is Not Kubernetes

Stop trying to make docker compose happen in production. I don’t care if “it’s just a small internal tool.” I don’t care if “Kubernetes is too complex.”

docker compose lacks a real orchestration loop. It doesn’t have self-healing in the way K8s does. If a node dies, Compose doesn’t care. It doesn’t have native secret management (no, mounting a .env file is not secret management). It doesn’t have horizontal pod autoscaling.

When people try to use docker compose for production, they end up writing a bunch of “glue” scripts to handle deployments, rollbacks, and monitoring. Congratulations, you’ve just built a worse version of Kubernetes using Bash and hope.

Use docker compose for what it’s good at: defining a local, reproducible environment that developers can spin up in ten seconds. Use it to ensure that the database version in dev matches the database version in prod. Use it to run integration tests in CI. But the moment that code leaves a developer’s machine, it should be packaged as an OCI-compliant image and handed off to a real orchestrator.

I’m going to go drink a fifth cup of coffee and try to figure out why the frontend-app container is suddenly reporting Exit 137. Oh, wait. I know why. Someone removed the memory limits I set, and the Webpack dev server just ate 4GB of RAM.

Back to work.

# Final sanity check configuration
services:
  frontend:
    image: node:20-slim
    working_dir: /app
    volumes:
      - ./frontend:/app
      - /app/node_modules # Anonymous volume to prevent host override
    environment:
      - NODE_ENV=development
    ports:
      - "3000:3000"
    deploy:
      resources:
        limits:
          memory: 2gb # Stop the bleeding
    stop_grace_period: 1m # Give the dev server time to shut down

If you ever remove those resources limits again, I will revoke your sudo access. I’m not joking. My pager has a very loud alarm, and I’m a very light sleeper. Fix your YAML, or I’ll fix your employment status.

Actually, I’ll just go back to sleep. Or try to. I can still see the scrolling logs behind my eyelids. Error response from daemon... Error response from daemon... It never ends. Just make sure you use docker compose down -v when you’re done. If I find one more orphaned volume cluttering up the CI runner’s disk space, there will be consequences. Technical, irritable, and very, very loud consequences.

Related Articles

Explore more insights and best practices:

Leave a Comment