How Docker Image Layers Work — Explained
This is simplified for beginners to understand and may be slightly inaccurate due to simplification.
When you build a Docker image, you're not just packaging code — you're creating a series of filesystem snapshots. Here's exactly how that works.
What Is a Docker Image Layer?
Imagine you install Ubuntu on a fresh, clean machine. The state of the disk at that point — all the files and directories that Ubuntu put there — is what Docker calls a layer.
Now run mkdir test on that machine. The disk state changes. That change is another layer.
A Docker image is just a stack of these layers, each one representing a snapshot of the filesystem after a specific command ran.
These snapshots are real — they are actually saved on disk. Think of it like a machine that's switched off: the filesystem is still there, all the directories and files exist on the disk, but nothing is running. The layers are like that — real saved filesystem states, just not active.
A container is when the final image comes to life. It's like switching that machine on — real RAM, real processes, an actual running filesystem.
How Docker Builds Each Layer
Take this Dockerfile:
FROM ubuntu
RUN mkdir test
Here's what Docker actually does when building this:
Layer 1 — FROM ubuntu
Docker pulls the Ubuntu image from a remote registry. This snapshot was already created by someone (the Ubuntu image maintainers) and published to the registry — Docker simply pulls and reuses it directly. No temporary container is created. Call this saved state A.
Layer 2 — RUN mkdir test
Docker takes A, starts a temporary container with it, runs mkdir test inside that container, saves the resulting filesystem state as a new snapshot B, then deletes the temporary container.
RUN specifically needs a temporary container because it can execute any command — not just filesystem changes. The container provides the isolated environment needed to run it safely.
Why COPY and ADD Are Different from RUN
COPY and ADD only change the filesystem — they copy files into the image. Unlike RUN, they don't execute arbitrary commands, so they don't need an executing environment like a temporary container. Docker can apply these changes directly to the saved snapshot. They still produce a new layer (because the filesystem changed), but no container is involved.
| Dockerfile instruction | Creates temporary container? | Creates new layer? |
|---|---|---|
FROM |
No | No (uses existing) |
RUN |
Yes | Yes |
COPY / ADD |
No | Yes |
Layer Caching: Why Builds Are Fast
Because each layer is a saved snapshot, Docker doesn't rebuild what it already has.
Say you update your Dockerfile to:
FROM ubuntu
RUN mkdir test
RUN mkdir abc
Docker goes through the instructions one by one:
FROM ubuntu— do I have a saved snapshot for this? Yes, that's A. Reuse it.RUN mkdir test— do I have a saved snapshot for Ubuntu withmkdir testapplied? Yes, that's B. Reuse it.RUN mkdir abc— do I have a saved snapshot for this on top of B? No. So Docker starts a temporary container from B, runsmkdir abc, saves the result as C, deletes the container.
This is also why when you pull or build an image, only the layers that changed — and everything after them — are downloaded or rebuilt. Layers before the change are reused.
Order Matters for Caching
What happens if you swap the order?
FROM ubuntu
RUN mkdir abc
RUN mkdir test
Docker checks its snapshots again:
FROM ubuntu— saved as A. Reuse it.RUN mkdir abc— do I have a saved snapshot for Ubuntu withmkdir abc? No. Build it fresh as D.RUN mkdir test— even though Docker has B (Ubuntu +mkdir test), it does not reuse it here. It runsmkdir teston top of D and saves a new snapshot.
Why doesn't Docker reuse B for the mkdir test step? Because it cannot guarantee the result would be the same. Consider if instead of RUN mkdir abc, the command was RUN rm -rf test — delete the test directory if it exists:
FROM ubuntu
RUN rm -rf test
RUN mkdir test
If Docker reused B (ubuntu + test dir) as the starting point and ran RUN rm -rf test on top of it, test would be deleted — giving a wrong result. The new dockerfile's requirement is that test should not be deleted, since RUN mkdir test comes after and is supposed to create it fresh. Docker has no way to know in advance whether a previous snapshot is safe to reuse after a different command ran, so it always rebuilds from the point the order changed.
The rule is: if any layer changes or shifts position, that layer and everything after it is rebuilt from scratch.
What Happens When a Container Runs
When you start a container from an image, Docker does not modify the image's saved layers. Instead it makes a copy of the final image snapshot and that copy is the container's writable layer.
The container reads and modifies this copy — the original image layers underneath remain reused as-is. When the container is deleted, this writable layer (its own snapshot created just for this container) is discarded. The image layers remain exactly as they were.
This is why running a container a hundred times never changes the image.
Summary
- An image layer is a real filesystem snapshot saved on disk — like a machine that's switched off, filesystem intact but not running
- A container is the final image switched on — running with real RAM and processes
FROMreuses an existing snapshot;RUNneeds a temporary container;COPY/ADDmodify the snapshot directly without one- Docker reuses cached snapshots layer by layer — as soon as a layer changes or moves, everything from that point is rebuilt
- Containers get a writable copy of the final image snapshot on top; the image layers themselves are never modified
What Was Simplified — For the Curious
1. Docker uses OverlayFS — each layer stores only a diff of the previous layer, not a full snapshot
The main article says Docker makes a copy of the final image snapshot for each container, and describes image layers as full snapshots. That's the right mental model to start with, but the actual storage is more efficient in both cases.
2. How the diff actually works
Say the existing layer has this structure:
abc/
pqr/
XYZ/
def
Now a new layer runs a command that creates lmn inside XYZ. The new layer's diff stores only the new thing — just the path to lmn:
abc/
pqr/
XYZ/
lmn ← only this is new
abc/, pqr/, XYZ/ appear here only as the path structure needed to reach lmn. They are not copies of those directories with all their contents. def is nowhere in this diff — it stays only in the layer below.
When a container starts (or when a temporary container is created during build), Docker merges all the layer diffs together into a single unified view — and that merged view is the complete filesystem the container sees:
abc/
pqr/
XYZ/
lmn ← from new layer
def ← from layer below
This same merging happens for both temporary containers created during build and actual containers at runtime — same mechanism either way.
Both image layers and the container's writable layer follow the same principle — each is a diff of the previous layer, storing only what changed. The difference is that image layers are fixed once created, they never change. The container's writable layer is writable and grows as the container runs — as it creates or modifies files, only those files are added to its diff. Files the container never touches are never copied — they are read directly from the image layers below. When the container is deleted, this writable layer diff is discarded.
For a deleted file, Docker stores a special hidden marker file (called a whiteout file) named .wh.filename in that layer's diff. When building the merged view, Docker sees the marker and hides that file from the layers below — making it appear deleted.
3. docker commit saves container state as a new image: You can take a running or stopped container — runtime changes included — and save its current filesystem state as a new image using docker commit. This is separate from the normal build process and generally discouraged in favour of Dockerfiles.