My Thoughts on Docker🔗

Docker is a tool that uses kernel level namespaces to isolate processes from each other. This tool allows users to create a consistent* runtime environment for their program that can run on any computer with the right kernel and hardware without having to consider the rest of the userspace. This is what most people, including me, would largely consider a good thing.

In addition to this, docker has evolved into the OCI-Standards, and you can now run docker containers with a myriad of tools, such as podman or kubernetes. In this post I'm going to argue quite heavilly against using docker for "normal" small server deployments. If you have a microservice oriented software stack which you designed to to run on a big cluster orchestrated via containers, most of these arguments won't apply. But that's not what most people using docker in a homelab environment do. I'm also not going to talk about swarm and bigger cluster orchestration services. They require setups too homogenic for it to be useful to me personally, I might do a follow up post on why I feel this way.

What does docker provide?🔗

As mentioned before, docker provides namespace isolation, it's important to note that this is not the same as providing security. It probably helps if you set it up correctly, but it's not explicitly designed to provide security between the host platform and the running environment. To accomplish this you need some sort of hypervisor.

The containers get spun up via container images, these images can be downloaded from the internet or be built by you.

Docker provides a way to build container images via a "Dockerfile" which provides a little recipe docker can follow to build the required environment and configure the command which should run on start-up.

When it comes container orchestration, docker allows you to manage your containers, start new ones, delete old ones, attach filesystems - "volumes" in docker-speak, stop them, set up networking connections and so on. It has a few different ways of accomplishing this but the most used one is probably docker compose.

docker-compose files are a declerative way to define what containers should be running, what volumes they should have, virtual networks they should each be in, and the whole #!.

Declarative systems like this makes it very easy to move the configuration from one place to another. It means you can just move the docker-compose.yml and whatever volumes you had for your state, and everything will hopefully just work!

Problems with Docker🔗

I used docker for my homelab for a couple years and was heavilly invested in the ecosystem.

Nowadays very little of that setup remains. And I will touch upon some of the reasons why here.

Yo dawg I heard you like init Systems🔗

Docker requires an always running daemon which manages the state of the running containers. It'll do things like restart containers if they crash. It will also make sure the right containers are started when your computer boots up, and probably lots of other internally important tasks.

That should probably remind you of another daemon on your system which do many of those same tasks, namely systemd. We'll get back to that later.

Dependency management🔗

In any kind of infrastructure there are a lot of interconnected systems, which very often depend on each other: A network interface needs an ip adress, a filesystem needs mounting, a database needs to go through an initialization sequence and become available.

Docker has a good answer to dependencies on other containers (often limited to whether or not the container is running, not whether it's healthy), but have little to no knowledge of any externalities. This will lead to problems if you let docker decide whether or not start your container.

You already have an init system that knows the status of most everything in your system, wouldn't it be nice if you could use that knowledge to start your services?

There are even docker images out there which themseselves include an systemd or another light-weight init system. Some alarm bells should probably be ringing once you're three levels deep.

Again podman actually plays fairly nicely with this. It's possible to use podman in systemd units and reap the benefits of systemd as lifecycle-management and containers as namespace isolation.

Logging🔗

journald has blessed us with nice extensible binary logging, unfortunately without some workarounds docker will output all logs from its executables to stdout.

Most docker images are also set up to log with time and date, loglevel, and other such metadata in each logmessage, something we've moved away from on "real linux" since syslog.

Not being able to quickly filter on loglevels, and the "unit" responsible for the logger often being some weird cryptic container-id that changes run to run, means any docker container on your system will spam your logs. Requiring much fancier log-ingestion services that try to parse the output and reclassify it with metadata

Just using the normal logging features available with journald is a lot better.

Network mangling🔗

Docker uses virtual network card bridges, and a whole lot of them. This is kind of aesthetically unpleasing, but there's nothing directly technically wrong with it. One gotcha however is that all that port forwarding magic needs to happen in nftables before it reaches those virtual NICs. Those iptables rules get added to whatever configuration you had from before and will in most distros take priority..

That means any firewall settings you have mean nothing to docker. If you say -p 8000:8000 port 8000 will be open the internet; perhaps without you intending it. That's how it has to work so this is understandable, but it's fair to mention since it trips people up, and makes your firewall rules a bit of a mess.

Inspectability/tractability is imporatant here!

Debugging🔗

Debugging containers can be annoying. You need to optimize for image size since this has a multiplicative effect on storage, bandwidth, and time.

Unfortunately that means when you shell into a container you might just end up with a bourne-shell and not much else.

With "normal" unix tools, you can just inspect all files as root, it's not perfect, especially not when you need to debug the environment itself, but in most cases it is easier to use your familiar tools from the "outside" of the service.

User and group ids🔗

All our programs are being ran as root, or alternatively using uids managed by hand. Programs generally don't need to be ran as root, and managing uids by hand is annoying at best and impossible at worst. As some images make a lot of assumptions around this.

Managing state can also be bad, there's nothing built in to take care of chowning data to the right ids, and sharing files between containers can be very complicated.

Some of this is inherent, but systemd has good alternatives with DynamicUser=, and Group=.

Turtles all the way up?🔗

You start with an OS, but install docker to deal with the difficulty of managing correct runtimes and state. Then you start orchestrating with compose, but as you realize the shortcomings of that approach you spin up a single-node kubernetes cluster. Kubernetes turns out not to solve everything either and now you're using rancher, or some other init-ish-system like helm, or alternatively maybe something simpler like a templating language you can use with kubectl.

I think it's healthy to be skeptical of soulutions which claim to solve your woes simply by adding another layer of abstraction. Sometimes infrastructure and software just is complicated, and only adding layers to it isn't a sustainable approach. Do it right from the start and you might not need to add so many management-layers on top.

Using docker as your package manager🔗

This is why most people actually want to use docker.

Im sure if you ask a homelabber the reason they use docker, it will be be beacause it makes setting up software easy. You only need one look at docker hub to see this. A lot of the most popular images are software stacks wrapped in shell-scripts that for better or for worse make spinning up and configuring them very simple.

Poor Quality Images🔗

Images are made by anyone. Sometimes that's upstream developer of a program, who might be motivated by making the software easier to test. Sometimes it's a user who just wanted to make their usecase work. Sometimes it's a organization of packagers, like linuxserver.io

This leads to users having a lot of choice, which can be good! But it in practice also means it's on the users to make sure they're using a good image which is kept up to date, is secure, and exposes the right configuration options.

There are no standards here, so each container you spin up comes with their own gotchas, and sometimes act like an entirely new ecosystem to dive into.

This isn't a big deal when you start, but becomes one whenever something goes wrong.

Not to mention having to validate all the image providers for trustability!

Updates🔗

Updating software is a very important aspect of package-management. Gernerally the simplest way is to rely on the image tag. latest, or if you're lucky the maintainer of your image could be making tags like <major>-<flavor> allowing you to pin your images to certain channels. Then using docker pull to re-deploy your software. These tags are different for every image you use.

That's not the whole story however. All software has dependencies, and these also require updating. This relationship isn't captured by docker infrastructure. The program you're running might not have gotten an update in a couple of months - so there aren't any new releases of the image either. That means you end up running months old versions of openssl, or glibc, or any other library or program your software depends on.

Relying on each image maintainer to rebuild images routinely is a poor solution for a package-management system (something docker therefore does not replace!).

Inspectability🔗

This is related to the issues above. Whenever there is a security issue somewhere in the stack, how can you make sure you're patched?

You have to look into how the end-user software is built and included in the container:

From another package-manager? Built from source then cleaned up? Downloaded from a binary release somewhere?

Then your containers are built at different times, was the patch included in the repository of the base-image at that time?

Then your containers are built with different base-images following different release channels, or even completely different package-managers and repositories!

This information isn't recorded anywhere and you must piece together the information from Dockerfiles (if you have them), upstream repos, logs, and historical time information.

What to use instead🔗

I want to say, you should be using NixOS... The union of systemd and Nix solves all the problems docker should solve while also avoiding the pitfalls of the solution.

This is a hard ask though, of beginner homelabbers, or people who just need their stuff to "work".

Maybe the answer is to just not deploy so much software, or prefering things available in your distribution. Or better yet making your own images/packages.

In any case you probably need to learn more about the systems that run your applications, and take more direct ownership of it.

Not because doing so is something extra docker provides, but because these are things you should be doing, which docker does not!