Breaking down containers

Gabe Weiss
9 min readFeb 6, 2020
No Docker whales were harmed in the creation of this image.

Hi friends!

This post is an addendum to a blog post I wrote for running a Kubernetes application with the Cloud SQL Proxy running in a sidecar.

In writing that blog, I found myself bouncing all around in order to figure out enough about containers from a position of “I don’t know anything about containers or Kubernetes” (I know, I know, I’ve lived a sheltered life) to being able to confidently get an application running in the Cloud using these tools.

What I want to do is break down creating a container for two very different applications. The first is from a codelab about connecting Kubernetes applications to Cloud SQL instance. The second is a Python script I wrote to dump randomized employee data on a MySQL instance.

I work mostly on MacOS these days. If you also work on MacOS, install docker for desktop. Linux has Docker natively, and if you’re on Windows, head here.

Trouble-shooting note if you’re on MacOS: The documentation for Docker does call out that their desktop for MacOS enables a new “HyperKit” to run Docker containers. What it doesn’t say (at least not in an easily found location) is that this means Docker is running an embedded hypervisor to run the container. Because the Docker container is running in a hypervisor, it means it’s isolated from your networking on your Mac. So if you setup the local Cloud SQL Proxy in our previous steps, our app won’t be able to connect to it when running in the container at the end of this section since the proxy runs on localhost.

Step one: Building the container, which means breaking down the Dockerfiles. The two I’m talking about are the codelab’s, and mine.

The reason I’m breaking down two, is I’ve always found it a bit easier to understand which pieces I need to change for my own apps when I have two data points on how to do something. The codelab configuration, which is what I’m breaking down first, is more complex even though there’s close to the same number of lines in the Dockerfile as my app’s.

Codelab DockerfileFROM tiangolo/uwsgi-nginx-flask:python3.6# Customize uWSGI webserver port
ENV LISTEN_PORT 8080
EXPOSE 8080
# Copy App and Install requirements
COPY ./app /app
RUN pip install -r /app/requirements.txt
# Customize Postgres Connection
#ENV DB_HOST 127.0.0.1
#ENV DB_USER postgres
#ENV DB_PASS password
#ENV DB_NAME memegen

The FROM line is your baseline image. Same as if you’re creating a virtual machine, and you want to start with an image beyond basic Linux, this is where you define it.

FROM tiangolo/uwsgi-nginx-flask:python3.6 is specifying someone else’s container image from Docker Hub. This particular configuration does a lot, which is why this Dockerfile is more complex. It’s inheriting a ton of behavior.

The short-ish explanation is that it starts up an nginx server with Flask and Python 3.6 support. Then it initializes a whole framework for running an app. If you want to dig in, it’s the uWSGI framework. It initializes and runs a main.py script, which is why you don’t see any command to run on container start in the Dockerfile (more on this is later). If you want, the Docker Hub link in the previous paragraph has a nice README that explains the framework in more detail.

# Customize uWSGI webserver port
ENV LISTEN_PORT 8080
EXPOSE 8080

This sets an environment variable used by the nginx server to customize what port it listens on. By default, that container starts listening on port 80, this just changes it to 8080 instead. If you’re already running a web server locally, there’s probably a good chance you might already be listening on 80 and/or 8080, so if you are, this would be where you can change it for the codelab example to avoid collisions. EXPOSE tells the container to allow connections to 8080 from outside. If you’re changing one, then the other needs to as well.

# Copy App and Install requirements
COPY ./app /app
RUN pip install -r /app/requirements.txt

This takes the application code from where it lives in the current directory and copies it to the /app absolute path inside the container to match the configuration of the uWSGI application. If you change this, you‘ll also need to change the callable field in the uwsgi.ini, the path for the pip install and potentially some other places. I didn’t change it, so it’s quite possible more might break. Then, it installs all the Python modules that the application uses.

# Customize Postgres Connection
#ENV DB_HOST 127.0.0.1
#ENV DB_USER postgres
#ENV DB_PASS password
#ENV DB_NAME memegen

Finally, this sets some environment variables the app uses to connect to the database itself. They’re commented out because in the codelab, when it runs the container the variables get passed in on the command line as arguments instead of reading these values.

Compare that, to the Dockerfile for my app:

My app's DockerfileFROM python:3LABEL maintainer="Gabe Weiss"COPY ./requirements.txt /app/
COPY ./mysql_faker.py /app/
WORKDIR /appRUN pip install -r requirements.txtENV DB_USER "<user>"
ENV DB_PASS "<password>"
ENV DB_NAME "<db name>"
# Note that SQL_HOST is not needed IF you're connecting to
# a localhost db or Cloud SQL Proxy AND you're not using Docker on MacOS
# Docker on MacOS uses hypervisor and doesn't share network with
# the host machine even when you set -net=host
# Uncomment SQL_HOST line and specify the IP to connect to
#ENV SQL_HOST "<database IP>"
CMD [ "python", "mysql_faker.py", "--auto", "--locations=10", "--employees=100", "--dontclean" ]

This FROM line imports an official Docker base image. You can find the full list of them here. Docker Hub has a bajillion images covering almost everything. You can limit the search to only the officially supported images, only images from verified publishers, or roll the dice and just search for everything. Suffice to say, you shouldn’t ever really need to start from nothing unless you want to. In my case, I’m starting with a base Python 3 image, so it comes installed with everything needed to use and run Python 3. I don’t have any additional dependencies for our script, so this is enough for me.

LABEL maintainer="Gabe Weiss"

LABEL is an instruction to add meta data to the image. Key value pairs. These are then visible when someone runs docker inspect against your container. It’s good practice to always add a maintainer tag. If this was “real”, the value here would be an email instead of just a name. This used to be handled by the maintainer tag for those folks that have been using Docker for awhile, but it got deprecated a couple years ago.

COPY ./requirements.txt /app/
COPY ./mysql_faker.py /app/

I’m being extra explicit here. In most Dockerfiles you’ll usually see something like: COPY . /myapp or something similar. Because I’m building straight out of the root folder of my GitHub repo, I didn’t want to copy everything over in that folder into the container since it’s not all needed. This also has the benefit of showing off how slim what I’m showing off is. No supporting files, no frameworks, just a simple Python script and a requirements folder to make running it easier.

WORKDIR /app

A lazy construct. Just sets the working directory so I didn’t have to preface my commands with /app/. This can be useful if you have a more complicated folder structure and need to bounce between folders to run stuff. You can use WORKDIR to jump around to run things in specific folders.

RUN pip install -r /app/requirements.txt

Installs the prerequisite modules for the Python app. RUN directives are run once at build time and happen only at build time, not at run time. We could do this as a CMD directive (happens at run time) instead, but it takes extra time, so it takes longer to run the container itself. It’s also a best practice to do it this way because Docker builds containers in layers, the dependencies here won’t change unless this line, or the requirements.txt itself changes. So the requirements part of your container will stay the same over time. This is important because, what happens if down the road, one of your dependencies gets deleted, or unreachable for some reason. If you didn’t run this at build time, you could break at runtime.

ENV DB_USER "<user>"
ENV DB_PASS "<password>"
ENV DB_NAME "<db name>"
# Uncomment SQL_HOST line and specify the IP to connect to
#ENV SQL_HOST "<database IP>"

These establish environment variables that will be set on container startup. Don’t forget to change the values to match up with your database. If you’re running my script against a database this isn’t local (or using the Cloud SQL Proxy against a Cloud SQL instance) then you should also uncomment the SQL_HOST variable and set it to the address of your database. Don’t forget to be sure you can reach your database instance from the machine you’re running the script on.

Reminder time… if you’re on MacOS, and you don’t change the SQL_HOST variable to the public IP address of your instance, this won’t work locally. It’ll tell you it can’t connect to your MySQL instance because what your container thinks of as localhost is the hypervisor’s localhost and not your machine’s. The two shall not meet! There’s ways around this locally using Kubernetes. These docs talk about it, and if you were to run this container in the Cloud, ultimately that solution is what ends up happening (TL;DR it runs two containers side-by-side in the same pod with shared volumes/networking).

CMD [ "python", "mysql_faker.py", "--auto", "--locations=10", "--employees=100", "--dontclean" ]

Like I mentioned before, CMD means it will be run when the container is executed, rather than at build time. First argument is the command to be run, and everything else are arguments passed in to that command. The extra flags I’m adding to my script:

  1. --auto removes the interactivity of the script asking if we want to create database/tables since I won’t be interacting with the container when it’s running
  2. --locations and --employees increases the defaults on how many locations and employees we want to create so it’s not just over before it starts. I’m setting 10 locations with 100 employees at each location, so 1,000 employees per script run.
  3. --dontclean prevents the script from starting fresh on each iteration of running the script. By default, my script will drop the employee and location tables in the MySQL instance before adding new data. When we have Kubernetes scaling out this script into multiple nodes to make inserting data into our database faster, kind of defeats the purpose to have the script deleting the data the other nodes are dumping into the db.

To test that this works, you should be able to, in the root directory of my repo, run docker build -t randomizer . to build the container. The -t randomizer gives your container a logical name to refer to for things like running it, or inspecting it. It shouldn’t take too long, and at the end things should complete without any big angry error text.

If you’re seeing errors, comment below with what you’re seeing and I can try to help and then edit the post to include what went wrong.

If you DO see that, then assuming you set the ENV variables in your Dockerfile, can run it like so:

docker run --net="host" --rm --name randomizer randomizer

The --rm removes the container once it’s done running so you aren’t left with a bunch of containers lying around on your machine.

The --name <string> tag gives your running container a logical name so you can poke it easier.

Lastly, the name of the container we gave it during the build step with the -t flag.

If all goes well, you should see messages reporting success at each step of the run. You can confirm it was successful by connecting directly to your database with something like mysql CLI, use <db name>; SELECT * FROM employee; and you should see the employee names and information.

And lo! You have a container, and it is good. Hopefully this helped demystify Dockerfiles a bit, and help get you up to speed creating your own containers.

If you came here from my Kubernetes sidecar post, head back here for the next piece of the puzzle.

If not and you got what you want, thanks for reading! Run into any problems? Please let me know! Respond in comments below, or reach out to me on Twitter. My DMs are open!

--

--

Gabe Weiss

Husband, father, actor, sword fighter, musician, gamer, developer advocate at Google. Making things that talk to the Cloud. Pronouns: He/Him