Build your first Container with this essential guide to Docker for Data Scientists
I don’t even count anymore how many times did my code break when someone else run it. The strange part was — it worked on my machine. That’s where Docker saves the day. If it works on your machine, it will work on any.
As of late 2020, knowing Docker is almost mandatory for data science jobs. No one says you should become an expert, but learning the basics can’t hurt. Today you’ll learn what Docker is and how to build your first container.
Docker is a tool that makes it easy to create, deploy, and run applications by using containers. You can package applications with their dependencies and deploy them as a single package.
Why should you care? Because saying “It works on my machine” doesn’t mean it will work on the others. With Docker containers, you can be sure that the application working on your machine will work on the others.
Think of Docker as a virtual machine without an operating system. Docker allows applications to use the same kernel as the system they are running on. As a result, you get both increase in performance and a decrease in the file size. Win-win.
Here’s a bit of Docker terminology you should know before starting:
- Container: software unit that packages the code and its dependencies
- Image: a snapshot of your container
- Dockerfile: file used to build your images
Let’s see how to build your first container.
How to build your first Docker Container
You’ll be surprised by how easy it is.
To use Docker you’ll need to install it. Download Docker Desktop from this link, install it and open up the application.
Now create the following project structure anywhere on your computer:
Let’s start with what you’re familiar with — Python. The
app.py should contain the following code:
from flask import Flask, render_template app = Flask(__name__) @app.route('/') def hello(): return render_template('index.html') if __name__ == '__main__': app.run(host='0.0.0.0', port=5000)
The whole purpose of this file is to instantiate the
Flask application and run it on
localhost:5000. Once the app is opened in the web browser, the
index.html template is shown.
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>My Python App</title> </head> <body> <h1>Python App</h1> <p>Docker is awesome!</p> </body> </html>
It’s a plain old boring file containing a heading and a paragraph. Still, enough to verify our app is working.
Next on the list is the
requirements.txt file. It contains all the libraries needed for your app. We only need
Flask, but the file can get much longer for real-world applications:
It’s a good idea to specify the library version, so you are entirely sure nothing will break in production.
And finally, we have the
Dockerfile (notice how it doesn’t have a file extension). This file is used to build Docker images. Here’s what you should put inside:
FROM python:3.8 COPY . /app WORKDIR /app RUN pip install -r requirements.txt EXPOSE 5000 CMD python ./app.py
So, what’s going on here? Here’s an overview:
FROM python:3.8– specifies we want to use official Python 3.8 Docker image as a base
COPY . /app– copies our files to the
WORKDIR /app– defines the working directory of a Docker container
RUN pip install -r requirements.txt– installs every library listed in the
EXPOSE 5000– tells Docker to listen on port 5000 at runtime
CMD python ./app.py– specifies how to run our Python application
And that’s it! You can now build and run the Docker image. Let’s build it first. From the Terminal, execute the following:
docker build --tag my-flask-app
This command will build an image called
my-flask-app. To run it, you have to execute the following:
docker run -p 5000:5000 my-flask-app
The app is running now on
localhost:5000. Let’s verify everything is okay:
Wasn’t this easy? You could now deploy this Docker container to the cloud, and it will work like it did on your machine.
Today you’ve learned what Docker is and why it is useful in data science. You’ve also built your first app and verified it works. It is by far the easiest solution to deploy applications and machine learning models to productions.
Knowing Docker is almost always a prerequisite for data science jobs. I’m not that big of a fan of data scientists doing DevOps, but learning the basics can’t hurt.
Are you a Data Scientist using Docker? Please let me know.