Build your first Container with this essential guide to Docker for Data Scientists
I don’t even count anymore how many times did my code break when someone else run it. The strange part was —it worked on my machine. That’s where Docker saves the day. If it works on your machine, it will work on any.
As of late 2020, knowing Docker is almost mandatory for data science jobs. No one says you should become an expert, but learning the basics can’t hurt. Today you’ll learn what Docker is and how to build your first container.
Docker is a tool that makes it easy to create, deploy, and run applications by using containers. You can package applications with their dependencies and deploy them as a single package.
Why should you care? Because saying “It works on my machine” doesn’t mean it will work on the others. With Docker containers, you can be sure that the application working on your machine will work on the others.
Think of Docker as a virtual machine without an operating system. Docker allows applications to use the same kernel as the system they are running on. As a result, you get both increase in performance and a decrease in the file size. Win-win.
Here’s a bit of Docker terminology you should know before starting:
- Container: software unit that packages the code and its dependencies
- Image: a snapshot of your container
- Dockerfile: file used to build your images
Let’s see how to build your first container.
How to build your first Docker Container
You’ll be surprised by how easy it is.
To use Docker you’ll need to install it. Download Docker Desktop from this link, install it and open up the application.
Now create the following project structure anywhere on your computer:
Let’s start with what you’re familiar with — Python. The app.py
should contain the following code:
from flask import Flask, render_template
app = Flask(__name__)
@app.route('/')
def hello():
return render_template('index.html')
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
The whole purpose of this file is to instantiate the Flask
application and run it on localhost:5000
. Once the app is opened in the web browser, the index.html
template is shown.
Here’s what templates/index.html
contains:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>My Python App</title>
</head>
<body>
<h1>Python App</h1>
<p>Docker is awesome!</p>
</body>
</html>
It’s a plain old boring file containing a heading and a paragraph. Still, enough to verify our app is working.
Next on the list is the requirements.txt
file. It contains all the libraries needed for your app. We only need Flask
, but the file can get much longer for real-world applications:
flask==1.1.2
It’s a good idea to specify the library version, so you are entirely sure nothing will break in production.
And finally, we have the Dockerfile
(notice how it doesn’t have a file extension). This file is used to build Docker images. Here’s what you should put inside:
FROM python:3.8
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
EXPOSE 5000
CMD python ./app.py
So, what’s going on here? Here’s an overview:
FROM python:3.8
– specifies we want to use official Python 3.8 Docker image as a baseCOPY . /app
– copies our files to the/app
folderWORKDIR /app
– defines the working directory of a Docker containerRUN pip install -r requirements.txt
– installs every library listed in therequirements.txt
fileEXPOSE 5000
– tells Docker to listen on port 5000 at runtimeCMD python ./app.py
– specifies how to run our Python application
And that’s it! You can now build and run the Docker image. Let’s build it first. From the Terminal, execute the following:
docker build –tag my-flask-app
This command will build an image called my-flask-app
. To run it, you have to execute the following:
docker run -p 5000:5000 my-flask-app
The app is running now on localhost:5000
. Let’s verify everything is okay:
Wasn’t this easy? You could now deploy this Docker container to the cloud, and it will work like it did on your machine.
Conclusion
Today you’ve learned what Docker is and why it is useful in data science. You’ve also built your first app and verified it works. It is by far the easiest solution to deploy applications and machine learning models to productions.
Knowing Docker is almost always a prerequisite for data science jobs. I’m not that big of a fan of data scientists doing DevOps, but learning the basics can’t hurt.
Are you a Data Scientist using Docker? Please let me know.