Configuring Ubuntu 18.04 + CUDA 10.0 + NVIDIA GPU For Deep Learning With Tensorflow & OpenCV Python Bindings

Tags: linux hardware python machine learning GPU Ubuntu OpenCV Deep Learning tensorflow

Guides Exist for Ubuntu 16.04; Less so for 18.04

This guide will essentially adapt existing guides for 16.04 and address the areas where those guides need to be altered or worked around to achieve the desired results. This solution isn't as well supported as 16.04 so if you're just looking for an easy way to get started with machine learning, you should probably look elsewhere. Here we're all about breaking things and figuring out how to fix them.

This guide draws especially heavily from one written by Adrian Rosebrock over at PyImageSearch.com. And if you're reading this post, you might also be interested in his latest book, Deep Learning For Computer Vision With Python.

Connecting To Your Development Machine

I'm going to assume you'll be interacting with your development machine remotely over SSH. This is better than running everything locally anyway because running the X11 server that runs your Graphical User Interface (GUI) while you set up your development environment can eff things up.

If you have no other choice but to run these commands locally, you can switch to another virtual console by pressing:

CTRL + ALT + F3

...logging in and then running:

sudo service lightdm stop

Updating Your Environment

Make sure your package lists are up-to-date:

sudo apt-get update && sudo apt-get upgrade -y

Installing Necessary Packages

This is the first area where we start to run into differences between configuring 16.04 and 18.04.

First, let's install all of the packages that install painlessly for both systems:

sudo apt-get install build-essential cmake git unzip pkg-config
sudo apt-get install libjpeg-dev libtiff5-dev
sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev
sudo apt-get install libxvidcore-dev libx264-dev
sudo apt-get install libgtk-3-dev
sudo apt-get install libhdf5-serial-dev graphviz
sudo apt-get install libopenblas-dev libatlas-base-dev gfortran
sudo apt-get install python-tk python3-tk python-imaging-tk

The following packages require a bit more work:

  • libjasper-dev
  • libpng12-dev

libjasper-dev for Ubuntu 18.04

Thanks to a handy SE answer, we can fix this issue pretty easily:

cd ~/Downloads
wget http://security.ubuntu.com/ubuntu/pool/main/j/jasper/libjasper-dev_1.900.1-debian1-2.4ubuntu1.2_amd64.deb
wget http://security.ubuntu.com/ubuntu/pool/main/j/jasper/libjasper1_1.900.1-debian1-2.4ubuntu1.2_amd64.deb
sudo apt-get install ./libjasper-dev_1.900.1-debian1-2.4ubuntu1.2_amd64.deb ./libjasper1_1.900.1-debian1-2.4ubuntu1.2_amd64.deb

libpng12-dev for Ubuntu 18.04

This package has since been dropped. Instead, we use libpng-dev which conveniently was already installed when we installed libgtk-3-dev.

Installing Python 2.7 and Python 3 Header Files

We'll be compiling OpenCV with python binding so we'll need these development packages to do so:

sudo apt-get install python2.7-dev python3-dev

Preparing NVIDIA Drivers

We'll need to change out the default graphics drivers so we'll need to install some packages in preparation for that change:

sudo apt-get install linux-image-generic linux-image-extra-virtual
sudo apt-get install linux-source linux-headers-generic

Now we need to disable the default Nouveau driver by creating a blacklist file. Open up the nano text editor with this command:

sudo nano /etc/modprobe.d/blacklist-nouveau.conf

To this file, add the following lines:

blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off

You can now close out the editor by hitting CTRL + X, being careful to select 'Y' when prompted to save your changes.

Now we disable Nouveau in the Kernel Mode Settings (KMS):

echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf

The above command edits the file without us having to open a text editor. The first part of the command, echo, returns everything after it, in this case options nouveau modeset=0. The vertical pipe character | in the middle, indicates that we want that output to be fed into the other part of our command, in this case tee. We append that command with sudo because we need admin writes to edit the file located at /etc/modprobe.d/nouveau-kms.conf. The tee command splits what is fed to it between the given file and the terminal output so we can see what it is doing. In this case, we've also included the -a flag which instructs the tee command to append the input to the file rather than just overwriting everything in the file.

Now we update run the command that makes note of our previous changes so they will be respected at boot time:

sudo update-initramfs -u

And now we can restart the machine:

sudo reboot

Downloading CUDA Toolkit

Now we download the CUDA Toolkit from the NVIDIA website. Because we like the pain and frustration of the bleeding edge, we'll be using CUDA Toolkit 10.0 so we'll select Linux > x86_64 > Ubuntu > 18.04 > deb (network).

If you've navigated to the website on a machine that isn't the one you're configuring, just copy the link address to the download file and use wget on your Ubuntu machine:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb

Now we just install it:

sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda-10.0

We need to update our path variable so the computer knows where CUDA lives. 

Note that I'm editing my ~/.profile file which only gets read when the user logs in. You could replace the ~/.profile part of the following command with ~/.bashrc which I imagine some people would recommend, but those people aren't me.

echo "export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}" >> ~/.profile

To refresh the PATH variable in your current session, you can either close and reopen a new terminal session or use the source command:

source ~/.profile

You can check to make sure /usr/local/cuda-10.0/bin is included in your path variable by echoing it:

echo $PATH

After rebooting we can verify that things are working properly. Check that the NVIDIA Persistence Daemon is running:

sudo systemctl status nvidia-persistenced

It should say something like:

nvidia-persistenced.service - NVIDIA Persistence Daemon
  Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; static; vendor preset: enabled)
  Active: active (running)

Now check that the NVIDA driver version:

cat /proc/driver/nvidia/version

You would expect something like:

NVRM version: NVIDIA UNIX x86_64 Kernel Module  410.48  Thu Sep  6 06:36:33 CDT 2018
GCC version:  gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)

Check the version of CUDA Toolkit:

nvcc -V

It should say something like:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

Now let's get the CUDA samples and make sure they run correctly. Change into whatever directory you'd like to download them to and run:

cuda-install-samples-10.0.sh .

There should now be a directory called NVIDIA_CUDA-10.0_Samples in your current directory. Let's cd into it and compile the samples:

Note that the make command uses a single CPU core by default but will go much faster if you tell it to use more. You can find out how many cores your system has available by using the command:

lscpu | grep -E '^Thread|^Core|^Socket|^CPU\('

That will give you output like:

CPU(s):              12
Thread(s) per core:  2
Core(s) per socket:  6
Socket(s):           1

So in my make command I can say make -j6 to make things go much faster than if I had just executed make without any additional arguments.

cd NVIDIA_CUDA-10.0_Samples/
make

Once that's done, we'll execute the deviceQuery script to check if our GPU is working with CUDA. From the same directory you ran the make command, you can run the script like this:

Note the period (.) at the beginning of the command.

./bin/x86_64/linux/release/deviceQuery

This command gives you a bunch of info and at the bottom it should hopefully say:

Result = PASS

Installing cuDNN (CUDA Deep Neural Network library)

Downloading cuDNN requires the creation of an account on the developer.nvidia.com website so it's probably easier to download the files on the machine you're sshing from and then copy them over using scp.

The three downloads we're looking for are cuDNN v7.3.1 for CUDA 10.0. Specifically we'll be installing the runtime and development libraries plus the code samples and user guide so look for:

  • cuDNN v7.3.1 Runtime Library for Ubuntu18.04 (Deb)
  • cuDNN v7.3.1 Developer Library for Ubuntu18.04 (Deb)
  • cuDNN v7.3.1 Code Samples and User Guide for Ubuntu18.04 (Deb)

When you have all of those downloaded, you can copy them to your development machine. You'll need to replace < user > and < dev-machine-ip > with their appropriate values. Make sure you run this command from the same directory you downloaded the files to.

scp libcudnn7* < user >@< dev-machine-ip >:/home/<user>/Downloads/

Now, back on the development machine, let's install them. We copied them over to users Download directory so you can look for the files there. When you find them, install them like so:

sudo dpkg -i libcudnn7_7.3.1.20-1+cuda10.0_amd64.deb
sudo dpkg -i libcudnn7-dev_7.3.1.20-1+cuda10.0_amd64.deb
sudo dpkg -i libcudnn7-doc_7.3.1.20-1+cuda10.0_amd64.deb

We can verify the installation by running some sample code. First, we copy it over to our home directory:

cp -r /usr/src/cudnn_samples_v7/ $HOME

Now we can change into the directory and compile the samples. As before we can speed up the make command by giving it the number of CPU cores we have available (e.g. make -j4).

cd  $HOME/cudnn_samples_v7/mnistCUDNN
make clean && make

We then run the sample (again, note the period at the beginning):

./mnistCUDNN

If all went well, you should see at the bottom of the output:

Test passed!

Setting Up Our Virtual Python Environments

Back in our home directory, let's setup a virtual environment for our python code to run. This is standard practice since it separates our own python libraries from the system level ones and also helps us stay organized.

There are quite a few ways to do this, so feel free to use whatever method you want or do what I do:

cd ~
sudo apt-get install python3-venv
python3 -m venv venv/dl4cv

If you followed my way, we created our virtual environment in a directory called dl4cv which you could have named whatever you want. This directory itself is located in the venv directory in your home directory.

Now, whenever you want to work inside of your virtual environment, you'll need to activate it:

source venv/dl4cv/bin/activate

When activated, you'll notice that the name of your virtual environment appears in parentheses to the left side of your command prompt.

Once inside our virtual python environment, we'll be using the python package manager pip to install new python packages. First, let's upgrade it to the latest version:

pip install --upgrade pip

Next, let's install the python package NumPy which we'll need to compile and install OpenCV:

pip install numpy

We now need to download both opencv and opencv_contrib to our Downloads directory:

cd ~/Downloads
wget -O opencv.zip https://github.com/opencv/opencv/archive/3.4.3.zip
wget -O opencv_contrib.zip https://github.com/opencv/opencv_contrib/archive/3.4.3.zip

Unzip both of the archives we just downloaded:

unzip opencv.zip
unzip opencv_contrib.zip

Now we change directory into the ~/opencv-3.4.3/ directory, create a new directory called build and configure OpenCV inside of it:

cd ~/Downloads/opencv-3.4.3/
mkdir build && cd build

cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D WITH_CUDA=ON -D INSTALL_PYTHON_EXAMPLES=ON -D OPENCV_EXTRA_MODULES_PATH=~/Downloads/opencv_contrib-3.4.3/modules -D BUILD_EXAMPLES=ON -DBUILD_opencv_cudacodec=OFF ..

When it's done make note of the output, focusing on the regions as highlighted in the image below.

OpenCV Build Terminal Output

You'll want to make sure that both your Python and NumPy versions are the ones from your virtual environment.

If for some reason they aren't you can always delete the build folder and repeat the above steps, make sure you have the commands correct and that you have activated your virtual python environment before running the command.

As an aside, I had originally tried to compile with CUDA support as well but it ends up not playing well at compile time so that's why it's disabled in this example.

Now we are ready to compile. This command again runs faster if you specify more cores, but if you run into errors, try running make clean and then rerunning make without specifying additional cores.

make -j4

Then we install and update our file links before returning to our home directory:

sudo make install
sudo ldconfig
cd ~

To make our OpenCV bindings available to our virtual python environment, we need to symbolically link them, which is kind of like copying but without the copying part:

cd ~/venv/dl4cv/lib/python3.6/site-packages/
ln -s /usr/local/lib/python3.6/site-packages/cv2.cpython-36m-x86_64-linux-gnu.so cv2.so
cd ~

Time to test it out! Make sure you have your virtual python environment activated and just type python and hit enter to enter the virtual environment.

From there you can try importing the OpenCV library:

import cv2
cv2.__version__

No errors? Congrats!

We'll install a few more python packages before adding Tensorflow support:

pip install scipy matplotlib pillow
pip install imutils h5py requests progressbar2
pip install scikit-learn scikit-image

Tensorflow with GPU support can be pip installed for earlier versions of CUDA + cuDNN but not for the latest versions that we've installed so we'll need to build tensorflow from source ourselves.

Let's start by downloading the source into our Downloads/ folder:

cd ~/Downloads
git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow

By default, we're on the master branch of the source code which is more likely to have bugs and stability issues. Let's switch to the latest release version:

git checkout r1.11

We can test out the code branch with Bazel but we'll need to install it. First the dependencies:

sudo apt-get install openjdk-8-jdk

Then we can install Bazel itself:

cd ~/Downloads
wget https://github.com/bazelbuild/bazel/releases/download/0.18.0/bazel_0.18.0-linux-x86_64.deb
dpkg -i bazel_0.18.0-linux-x86_64.deb

...and change back into our tensorflow directory to run the test (expect it to take awhile):

bazel test -c opt -- //tensorflow/... -//tensorflow/compiler/... -//tensorflow/contrib/lite/...

You'll see plenty of WARNING messages, but we just want to make sure things don't completely go to shit. For me, my results were:

INFO: Build completed, 23 tests FAILED, 27775 total actions

The documentation offers no clues as to whether 23 errors is good or bad, so I'll just assume it's totally fine.

Now we run the configure script which essential surveys the user for which values to use to compile with. The things we care about are telling it to compile with the python executable in our virtual environment, enabling CUDA support, and telling it the correct versions of our tools. We'll just leave everything else as their default values.

So the only non-default answers we need to give are:

  1. Do you wish to build TensorFlow with CUDA support? [y/N]: Y
  2. Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]: 10.0
  3. Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.3
  4. Please specify the NCCL version you want to use. If NCCL 2.2 is not installed, then you can use version 1.3 that can be fetched automatically but it may have worse performance with multiple GPUs. [Default is 2.2]: 1.3

Now we make the package builder:

bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

...build the package:

./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

And install it using pip and change back to our home directory:

pip install /tmp/tensorflow_pkg/tensorflow-1.11.0-cp36-cp36m-linux_x86_64.whl
cd ~

We can test by opening up another python interactive shell and trying to import tensorflow:

import tensorflow
tensorflow.__version__

If you don't get any errors, you can move onto installing Keras:

pip install keras

If you open up another Python shell and import keras, you should get a message saying:

Using TensorFlow backend.

If you do, you're done! Go play with some deep learning models. Maybe start here?