March 7, 2019

custom Sagemaker algorithms

Sagemaker - the machine learning platform offered by AWS - has both prebuilt algorithms and a “bring-your-own” capability. To bring your own algorithm onto Sagemaker, you will need to use a Docker image to package the code. Here’s how.

FROM nvidia/cuda:9.0-base-ubuntu16.04

LABEL maintainer="Jared Davis <[email protected]>"

ARG gitsha
ARG gituser
ARG gitproject
ARG gitserver

# Pick up some TF dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
        build-essential \
        cuda-command-line-tools-9-0 \
        cuda-cublas-9-0 \
        cuda-cufft-9-0 \
        cuda-curand-9-0 \
        cuda-cusolver-9-0 \
        cuda-cusparse-9-0 \
        curl \
        libcudnn7=7.2.1.38-1+cuda9.0 \
        libnccl2=2.2.13-1+cuda9.0 \
        libfreetype6-dev \
        libhdf5-serial-dev \
        libpng12-dev \
        libzmq3-dev \
        pkg-config \
        python3 \
        python3-dev \
        rsync \
        software-properties-common \
        unzip \
        && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

RUN apt-get update && \
        apt-get install nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.0 && \
        apt-get update && \
        apt-get install libnvinfer4=4.1.2-1+cuda9.0

RUN ln -s $(which python3) /usr/bin/python && \
        curl -O https://bootstrap.pypa.io/get-pip.py && \
        python get-pip.py && \
        rm get-pip.py

RUN pip --no-cache-dir install \
        Pillow \
        h5py \
        ipykernel \
        jupyter \
        keras_applications \
        keras_preprocessing \
        matplotlib \
        numpy \
        pandas \
        scipy \
        sklearn \
        tensorflow-gpu==1.12 \
        "sagemaker-tensorflow>=1.12,<1.13"

ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"

WORKDIR /opt/program

ADD https://${gitserver}/${gituser}/${gitproject}/archive/${gitsha}.zip algorithm.zip
RUN unzip algorithm.zip && \
        mv algorithm-${gitsha} algorithm && \
        cd algorithm && \
        python setup.py

The Dockerfile uses the NVIDIA Container Runtime for Docker to gain access to GPU resources on the host machine, in this case, a Sagemaker instance. The Dockerfile also uses the latest Python 3 apt package available to Ubuntu, Tensorflow 1.12 for machine learning code, and the Sagemaker Tensorflow package to make use of additional Sagemaker APIs.

I’m also assuming in the Dockerfile that your actual Tensorflow algorithm is written in a separate git repository - this repo will be added to the container image on build. The algorithm code should also use a setup.py script to prepare the project for Sagemaker’s startup calls when it launches a training job.

Content by © Jared Davis 2019-2020

Powered by Hugo & Kiss.