As GPU-accelerated computing becomes essential in AI, HPC, and scientific computing, developers increasingly turn to containers for reproducible, scalable, and efficient development environments. Docker, when paired with NVIDIA’s CUDA, provides a clean, consistent, and portable platform for building and deploying GPU-powered applications.
Table of Contents
This guide walks you through how to compile and run CUDA code using nvcc
inside a Docker container, leveraging NVIDIAβs official CUDA images.
β Prerequisites
Before you start, ensure you have the following installed:
- Docker Engine
Download here - NVIDIA Container Toolkit (
nvidia-docker
)
Required to provide GPU access to Docker containers:
# For Debian/Ubuntu
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo systemctl restart docker
Verify GPU access:
docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi
π§Ύ Example: CUDA Vector Addition Program
Create a simple CUDA program vector_add.cu
:
// vector_add.cu
#include <iostream>
__global__ void vector_add(float *a, float *b, float *c, int n) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < n) c[idx] = a[idx] + b[idx];
}
int main() {
const int N = 512;
size_t size = N * sizeof(float);
float *h_a = new float[N];
float *h_b = new float[N];
float *h_c = new float[N];
for (int i = 0; i < N; ++i) {
h_a[i] = i;
h_b[i] = i * 2;
}
float *d_a, *d_b, *d_c;
cudaMalloc(&d_a, size);
cudaMalloc(&d_b, size);
cudaMalloc(&d_c, size);
cudaMemcpy(d_a, h_a, size, cudaMemcpyHostToDevice);
cudaMemcpy(d_b, h_b, size, cudaMemcpyHostToDevice);
vector_add<<<(N + 255) / 256, 256>>>(d_a, d_b, d_c, N);
cudaMemcpy(h_c, d_c, size, cudaMemcpyDeviceToHost);
for (int i = 0; i < 5; ++i)
std::cout << h_a[i] << " + " << h_b[i] << " = " << h_c[i] << std::endl;
cudaFree(d_a); cudaFree(d_b); cudaFree(d_c);
delete[] h_a; delete[] h_b; delete[] h_c;
return 0;
}
π³ Create a Dockerfile
Here’s a Dockerfile
to build the CUDA code inside a container:
# Dockerfile
FROM nvidia/cuda:12.3.2-devel-ubuntu22.04
RUN apt-get update && apt-get install -y build-essential
COPY vector_add.cu /workspace/vector_add.cu
WORKDIR /workspace
RUN nvcc -o vector_add vector_add.cu
CMD ["./vector_add"]
ποΈ Build and Run the Container
Step 1: Build the image
docker build -t cuda-vector-add .
Step 2: Run with GPU access
docker run --rm --gpus all cuda-vector-add
Expected output:
0 + 0 = 0
1 + 2 = 3
2 + 4 = 6
3 + 6 = 9
4 + 8 = 12
π§° Interactive Development Using Volumes
You can mount your source code dynamically for faster iteration:
docker run --rm -it --gpus all -v $PWD:/workspace -w /workspace nvidia/cuda:12.3.2-devel-ubuntu22.04 bash
Inside the container:
nvcc -o vector_add vector_add.cu
./vector_add
π§ Feature Comparison: Local vs. Docker-Based CUDA Development
Feature | Native CUDA Development | Docker + CUDA Container |
---|---|---|
Portability | Tied to local setup | Cross-platform and replicable |
Isolation | Shared environment | Fully isolated and reproducible |
Environment Setup Time | Manual (may vary by system) | One-time Dockerfile |
Ease of Scaling to Cloud | Needs reconfiguration | Plug-and-play with container images |
GPU Access | Direct | Requires nvidia-container-toolkit |
Version Control of Toolchain | Manual version tracking | Fixed by Docker image tag |
π Pro Tips
- Use
nvidia/cuda:<version>-devel-ubuntu<version>
for full development withnvcc
. - For runtime-only containers, use
nvidia/cuda:<version>-runtime
. - Use
.dockerignore
to avoid copying unnecessary files. - Consider using multi-stage builds to separate compilation and runtime for leaner images.
π‘ Conclusion
Using Docker with nvcc
is a powerful way to simplify your CUDA development workflow. It eliminates environment inconsistencies and provides a reproducible, scalable path from local development to deployment β whether on bare-metal servers, Kubernetes clusters, or the cloud.