As GPU-accelerated computing becomes essential in AI, HPC, and scientific computing, developers increasingly turn to containers for reproducible, scalable, and efficient development environments. Docker, when paired with NVIDIA’s CUDA, provides a clean, consistent, and portable platform for building and deploying GPU-powered applications.

This guide walks you through how to compile and run CUDA code using nvcc inside a Docker container, leveraging NVIDIA’s official CUDA images.

βœ… Prerequisites

Before you start, ensure you have the following installed:

  • Docker Engine
    Download here
  • NVIDIA Container Toolkit (nvidia-docker)
    Required to provide GPU access to Docker containers:
# For Debian/Ubuntu
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo systemctl restart docker

Verify GPU access:

docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi

🧾 Example: CUDA Vector Addition Program

Create a simple CUDA program vector_add.cu:

// vector_add.cu
#include <iostream>

__global__ void vector_add(float *a, float *b, float *c, int n) {
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if (idx < n) c[idx] = a[idx] + b[idx];
}

int main() {
    const int N = 512;
    size_t size = N * sizeof(float);

    float *h_a = new float[N];
    float *h_b = new float[N];
    float *h_c = new float[N];

    for (int i = 0; i < N; ++i) {
        h_a[i] = i;
        h_b[i] = i * 2;
    }

    float *d_a, *d_b, *d_c;
    cudaMalloc(&d_a, size);
    cudaMalloc(&d_b, size);
    cudaMalloc(&d_c, size);

    cudaMemcpy(d_a, h_a, size, cudaMemcpyHostToDevice);
    cudaMemcpy(d_b, h_b, size, cudaMemcpyHostToDevice);

    vector_add<<<(N + 255) / 256, 256>>>(d_a, d_b, d_c, N);

    cudaMemcpy(h_c, d_c, size, cudaMemcpyDeviceToHost);

    for (int i = 0; i < 5; ++i)
        std::cout << h_a[i] << " + " << h_b[i] << " = " << h_c[i] << std::endl;

    cudaFree(d_a); cudaFree(d_b); cudaFree(d_c);
    delete[] h_a; delete[] h_b; delete[] h_c;

    return 0;
}

🐳 Create a Dockerfile

Here’s a Dockerfile to build the CUDA code inside a container:

# Dockerfile
FROM nvidia/cuda:12.3.2-devel-ubuntu22.04

RUN apt-get update && apt-get install -y build-essential

COPY vector_add.cu /workspace/vector_add.cu
WORKDIR /workspace

RUN nvcc -o vector_add vector_add.cu
CMD ["./vector_add"]

πŸ—οΈ Build and Run the Container

Step 1: Build the image

docker build -t cuda-vector-add .

Step 2: Run with GPU access

docker run --rm --gpus all cuda-vector-add

Expected output:

0 + 0 = 0
1 + 2 = 3
2 + 4 = 6
3 + 6 = 9
4 + 8 = 12

🧰 Interactive Development Using Volumes

You can mount your source code dynamically for faster iteration:

docker run --rm -it --gpus all -v $PWD:/workspace -w /workspace nvidia/cuda:12.3.2-devel-ubuntu22.04 bash

Inside the container:

nvcc -o vector_add vector_add.cu
./vector_add

πŸ”§ Feature Comparison: Local vs. Docker-Based CUDA Development

FeatureNative CUDA DevelopmentDocker + CUDA Container
PortabilityTied to local setupCross-platform and replicable
IsolationShared environmentFully isolated and reproducible
Environment Setup TimeManual (may vary by system)One-time Dockerfile
Ease of Scaling to CloudNeeds reconfigurationPlug-and-play with container images
GPU AccessDirectRequires nvidia-container-toolkit
Version Control of ToolchainManual version trackingFixed by Docker image tag

πŸš€ Pro Tips

  • Use nvidia/cuda:<version>-devel-ubuntu<version> for full development with nvcc.
  • For runtime-only containers, use nvidia/cuda:<version>-runtime.
  • Use .dockerignore to avoid copying unnecessary files.
  • Consider using multi-stage builds to separate compilation and runtime for leaner images.

πŸ’‘ Conclusion

Using Docker with nvcc is a powerful way to simplify your CUDA development workflow. It eliminates environment inconsistencies and provides a reproducible, scalable path from local development to deployment β€” whether on bare-metal servers, Kubernetes clusters, or the cloud.

Scroll to Top