Your homelab server has an NVIDIA GPU sitting idle in the PCIe slot. Docker containers run fine without it, but every AI query runs on CPU at a crawl, every Jellyfin stream pegs the processor for transcoding, and Frigate object detection chugs through frames at single-digit FPS.
The NVIDIA Container Toolkit fixes this. It exposes the GPU to Docker containers as a natively accessible resource — no privileged mode, no raw device passthrough, no manual file descriptor forwarding. Just install the runtime, set a single Compose option, and your containers see the GPU as if it were running on bare metal.
This guide covers the complete setup from a fresh install through production workloads on Ubuntu/Debian and Proxmox LXC hosts.
Why GPU Accelerate Docker Containers?
A GPU in a Docker workflow is not about gaming or rendering. In a homelab, the three workloads that benefit most are:
- AI inference — Ollama runs LLMs 5–10× faster on GPU than CPU. A $200 Tesla P4 or RTX 3060 handles 7B–13B parameter models comfortably.
- Media transcoding — Jellyfin (or Plex) uses NVIDIA NVENC for hardware-accelerated video encoding. A single transcode stream on CPU uses 60–80% of a modern processor; NVENC does it for under 5%.
- Computer vision — Frigate uses TensorRT or OpenCV with CUDA for real-time object detection on camera feeds. GPU detection slashes inference latency from 500ms to under 30ms per frame.
All three share the same underlying setup: the NVIDIA Container Toolkit. Once it is running, any Docker container can request GPU access with a single line in its Compose file.
Prerequisites
- Linux host (Debian 12, Ubuntu 24.04, or Proxmox LXC)
- NVIDIA GPU (Turing or newer for NVENC — GTX 1650, RTX 2060+, Tesla P4/P40, A-series)
- NVIDIA proprietary driver installed (
nvidia-smiworks) - Docker Engine 24+ with
containerd
Verify the Driver
|
|
You should see the driver version, CUDA version, and your GPU:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================|======================|======================|
| 0 Tesla P4 Off | 00000000:02:00.0 Off | Off |
| N/A 48C P0 23W / 75W | 10MiB / 7680MiB | 0% Default |
+-----------------------------------------------------------------------------+
If nvidia-smi returns a command-not-found or driver error,
install the driver first:
|
|
Install the NVIDIA Container Toolkit
The toolkit provides the nvidia-container-runtime that Docker
uses to inject GPU devices into containers.
Add the Repository
|
|
Install and Configure Docker
|
|
The configure step adds the NVIDIA runtime to
/etc/docker/daemon.json:
|
|
Verify the Installation
|
|
You should see the same nvidia-smi output inside the container.
This confirms the GPU is accessible within the container
namespace with no manual device mapping.
Docker Compose GPU Configuration
The modern way to expose GPUs in Docker Compose is the
deploy.resources.reservations.devices block. This works with
Compose v2.23+ and Docker Engine 24+.
|
|
For a specific GPU in a multi-GPU system:
|
|
To request specific compute capabilities:
|
|
The capabilities list controls what is exposed:
gpu— basic GPU device accessutility—nvidia-smimonitoring counterscompute— CUDA compute APIvideo— NVENC/NVDEC video encode/decodedisplay— display output
For Jellyfin and Frigate, always include video:
|
|
Real Workload: Ollama with GPU Acceleration
Ollama is the most straightforward GPU-accelerated workload for a homelab. It detects the GPU automatically when the NVIDIA runtime is configured.
|
|
Pull and run a model to validate GPU usage:
|
|
Check GPU utilization during inference:
|
|
You should see 60–95% GPU compute utilization during token generation. Compare this to the same model running on CPU — typical speedup is 5–10× depending on the model size.
Real Workload: Jellyfin with NVENC Transcoding
Jellyfin uses NVIDIA NVENC for hardware-accelerated video
transcoding. This requires the video capability in addition
to gpu.
|
|
After deployment:
- Open Jellyfin admin dashboard → Playback → Transcoding
- Set Hardware acceleration to NVIDIA NVENC
- Select the GPU device from the dropdown
- Enable Hardware encoding and Hardware decoding
- Save and test by playing a 4K HEVC file through the web client — check the playback info for “Hardware: yes”
To verify transcoding is GPU-accelerated:
|
|
A single Tesla P4 handles 4–6 simultaneous 1080p transcodes or 2–3 4K → 1080p transcodes without breaking a sweat.
Real Workload: Frigate with TensorRT Detection
Frigate 0.15+ supports NVIDIA TensorRT for hardware-accelerated object detection. This dramatically reduces inference latency compared to the default CPU-based OpenVINO or Coral TPU modes.
|
|
Frigate config snippet for TensorRT:
|
|
With GPU detection, frame processing goes from 2–3 FPS on CPU to 25–30 FPS on a Tesla P4 — the difference between missing motion events and catching everything.
Proxmox LXC Considerations
Running Docker inside a Proxmox LXC requires an extra step to
pass the GPU into the container. Add these lines to the LXC
config file (/etc/pve/lxc/<ID>.conf):
|
|
Then run pct reboot <ID> to apply. After reboot, install the
NVIDIA driver and Container Toolkit inside the LXC as described
above.
Troubleshooting
“No available runtime” or “Unknown runtime nvidia”
The Docker daemon does not see the NVIDIA runtime:
|
|
“could not select device driver” error
The Compose file references driver: nvidia but the runtime is
not registered. This is the same fix as above — run
nvidia-ctk runtime configure and restart Docker.
Container sees no GPU despite runtime
|
|
Fabric Manager is required for multi-GPU Tesla cards (P40, A2, T4) in certain configurations.
NVENC not available in Jellyfin
The video capability must be in the capabilities list, and
the GPU must support NVENC. Check your GPU’s encode/decode
support:
|
|
GPUs without NVENC (GTX 10-series non-Ti, older Kepler cards) cannot accelerate encoding. If your card lacks NVENC, use the Intel QuickSync path instead.
Summary
The NVIDIA Container Toolkit transforms a homelab GPU from an idle PCIe occupant into a shared accelerator for every container on the host. The setup is four commands:
- Install
nvidia-container-toolkit - Configure the Docker runtime
- Add
deploy.resources.reservations.devicesto your Compose files - Deploy GPU-accelerated Ollama, Jellyfin, or Frigate
No Kubernetes, no custom images, no privileged containers. Just a GPU that actually earns its slot in the case.