Author: Igor Khozhanov
Contact: khozhanov@gmail.com
Copyright: © 2026 Igor Khozhanov. All Rights Reserved.
Note for Reviewers: This repository is currently under active development. The pipeline is being implemented in stages to ensure memory safety and zero-host-copy verification.
| Module / Stage | Status | Notes |
|---|---|---|
| FFMpeg Source | ✅ Stable | Handles stream connection and packet extraction. |
| Stub Detector | ✅ Stable | Pass-through module, validated for pipeline latency profiling. |
| Output / NVJpeg | ✅ Stable | Saves frames from GPU memory to disk as separate *.jpg images. |
| Inference Pipeline | ✅ Stable | Connects all the stages together. |
| ONNX Detector | 🛠Integration | Implemented with Ort::MemoryInfo for Zero-Copy input. |
| TensorRT Detector | 🛠Integration | Engine builder & enqueueV3 implemented; Dynamic shapes supported. |
| Object Tracker | 🚧 WIP | Kernels for position prediction & IOU matching. |
| Post-Processing | 🚧 WIP | Custom CUDA kernels for YOLOv8 decoding & NMS. |
This project implements a high-performance video inference pipeline designed to minimize CPU-GPU bandwidth usage. Unlike standard OpenCV implementations, this pipeline keeps data entirely on the VRAM (Zero-Host-Copy) from decoding to inference.
The current build verifies the Decoding, Memory Allocation and Data Saving stages.
- ✅ Linux x64 (Verified on Ubuntu 24.04 / RTX 3060 Ti)
- 🚧 Windows 10/11 (Build scripts implemented, pending validation)
- 🚧 Nvidia Jetson Orin (CMake configuration ready, pending hardware tests)
Note: The CMakeLists.txt contains specific logic for vcpkg (Windows) and aarch64 (Jetson), but these targets are currently experimental.
- CMake 3.19+
- CUDA Toolkit (12.x)
- TensorRT 10.x+
- FFmpeg: Required.
- Linux Users: Install via package manager or build from source with
--enable-shared.
- Linux Users: Install via package manager or build from source with
- NVIDIA cuDNN: Required by ONNX Runtime CUDA provider.
- Note:
Ensure
libcudnn.sois in yourLD_LIBRARY_PATHor installed system-wide.
- Note:
Ensure
mkdir build && cd build
cmake ..
make -j$(nproc)./ZeroCopyInference -i ../video/Moving.mp4 --backend stub -b 16 -o Movingor
docker run --rm --gpus all \
-v $(pwd)/video:/app/video \
ghcr.io/igkho/zerohostcopyinference:main \
-i video/Moving.mp4 --backend stub -b 16 -o video/output./ZeroCopyInferenceTestsor
docker run --rm --gpus all \
--entrypoint ./build/ZeroCopyInferenceTests \
ghcr.io/igkho/zerohostcopyinference:mainInfrastructure overhead measured on NVIDIA RTX 3060 Ti (1440p Video):
| Metric | Result | Notes |
|---|---|---|
| Max Pipeline Capacity | ~300 FPS (No Model) | Measured with Stub/Pass-through Detector. Represents the I/O ceiling (Decode → GPU Memory → Encode) before adding model latency |
| I/O Latency | ~3.3 ms | Time spent on non-inference tasks, leaving 13ms+ (at 60FPS) purely for AI models. |
| CPU Usage | Low | Zero-Host-Copy ensures CPU only handles orchestration, not pixels. |
The source code of this project is licensed under the MIT License. You are free to use, modify, and distribute this infrastructure code for any purpose, including commercial applications.
While the C++ pipeline code is MIT-licensed, the models you run on it may have their own restrictive licenses.
- Example: If you use YOLOv8 (Ultralytics) with this pipeline, be aware that YOLOv8 is licensed under AGPL-3.0.
- Implication: Integrating an AGPL-3.0 model may legally require your entire combined application to comply with AGPL-3.0 terms (i.e., open-sourcing your entire project).
User Responsibility: This repository provides the execution engine only. No models are bundled. You are responsible for verifying and complying with the license of any specific ONNX/TensorRT model you choose to load.