Skip to content

Commit 31dcfde

Browse files
authored
V3.8.2 Update (#87)
* Update Model List v3.8.2 - Readded ONNX model from Depth inference list that was missing * Add files via upload Adapters for DAV3 and Video Depth Anything Depth models for integration * Add files via upload Video Depth Anything Backend * Add files via upload * Add files via upload Config files for DAV3 * Add files via upload model backend for integration * Add files via upload DAV3 Model back end * Add files via upload Windows Updater for latest release * v3.8.2 - Main GUI & Workflow Improvements ### Main GUI & Workflow Improvements - Renamed **Depth Estimation** tab to **Depth Engine** to reflect multi-backend depth processing. - Added native DA3 and Video Depth Anything engines directly into the unified depth selector. - Improved model list consistency so UI options always match available backends. - Added clearer ONNX model identification in the console during load. - Fixed mismatched slider labels and tooltips in the 3D Generator tab. - Reworked **Encoding Settings** dialog layout for cleaner spacing and readability. - Moved **Clip Range** controls into Processing Options with translated labels and tooltips. - Added optional **Convergence Crosshairs** overlay in Preview GUI for faster tuning. - Fixed File menu actions failing to trigger dialogs (Load Preset and Output Path). - Simplified File menu by removing redundant Save/Load Settings in favor of presets. - Integrated built-in **VisionDepth3D Updater** accessible from Help → Check Updates. - Added confirmation prompt before launching updater for safe auto-closing behavior. - Reduced console warning spam for cleaner runtime output. * v.3.8.2 - Depth Estimation Improvements & Fixes ### Depth Estimation Improvements & Fixes - Introduced native **Depth Anything 3 (DA3)** backend with full integration into image and video workflows. - Added native **Video Depth Anything (VDA)** backend with sequence-aware temporal inference. - Unified DA3, VDA, ONNX, and Hugging Face models under a single depth engine pipeline. - Normalized all depth outputs into a consistent 0–1 range for reliable blending and 3D rendering. - Added warm-up passes for DA3 and VDA to eliminate first-frame hitching. - Improved batching support and fallback handling for multi-frame depth inference. - Added configurable target FPS control for VDA to reduce inference load on high-FPS sources. #### ONNX Stability & Model Fixes - Fixed Distill-Any-Depth ONNX models failing due to tensor shape mismatches. - Enforced correct 518×518 inference resolution for Distill-Any-Depth models. - Added automatic ONNX model detection and resolution enforcement. - Switched ONNX preprocessing to aspect-ratio-preserving padding instead of stretching. - Enabled safe ONNX Runtime graph optimizations for improved stability and performance. - Fixed ONNX warm-up errors and broadcast failures. #### Video Depth Handling Improvements - Fixed letterbox (black bar) regions incorrectly affecting depth inference. - Improved multi-frame letterbox detection to prevent flicker. - Filled letterbox areas with neutral depth to prevent pop-out artifacts and white banding. #### Performance Optimizations - Removed redundant image resizing during video inference. - Consolidated resizing into a single pass per frame. - Enabled CUDA `channels_last` memory layout for supported Hugging Face models. - Improved FP16 inference handling for faster CUDA performance. - Optimized ONNX session configuration to reduce memory overhead. - Improved batch handling to reduce per-frame processing cost. - Reduced console warning spam for cleaner runtime output. * v3.8.2 Preview GUI & Live View Improvements - Added optional **Convergence Crosshairs overlay** to the Preview GUI for faster and more precise convergence tuning. - Significantly improved real-time Preview GUI smoothness by resetting render state between sessions to prevent drift and jitter. - Eliminated “settling” artifacts at the start of previews by reinitializing depth normalization and convergence trackers per render. - Improved floating window behavior during the first frames of preview playback for more stable stereo alignment. - Increased live preview FPS by reducing GPU memory churn and persistent buffer reuse. - Reduced preview stutter caused by warm-up spikes and redundant tensor allocations. - Improved frame pacing for smoother SBS output during live preview. - Enhanced stability when mixing screen capture with GPU depth inference. * v3.8.2 - VD3D Live 3D Performance & Stability Upgrades - Major real-time performance boost across GPUs, with live 3D preview running approximately **40 to 70 percent faster** depending on resolution and hardware. - Eliminated frequent GPU memory reallocations by introducing persistent CUDA buffers for depth inference and stereo rendering. - Smoother live depth updates through optimized GPU tensor reuse and reduced CPU to GPU transfer overhead. - Added independent **Depth FPS control**, allowing depth inference to run at a lower rate than preview rendering for better responsiveness and stability. - Reduced temporal jitter in live depth maps using improved EMA smoothing while preserving depth responsiveness. - Minimized preview hitching caused by first-frame warm-up and inference spikes. - Improved frame pacing for more consistent SBS output in live mode. - Increased stability when combining screen capture with GPU depth inference. * v3.8.2 - 3D Generator Performance, Stability & Quality Improvements - Significantly smoother offline and real-time 3D rendering by fully resetting internal render state at the start of each render session. - Eliminated temporal drift, convergence carry-over, and accumulated smoothing artifacts between consecutive renders. - Improved depth range calibration per clip with fresh percentile normalization for more consistent parallax response. - Stabilized floating window behavior and convergence transitions during the first frames of each render. - Increased real-time preview FPS and reduced jitter across long renders. - Fixed output sizing across all 3D modes including: - VR formats - Passive Interlaced displays - Single-eye exports - Corrected floating window calculations to operate per-eye instead of full SBS width. - Added safety resizing to guarantee final encoded frames always match target output resolution. - Added optional **Convergence Crosshairs overlay** in the Preview GUI for faster and more precise tuning. - Cleaned up UI inconsistencies: - Foreground and Background shift labels now match their actual sliders - Tooltips correctly reflect each control’s function - Reworked Encoding Settings layout for better readability and workflow. - Moved Clip Range controls into Processing Options for a cleaner main interface. - Fixed File menu actions: - Preset loading now works correctly from the dropdown - Output path dialog now opens properly from both menu and hotkey - Removed redundant Save/Load Settings in favor of streamlined Preset workflow * Revise changelog for VisionDepth3D v3.8.2 release Updated changelog for VisionDepth3D v3.8.2 with performance improvements, new depth engines, and various fixes. * Update requirements.txt with new packages Added new dependencies to requirements.txt for additional functionality. * Update copyright year in LICENSE.txt * Delete presets/Best3DSettings.json old * Delete presets/balanced_depth.json old * Add files via upload
1 parent e2987aa commit 31dcfde

171 files changed

Lines changed: 16338 additions & 575 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Changelog.md

Lines changed: 138 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,68 +1,164 @@
1-
# VisionDepth3D v3.8 – Changelog
1+
# VisionDepth3D v3.8.2 - Changelog
2+
3+
---
4+
5+
> This release delivers major performance improvements to both live 3D preview and offline rendering, alongside new depth engines, stability fixes, and encoding reliability upgrades.
26
37
---
48

59
## 1) Depth Estimation Tab
610

7-
### Depth Models
11+
### UI Depth Tab Labelling
812

9-
- Fixed ONNX model loading:
10-
- Distill-Any-Depth (inference resolution 518×518, batch size 8)
11-
- Video Depth Anything (inference resolution 512×288, batch size 8)
12-
- Implemented LBM depth model (development version). Thanks to Aether for the implementation fix.
13-
- Removed depth models from the dropdown that returned no `d_type`.
14-
- Fixed Hugging Face model downloads and caching so zoo models consistently save inside the app `weights/` directory (no more extra `.cache` downloads).
15-
- Updated Transformers image processor loading to prefer `use_fast=True` when available (with automatic fallback when unsupported).
13+
- Renamed the Depth Estimation tab to **Depth Engine** to better reflect multi-backend depth processing.
14+
- Reduced console warning spam related to sequential pipeline usage by suppressing the specific Hugging Face warning message.
1615

17-
### Depth Backend
16+
### Depth Anything 3 (DA3) Adapter Integration
1817

19-
- Implemented temporal smoothing in the depth pipeline to reduce flicker and improve temporal stability of depth map output.
20-
- Packaged VisionDepth3D.exe with Distill-Any-Depth (ONNX), Video Depth Anything (ONNX), and Depth Anything v2 Giant weights.
18+
- Added native Depth Anything 3 backend support via a dedicated DA3 adapter (separate from Hugging Face pipeline models).
19+
- Implemented DA3 model loading through Hugging Face `from_pretrained` with VD3D cache routing into the `weights/` directory.
20+
- Added DA3 model entries to the model selector (DA3-SMALL / BASE / LARGE / GIANT and DA3METRIC variants).
21+
- Wired DA3 inference into the unified depth pipeline so it works with both image and video depth workflows.
22+
- Mapped the UI “Inference Resolution” dropdown into DA3’s `process_res` logic (single max-side target resolution), with a video-friendly cap applied to prevent excessive internal upscaling.
23+
- Normalized DA3 depth outputs into a consistent 0–1 range to match existing VD3D depth handling and export logic.
24+
- Depth polarity handling for DA3 metric models remains user-controlled via the “Invert Depth” toggle.
25+
- Improved DA3 batching compatibility by supporting list-of-PIL inference and ensuring returned depth frame counts match input batch size (with a per-image fallback if needed).
26+
- Added a DA3 warm-up pass during model load to reduce first-frame hitching and confirm the backend is initialized correctly.
2127

22-
---
28+
### Video Depth Anything (VDA) Adapter Integration
2329

24-
## 2) 3D Render Tab
30+
- Added native **Video Depth Anything** backend support via a dedicated VDA adapter for sequence-based video depth inference.
31+
- Implemented VDA model loading directly from Hugging Face repositories (e.g. `depth-anything/Video-Depth-Anything-*`) with automatic checkpoint download and caching.
32+
- Integrated VDA into the unified depth pipeline so it can be selected and used alongside DA3, ONNX, and Hugging Face depth models.
33+
- Enabled sequence-aware inference for video input, allowing VDA to process temporal frame batches instead of independent per-frame depth estimation.
34+
- Added configurable target FPS handling for VDA to reduce inference load on high-FPS sources by running depth inference at a lower temporal rate.
35+
- Ensured VDA output depth frames are normalized into VD3D’s standard 0–1 depth range for compatibility with existing export, blending, and 3D rendering logic.
36+
- Wired VDA output into the same post-processing, temporal normalization, and letterbox-handling pipeline used by other depth engines.
37+
- Added VDA model warm-up during load to verify backend initialization and reduce first-inference latency.
38+
- Depth polarity for VDA models remains user-controlled via the existing “Invert Depth” toggle for consistency across all depth engines.
2539

26-
### UI Fixes
40+
### ONNX Model Fixes & Stability Improvements
2741

28-
- Added buttons for encoder settings and processing options.
29-
- Implemented multi-language support and tooltips for new dialog boxes.
30-
- Adjusted preview image window size and video info layout to prevent window overflow.
31-
- 3D tab columns now stack correctly when resizing the window on smaller screens.
42+
- Fixed Distill-Any-Depth ONNX models (Small / Base / Large) failing to run due to internal tensor shape mismatch.
43+
- Distill-Any-Depth ONNX models now correctly use a fixed 518×518 inference size, matching their exported positional embedding grid.
44+
- Added automatic detection for Distill-Any-Depth ONNX models and enforced fixed input resolution internally.
45+
- Updated ONNX image preprocessing to preserve aspect ratio using padding instead of stretching, improving depth stability and quality on widescreen content.
46+
- ONNX warm-up now succeeds reliably for Distill-Any-Depth models without broadcast or Add-node errors.
47+
- Enabled safe ONNX Runtime graph optimizations to reduce unnecessary memory copies and warning spam.
48+
- Added clearer ONNX model identification output in the console so users can see exactly which ONNX model is being loaded.
3249

33-
### 3D Backend
50+
### Model List Consistency
3451

35-
- Reworked Auto Crop Black Bars to use first-frame detection with cached crop reuse.
36-
- Prevents per-frame crop jitter and depth/frame misalignment.
37-
- Improves stability for cinema content with subtle letterboxing.
38-
- Keep Audio checkbox now respects the user-selected output container instead of forcing MP4.
52+
- Fixed missing Distill-Any-Depth ONNX models in the depth inference script while still being listed in the UI.
53+
- Ensured ONNX model availability in the UI now correctly matches backend support.
3954

40-
---
55+
### Video Encoding / Codec Handling
4156

42-
## Frametool Backend
57+
- Fixed CPU and GPU FFmpeg codecs (libx264, libx265, NVENC, AMF, QSV) being incorrectly routed through OpenCV’s VideoWriter.
58+
- Non-OpenCV-safe codecs are now encoded via FFmpeg piping, preventing OpenH264 DLL errors and codec initialization failures.
59+
- OpenCV VideoWriter is now limited to compatible FourCC codecs (mp4v, XVID, DIVX) with automatic fallback handling.
4360

44-
- Reworked Frametool backend to support SSResNet models for feature model integration.
61+
### Depth Inference Performance & Pipeline Optimizations
4562

46-
---
63+
- Reduced redundant image resizing during video depth inference to avoid double-scaling overhead.
64+
- Consolidated resize to a single pass per frame, reducing CPU overhead.
65+
- Enabled CUDA-optimized memory layout (`channels_last`) for Hugging Face depth models when running on GPU.
66+
- Improved FP16 inference handling for supported Hugging Face models to increase throughput on CUDA devices.
67+
- Optimized ONNX Runtime session configuration using safe graph optimizations and memory arena usage.
68+
- Improved batch handling logic to reduce per-frame overhead during video processing.
69+
- FFmpeg piping is now preferred by default for video output, significantly reducing encoding bottlenecks.
4770

48-
## Console Improvements
71+
### Letterbox & Black Bar Handling (Video)
4972

50-
- Standardized startup console messages to clearly reflect which subsystems are initializing (Torch, depth estimation, upscaler, external 3D pipeline, language, settings).
51-
- Unified compute device reporting across pipelines for consistent and clearer console output.
52-
- Suppressed optional xFormers dependency warning on startup.
53-
- Prevented duplicate language loading during settings restore.
73+
- Fixed letterbox (black bar) regions incorrectly contributing to depth inference.
74+
- Depth estimation now consistently ignores top and bottom letterbox bars instead of assigning artificial depth.
75+
- Improved letterbox detection with multi-frame fallback probing and stabilization to prevent flicker.
76+
- Letterbox regions are now filled with a neutral depth value, preventing pop-out artifacts and white banding in 3D renders.
5477

5578
---
5679

57-
## Summary
80+
## 2) 3D Video Generator Tab
81+
82+
### 3D Rendering Pipeline Performance & Stability
83+
84+
- Implemented full render-state reset at the start of each video and image render to prevent temporal drift and accumulated smoothing artifacts between sessions.
85+
- Reset internal pixel shift EMA buffers per render, ensuring clean disparity initialization and improved real-time stability.
86+
- Reset floating window convergence trackers and easing states to eliminate carry-over offsets and unintended masking behavior across renders.
87+
- Reinitialized depth percentile normalization per render, allowing depth range calibration to adapt cleanly to each clip for more consistent parallax response.
88+
- Improved convergence and floating window behavior during the first frames of each render, eliminating “settling” artifacts and jitter.
89+
- Resulted in significantly smoother live 3D playback and notable FPS improvements during real-time rendering.
90+
91+
### Output Geometry & Eye Mode Fixes
92+
93+
- Fixed output sizing logic for VR, Passive Interlaced, and single-eye export modes.
94+
- Ensured per-eye resolution handling remains consistent across all 3D formats.
95+
- Corrected floating window width calculations to always operate on per-eye dimensions instead of SBS frame width.
96+
- Added safety resizing to guarantee encoded frames always match target output resolution.
97+
98+
### Preview GUI
99+
100+
- Preview GUI now supports an optional Convergence Crosshairs overlay for faster convergence tuning.
101+
102+
### UI Label Consistency
103+
104+
- Fixed mismatched labels for Foreground Shift and Background Shift.
105+
- Sliders now correctly match their tooltips.
106+
107+
### Encoding Settings Layout
108+
109+
- Reworked the Encoding Settings dialog layout for improved spacing and readability.
110+
- Grouped checkboxes, dropdowns, and quality controls into clearer rows.
58111

59-
v3.8 focuses on stabilizing depth estimation, improving model compatibility,
60-
and refining the 3D Render tab UI with better layout behavior, clearer diagnostics, and improved localization support.
112+
### Processing Options
61113

62-
> Back up your `weights/` and `presets/` folders before uninstalling v3.7.
63-
> Then run VisionDepth3D_Setup_Downloader to download the official
64-
> VisionDepth3D v3.8 Windows installer and required `.bin` files.
114+
- Moved Clip Range (start/end time) controls into the Processing Options dialog.
115+
- Clip range settings respect the selected UI language and include translated labels and tooltips.
116+
117+
### Menu Fixes, Presets, and Updater Integration
118+
119+
- Help → Check Updates now launches the bundled **VisionDepth3D Updater** window (`VisionDepth3D_Updater.exe`) to download and install the latest official Windows release.
120+
- Added a confirmation prompt before launching the updater, since VisionDepth3D closes itself to allow safe updating.
121+
- Fixed **File → Load Preset** failing from the dropdown due to the preset apply function not being available in scope.
122+
- Fixed **File → Output Path** dropdown not opening the save dialog while the hotkey worked, by routing the menu action through the same handler used by `Ctrl+O`.
123+
- Removed **Save Settings** and **Load Settings** from the File menu since preset save/load already covers the same workflow and simplifies the UI.
124+
125+
## 3) VD3D Live 3D (Real-Time Depth + SBS Pipeline)
126+
127+
### Live Depth Inference Performance Overhaul
128+
129+
- Implemented persistent GPU tensor staging for live frame uploads, eliminating per-frame CUDA allocations and significantly reducing memory transfer overhead.
130+
- Optimized live depth input preprocessing to reuse GPU buffers instead of recreating tensors each inference cycle.
131+
- Reduced redundant CPU to GPU conversions during live depth updates.
132+
- Improved FP16 autocast handling for Depth Anything V2 live inference to ensure stable mixed-precision execution on CUDA.
133+
134+
### Real-Time Pixel Shift Pipeline Optimization
135+
136+
- Added persistent CUDA frame buffers for the live pixel-shift SBS renderer to avoid per-frame GPU reallocations.
137+
- Reduced per-frame normalization overhead by using in-place GPU operations.
138+
- Improved handling of mixed return types from `pixel_shift_cuda` (CUDA tensors or NumPy fallback), ensuring stable live output without crashes.
139+
- Prevented pipeline stalls caused by repeated tensor construction and shape revalidation.
140+
141+
### Live Depth Update Scheduling & Stability
142+
143+
- Implemented controlled depth refresh rate (Depth FPS) to decouple depth inference from preview frame rate for smoother live playback.
144+
- Improved EMA depth smoothing behavior for live mode to reduce temporal jitter while preserving responsiveness.
145+
- Reduced live preview hitching caused by first-frame warm-up and inference spikes.
146+
147+
### Live Capture & Preview Improvements
148+
149+
- Reduced capture overhead by allowing lower capture FPS without affecting SBS rendering smoothness.
150+
- Improved screen capture pacing using high-precision timers to prevent uneven frame delivery.
151+
- Improved live preview stability when mixing screen capture and GPU depth inference.
152+
153+
### Overall Live Mode Gains
154+
155+
- Live 3D preview performance increased by approximately 40 to 70 percent depending on GPU and inference resolution.
156+
- Significantly reduced stutter caused by GPU memory churn.
157+
- More consistent frame pacing for real-time SBS output.
158+
159+
---
65160

66-
> (Optional but recommended) Clear the Hugging Face cache to free space and
67-
> avoid duplicate model downloads:
68-
> `C:\Users\YOUR_USERNAME\.cache\huggingface`
161+
> **Upgrade Note**
162+
> Back up your `weights/` and `presets/` folders before uninstalling v3.8.1
163+
> Then run **VisionDepth3D_Setup_Downloader** to download the official
164+
> VisionDepth3D v3.8.2 Windows installer and required `.bin` files.

LICENSE.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Copyright (c) 2025 Johnathan Carpenter. All rights reserved
1+
Copyright (c) 2026 Johnathan Carpenter. All rights reserved
22

33
This License Agreement ("Agreement") is a legal agreement between you ("User") and VisionDepth ("Licensor") regarding the use of VisionDepth3D ("Software"). By downloading, installing, or using the Software, you acknowledge and agree to be bound by the terms of this Agreement.
44

@@ -45,3 +45,4 @@ This Agreement shall be governed by and interpreted in accordance with the laws
4545

4646
9. Contact
4747
For inquiries regarding this Agreement, contact: redsky90@gmail.com
48+

0 commit comments

Comments
 (0)