diff --git a/README.ja.md b/README.ja.md index 3d924f1..bf227f4 100644 --- a/README.ja.md +++ b/README.ja.md @@ -34,6 +34,38 @@ cargo check --workspace cargo test --workspace ``` +> 注意: 一部の GPU テストはホスト GPU を共有します。ワークスペース一括テストを並列実行すると +> GPU 競合でフレーク化することがあるため、安定確認はクレート個別(例: `cargo test -p cozip`)で +> 行ってください。 + +## Linux デスクトップ連携 + +`packaging/linux/install.sh` は、デスクトップエントリ・MIME タイプ・KDE(Dolphin) サービスメニュー +に加えて、GNOME(Nautilus)/Cinnamon(Nemo)/MATE(Caja) の右クリック「スクリプト」連携をインストール +します。`packaging/linux/uninstall.sh` で削除できます。 + +```bash +./packaging/linux/install.sh # 現在のユーザー向けにビルド+インストール +./packaging/linux/uninstall.sh +``` + +GNOME/Cinnamon/MATE では、圧縮/展開の各アクションが右クリックの **スクリプト** サブメニューに +表示されます。Windows で作成された非UTF-8(例: Shift-JIS)のファイル名は、展開時に復号し、作成時に +UTF-8 へ再エンコードして Windows と一致した挙動になります。 + +## GPU キルスイッチ + +`COZIP_DISABLE_GPU=1` を設定すると CPU のみで動作します。ヘッドレスサーバ、GPU ドライバ不調、CI 向けの +横断的な退避手段で、圧縮は透過的に CPU 経路へフォールバックします。 + +## 独自形式の圧縮率(Huffman) + +PDeflate(`CoZip`)形式は、マッチ探索の後段に任意のチャンク単位 正準 Huffman エントロピー符号化を適用します。 +**後方互換**(フラグはチャンク単位で、旧ストリームは引き続き解凍可能)かつ **既定で有効** です。各チャンクは +推定削減率がしきい値を超えたときだけ Huffman 化されるため、圧縮しにくい/効果の薄いデータは高速経路を維持し、 +GPU 加速のマッチ段にも影響しません。一方、偏ったデータでは目に見えて縮みます(ローカル計測で偏ったリテラル +データに対し約8〜17%小さく)。GPU 解凍は Huffman チャンクを直接デコードします。 + ## `cozip_desktop` の引数 引数なしで `cozip_desktop` を実行すると、デスクトップアプリの圧縮画面を開きます。 diff --git a/README.md b/README.md index 92234ad..3e13ec2 100644 --- a/README.md +++ b/README.md @@ -34,6 +34,41 @@ cargo check --workspace cargo test --workspace ``` +> Note: several GPU tests share the host GPU. Running the whole workspace test in +> parallel can flake under GPU contention; run per crate (e.g. `cargo test -p cozip`) +> for stable results. + +## Linux desktop integration + +`packaging/linux/install.sh` installs the desktop entry, MIME type, KDE/Dolphin +service menus, and right-click "Scripts" entries for GNOME (Nautilus), Cinnamon +(Nemo) and MATE (Caja). `packaging/linux/uninstall.sh` removes them. + +```bash +./packaging/linux/install.sh # build + install for the current user +./packaging/linux/uninstall.sh +``` + +On GNOME/Cinnamon/MATE the compress/extract actions appear under the right-click +**Scripts** submenu. Non-UTF-8 (e.g. Shift-JIS) file names produced on Windows are +decoded on extraction and re-encoded as UTF-8 on creation, matching Windows behavior. + +## GPU kill switch + +Set `COZIP_DISABLE_GPU=1` to force CPU-only operation. This is the cross-platform +escape hatch for headless servers, broken GPU drivers, or CI; compression falls back +to the CPU path transparently. + +## Custom-format compression ratio (Huffman) + +The PDeflate (`CoZip`) format applies an optional per-chunk canonical-Huffman entropy +stage after match-finding. It is **backward compatible** (the flag is per-chunk; older +streams keep decoding) and **on by default**: each chunk is Huffman-coded only when the +estimated saving clears a threshold, so incompressible/marginal data keeps the fast path +and the GPU-accelerated match stage is unaffected, while skewed data shrinks noticeably +(~8–17% smaller on biased literal data in local measurements). GPU decompression +decodes Huffman chunks directly. + ## `cozip_desktop` Arguments Running `cozip_desktop` without arguments opens the desktop application on the diff --git a/docs/context-log.md b/docs/context-log.md index 3d7415e..35542a7 100644 --- a/docs/context-log.md +++ b/docs/context-log.md @@ -1990,3 +1990,31 @@ mode別GPU品質パラメータ: - PDeflate 単一ファイル stream header に任意の UTF-8 `file_name` metadata を追加した - 単一ファイル圧縮では元ファイル名を埋め込み、解凍時はその名前を優先して復元する - 旧ストリームで metadata がない場合は、従来どおりアーカイブ名 stem から復元名を推定する + +## 2026-05-29 Linux 完全対応・独自形式圧縮率向上・GPU速度維持 + +### Linux を Windows と相違なくする対応 +- 作成側の Unix 非UTF8 パス名を Shift-JIS 復号して UTF-8 ZIP 名へ正規化(抽出側は既に UTF-8/Shift-JIS/CP437 対応済み)。Windows 相互運用を両方向で成立させた。 +- GNOME(Nautilus)/Cinnamon(Nemo)/MATE(Caja) 向けの右クリック「Scripts」連携を新規追加(`packaging/linux/filemanager-scripts/`)。KDE(Dolphin) サービスメニューと同等の 圧縮(ZIP/CoZip/詳細)・展開(ここに展開/詳細) を提供。 + - 共通ヘルパーは scripts ディレクトリに置くとメニューへ露出するため、データディレクトリへ配置し各スクリプトから参照する方式にした。 +- `install.sh` / `uninstall.sh` を上記3デスクトップ環境へ拡張。一時 HOME での疑似インストール/アンインストールで検証。 +- GPU 堅牢性: `COZIP_DISABLE_GPU` キルスイッチを `cozip_deflate` と `cozip_pdeflate` 双方の GPU 初期化経路へ追加。ヘッドレス/ドライバ不良/CI で確実に CPU 経路へフォールバックする。 + +### 独自形式(PDeflate)の圧縮率向上(後方互換) +- 従来 `huffman_encode_enabled` 経路は identity LUT(全シンボル8bit固定=実質無圧縮)だった。これを本物の正準 Huffman へ配線した。 + - 基盤(頻度→符号長、正準コードブック、root+subtable LUT、LSB デコード、`encode_symbols_with_huffman_codebook`、GPU デコードシェーダの Huffman 復号)は実装済みで、finalize への配線のみが欠けていた。 +- 256シンボル alphabet で深さが 15bit を超え得るため、zlib `gen_bitlen` 方式の長さ制限 Huffman(`limit_code_lengths`)を追加。 +- セクションのビットストリーム枠組みを Huffman 用に拡張: 各セクションはバイト境界整列で格納し、正確な(非バイト整列の)bit_len を記録。デコード/preprocess のスライスを `huffman` 時 `ceil(bits/8)` へ変更。 +- チャンク単位で「Huffman版 vs 素版」のサイズを厳密比較し、小さい方を採用。フラグはチャンク単位のため旧ストリームは引き続き解凍可能(後方互換)。 + +### GPU 速度を落とさないための工夫(バランス) +- GPU はマッチ探索、CPU が finalize でエントロピー符号化する構造のため、Huffman 追加でも GPU 加速のマッチ段は不変。 +- 全チャンクで実エンコードすると圧縮が +75% 遅くなったため、頻度から利得を事前推定し、推定削減率が約5%以上のチャンクだけ実エンコードする賢いゲートを導入。 + - bench データ(削減~2%)はゲートでスキップ → 圧縮速度・比率ともベースライン同等(速度劣化なし)。 + - 偏ったリテラルデータでは 8〜17% 圧縮率向上(ローカル計測)。 +- `huffman_encode_enabled` を既定 ON 化。 +- 計測(RTX 4070 SUPER, size 1GiB, GPU compress, bench データ): comp_ms 242→245(誤差内)、ratio 0.3944→0.3945(スキップにつき不変)。 + +### 確認 +- クレート個別テストは全て通過(cozip 16, cozip_pdeflate 22, cozip_deflate 11)。GPU+Huffman の CPU/GPU デコード一致テストを追加。 +- ワークスペース一括テストは複数テストバイナリが同一 GPU を同時利用するため GPU 競合でフレーク化することがある。安定確認はクレート個別実行で行う。 diff --git a/packaging/linux/filemanager-scripts/CoZip Compress (options) b/packaging/linux/filemanager-scripts/CoZip Compress (options) new file mode 100644 index 0000000..e46c307 --- /dev/null +++ b/packaging/linux/filemanager-scripts/CoZip Compress (options) @@ -0,0 +1,8 @@ +#!/usr/bin/env bash +set -euo pipefail +source "@COZIP_FM_COMMON@" + +cozip_read_selection_into paths +[[ ${#paths[@]} -eq 0 ]] && exit 0 + +exec "$(cozip_desktop_bin)" ui compress-details "${paths[@]}" diff --git a/packaging/linux/filemanager-scripts/CoZip Compress to CoZip b/packaging/linux/filemanager-scripts/CoZip Compress to CoZip new file mode 100644 index 0000000..17e59e7 --- /dev/null +++ b/packaging/linux/filemanager-scripts/CoZip Compress to CoZip @@ -0,0 +1,8 @@ +#!/usr/bin/env bash +set -euo pipefail +source "@COZIP_FM_COMMON@" + +cozip_read_selection_into paths +[[ ${#paths[@]} -eq 0 ]] && exit 0 + +exec "$(cozip_desktop_bin)" compress --format cozip --hybrid "${paths[@]}" diff --git a/packaging/linux/filemanager-scripts/CoZip Compress to ZIP b/packaging/linux/filemanager-scripts/CoZip Compress to ZIP new file mode 100644 index 0000000..6671298 --- /dev/null +++ b/packaging/linux/filemanager-scripts/CoZip Compress to ZIP @@ -0,0 +1,8 @@ +#!/usr/bin/env bash +set -euo pipefail +source "@COZIP_FM_COMMON@" + +cozip_read_selection_into paths +[[ ${#paths[@]} -eq 0 ]] && exit 0 + +exec "$(cozip_desktop_bin)" compress --format zip --hybrid "${paths[@]}" diff --git a/packaging/linux/filemanager-scripts/CoZip Extract (options) b/packaging/linux/filemanager-scripts/CoZip Extract (options) new file mode 100644 index 0000000..c7e58e7 --- /dev/null +++ b/packaging/linux/filemanager-scripts/CoZip Extract (options) @@ -0,0 +1,8 @@ +#!/usr/bin/env bash +set -euo pipefail +source "@COZIP_FM_COMMON@" + +cozip_read_selection_into paths +[[ ${#paths[@]} -eq 0 ]] && exit 0 + +exec "$(cozip_desktop_bin)" ui extract-details "${paths[@]}" diff --git a/packaging/linux/filemanager-scripts/CoZip Extract Here b/packaging/linux/filemanager-scripts/CoZip Extract Here new file mode 100644 index 0000000..a97fc95 --- /dev/null +++ b/packaging/linux/filemanager-scripts/CoZip Extract Here @@ -0,0 +1,8 @@ +#!/usr/bin/env bash +set -euo pipefail +source "@COZIP_FM_COMMON@" + +cozip_read_selection_into paths +[[ ${#paths[@]} -eq 0 ]] && exit 0 + +exec "$(cozip_desktop_bin)" extract --here "${paths[@]}" diff --git a/packaging/linux/filemanager-scripts/_cozip_common.sh b/packaging/linux/filemanager-scripts/_cozip_common.sh new file mode 100644 index 0000000..6b7d972 --- /dev/null +++ b/packaging/linux/filemanager-scripts/_cozip_common.sh @@ -0,0 +1,39 @@ +# Shared helper sourced by CoZip file-manager scripts. +# +# Supports Nautilus (GNOME), Nemo (Cinnamon) and Caja (MATE). Each of those +# file managers exports the selection through its own *_SCRIPT_SELECTED_FILE_PATHS +# environment variable (newline separated, absolute paths). +# +# The installer rewrites @COZIP_DESKTOP@ to the absolute cozip_desktop path. + +cozip_desktop_bin() { + printf '%s' "@COZIP_DESKTOP@" +} + +# Collects the selected paths from whichever file manager invoked the script and +# prints them, one per line. Empty lines are dropped. +cozip_selected_paths() { + local raw="" + if [[ -n "${NAUTILUS_SCRIPT_SELECTED_FILE_PATHS:-}" ]]; then + raw="${NAUTILUS_SCRIPT_SELECTED_FILE_PATHS}" + elif [[ -n "${NEMO_SCRIPT_SELECTED_FILE_PATHS:-}" ]]; then + raw="${NEMO_SCRIPT_SELECTED_FILE_PATHS}" + elif [[ -n "${CAJA_SCRIPT_SELECTED_FILE_PATHS:-}" ]]; then + raw="${CAJA_SCRIPT_SELECTED_FILE_PATHS}" + fi + + local line + while IFS= read -r line; do + [[ -n "$line" ]] && printf '%s\n' "$line" + done <<< "$raw" +} + +# Reads selected paths into the array named by $1. +cozip_read_selection_into() { + local -n _out="$1" + _out=() + local line + while IFS= read -r line; do + [[ -n "$line" ]] && _out+=("$line") + done < <(cozip_selected_paths) +} diff --git a/packaging/linux/install.sh b/packaging/linux/install.sh index 467dc10..a4dae51 100755 --- a/packaging/linux/install.sh +++ b/packaging/linux/install.sh @@ -13,6 +13,18 @@ ICON_DIR="${DATA_DIR}/icons" COZIP_DESKTOP_BIN="${BIN_DIR}/cozip_desktop" COZIP_COMP_ICON="${ICON_DIR}/comp.ico" COZIP_DECOMP_ICON="${ICON_DIR}/decomp.ico" +# GNOME (Nautilus) / Cinnamon (Nemo) / MATE (Caja) right-click "Scripts" support. +COZIP_FM_COMMON="${DATA_DIR}/filemanager-common.sh" +NAUTILUS_SCRIPT_DIR="${HOME}/.local/share/nautilus/scripts" +NEMO_SCRIPT_DIR="${HOME}/.local/share/nemo/scripts" +CAJA_SCRIPT_DIR="${HOME}/.config/caja/scripts" +FM_SCRIPT_NAMES=( + "CoZip Compress to ZIP" + "CoZip Compress to CoZip" + "CoZip Compress (options)" + "CoZip Extract Here" + "CoZip Extract (options)" +) build=1 if [[ "${1:-}" == "--no-build" ]]; then @@ -44,6 +56,30 @@ install_desktop_template() { chmod 0644 "$dst" } +install_filemanager_common() { + local escaped_bin + escaped_bin="$(escape_sed_replacement "$COZIP_DESKTOP_BIN")" + mkdir -p "$(dirname "$COZIP_FM_COMMON")" + sed \ + -e "s|@COZIP_DESKTOP@|${escaped_bin}|g" \ + "$SCRIPT_DIR/filemanager-scripts/_cozip_common.sh" > "$COZIP_FM_COMMON" + chmod 0644 "$COZIP_FM_COMMON" +} + +install_filemanager_scripts_into() { + local dst_dir="$1" + local escaped_common + escaped_common="$(escape_sed_replacement "$COZIP_FM_COMMON")" + mkdir -p "$dst_dir" + local name + for name in "${FM_SCRIPT_NAMES[@]}"; do + sed \ + -e "s|@COZIP_FM_COMMON@|${escaped_common}|g" \ + "$SCRIPT_DIR/filemanager-scripts/${name}" > "${dst_dir}/${name}" + chmod 0755 "${dst_dir}/${name}" + done +} + refresh_desktop_caches() { command -v update-desktop-database >/dev/null 2>&1 \ && update-desktop-database "$APP_DIR" 2>/dev/null || true @@ -91,9 +127,16 @@ chmod +x \ "$SERVICEMENU_DIR/cozip-10-extract-here.desktop" \ "$SERVICEMENU_DIR/cozip-20-extract-details.desktop" +echo "==> Installing file-manager scripts (GNOME/Cinnamon/MATE)..." +install_filemanager_common +install_filemanager_scripts_into "$NAUTILUS_SCRIPT_DIR" +install_filemanager_scripts_into "$NEMO_SCRIPT_DIR" +install_filemanager_scripts_into "$CAJA_SCRIPT_DIR" + echo "==> Refreshing desktop caches..." refresh_desktop_caches echo "" echo "Done! Installed $COZIP_DESKTOP_BIN." echo "You may need to restart Dolphin (or log out/in) for the service menus to appear." +echo "On GNOME/Cinnamon/MATE the actions appear under the right-click \"Scripts\" submenu." diff --git a/packaging/linux/uninstall.sh b/packaging/linux/uninstall.sh index eb204be..ded9a9e 100755 --- a/packaging/linux/uninstall.sh +++ b/packaging/linux/uninstall.sh @@ -8,6 +8,17 @@ MIME_ROOT="${HOME}/.local/share/mime" MIME_DIR="${MIME_ROOT}/packages" DATA_DIR="${HOME}/.local/share/cozip" ICON_DIR="${DATA_DIR}/icons" +COZIP_FM_COMMON="${DATA_DIR}/filemanager-common.sh" +NAUTILUS_SCRIPT_DIR="${HOME}/.local/share/nautilus/scripts" +NEMO_SCRIPT_DIR="${HOME}/.local/share/nemo/scripts" +CAJA_SCRIPT_DIR="${HOME}/.config/caja/scripts" +FM_SCRIPT_NAMES=( + "CoZip Compress to ZIP" + "CoZip Compress to CoZip" + "CoZip Compress (options)" + "CoZip Extract Here" + "CoZip Extract (options)" +) refresh_desktop_caches() { command -v update-desktop-database >/dev/null 2>&1 \ @@ -38,6 +49,13 @@ remove_file "$SERVICEMENU_DIR/cozip-10-extract-here.desktop" remove_file "$SERVICEMENU_DIR/cozip-20-extract-details.desktop" remove_file "$ICON_DIR/comp.ico" remove_file "$ICON_DIR/decomp.ico" +remove_file "$COZIP_FM_COMMON" + +for dir in "$NAUTILUS_SCRIPT_DIR" "$NEMO_SCRIPT_DIR" "$CAJA_SCRIPT_DIR"; do + for name in "${FM_SCRIPT_NAMES[@]}"; do + remove_file "$dir/$name" + done +done if [[ -d "$ICON_DIR" ]]; then rmdir "$ICON_DIR" 2>/dev/null || true diff --git a/src/cozip/src/lib.rs b/src/cozip/src/lib.rs index 40f7954..5741801 100644 --- a/src/cozip/src/lib.rs +++ b/src/cozip/src/lib.rs @@ -1,5 +1,6 @@ use std::collections::{BTreeMap, VecDeque}; use std::env; +use std::ffi::OsStr; use std::fs::{File as StdFile, OpenOptions}; use std::io::{self, BufReader, BufWriter, Cursor, Read, Seek, SeekFrom, Write}; use std::path::{Component, Path, PathBuf}; @@ -5160,8 +5161,7 @@ fn zip_name_from_relative_path(path: &Path) -> Result { for component in path.components() { match component { Component::Normal(part) => { - let part = part.to_str().ok_or(CoZipError::NonUtf8Name)?; - parts.push(part.to_string()); + parts.push(zip_name_part_from_os_str(part)?); } Component::CurDir => {} Component::ParentDir | Component::RootDir | Component::Prefix(_) => { @@ -5191,8 +5191,56 @@ fn file_name_from_path(path: &Path) -> Result { let file_name = path .file_name() .ok_or(CoZipError::InvalidEntryName("file name is missing"))?; - let file_name = file_name.to_str().ok_or(CoZipError::NonUtf8Name)?; - normalize_zip_entry_name(file_name) + let file_name = zip_name_part_from_os_str(file_name)?; + normalize_zip_entry_name(&file_name) +} + +fn zip_name_part_from_os_str(part: &OsStr) -> Result { + if let Some(value) = part.to_str() { + return Ok(value.to_string()); + } + + #[cfg(unix)] + { + use std::os::unix::ffi::OsStrExt; + return decode_unix_filename_bytes(part.as_bytes()); + } + + #[cfg(not(unix))] + { + let _ = part; + Err(CoZipError::NonUtf8Name) + } +} + +#[cfg(unix)] +fn decode_unix_filename_bytes(bytes: &[u8]) -> Result { + if bytes.is_empty() { + return Err(CoZipError::InvalidEntryName("entry name is empty")); + } + + let (shift_jis_decoded, _, shift_jis_had_errors) = SHIFT_JIS.decode(bytes); + if !shift_jis_had_errors { + let candidate = shift_jis_decoded.into_owned(); + let (reencoded, _, reencode_had_errors) = SHIFT_JIS.encode(&candidate); + if !reencode_had_errors + && reencoded.as_ref() == bytes + && contains_probably_japanese_text(&candidate) + { + inspect_trace_log(format!( + "[path_name] decode_unix_filename encoding=shift_jis value={}", + candidate + )); + return Ok(candidate); + } + } + + let candidate = String::from_utf8_lossy(bytes).into_owned(); + inspect_trace_log(format!( + "[path_name] decode_unix_filename encoding=utf8_lossy value={}", + candidate + )); + Ok(candidate) } fn normalize_zip_entry_name(name: &str) -> Result { @@ -5510,6 +5558,23 @@ mod tests { let _ = std::fs::remove_dir_all(base); } + #[cfg(unix)] + #[test] + fn shift_jis_unix_filename_bytes_become_utf8_zip_name() { + use std::ffi::OsString; + use std::os::unix::ffi::OsStringExt; + + let file_name = OsString::from_vec(vec![ + 0x83, 0x65, 0x83, 0x58, 0x83, 0x67, b'.', b't', b'x', b't', + ]); + let path = PathBuf::from(file_name); + + assert_eq!( + file_name_from_path(&path).expect("decode shift jis path"), + "テスト.txt" + ); + } + #[test] fn cozip_directory_roundtrip_many_files_self_verify() { let cozip = CoZip::init(CoZipOptions::Zip { diff --git a/src/cozip_deflate/src/gpu.rs b/src/cozip_deflate/src/gpu.rs index 9ecf45f..61df9af 100644 --- a/src/cozip_deflate/src/gpu.rs +++ b/src/cozip_deflate/src/gpu.rs @@ -184,12 +184,32 @@ struct DecodeScratch { bind_group: wgpu::BindGroup, } +/// Returns true when the user requested a hard GPU disable via `COZIP_DISABLE_GPU`. +/// +/// This is the cross-platform kill switch used to force CPU-only operation on +/// machines with broken/headless GPU drivers (common on Linux servers and CI). +/// Any value other than empty/`0`/`false` enables the disable. +fn gpu_disabled_by_env() -> bool { + match std::env::var("COZIP_DISABLE_GPU") { + Ok(value) => { + let value = value.trim(); + !(value.is_empty() || value == "0" || value.eq_ignore_ascii_case("false")) + } + Err(_) => false, + } +} + impl GpuAssist { pub(super) fn new(options: &HybridOptions) -> Result { pollster::block_on(Self::new_async(options)) } async fn new_async(options: &HybridOptions) -> Result { + if gpu_disabled_by_env() { + return Err(CozipDeflateError::GpuUnavailable( + "GPU disabled by COZIP_DISABLE_GPU".to_string(), + )); + } let instance = wgpu::Instance::default(); let adapter = instance .request_adapter(&wgpu::RequestAdapterOptions { diff --git a/src/cozip_pdeflate/src/pdeflate/gpu.rs b/src/cozip_pdeflate/src/pdeflate/gpu.rs index a829694..62c19ac 100644 --- a/src/cozip_pdeflate/src/pdeflate/gpu.rs +++ b/src/cozip_pdeflate/src/pdeflate/gpu.rs @@ -2301,6 +2301,11 @@ impl GpuSparsePackScratch { } fn init_runtime() -> Result { + // Cross-platform kill switch: force CPU-only on broken/headless GPU drivers + // (common on Linux servers and CI). Mirrors cozip_deflate's gpu_disabled_by_env. + if env_flag_enabled("COZIP_DISABLE_GPU") { + return Err("GPU disabled by COZIP_DISABLE_GPU".to_string()); + } let instance = wgpu::Instance::default(); let adapter = pollster::block_on(instance.request_adapter(&wgpu::RequestAdapterOptions { power_preference: wgpu::PowerPreference::HighPerformance, diff --git a/src/cozip_pdeflate/src/pdeflate/mod.rs b/src/cozip_pdeflate/src/pdeflate/mod.rs index 3c8b6be..55539d8 100644 --- a/src/cozip_pdeflate/src/pdeflate/mod.rs +++ b/src/cozip_pdeflate/src/pdeflate/mod.rs @@ -239,7 +239,7 @@ impl Default for PDeflateOptions { gpu_tail_stop_ratio: 1.0, parallel_read_threads: parallel_threads, parallel_write_threads: parallel_threads, - huffman_encode_enabled: false, + huffman_encode_enabled: true, compression_mode: PDeflateCompressionMode::Speed, hybrid_scheduler_policy: PDeflateHybridSchedulerPolicy::GlobalQueue, } @@ -2256,38 +2256,124 @@ fn finalize_chunk_from_table( let mut scratch = scratch.borrow_mut(); scratch.section_index.clear(); scratch.section_bitstream.clear(); - let huffman_enabled = options.huffman_encode_enabled; - let mut section_cmd_cursor = 0usize; - for &len_u32 in §ion_cmd_lens { - let sec_cmd_len = - usize::try_from(len_u32).map_err(|_| PDeflateError::NumericOverflow)?; - let sec_cmd_end = section_cmd_cursor - .checked_add(sec_cmd_len) - .ok_or(PDeflateError::NumericOverflow)?; - let sec_cmd = section_cmd.get(section_cmd_cursor..sec_cmd_end).ok_or( - PDeflateError::InvalidStream( - "section command range out of bounds during bitstream encode", - ), - )?; - let bit_len = if huffman_enabled { - let symbols = logical_commands_to_huffman_symbols(sec_cmd); - encode_huffman_symbols_to_bitstream(symbols, &mut scratch.section_bitstream)? - } else { - scratch.section_bitstream.extend_from_slice(sec_cmd); - u32::try_from( + // Always lay out the plain (non-entropy-coded) section index so the chunk + // can fall back to it when Huffman would not pay off. + let mut plain_index = Vec::with_capacity(section_count); + { + let mut cursor = 0usize; + for &len_u32 in §ion_cmd_lens { + let sec_cmd_len = + usize::try_from(len_u32).map_err(|_| PDeflateError::NumericOverflow)?; + let plain_bit_len = u32::try_from( sec_cmd_len .checked_mul(8) .ok_or(PDeflateError::NumericOverflow)?, ) - .map_err(|_| PDeflateError::NumericOverflow)? - }; - write_varint_u32(&mut scratch.section_index, bit_len); - section_cmd_cursor = sec_cmd_end; + .map_err(|_| PDeflateError::NumericOverflow)?; + write_varint_u32(&mut plain_index, plain_bit_len); + cursor = cursor + .checked_add(sec_cmd_len) + .ok_or(PDeflateError::NumericOverflow)?; + } + if cursor != section_cmd.len() { + return Err(PDeflateError::InvalidStream( + "sum(section command len) != section command stream size", + )); + } } - if section_cmd_cursor != section_cmd.len() { - return Err(PDeflateError::InvalidStream( - "sum(section command len) != section command stream size", - )); + + // Build one canonical Huffman codebook for the whole chunk from the + // frequency distribution of the section command stream. This entropy + // stage runs on the CPU after GPU/CPU match-finding, so it improves the + // ratio without disturbing the GPU-accelerated match path. The codebook + // is skipped for empty chunks (no symbols) so they keep the plain layout. + let huffman_attempt = if options.huffman_encode_enabled && !section_cmd.is_empty() { + let mut frequencies = [0u32; 256]; + for &symbol in section_cmd.iter() { + let slot = &mut frequencies[symbol as usize]; + *slot = slot.saturating_add(1); + } + // On the (provably unreachable for a 256-symbol alphabet at 15 bits) + // chance the codebook is rejected, fall back to the plain layout. + build_canonical_huffman_codebook_from_frequencies(&frequencies, HUFF_MAX_CODE_BITS) + .ok() + .map(|codebook| (frequencies, codebook)) + } else { + None + }; + + // Decide per chunk whether to spend the (slower) entropy pass. The estimate + // from the codebook is cheap; the actual section encoding is not, so it is + // only performed when the codebook predicts a worthwhile saving. The flag is + // per-chunk, so mixing Huffman and plain chunks stays backward compatible. + let plain_bytes = section_cmd.len() + plain_index.len(); + let huff_lut_storage = if let Some((frequencies, codebook)) = huffman_attempt.as_ref() { + let root_bits = HUFF_LUT_ROOT_BITS_DEFAULT + .min(codebook.max_code_bits) + .max(1); + let lut = build_huffman_lut(codebook, root_bits)?; + let serialized_lut = serialize_huffman_lut(&lut)?; + + let estimated_payload_bits: u64 = (0..256) + .filter_map(|symbol| { + codebook.codes[symbol] + .map(|code| u64::from(frequencies[symbol]) * u64::from(code.bit_len)) + }) + .sum(); + let estimated_huff_bytes = (estimated_payload_bits / 8) + .saturating_add(serialized_lut.len() as u64) + .saturating_add(plain_index.len() as u64); + // Require a clear win (>= ~10%) before paying the entropy/encode cost, so + // incompressible or marginally-compressible chunks keep the fast path. + let worth_encoding = estimated_huff_bytes.saturating_mul(20) + < (plain_bytes as u64).saturating_mul(19); + + if worth_encoding { + let mut huff_bitstream = Vec::with_capacity(section_cmd.len()); + let mut huff_index = Vec::with_capacity(section_count); + let mut cursor = 0usize; + for &len_u32 in §ion_cmd_lens { + let sec_cmd_len = + usize::try_from(len_u32).map_err(|_| PDeflateError::NumericOverflow)?; + let sec_cmd_end = cursor + .checked_add(sec_cmd_len) + .ok_or(PDeflateError::NumericOverflow)?; + let sec_cmd = section_cmd.get(cursor..sec_cmd_end).ok_or( + PDeflateError::InvalidStream( + "section command range out of bounds during bitstream encode", + ), + )?; + let (encoded, sec_bit_len) = + encode_symbols_with_huffman_codebook(sec_cmd, codebook)?; + huff_bitstream.extend_from_slice(&encoded); + write_varint_u32( + &mut huff_index, + u32::try_from(sec_bit_len) + .map_err(|_| PDeflateError::NumericOverflow)?, + ); + cursor = sec_cmd_end; + } + + // The exact comparison guarantees the chunk never grows. + let huff_cost = huff_bitstream.len() + huff_index.len() + serialized_lut.len(); + if huff_cost < plain_bytes { + scratch.section_bitstream.extend_from_slice(&huff_bitstream); + scratch.section_index.extend_from_slice(&huff_index); + Some(serialized_lut) + } else { + None + } + } else { + None + } + } else { + None + }; + + let huffman_enabled = huff_lut_storage.is_some(); + if !huffman_enabled { + scratch.section_bitstream.extend_from_slice(§ion_cmd); + scratch.section_index.extend_from_slice(&plain_index); } let (table_index, table_data): (&[u8], &[u8]) = if let (Some(idx), Some(data)) = @@ -2308,11 +2394,6 @@ fn finalize_chunk_from_table( } (&scratch.table_index, &scratch.table_data) }; - let huff_lut_storage = if huffman_enabled { - Some(build_identity_huffman_lut_block()?) - } else { - None - }; let huff_lut = huff_lut_storage.as_deref().unwrap_or(&[]); let chunk_flags = if huffman_enabled { CHUNK_FLAG_HUFFMAN @@ -4375,7 +4456,13 @@ fn preprocess_chunk_for_gpu_decode(payload: &[u8]) -> Result Result usize::from(max_code_bits) { - return Err(PDeflateError::InvalidOptions( - "huffman code length exceeds max_code_bits", - )); - } - lengths[symbol] = u8::try_from(depth).map_err(|_| PDeflateError::NumericOverflow)?; + // Plain Huffman over a 256-symbol alphabet can exceed the format's code-length + // limit on strongly skewed data. Store the (bounded) depth here and let + // limit_code_lengths cap it below without rejecting the input. + lengths[symbol] = u8::try_from(depth.min(255)).unwrap_or(255); } + limit_code_lengths(&mut lengths, frequencies, max_code_bits); Ok(lengths) } +/// Caps Huffman code lengths to `max_code_bits` while keeping a valid (not +/// oversubscribed) prefix code. +/// +/// Over-long codes are clamped to the limit, which oversubscribes the Kraft sum; +/// the surplus is then repaired on the length histogram using zlib's `gen_bitlen` +/// redistribution, after which lengths are reassigned so the least frequent symbols +/// receive the longest codes. The result is at most a hair from optimal and always +/// satisfies the canonical-code constraints. +fn limit_code_lengths(lengths: &mut [u8], frequencies: &[u32], max_code_bits: u8) { + let l = usize::from(max_code_bits); + if l == 0 { + return; + } + + let mut overflow: i64 = 0; + let mut bl_count = vec![0i64; l + 1]; + for &len in lengths.iter() { + let len = usize::from(len); + if len == 0 { + continue; + } + if len > l { + overflow += 1; + bl_count[l] += 1; + } else { + bl_count[len] += 1; + } + } + if overflow == 0 { + return; + } + + while overflow > 0 { + let mut bits = l - 1; + while bits > 0 && bl_count[bits] == 0 { + bits -= 1; + } + if bits == 0 { + break; + } + bl_count[bits] -= 1; + bl_count[bits + 1] += 2; + bl_count[l] -= 1; + overflow -= 2; + } + + let mut syms: Vec = (0..lengths.len()).filter(|&s| lengths[s] > 0).collect(); + syms.sort_by_key(|&s| (frequencies[s], s)); + let mut pos = 0usize; + for bits in (1..=l).rev() { + let mut count = bl_count[bits]; + while count > 0 && pos < syms.len() { + lengths[syms[pos]] = bits as u8; + pos += 1; + count -= 1; + } + } +} + fn build_canonical_huffman_codebook_from_lengths( code_lengths: &[u8], max_code_bits: u8, @@ -6813,7 +6964,9 @@ fn decode_section_bitstream_to_huffman_symbols_with_lut( section_bit_len: usize, huff_lut: &HuffmanLut, ) -> Result, PDeflateError> { - let byte_len = section_bit_len_to_byte_len(section_bit_len)?; + // Huffman section bitstreams are byte-aligned in storage; the exact bit length + // may not be a multiple of 8, so the stored byte count is ceil(bits/8). + let byte_len = section_bit_len.div_ceil(8); if byte_len != section_bits.len() { return Err(PDeflateError::InvalidStream( "section bitstream length mismatch", @@ -6994,6 +7147,24 @@ mod tests { out } + /// Literal-heavy data with a strongly skewed byte distribution that resists the + /// match finder, used so the per-chunk entropy stage clearly pays off and the + /// Huffman layout is selected. Symbols are squared (biased toward small values) and + /// folded into a moderate alphabet, then XORed with a position bit so identical + /// 3-grams rarely recur — leaving a skewed literal stream the match stage cannot + /// absorb, which Huffman compresses well (~15% on this generator). + fn skewed_literal_data(size: usize, seed: u32) -> Vec { + let mut out = Vec::with_capacity(size); + let mut state = seed; + while out.len() < size { + state = state.wrapping_mul(1_664_525).wrapping_add(1_013_904_223); + let r = (state >> 24) as usize & 0xff; + let sym = ((r * r / 1024) as u8) & 0x3f; + out.push(sym ^ ((out.len() as u8) & 0x40)); + } + out + } + fn decompress_with_options(stream: &[u8], opts: &PDeflateOptions) -> Vec { let mut out = Vec::new(); pdeflate_decompress_into_with_stats_with_options(stream, &mut out, opts) @@ -7199,9 +7370,23 @@ mod tests { } #[test] - fn chunk_contains_identity_huffman_lut_block() { - let input = b"ABABABABABABABAB".to_vec(); - let payload = compress_single_chunk_payload_with_huffman(&input, 1, true); + fn chunk_contains_real_huffman_lut_block() { + // Skewed literal-heavy input must produce a real, frequency-optimised canonical + // Huffman chunk (strictly smaller than the plain layout, unlike the old identity + // placeholder) that still carries a valid LUT and round-trips. + let input = skewed_literal_data(512 * 1024, 0x51ce_d00d); + let payload = compress_single_chunk_payload_with_huffman(&input, 128, true); + let plain = compress_single_chunk_payload_with_huffman(&input, 128, false); + + // Real entropy coding must beat the plain layout. The old identity placeholder + // could only match or inflate it. + assert!( + payload.len() < plain.len(), + "huffman chunk ({}) should be smaller than plain ({})", + payload.len(), + plain.len() + ); + let ( table_index_offset, table_data_offset, @@ -7219,25 +7404,21 @@ mod tests { assert!(!huff_lut.is_empty()); let lut = deserialize_huffman_lut(huff_lut).expect("deserialize chunk lut"); assert_eq!(lut.symbol_count, 256); - assert_eq!(lut.root_bits, 8); - assert_eq!(lut.max_code_bits, 8); - assert_eq!(lut.root.len(), 256); - assert!(lut.subtables.is_empty()); - for (idx, entry) in lut.root.iter().copied().enumerate() { - match entry { - HuffmanLutEntry::Symbol { symbol, bit_len } => { - assert_eq!(usize::from(symbol), idx); - assert_eq!(bit_len, 8); - } - _ => panic!("identity lut root must contain direct symbols"), - } - } + assert!(lut.max_code_bits <= HUFF_MAX_CODE_BITS); + assert_eq!( + lut.root_bits, + HUFF_LUT_ROOT_BITS_DEFAULT.min(lut.max_code_bits).max(1) + ); + + // The chunk must still decode back to the original bytes. + let decoded = decode_chunk_payload_cpu(&payload); + assert_eq!(decoded, input); } #[test] fn reject_corrupted_chunk_huffman_lut_block() { - let input = b"ABABABABABABABAB".to_vec(); - let mut payload = compress_single_chunk_payload_with_huffman(&input, 1, true); + let input = skewed_literal_data(512 * 1024, 0x51ce_d00d); + let mut payload = compress_single_chunk_payload_with_huffman(&input, 128, true); let (_table_index_offset, _table_data_offset, huff_lut_offset, _section_index_offset, _) = chunk_offsets(&payload); // huff lut header: [symbol_count:u16][root_bits:u8][max_code_bits:u8][root_len:u32]... @@ -7481,6 +7662,57 @@ mod tests { assert_eq!(gpu_out, cpu_out); } + #[test] + fn gpu_decode_v2_matches_cpu_with_huffman() { + // A Huffman-encoded stream must decode identically on the CPU and the GPU + // (the decode_v2 shader has a canonical-Huffman LUT decoder), and must also + // be smaller than the same stream without Huffman. + let _guard = gpu_test_lock(); + if !gpu::is_runtime_available() { + return; + } + gpu::reset_decode_slot_pool_for_test().expect("reset decode slot pool"); + let input = skewed_literal_data(512 * 1024, 0x0bad_f00d); + + // Compress on the CPU so the Huffman-vs-plain size comparison is deterministic + // (GPU match-finding can vary run to run). The Huffman flag is per-chunk, so the + // result can never be larger than the plain layout. + let huffman_opts = PDeflateOptions { + gpu_compress_enabled: false, + huffman_encode_enabled: true, + ..PDeflateOptions::default() + }; + let plain_opts = PDeflateOptions { + gpu_compress_enabled: false, + huffman_encode_enabled: false, + ..PDeflateOptions::default() + }; + let compressed = pdeflate_compress(&input, &huffman_opts).expect("compress huffman"); + let plain = pdeflate_compress(&input, &plain_opts).expect("compress plain"); + assert!( + compressed.len() < plain.len(), + "huffman stream ({}) should be smaller than plain ({})", + compressed.len(), + plain.len() + ); + + let cpu_opts = PDeflateOptions { + gpu_decompress_enabled: false, + gpu_decompress_force_gpu: false, + ..huffman_opts.clone() + }; + let gpu_opts = PDeflateOptions { + gpu_decompress_enabled: true, + gpu_decompress_force_gpu: true, + ..huffman_opts.clone() + }; + let cpu_out = decompress_with_options(&compressed, &cpu_opts); + let gpu_out = decompress_with_options(&compressed, &gpu_opts); + assert_eq!(cpu_out, input); + assert_eq!(gpu_out, input); + assert_eq!(gpu_out, cpu_out); + } + #[test] fn gpu_decode_v2_matches_cpu_random_inputs() { let _guard = gpu_test_lock();