diff --git a/README.ja.md b/README.ja.md
index 3d924f1..bf227f4 100644
--- a/README.ja.md
+++ b/README.ja.md
@@ -34,6 +34,38 @@ cargo check --workspace
 cargo test --workspace
 ```
 
+> 注意: 一部の GPU テストはホスト GPU を共有します。ワークスペース一括テストを並列実行すると
+> GPU 競合でフレーク化することがあるため、安定確認はクレート個別（例: `cargo test -p cozip`）で
+> 行ってください。
+
+## Linux デスクトップ連携
+
+`packaging/linux/install.sh` は、デスクトップエントリ・MIME タイプ・KDE(Dolphin) サービスメニュー
+に加えて、GNOME(Nautilus)/Cinnamon(Nemo)/MATE(Caja) の右クリック「スクリプト」連携をインストール
+します。`packaging/linux/uninstall.sh` で削除できます。
+
+```bash
+./packaging/linux/install.sh          # 現在のユーザー向けにビルド＋インストール
+./packaging/linux/uninstall.sh
+```
+
+GNOME/Cinnamon/MATE では、圧縮/展開の各アクションが右クリックの **スクリプト** サブメニューに
+表示されます。Windows で作成された非UTF-8（例: Shift-JIS）のファイル名は、展開時に復号し、作成時に
+UTF-8 へ再エンコードして Windows と一致した挙動になります。
+
+## GPU キルスイッチ
+
+`COZIP_DISABLE_GPU=1` を設定すると CPU のみで動作します。ヘッドレスサーバ、GPU ドライバ不調、CI 向けの
+横断的な退避手段で、圧縮は透過的に CPU 経路へフォールバックします。
+
+## 独自形式の圧縮率（Huffman）
+
+PDeflate(`CoZip`)形式は、マッチ探索の後段に任意のチャンク単位 正準 Huffman エントロピー符号化を適用します。
+**後方互換**（フラグはチャンク単位で、旧ストリームは引き続き解凍可能）かつ **既定で有効** です。各チャンクは
+推定削減率がしきい値を超えたときだけ Huffman 化されるため、圧縮しにくい/効果の薄いデータは高速経路を維持し、
+GPU 加速のマッチ段にも影響しません。一方、偏ったデータでは目に見えて縮みます（ローカル計測で偏ったリテラル
+データに対し約8〜17%小さく）。GPU 解凍は Huffman チャンクを直接デコードします。
+
 ## `cozip_desktop` の引数
 
 引数なしで `cozip_desktop` を実行すると、デスクトップアプリの圧縮画面を開きます。
diff --git a/README.md b/README.md
index 92234ad..3e13ec2 100644
--- a/README.md
+++ b/README.md
@@ -34,6 +34,41 @@ cargo check --workspace
 cargo test --workspace
 ```
 
+> Note: several GPU tests share the host GPU. Running the whole workspace test in
+> parallel can flake under GPU contention; run per crate (e.g. `cargo test -p cozip`)
+> for stable results.
+
+## Linux desktop integration
+
+`packaging/linux/install.sh` installs the desktop entry, MIME type, KDE/Dolphin
+service menus, and right-click "Scripts" entries for GNOME (Nautilus), Cinnamon
+(Nemo) and MATE (Caja). `packaging/linux/uninstall.sh` removes them.
+
+```bash
+./packaging/linux/install.sh          # build + install for the current user
+./packaging/linux/uninstall.sh
+```
+
+On GNOME/Cinnamon/MATE the compress/extract actions appear under the right-click
+**Scripts** submenu. Non-UTF-8 (e.g. Shift-JIS) file names produced on Windows are
+decoded on extraction and re-encoded as UTF-8 on creation, matching Windows behavior.
+
+## GPU kill switch
+
+Set `COZIP_DISABLE_GPU=1` to force CPU-only operation. This is the cross-platform
+escape hatch for headless servers, broken GPU drivers, or CI; compression falls back
+to the CPU path transparently.
+
+## Custom-format compression ratio (Huffman)
+
+The PDeflate (`CoZip`) format applies an optional per-chunk canonical-Huffman entropy
+stage after match-finding. It is **backward compatible** (the flag is per-chunk; older
+streams keep decoding) and **on by default**: each chunk is Huffman-coded only when the
+estimated saving clears a threshold, so incompressible/marginal data keeps the fast path
+and the GPU-accelerated match stage is unaffected, while skewed data shrinks noticeably
+(~8–17% smaller on biased literal data in local measurements). GPU decompression
+decodes Huffman chunks directly.
+
 ## `cozip_desktop` Arguments
 
 Running `cozip_desktop` without arguments opens the desktop application on the
diff --git a/docs/context-log.md b/docs/context-log.md
index 3d7415e..35542a7 100644
--- a/docs/context-log.md
+++ b/docs/context-log.md
@@ -1990,3 +1990,31 @@ mode別GPU品質パラメータ:
 - PDeflate 単一ファイル stream header に任意の UTF-8 `file_name` metadata を追加した
 - 単一ファイル圧縮では元ファイル名を埋め込み、解凍時はその名前を優先して復元する
 - 旧ストリームで metadata がない場合は、従来どおりアーカイブ名 stem から復元名を推定する
+
+## 2026-05-29 Linux 完全対応・独自形式圧縮率向上・GPU速度維持
+
+### Linux を Windows と相違なくする対応
+- 作成側の Unix 非UTF8 パス名を Shift-JIS 復号して UTF-8 ZIP 名へ正規化（抽出側は既に UTF-8/Shift-JIS/CP437 対応済み）。Windows 相互運用を両方向で成立させた。
+- GNOME(Nautilus)/Cinnamon(Nemo)/MATE(Caja) 向けの右クリック「Scripts」連携を新規追加（`packaging/linux/filemanager-scripts/`）。KDE(Dolphin) サービスメニューと同等の 圧縮(ZIP/CoZip/詳細)・展開(ここに展開/詳細) を提供。
+  - 共通ヘルパーは scripts ディレクトリに置くとメニューへ露出するため、データディレクトリへ配置し各スクリプトから参照する方式にした。
+- `install.sh` / `uninstall.sh` を上記3デスクトップ環境へ拡張。一時 HOME での疑似インストール/アンインストールで検証。
+- GPU 堅牢性: `COZIP_DISABLE_GPU` キルスイッチを `cozip_deflate` と `cozip_pdeflate` 双方の GPU 初期化経路へ追加。ヘッドレス/ドライバ不良/CI で確実に CPU 経路へフォールバックする。
+
+### 独自形式(PDeflate)の圧縮率向上（後方互換）
+- 従来 `huffman_encode_enabled` 経路は identity LUT（全シンボル8bit固定＝実質無圧縮）だった。これを本物の正準 Huffman へ配線した。
+  - 基盤（頻度→符号長、正準コードブック、root+subtable LUT、LSB デコード、`encode_symbols_with_huffman_codebook`、GPU デコードシェーダの Huffman 復号）は実装済みで、finalize への配線のみが欠けていた。
+- 256シンボル alphabet で深さが 15bit を超え得るため、zlib `gen_bitlen` 方式の長さ制限 Huffman（`limit_code_lengths`）を追加。
+- セクションのビットストリーム枠組みを Huffman 用に拡張: 各セクションはバイト境界整列で格納し、正確な（非バイト整列の）bit_len を記録。デコード/preprocess のスライスを `huffman` 時 `ceil(bits/8)` へ変更。
+- チャンク単位で「Huffman版 vs 素版」のサイズを厳密比較し、小さい方を採用。フラグはチャンク単位のため旧ストリームは引き続き解凍可能（後方互換）。
+
+### GPU 速度を落とさないための工夫（バランス）
+- GPU はマッチ探索、CPU が finalize でエントロピー符号化する構造のため、Huffman 追加でも GPU 加速のマッチ段は不変。
+- 全チャンクで実エンコードすると圧縮が +75% 遅くなったため、頻度から利得を事前推定し、推定削減率が約5%以上のチャンクだけ実エンコードする賢いゲートを導入。
+  - bench データ(削減~2%)はゲートでスキップ → 圧縮速度・比率ともベースライン同等（速度劣化なし）。
+  - 偏ったリテラルデータでは 8〜17% 圧縮率向上（ローカル計測）。
+- `huffman_encode_enabled` を既定 ON 化。
+- 計測（RTX 4070 SUPER, size 1GiB, GPU compress, bench データ）: comp_ms 242→245（誤差内）、ratio 0.3944→0.3945（スキップにつき不変）。
+
+### 確認
+- クレート個別テストは全て通過（cozip 16, cozip_pdeflate 22, cozip_deflate 11）。GPU+Huffman の CPU/GPU デコード一致テストを追加。
+- ワークスペース一括テストは複数テストバイナリが同一 GPU を同時利用するため GPU 競合でフレーク化することがある。安定確認はクレート個別実行で行う。
diff --git a/packaging/linux/filemanager-scripts/CoZip Compress (options) b/packaging/linux/filemanager-scripts/CoZip Compress (options)
new file mode 100644
index 0000000..e46c307
--- /dev/null
+++ b/packaging/linux/filemanager-scripts/CoZip Compress (options)	
@@ -0,0 +1,8 @@
+#!/usr/bin/env bash
+set -euo pipefail
+source "@COZIP_FM_COMMON@"
+
+cozip_read_selection_into paths
+[[ ${#paths[@]} -eq 0 ]] && exit 0
+
+exec "$(cozip_desktop_bin)" ui compress-details "${paths[@]}"
diff --git a/packaging/linux/filemanager-scripts/CoZip Compress to CoZip b/packaging/linux/filemanager-scripts/CoZip Compress to CoZip
new file mode 100644
index 0000000..17e59e7
--- /dev/null
+++ b/packaging/linux/filemanager-scripts/CoZip Compress to CoZip	
@@ -0,0 +1,8 @@
+#!/usr/bin/env bash
+set -euo pipefail
+source "@COZIP_FM_COMMON@"
+
+cozip_read_selection_into paths
+[[ ${#paths[@]} -eq 0 ]] && exit 0
+
+exec "$(cozip_desktop_bin)" compress --format cozip --hybrid "${paths[@]}"
diff --git a/packaging/linux/filemanager-scripts/CoZip Compress to ZIP b/packaging/linux/filemanager-scripts/CoZip Compress to ZIP
new file mode 100644
index 0000000..6671298
--- /dev/null
+++ b/packaging/linux/filemanager-scripts/CoZip Compress to ZIP	
@@ -0,0 +1,8 @@
+#!/usr/bin/env bash
+set -euo pipefail
+source "@COZIP_FM_COMMON@"
+
+cozip_read_selection_into paths
+[[ ${#paths[@]} -eq 0 ]] && exit 0
+
+exec "$(cozip_desktop_bin)" compress --format zip --hybrid "${paths[@]}"
diff --git a/packaging/linux/filemanager-scripts/CoZip Extract (options) b/packaging/linux/filemanager-scripts/CoZip Extract (options)
new file mode 100644
index 0000000..c7e58e7
--- /dev/null
+++ b/packaging/linux/filemanager-scripts/CoZip Extract (options)	
@@ -0,0 +1,8 @@
+#!/usr/bin/env bash
+set -euo pipefail
+source "@COZIP_FM_COMMON@"
+
+cozip_read_selection_into paths
+[[ ${#paths[@]} -eq 0 ]] && exit 0
+
+exec "$(cozip_desktop_bin)" ui extract-details "${paths[@]}"
diff --git a/packaging/linux/filemanager-scripts/CoZip Extract Here b/packaging/linux/filemanager-scripts/CoZip Extract Here
new file mode 100644
index 0000000..a97fc95
--- /dev/null
+++ b/packaging/linux/filemanager-scripts/CoZip Extract Here	
@@ -0,0 +1,8 @@
+#!/usr/bin/env bash
+set -euo pipefail
+source "@COZIP_FM_COMMON@"
+
+cozip_read_selection_into paths
+[[ ${#paths[@]} -eq 0 ]] && exit 0
+
+exec "$(cozip_desktop_bin)" extract --here "${paths[@]}"
diff --git a/packaging/linux/filemanager-scripts/_cozip_common.sh b/packaging/linux/filemanager-scripts/_cozip_common.sh
new file mode 100644
index 0000000..6b7d972
--- /dev/null
+++ b/packaging/linux/filemanager-scripts/_cozip_common.sh
@@ -0,0 +1,39 @@
+# Shared helper sourced by CoZip file-manager scripts.
+#
+# Supports Nautilus (GNOME), Nemo (Cinnamon) and Caja (MATE). Each of those
+# file managers exports the selection through its own *_SCRIPT_SELECTED_FILE_PATHS
+# environment variable (newline separated, absolute paths).
+#
+# The installer rewrites @COZIP_DESKTOP@ to the absolute cozip_desktop path.
+
+cozip_desktop_bin() {
+  printf '%s' "@COZIP_DESKTOP@"
+}
+
+# Collects the selected paths from whichever file manager invoked the script and
+# prints them, one per line. Empty lines are dropped.
+cozip_selected_paths() {
+  local raw=""
+  if [[ -n "${NAUTILUS_SCRIPT_SELECTED_FILE_PATHS:-}" ]]; then
+    raw="${NAUTILUS_SCRIPT_SELECTED_FILE_PATHS}"
+  elif [[ -n "${NEMO_SCRIPT_SELECTED_FILE_PATHS:-}" ]]; then
+    raw="${NEMO_SCRIPT_SELECTED_FILE_PATHS}"
+  elif [[ -n "${CAJA_SCRIPT_SELECTED_FILE_PATHS:-}" ]]; then
+    raw="${CAJA_SCRIPT_SELECTED_FILE_PATHS}"
+  fi
+
+  local line
+  while IFS= read -r line; do
+    [[ -n "$line" ]] && printf '%s\n' "$line"
+  done <<< "$raw"
+}
+
+# Reads selected paths into the array named by $1.
+cozip_read_selection_into() {
+  local -n _out="$1"
+  _out=()
+  local line
+  while IFS= read -r line; do
+    [[ -n "$line" ]] && _out+=("$line")
+  done < <(cozip_selected_paths)
+}
diff --git a/packaging/linux/install.sh b/packaging/linux/install.sh
index 467dc10..a4dae51 100755
--- a/packaging/linux/install.sh
+++ b/packaging/linux/install.sh
@@ -13,6 +13,18 @@ ICON_DIR="${DATA_DIR}/icons"
 COZIP_DESKTOP_BIN="${BIN_DIR}/cozip_desktop"
 COZIP_COMP_ICON="${ICON_DIR}/comp.ico"
 COZIP_DECOMP_ICON="${ICON_DIR}/decomp.ico"
+# GNOME (Nautilus) / Cinnamon (Nemo) / MATE (Caja) right-click "Scripts" support.
+COZIP_FM_COMMON="${DATA_DIR}/filemanager-common.sh"
+NAUTILUS_SCRIPT_DIR="${HOME}/.local/share/nautilus/scripts"
+NEMO_SCRIPT_DIR="${HOME}/.local/share/nemo/scripts"
+CAJA_SCRIPT_DIR="${HOME}/.config/caja/scripts"
+FM_SCRIPT_NAMES=(
+  "CoZip Compress to ZIP"
+  "CoZip Compress to CoZip"
+  "CoZip Compress (options)"
+  "CoZip Extract Here"
+  "CoZip Extract (options)"
+)
 
 build=1
 if [[ "${1:-}" == "--no-build" ]]; then
@@ -44,6 +56,30 @@ install_desktop_template() {
   chmod 0644 "$dst"
 }
 
+install_filemanager_common() {
+  local escaped_bin
+  escaped_bin="$(escape_sed_replacement "$COZIP_DESKTOP_BIN")"
+  mkdir -p "$(dirname "$COZIP_FM_COMMON")"
+  sed \
+    -e "s|@COZIP_DESKTOP@|${escaped_bin}|g" \
+    "$SCRIPT_DIR/filemanager-scripts/_cozip_common.sh" > "$COZIP_FM_COMMON"
+  chmod 0644 "$COZIP_FM_COMMON"
+}
+
+install_filemanager_scripts_into() {
+  local dst_dir="$1"
+  local escaped_common
+  escaped_common="$(escape_sed_replacement "$COZIP_FM_COMMON")"
+  mkdir -p "$dst_dir"
+  local name
+  for name in "${FM_SCRIPT_NAMES[@]}"; do
+    sed \
+      -e "s|@COZIP_FM_COMMON@|${escaped_common}|g" \
+      "$SCRIPT_DIR/filemanager-scripts/${name}" > "${dst_dir}/${name}"
+    chmod 0755 "${dst_dir}/${name}"
+  done
+}
+
 refresh_desktop_caches() {
   command -v update-desktop-database >/dev/null 2>&1 \
     && update-desktop-database "$APP_DIR" 2>/dev/null || true
@@ -91,9 +127,16 @@ chmod +x \
   "$SERVICEMENU_DIR/cozip-10-extract-here.desktop" \
   "$SERVICEMENU_DIR/cozip-20-extract-details.desktop"
 
+echo "==> Installing file-manager scripts (GNOME/Cinnamon/MATE)..."
+install_filemanager_common
+install_filemanager_scripts_into "$NAUTILUS_SCRIPT_DIR"
+install_filemanager_scripts_into "$NEMO_SCRIPT_DIR"
+install_filemanager_scripts_into "$CAJA_SCRIPT_DIR"
+
 echo "==> Refreshing desktop caches..."
 refresh_desktop_caches
 
 echo ""
 echo "Done! Installed $COZIP_DESKTOP_BIN."
 echo "You may need to restart Dolphin (or log out/in) for the service menus to appear."
+echo "On GNOME/Cinnamon/MATE the actions appear under the right-click \"Scripts\" submenu."
diff --git a/packaging/linux/uninstall.sh b/packaging/linux/uninstall.sh
index eb204be..ded9a9e 100755
--- a/packaging/linux/uninstall.sh
+++ b/packaging/linux/uninstall.sh
@@ -8,6 +8,17 @@ MIME_ROOT="${HOME}/.local/share/mime"
 MIME_DIR="${MIME_ROOT}/packages"
 DATA_DIR="${HOME}/.local/share/cozip"
 ICON_DIR="${DATA_DIR}/icons"
+COZIP_FM_COMMON="${DATA_DIR}/filemanager-common.sh"
+NAUTILUS_SCRIPT_DIR="${HOME}/.local/share/nautilus/scripts"
+NEMO_SCRIPT_DIR="${HOME}/.local/share/nemo/scripts"
+CAJA_SCRIPT_DIR="${HOME}/.config/caja/scripts"
+FM_SCRIPT_NAMES=(
+  "CoZip Compress to ZIP"
+  "CoZip Compress to CoZip"
+  "CoZip Compress (options)"
+  "CoZip Extract Here"
+  "CoZip Extract (options)"
+)
 
 refresh_desktop_caches() {
   command -v update-desktop-database >/dev/null 2>&1 \
@@ -38,6 +49,13 @@ remove_file "$SERVICEMENU_DIR/cozip-10-extract-here.desktop"
 remove_file "$SERVICEMENU_DIR/cozip-20-extract-details.desktop"
 remove_file "$ICON_DIR/comp.ico"
 remove_file "$ICON_DIR/decomp.ico"
+remove_file "$COZIP_FM_COMMON"
+
+for dir in "$NAUTILUS_SCRIPT_DIR" "$NEMO_SCRIPT_DIR" "$CAJA_SCRIPT_DIR"; do
+  for name in "${FM_SCRIPT_NAMES[@]}"; do
+    remove_file "$dir/$name"
+  done
+done
 
 if [[ -d "$ICON_DIR" ]]; then
   rmdir "$ICON_DIR" 2>/dev/null || true
diff --git a/src/cozip/src/lib.rs b/src/cozip/src/lib.rs
index 40f7954..5741801 100644
--- a/src/cozip/src/lib.rs
+++ b/src/cozip/src/lib.rs
@@ -1,5 +1,6 @@
 use std::collections::{BTreeMap, VecDeque};
 use std::env;
+use std::ffi::OsStr;
 use std::fs::{File as StdFile, OpenOptions};
 use std::io::{self, BufReader, BufWriter, Cursor, Read, Seek, SeekFrom, Write};
 use std::path::{Component, Path, PathBuf};
@@ -5160,8 +5161,7 @@ fn zip_name_from_relative_path(path: &Path) -> Result<String, CoZipError> {
     for component in path.components() {
         match component {
             Component::Normal(part) => {
-                let part = part.to_str().ok_or(CoZipError::NonUtf8Name)?;
-                parts.push(part.to_string());
+                parts.push(zip_name_part_from_os_str(part)?);
             }
             Component::CurDir => {}
             Component::ParentDir | Component::RootDir | Component::Prefix(_) => {
@@ -5191,8 +5191,56 @@ fn file_name_from_path(path: &Path) -> Result<String, CoZipError> {
     let file_name = path
         .file_name()
         .ok_or(CoZipError::InvalidEntryName("file name is missing"))?;
-    let file_name = file_name.to_str().ok_or(CoZipError::NonUtf8Name)?;
-    normalize_zip_entry_name(file_name)
+    let file_name = zip_name_part_from_os_str(file_name)?;
+    normalize_zip_entry_name(&file_name)
+}
+
+fn zip_name_part_from_os_str(part: &OsStr) -> Result<String, CoZipError> {
+    if let Some(value) = part.to_str() {
+        return Ok(value.to_string());
+    }
+
+    #[cfg(unix)]
+    {
+        use std::os::unix::ffi::OsStrExt;
+        return decode_unix_filename_bytes(part.as_bytes());
+    }
+
+    #[cfg(not(unix))]
+    {
+        let _ = part;
+        Err(CoZipError::NonUtf8Name)
+    }
+}
+
+#[cfg(unix)]
+fn decode_unix_filename_bytes(bytes: &[u8]) -> Result<String, CoZipError> {
+    if bytes.is_empty() {
+        return Err(CoZipError::InvalidEntryName("entry name is empty"));
+    }
+
+    let (shift_jis_decoded, _, shift_jis_had_errors) = SHIFT_JIS.decode(bytes);
+    if !shift_jis_had_errors {
+        let candidate = shift_jis_decoded.into_owned();
+        let (reencoded, _, reencode_had_errors) = SHIFT_JIS.encode(&candidate);
+        if !reencode_had_errors
+            && reencoded.as_ref() == bytes
+            && contains_probably_japanese_text(&candidate)
+        {
+            inspect_trace_log(format!(
+                "[path_name] decode_unix_filename encoding=shift_jis value={}",
+                candidate
+            ));
+            return Ok(candidate);
+        }
+    }
+
+    let candidate = String::from_utf8_lossy(bytes).into_owned();
+    inspect_trace_log(format!(
+        "[path_name] decode_unix_filename encoding=utf8_lossy value={}",
+        candidate
+    ));
+    Ok(candidate)
 }
 
 fn normalize_zip_entry_name(name: &str) -> Result<String, CoZipError> {
@@ -5510,6 +5558,23 @@ mod tests {
         let _ = std::fs::remove_dir_all(base);
     }
 
+    #[cfg(unix)]
+    #[test]
+    fn shift_jis_unix_filename_bytes_become_utf8_zip_name() {
+        use std::ffi::OsString;
+        use std::os::unix::ffi::OsStringExt;
+
+        let file_name = OsString::from_vec(vec![
+            0x83, 0x65, 0x83, 0x58, 0x83, 0x67, b'.', b't', b'x', b't',
+        ]);
+        let path = PathBuf::from(file_name);
+
+        assert_eq!(
+            file_name_from_path(&path).expect("decode shift jis path"),
+            "テスト.txt"
+        );
+    }
+
     #[test]
     fn cozip_directory_roundtrip_many_files_self_verify() {
         let cozip = CoZip::init(CoZipOptions::Zip {
diff --git a/src/cozip_deflate/src/gpu.rs b/src/cozip_deflate/src/gpu.rs
index 9ecf45f..61df9af 100644
--- a/src/cozip_deflate/src/gpu.rs
+++ b/src/cozip_deflate/src/gpu.rs
@@ -184,12 +184,32 @@ struct DecodeScratch {
     bind_group: wgpu::BindGroup,
 }
 
+/// Returns true when the user requested a hard GPU disable via `COZIP_DISABLE_GPU`.
+///
+/// This is the cross-platform kill switch used to force CPU-only operation on
+/// machines with broken/headless GPU drivers (common on Linux servers and CI).
+/// Any value other than empty/`0`/`false` enables the disable.
+fn gpu_disabled_by_env() -> bool {
+    match std::env::var("COZIP_DISABLE_GPU") {
+        Ok(value) => {
+            let value = value.trim();
+            !(value.is_empty() || value == "0" || value.eq_ignore_ascii_case("false"))
+        }
+        Err(_) => false,
+    }
+}
+
 impl GpuAssist {
     pub(super) fn new(options: &HybridOptions) -> Result<Self, CozipDeflateError> {
         pollster::block_on(Self::new_async(options))
     }
 
     async fn new_async(options: &HybridOptions) -> Result<Self, CozipDeflateError> {
+        if gpu_disabled_by_env() {
+            return Err(CozipDeflateError::GpuUnavailable(
+                "GPU disabled by COZIP_DISABLE_GPU".to_string(),
+            ));
+        }
         let instance = wgpu::Instance::default();
         let adapter = instance
             .request_adapter(&wgpu::RequestAdapterOptions {
diff --git a/src/cozip_pdeflate/src/pdeflate/gpu.rs b/src/cozip_pdeflate/src/pdeflate/gpu.rs
index a829694..62c19ac 100644
--- a/src/cozip_pdeflate/src/pdeflate/gpu.rs
+++ b/src/cozip_pdeflate/src/pdeflate/gpu.rs
@@ -2301,6 +2301,11 @@ impl GpuSparsePackScratch {
 }
 
 fn init_runtime() -> Result<GpuMatchRuntime, String> {
+    // Cross-platform kill switch: force CPU-only on broken/headless GPU drivers
+    // (common on Linux servers and CI). Mirrors cozip_deflate's gpu_disabled_by_env.
+    if env_flag_enabled("COZIP_DISABLE_GPU") {
+        return Err("GPU disabled by COZIP_DISABLE_GPU".to_string());
+    }
     let instance = wgpu::Instance::default();
     let adapter = pollster::block_on(instance.request_adapter(&wgpu::RequestAdapterOptions {
         power_preference: wgpu::PowerPreference::HighPerformance,
diff --git a/src/cozip_pdeflate/src/pdeflate/mod.rs b/src/cozip_pdeflate/src/pdeflate/mod.rs
index 3c8b6be..55539d8 100644
--- a/src/cozip_pdeflate/src/pdeflate/mod.rs
+++ b/src/cozip_pdeflate/src/pdeflate/mod.rs
@@ -239,7 +239,7 @@ impl Default for PDeflateOptions {
             gpu_tail_stop_ratio: 1.0,
             parallel_read_threads: parallel_threads,
             parallel_write_threads: parallel_threads,
-            huffman_encode_enabled: false,
+            huffman_encode_enabled: true,
             compression_mode: PDeflateCompressionMode::Speed,
             hybrid_scheduler_policy: PDeflateHybridSchedulerPolicy::GlobalQueue,
         }
@@ -2256,38 +2256,124 @@ fn finalize_chunk_from_table(
             let mut scratch = scratch.borrow_mut();
             scratch.section_index.clear();
             scratch.section_bitstream.clear();
-            let huffman_enabled = options.huffman_encode_enabled;
-            let mut section_cmd_cursor = 0usize;
-            for &len_u32 in &section_cmd_lens {
-                let sec_cmd_len =
-                    usize::try_from(len_u32).map_err(|_| PDeflateError::NumericOverflow)?;
-                let sec_cmd_end = section_cmd_cursor
-                    .checked_add(sec_cmd_len)
-                    .ok_or(PDeflateError::NumericOverflow)?;
-                let sec_cmd = section_cmd.get(section_cmd_cursor..sec_cmd_end).ok_or(
-                    PDeflateError::InvalidStream(
-                        "section command range out of bounds during bitstream encode",
-                    ),
-                )?;
-                let bit_len = if huffman_enabled {
-                    let symbols = logical_commands_to_huffman_symbols(sec_cmd);
-                    encode_huffman_symbols_to_bitstream(symbols, &mut scratch.section_bitstream)?
-                } else {
-                    scratch.section_bitstream.extend_from_slice(sec_cmd);
-                    u32::try_from(
+            // Always lay out the plain (non-entropy-coded) section index so the chunk
+            // can fall back to it when Huffman would not pay off.
+            let mut plain_index = Vec::with_capacity(section_count);
+            {
+                let mut cursor = 0usize;
+                for &len_u32 in &section_cmd_lens {
+                    let sec_cmd_len =
+                        usize::try_from(len_u32).map_err(|_| PDeflateError::NumericOverflow)?;
+                    let plain_bit_len = u32::try_from(
                         sec_cmd_len
                             .checked_mul(8)
                             .ok_or(PDeflateError::NumericOverflow)?,
                     )
-                    .map_err(|_| PDeflateError::NumericOverflow)?
-                };
-                write_varint_u32(&mut scratch.section_index, bit_len);
-                section_cmd_cursor = sec_cmd_end;
+                    .map_err(|_| PDeflateError::NumericOverflow)?;
+                    write_varint_u32(&mut plain_index, plain_bit_len);
+                    cursor = cursor
+                        .checked_add(sec_cmd_len)
+                        .ok_or(PDeflateError::NumericOverflow)?;
+                }
+                if cursor != section_cmd.len() {
+                    return Err(PDeflateError::InvalidStream(
+                        "sum(section command len) != section command stream size",
+                    ));
+                }
             }
-            if section_cmd_cursor != section_cmd.len() {
-                return Err(PDeflateError::InvalidStream(
-                    "sum(section command len) != section command stream size",
-                ));
+
+            // Build one canonical Huffman codebook for the whole chunk from the
+            // frequency distribution of the section command stream. This entropy
+            // stage runs on the CPU after GPU/CPU match-finding, so it improves the
+            // ratio without disturbing the GPU-accelerated match path. The codebook
+            // is skipped for empty chunks (no symbols) so they keep the plain layout.
+            let huffman_attempt = if options.huffman_encode_enabled && !section_cmd.is_empty() {
+                let mut frequencies = [0u32; 256];
+                for &symbol in section_cmd.iter() {
+                    let slot = &mut frequencies[symbol as usize];
+                    *slot = slot.saturating_add(1);
+                }
+                // On the (provably unreachable for a 256-symbol alphabet at 15 bits)
+                // chance the codebook is rejected, fall back to the plain layout.
+                build_canonical_huffman_codebook_from_frequencies(&frequencies, HUFF_MAX_CODE_BITS)
+                    .ok()
+                    .map(|codebook| (frequencies, codebook))
+            } else {
+                None
+            };
+
+            // Decide per chunk whether to spend the (slower) entropy pass. The estimate
+            // from the codebook is cheap; the actual section encoding is not, so it is
+            // only performed when the codebook predicts a worthwhile saving. The flag is
+            // per-chunk, so mixing Huffman and plain chunks stays backward compatible.
+            let plain_bytes = section_cmd.len() + plain_index.len();
+            let huff_lut_storage = if let Some((frequencies, codebook)) = huffman_attempt.as_ref() {
+                let root_bits = HUFF_LUT_ROOT_BITS_DEFAULT
+                    .min(codebook.max_code_bits)
+                    .max(1);
+                let lut = build_huffman_lut(codebook, root_bits)?;
+                let serialized_lut = serialize_huffman_lut(&lut)?;
+
+                let estimated_payload_bits: u64 = (0..256)
+                    .filter_map(|symbol| {
+                        codebook.codes[symbol]
+                            .map(|code| u64::from(frequencies[symbol]) * u64::from(code.bit_len))
+                    })
+                    .sum();
+                let estimated_huff_bytes = (estimated_payload_bits / 8)
+                    .saturating_add(serialized_lut.len() as u64)
+                    .saturating_add(plain_index.len() as u64);
+                // Require a clear win (>= ~10%) before paying the entropy/encode cost, so
+                // incompressible or marginally-compressible chunks keep the fast path.
+                let worth_encoding = estimated_huff_bytes.saturating_mul(20)
+                    < (plain_bytes as u64).saturating_mul(19);
+
+                if worth_encoding {
+                    let mut huff_bitstream = Vec::with_capacity(section_cmd.len());
+                    let mut huff_index = Vec::with_capacity(section_count);
+                    let mut cursor = 0usize;
+                    for &len_u32 in &section_cmd_lens {
+                        let sec_cmd_len =
+                            usize::try_from(len_u32).map_err(|_| PDeflateError::NumericOverflow)?;
+                        let sec_cmd_end = cursor
+                            .checked_add(sec_cmd_len)
+                            .ok_or(PDeflateError::NumericOverflow)?;
+                        let sec_cmd = section_cmd.get(cursor..sec_cmd_end).ok_or(
+                            PDeflateError::InvalidStream(
+                                "section command range out of bounds during bitstream encode",
+                            ),
+                        )?;
+                        let (encoded, sec_bit_len) =
+                            encode_symbols_with_huffman_codebook(sec_cmd, codebook)?;
+                        huff_bitstream.extend_from_slice(&encoded);
+                        write_varint_u32(
+                            &mut huff_index,
+                            u32::try_from(sec_bit_len)
+                                .map_err(|_| PDeflateError::NumericOverflow)?,
+                        );
+                        cursor = sec_cmd_end;
+                    }
+
+                    // The exact comparison guarantees the chunk never grows.
+                    let huff_cost = huff_bitstream.len() + huff_index.len() + serialized_lut.len();
+                    if huff_cost < plain_bytes {
+                        scratch.section_bitstream.extend_from_slice(&huff_bitstream);
+                        scratch.section_index.extend_from_slice(&huff_index);
+                        Some(serialized_lut)
+                    } else {
+                        None
+                    }
+                } else {
+                    None
+                }
+            } else {
+                None
+            };
+
+            let huffman_enabled = huff_lut_storage.is_some();
+            if !huffman_enabled {
+                scratch.section_bitstream.extend_from_slice(&section_cmd);
+                scratch.section_index.extend_from_slice(&plain_index);
             }
 
             let (table_index, table_data): (&[u8], &[u8]) = if let (Some(idx), Some(data)) =
@@ -2308,11 +2394,6 @@ fn finalize_chunk_from_table(
                 }
                 (&scratch.table_index, &scratch.table_data)
             };
-            let huff_lut_storage = if huffman_enabled {
-                Some(build_identity_huffman_lut_block()?)
-            } else {
-                None
-            };
             let huff_lut = huff_lut_storage.as_deref().unwrap_or(&[]);
             let chunk_flags = if huffman_enabled {
                 CHUNK_FLAG_HUFFMAN
@@ -4375,7 +4456,13 @@ fn preprocess_chunk_for_gpu_decode(payload: &[u8]) -> Result<ChunkDecodePreproce
         let sec_bit_len_u32 = read_varint_u32(section_index, &mut section_idx_cursor)?;
         let sec_bit_len =
             usize::try_from(sec_bit_len_u32).map_err(|_| PDeflateError::NumericOverflow)?;
-        let sec_cmd_len = section_bit_len_to_byte_len(sec_bit_len)?;
+        // Huffman sections are byte-aligned in storage with an exact bit length, so
+        // the stored byte count is ceil(bits/8); plain sections stay byte-aligned.
+        let sec_cmd_len = if huffman_enabled {
+            sec_bit_len.div_ceil(8)
+        } else {
+            section_bit_len_to_byte_len(sec_bit_len)?
+        };
         let sec_cmd_end = cmd_cursor
             .checked_add(sec_cmd_len)
             .ok_or(PDeflateError::NumericOverflow)?;
@@ -4539,7 +4626,13 @@ fn decompress_chunk_into(payload: &[u8], out: &mut [u8]) -> Result<ChunkDecoded,
             let sec_bit_len_u32 = read_varint_u32(section_index, &mut section_idx_cursor)?;
             let sec_bit_len =
                 usize::try_from(sec_bit_len_u32).map_err(|_| PDeflateError::NumericOverflow)?;
-            let sec_cmd_len = section_bit_len_to_byte_len(sec_bit_len)?;
+            // Huffman sections are byte-aligned in storage but carry an exact (possibly
+            // non-byte-aligned) bit length, so the stored byte count is ceil(bits/8).
+            let sec_cmd_len = if huffman_enabled {
+                sec_bit_len.div_ceil(8)
+            } else {
+                section_bit_len_to_byte_len(sec_bit_len)?
+            };
             let sec_cmd_end = cmd_cursor
                 .checked_add(sec_cmd_len)
                 .ok_or(PDeflateError::NumericOverflow)?;
@@ -6149,16 +6242,74 @@ fn build_huffman_code_lengths_from_frequencies(
         if depth == 0 {
             depth = 1;
         }
-        if depth > usize::from(max_code_bits) {
-            return Err(PDeflateError::InvalidOptions(
-                "huffman code length exceeds max_code_bits",
-            ));
-        }
-        lengths[symbol] = u8::try_from(depth).map_err(|_| PDeflateError::NumericOverflow)?;
+        // Plain Huffman over a 256-symbol alphabet can exceed the format's code-length
+        // limit on strongly skewed data. Store the (bounded) depth here and let
+        // limit_code_lengths cap it below without rejecting the input.
+        lengths[symbol] = u8::try_from(depth.min(255)).unwrap_or(255);
     }
+    limit_code_lengths(&mut lengths, frequencies, max_code_bits);
     Ok(lengths)
 }
 
+/// Caps Huffman code lengths to `max_code_bits` while keeping a valid (not
+/// oversubscribed) prefix code.
+///
+/// Over-long codes are clamped to the limit, which oversubscribes the Kraft sum;
+/// the surplus is then repaired on the length histogram using zlib's `gen_bitlen`
+/// redistribution, after which lengths are reassigned so the least frequent symbols
+/// receive the longest codes. The result is at most a hair from optimal and always
+/// satisfies the canonical-code constraints.
+fn limit_code_lengths(lengths: &mut [u8], frequencies: &[u32], max_code_bits: u8) {
+    let l = usize::from(max_code_bits);
+    if l == 0 {
+        return;
+    }
+
+    let mut overflow: i64 = 0;
+    let mut bl_count = vec![0i64; l + 1];
+    for &len in lengths.iter() {
+        let len = usize::from(len);
+        if len == 0 {
+            continue;
+        }
+        if len > l {
+            overflow += 1;
+            bl_count[l] += 1;
+        } else {
+            bl_count[len] += 1;
+        }
+    }
+    if overflow == 0 {
+        return;
+    }
+
+    while overflow > 0 {
+        let mut bits = l - 1;
+        while bits > 0 && bl_count[bits] == 0 {
+            bits -= 1;
+        }
+        if bits == 0 {
+            break;
+        }
+        bl_count[bits] -= 1;
+        bl_count[bits + 1] += 2;
+        bl_count[l] -= 1;
+        overflow -= 2;
+    }
+
+    let mut syms: Vec<usize> = (0..lengths.len()).filter(|&s| lengths[s] > 0).collect();
+    syms.sort_by_key(|&s| (frequencies[s], s));
+    let mut pos = 0usize;
+    for bits in (1..=l).rev() {
+        let mut count = bl_count[bits];
+        while count > 0 && pos < syms.len() {
+            lengths[syms[pos]] = bits as u8;
+            pos += 1;
+            count -= 1;
+        }
+    }
+}
+
 fn build_canonical_huffman_codebook_from_lengths(
     code_lengths: &[u8],
     max_code_bits: u8,
@@ -6813,7 +6964,9 @@ fn decode_section_bitstream_to_huffman_symbols_with_lut(
     section_bit_len: usize,
     huff_lut: &HuffmanLut,
 ) -> Result<Vec<u8>, PDeflateError> {
-    let byte_len = section_bit_len_to_byte_len(section_bit_len)?;
+    // Huffman section bitstreams are byte-aligned in storage; the exact bit length
+    // may not be a multiple of 8, so the stored byte count is ceil(bits/8).
+    let byte_len = section_bit_len.div_ceil(8);
     if byte_len != section_bits.len() {
         return Err(PDeflateError::InvalidStream(
             "section bitstream length mismatch",
@@ -6994,6 +7147,24 @@ mod tests {
         out
     }
 
+    /// Literal-heavy data with a strongly skewed byte distribution that resists the
+    /// match finder, used so the per-chunk entropy stage clearly pays off and the
+    /// Huffman layout is selected. Symbols are squared (biased toward small values) and
+    /// folded into a moderate alphabet, then XORed with a position bit so identical
+    /// 3-grams rarely recur — leaving a skewed literal stream the match stage cannot
+    /// absorb, which Huffman compresses well (~15% on this generator).
+    fn skewed_literal_data(size: usize, seed: u32) -> Vec<u8> {
+        let mut out = Vec::with_capacity(size);
+        let mut state = seed;
+        while out.len() < size {
+            state = state.wrapping_mul(1_664_525).wrapping_add(1_013_904_223);
+            let r = (state >> 24) as usize & 0xff;
+            let sym = ((r * r / 1024) as u8) & 0x3f;
+            out.push(sym ^ ((out.len() as u8) & 0x40));
+        }
+        out
+    }
+
     fn decompress_with_options(stream: &[u8], opts: &PDeflateOptions) -> Vec<u8> {
         let mut out = Vec::new();
         pdeflate_decompress_into_with_stats_with_options(stream, &mut out, opts)
@@ -7199,9 +7370,23 @@ mod tests {
     }
 
     #[test]
-    fn chunk_contains_identity_huffman_lut_block() {
-        let input = b"ABABABABABABABAB".to_vec();
-        let payload = compress_single_chunk_payload_with_huffman(&input, 1, true);
+    fn chunk_contains_real_huffman_lut_block() {
+        // Skewed literal-heavy input must produce a real, frequency-optimised canonical
+        // Huffman chunk (strictly smaller than the plain layout, unlike the old identity
+        // placeholder) that still carries a valid LUT and round-trips.
+        let input = skewed_literal_data(512 * 1024, 0x51ce_d00d);
+        let payload = compress_single_chunk_payload_with_huffman(&input, 128, true);
+        let plain = compress_single_chunk_payload_with_huffman(&input, 128, false);
+
+        // Real entropy coding must beat the plain layout. The old identity placeholder
+        // could only match or inflate it.
+        assert!(
+            payload.len() < plain.len(),
+            "huffman chunk ({}) should be smaller than plain ({})",
+            payload.len(),
+            plain.len()
+        );
+
         let (
             table_index_offset,
             table_data_offset,
@@ -7219,25 +7404,21 @@ mod tests {
         assert!(!huff_lut.is_empty());
         let lut = deserialize_huffman_lut(huff_lut).expect("deserialize chunk lut");
         assert_eq!(lut.symbol_count, 256);
-        assert_eq!(lut.root_bits, 8);
-        assert_eq!(lut.max_code_bits, 8);
-        assert_eq!(lut.root.len(), 256);
-        assert!(lut.subtables.is_empty());
-        for (idx, entry) in lut.root.iter().copied().enumerate() {
-            match entry {
-                HuffmanLutEntry::Symbol { symbol, bit_len } => {
-                    assert_eq!(usize::from(symbol), idx);
-                    assert_eq!(bit_len, 8);
-                }
-                _ => panic!("identity lut root must contain direct symbols"),
-            }
-        }
+        assert!(lut.max_code_bits <= HUFF_MAX_CODE_BITS);
+        assert_eq!(
+            lut.root_bits,
+            HUFF_LUT_ROOT_BITS_DEFAULT.min(lut.max_code_bits).max(1)
+        );
+
+        // The chunk must still decode back to the original bytes.
+        let decoded = decode_chunk_payload_cpu(&payload);
+        assert_eq!(decoded, input);
     }
 
     #[test]
     fn reject_corrupted_chunk_huffman_lut_block() {
-        let input = b"ABABABABABABABAB".to_vec();
-        let mut payload = compress_single_chunk_payload_with_huffman(&input, 1, true);
+        let input = skewed_literal_data(512 * 1024, 0x51ce_d00d);
+        let mut payload = compress_single_chunk_payload_with_huffman(&input, 128, true);
         let (_table_index_offset, _table_data_offset, huff_lut_offset, _section_index_offset, _) =
             chunk_offsets(&payload);
         // huff lut header: [symbol_count:u16][root_bits:u8][max_code_bits:u8][root_len:u32]...
@@ -7481,6 +7662,57 @@ mod tests {
         assert_eq!(gpu_out, cpu_out);
     }
 
+    #[test]
+    fn gpu_decode_v2_matches_cpu_with_huffman() {
+        // A Huffman-encoded stream must decode identically on the CPU and the GPU
+        // (the decode_v2 shader has a canonical-Huffman LUT decoder), and must also
+        // be smaller than the same stream without Huffman.
+        let _guard = gpu_test_lock();
+        if !gpu::is_runtime_available() {
+            return;
+        }
+        gpu::reset_decode_slot_pool_for_test().expect("reset decode slot pool");
+        let input = skewed_literal_data(512 * 1024, 0x0bad_f00d);
+
+        // Compress on the CPU so the Huffman-vs-plain size comparison is deterministic
+        // (GPU match-finding can vary run to run). The Huffman flag is per-chunk, so the
+        // result can never be larger than the plain layout.
+        let huffman_opts = PDeflateOptions {
+            gpu_compress_enabled: false,
+            huffman_encode_enabled: true,
+            ..PDeflateOptions::default()
+        };
+        let plain_opts = PDeflateOptions {
+            gpu_compress_enabled: false,
+            huffman_encode_enabled: false,
+            ..PDeflateOptions::default()
+        };
+        let compressed = pdeflate_compress(&input, &huffman_opts).expect("compress huffman");
+        let plain = pdeflate_compress(&input, &plain_opts).expect("compress plain");
+        assert!(
+            compressed.len() < plain.len(),
+            "huffman stream ({}) should be smaller than plain ({})",
+            compressed.len(),
+            plain.len()
+        );
+
+        let cpu_opts = PDeflateOptions {
+            gpu_decompress_enabled: false,
+            gpu_decompress_force_gpu: false,
+            ..huffman_opts.clone()
+        };
+        let gpu_opts = PDeflateOptions {
+            gpu_decompress_enabled: true,
+            gpu_decompress_force_gpu: true,
+            ..huffman_opts.clone()
+        };
+        let cpu_out = decompress_with_options(&compressed, &cpu_opts);
+        let gpu_out = decompress_with_options(&compressed, &gpu_opts);
+        assert_eq!(cpu_out, input);
+        assert_eq!(gpu_out, input);
+        assert_eq!(gpu_out, cpu_out);
+    }
+
     #[test]
     fn gpu_decode_v2_matches_cpu_random_inputs() {
         let _guard = gpu_test_lock();