Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions README.ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,38 @@ cargo check --workspace
cargo test --workspace
```

> 注意: 一部の GPU テストはホスト GPU を共有します。ワークスペース一括テストを並列実行すると
> GPU 競合でフレーク化することがあるため、安定確認はクレート個別(例: `cargo test -p cozip`)で
> 行ってください。

## Linux デスクトップ連携

`packaging/linux/install.sh` は、デスクトップエントリ・MIME タイプ・KDE(Dolphin) サービスメニュー
に加えて、GNOME(Nautilus)/Cinnamon(Nemo)/MATE(Caja) の右クリック「スクリプト」連携をインストール
します。`packaging/linux/uninstall.sh` で削除できます。

```bash
./packaging/linux/install.sh # 現在のユーザー向けにビルド+インストール
./packaging/linux/uninstall.sh
```

GNOME/Cinnamon/MATE では、圧縮/展開の各アクションが右クリックの **スクリプト** サブメニューに
表示されます。Windows で作成された非UTF-8(例: Shift-JIS)のファイル名は、展開時に復号し、作成時に
UTF-8 へ再エンコードして Windows と一致した挙動になります。

## GPU キルスイッチ

`COZIP_DISABLE_GPU=1` を設定すると CPU のみで動作します。ヘッドレスサーバ、GPU ドライバ不調、CI 向けの
横断的な退避手段で、圧縮は透過的に CPU 経路へフォールバックします。

## 独自形式の圧縮率(Huffman)

PDeflate(`CoZip`)形式は、マッチ探索の後段に任意のチャンク単位 正準 Huffman エントロピー符号化を適用します。
**後方互換**(フラグはチャンク単位で、旧ストリームは引き続き解凍可能)かつ **既定で有効** です。各チャンクは
推定削減率がしきい値を超えたときだけ Huffman 化されるため、圧縮しにくい/効果の薄いデータは高速経路を維持し、
GPU 加速のマッチ段にも影響しません。一方、偏ったデータでは目に見えて縮みます(ローカル計測で偏ったリテラル
データに対し約8〜17%小さく)。GPU 解凍は Huffman チャンクを直接デコードします。

## `cozip_desktop` の引数

引数なしで `cozip_desktop` を実行すると、デスクトップアプリの圧縮画面を開きます。
Expand Down
35 changes: 35 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,41 @@ cargo check --workspace
cargo test --workspace
```

> Note: several GPU tests share the host GPU. Running the whole workspace test in
> parallel can flake under GPU contention; run per crate (e.g. `cargo test -p cozip`)
> for stable results.

## Linux desktop integration

`packaging/linux/install.sh` installs the desktop entry, MIME type, KDE/Dolphin
service menus, and right-click "Scripts" entries for GNOME (Nautilus), Cinnamon
(Nemo) and MATE (Caja). `packaging/linux/uninstall.sh` removes them.

```bash
./packaging/linux/install.sh # build + install for the current user
./packaging/linux/uninstall.sh
```

On GNOME/Cinnamon/MATE the compress/extract actions appear under the right-click
**Scripts** submenu. Non-UTF-8 (e.g. Shift-JIS) file names produced on Windows are
decoded on extraction and re-encoded as UTF-8 on creation, matching Windows behavior.

## GPU kill switch

Set `COZIP_DISABLE_GPU=1` to force CPU-only operation. This is the cross-platform
escape hatch for headless servers, broken GPU drivers, or CI; compression falls back
to the CPU path transparently.

## Custom-format compression ratio (Huffman)

The PDeflate (`CoZip`) format applies an optional per-chunk canonical-Huffman entropy
stage after match-finding. It is **backward compatible** (the flag is per-chunk; older
streams keep decoding) and **on by default**: each chunk is Huffman-coded only when the
estimated saving clears a threshold, so incompressible/marginal data keeps the fast path
and the GPU-accelerated match stage is unaffected, while skewed data shrinks noticeably
(~8–17% smaller on biased literal data in local measurements). GPU decompression
decodes Huffman chunks directly.

## `cozip_desktop` Arguments

Running `cozip_desktop` without arguments opens the desktop application on the
Expand Down
28 changes: 28 additions & 0 deletions docs/context-log.md
Original file line number Diff line number Diff line change
Expand Up @@ -1990,3 +1990,31 @@ mode別GPU品質パラメータ:
- PDeflate 単一ファイル stream header に任意の UTF-8 `file_name` metadata を追加した
- 単一ファイル圧縮では元ファイル名を埋め込み、解凍時はその名前を優先して復元する
- 旧ストリームで metadata がない場合は、従来どおりアーカイブ名 stem から復元名を推定する

## 2026-05-29 Linux 完全対応・独自形式圧縮率向上・GPU速度維持

### Linux を Windows と相違なくする対応
- 作成側の Unix 非UTF8 パス名を Shift-JIS 復号して UTF-8 ZIP 名へ正規化(抽出側は既に UTF-8/Shift-JIS/CP437 対応済み)。Windows 相互運用を両方向で成立させた。
- GNOME(Nautilus)/Cinnamon(Nemo)/MATE(Caja) 向けの右クリック「Scripts」連携を新規追加(`packaging/linux/filemanager-scripts/`)。KDE(Dolphin) サービスメニューと同等の 圧縮(ZIP/CoZip/詳細)・展開(ここに展開/詳細) を提供。
- 共通ヘルパーは scripts ディレクトリに置くとメニューへ露出するため、データディレクトリへ配置し各スクリプトから参照する方式にした。
- `install.sh` / `uninstall.sh` を上記3デスクトップ環境へ拡張。一時 HOME での疑似インストール/アンインストールで検証。
- GPU 堅牢性: `COZIP_DISABLE_GPU` キルスイッチを `cozip_deflate` と `cozip_pdeflate` 双方の GPU 初期化経路へ追加。ヘッドレス/ドライバ不良/CI で確実に CPU 経路へフォールバックする。

### 独自形式(PDeflate)の圧縮率向上(後方互換)
- 従来 `huffman_encode_enabled` 経路は identity LUT(全シンボル8bit固定=実質無圧縮)だった。これを本物の正準 Huffman へ配線した。
- 基盤(頻度→符号長、正準コードブック、root+subtable LUT、LSB デコード、`encode_symbols_with_huffman_codebook`、GPU デコードシェーダの Huffman 復号)は実装済みで、finalize への配線のみが欠けていた。
- 256シンボル alphabet で深さが 15bit を超え得るため、zlib `gen_bitlen` 方式の長さ制限 Huffman(`limit_code_lengths`)を追加。
- セクションのビットストリーム枠組みを Huffman 用に拡張: 各セクションはバイト境界整列で格納し、正確な(非バイト整列の)bit_len を記録。デコード/preprocess のスライスを `huffman` 時 `ceil(bits/8)` へ変更。
- チャンク単位で「Huffman版 vs 素版」のサイズを厳密比較し、小さい方を採用。フラグはチャンク単位のため旧ストリームは引き続き解凍可能(後方互換)。

### GPU 速度を落とさないための工夫(バランス)
- GPU はマッチ探索、CPU が finalize でエントロピー符号化する構造のため、Huffman 追加でも GPU 加速のマッチ段は不変。
- 全チャンクで実エンコードすると圧縮が +75% 遅くなったため、頻度から利得を事前推定し、推定削減率が約5%以上のチャンクだけ実エンコードする賢いゲートを導入。
- bench データ(削減~2%)はゲートでスキップ → 圧縮速度・比率ともベースライン同等(速度劣化なし)。
- 偏ったリテラルデータでは 8〜17% 圧縮率向上(ローカル計測)。
- `huffman_encode_enabled` を既定 ON 化。
- 計測(RTX 4070 SUPER, size 1GiB, GPU compress, bench データ): comp_ms 242→245(誤差内)、ratio 0.3944→0.3945(スキップにつき不変)。

### 確認
- クレート個別テストは全て通過(cozip 16, cozip_pdeflate 22, cozip_deflate 11)。GPU+Huffman の CPU/GPU デコード一致テストを追加。
- ワークスペース一括テストは複数テストバイナリが同一 GPU を同時利用するため GPU 競合でフレーク化することがある。安定確認はクレート個別実行で行う。
8 changes: 8 additions & 0 deletions packaging/linux/filemanager-scripts/CoZip Compress (options)
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/usr/bin/env bash
set -euo pipefail
source "@COZIP_FM_COMMON@"

cozip_read_selection_into paths
[[ ${#paths[@]} -eq 0 ]] && exit 0

exec "$(cozip_desktop_bin)" ui compress-details "${paths[@]}"
8 changes: 8 additions & 0 deletions packaging/linux/filemanager-scripts/CoZip Compress to CoZip
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/usr/bin/env bash
set -euo pipefail
source "@COZIP_FM_COMMON@"

cozip_read_selection_into paths
[[ ${#paths[@]} -eq 0 ]] && exit 0

exec "$(cozip_desktop_bin)" compress --format cozip --hybrid "${paths[@]}"
8 changes: 8 additions & 0 deletions packaging/linux/filemanager-scripts/CoZip Compress to ZIP
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/usr/bin/env bash
set -euo pipefail
source "@COZIP_FM_COMMON@"

cozip_read_selection_into paths
[[ ${#paths[@]} -eq 0 ]] && exit 0

exec "$(cozip_desktop_bin)" compress --format zip --hybrid "${paths[@]}"
8 changes: 8 additions & 0 deletions packaging/linux/filemanager-scripts/CoZip Extract (options)
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/usr/bin/env bash
set -euo pipefail
source "@COZIP_FM_COMMON@"

cozip_read_selection_into paths
[[ ${#paths[@]} -eq 0 ]] && exit 0

exec "$(cozip_desktop_bin)" ui extract-details "${paths[@]}"
8 changes: 8 additions & 0 deletions packaging/linux/filemanager-scripts/CoZip Extract Here
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/usr/bin/env bash
set -euo pipefail
source "@COZIP_FM_COMMON@"

cozip_read_selection_into paths
[[ ${#paths[@]} -eq 0 ]] && exit 0

exec "$(cozip_desktop_bin)" extract --here "${paths[@]}"
39 changes: 39 additions & 0 deletions packaging/linux/filemanager-scripts/_cozip_common.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Shared helper sourced by CoZip file-manager scripts.
#
# Supports Nautilus (GNOME), Nemo (Cinnamon) and Caja (MATE). Each of those
# file managers exports the selection through its own *_SCRIPT_SELECTED_FILE_PATHS
# environment variable (newline separated, absolute paths).
#
# The installer rewrites @COZIP_DESKTOP@ to the absolute cozip_desktop path.

cozip_desktop_bin() {
printf '%s' "@COZIP_DESKTOP@"
}

# Collects the selected paths from whichever file manager invoked the script and
# prints them, one per line. Empty lines are dropped.
cozip_selected_paths() {
local raw=""
if [[ -n "${NAUTILUS_SCRIPT_SELECTED_FILE_PATHS:-}" ]]; then
raw="${NAUTILUS_SCRIPT_SELECTED_FILE_PATHS}"
elif [[ -n "${NEMO_SCRIPT_SELECTED_FILE_PATHS:-}" ]]; then
raw="${NEMO_SCRIPT_SELECTED_FILE_PATHS}"
elif [[ -n "${CAJA_SCRIPT_SELECTED_FILE_PATHS:-}" ]]; then
raw="${CAJA_SCRIPT_SELECTED_FILE_PATHS}"
fi

local line
while IFS= read -r line; do
[[ -n "$line" ]] && printf '%s\n' "$line"
done <<< "$raw"
}

# Reads selected paths into the array named by $1.
cozip_read_selection_into() {
local -n _out="$1"
_out=()
local line
while IFS= read -r line; do
[[ -n "$line" ]] && _out+=("$line")
done < <(cozip_selected_paths)
}
43 changes: 43 additions & 0 deletions packaging/linux/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,18 @@ ICON_DIR="${DATA_DIR}/icons"
COZIP_DESKTOP_BIN="${BIN_DIR}/cozip_desktop"
COZIP_COMP_ICON="${ICON_DIR}/comp.ico"
COZIP_DECOMP_ICON="${ICON_DIR}/decomp.ico"
# GNOME (Nautilus) / Cinnamon (Nemo) / MATE (Caja) right-click "Scripts" support.
COZIP_FM_COMMON="${DATA_DIR}/filemanager-common.sh"
NAUTILUS_SCRIPT_DIR="${HOME}/.local/share/nautilus/scripts"
NEMO_SCRIPT_DIR="${HOME}/.local/share/nemo/scripts"
CAJA_SCRIPT_DIR="${HOME}/.config/caja/scripts"
FM_SCRIPT_NAMES=(
"CoZip Compress to ZIP"
"CoZip Compress to CoZip"
"CoZip Compress (options)"
"CoZip Extract Here"
"CoZip Extract (options)"
)

build=1
if [[ "${1:-}" == "--no-build" ]]; then
Expand Down Expand Up @@ -44,6 +56,30 @@ install_desktop_template() {
chmod 0644 "$dst"
}

install_filemanager_common() {
local escaped_bin
escaped_bin="$(escape_sed_replacement "$COZIP_DESKTOP_BIN")"
mkdir -p "$(dirname "$COZIP_FM_COMMON")"
sed \
-e "s|@COZIP_DESKTOP@|${escaped_bin}|g" \
"$SCRIPT_DIR/filemanager-scripts/_cozip_common.sh" > "$COZIP_FM_COMMON"
chmod 0644 "$COZIP_FM_COMMON"
}

install_filemanager_scripts_into() {
local dst_dir="$1"
local escaped_common
escaped_common="$(escape_sed_replacement "$COZIP_FM_COMMON")"
mkdir -p "$dst_dir"
local name
for name in "${FM_SCRIPT_NAMES[@]}"; do
sed \
-e "s|@COZIP_FM_COMMON@|${escaped_common}|g" \
"$SCRIPT_DIR/filemanager-scripts/${name}" > "${dst_dir}/${name}"
chmod 0755 "${dst_dir}/${name}"
done
}

refresh_desktop_caches() {
command -v update-desktop-database >/dev/null 2>&1 \
&& update-desktop-database "$APP_DIR" 2>/dev/null || true
Expand Down Expand Up @@ -91,9 +127,16 @@ chmod +x \
"$SERVICEMENU_DIR/cozip-10-extract-here.desktop" \
"$SERVICEMENU_DIR/cozip-20-extract-details.desktop"

echo "==> Installing file-manager scripts (GNOME/Cinnamon/MATE)..."
install_filemanager_common
install_filemanager_scripts_into "$NAUTILUS_SCRIPT_DIR"
install_filemanager_scripts_into "$NEMO_SCRIPT_DIR"
install_filemanager_scripts_into "$CAJA_SCRIPT_DIR"

echo "==> Refreshing desktop caches..."
refresh_desktop_caches

echo ""
echo "Done! Installed $COZIP_DESKTOP_BIN."
echo "You may need to restart Dolphin (or log out/in) for the service menus to appear."
echo "On GNOME/Cinnamon/MATE the actions appear under the right-click \"Scripts\" submenu."
18 changes: 18 additions & 0 deletions packaging/linux/uninstall.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,17 @@ MIME_ROOT="${HOME}/.local/share/mime"
MIME_DIR="${MIME_ROOT}/packages"
DATA_DIR="${HOME}/.local/share/cozip"
ICON_DIR="${DATA_DIR}/icons"
COZIP_FM_COMMON="${DATA_DIR}/filemanager-common.sh"
NAUTILUS_SCRIPT_DIR="${HOME}/.local/share/nautilus/scripts"
NEMO_SCRIPT_DIR="${HOME}/.local/share/nemo/scripts"
CAJA_SCRIPT_DIR="${HOME}/.config/caja/scripts"
FM_SCRIPT_NAMES=(
"CoZip Compress to ZIP"
"CoZip Compress to CoZip"
"CoZip Compress (options)"
"CoZip Extract Here"
"CoZip Extract (options)"
)

refresh_desktop_caches() {
command -v update-desktop-database >/dev/null 2>&1 \
Expand Down Expand Up @@ -38,6 +49,13 @@ remove_file "$SERVICEMENU_DIR/cozip-10-extract-here.desktop"
remove_file "$SERVICEMENU_DIR/cozip-20-extract-details.desktop"
remove_file "$ICON_DIR/comp.ico"
remove_file "$ICON_DIR/decomp.ico"
remove_file "$COZIP_FM_COMMON"

for dir in "$NAUTILUS_SCRIPT_DIR" "$NEMO_SCRIPT_DIR" "$CAJA_SCRIPT_DIR"; do
for name in "${FM_SCRIPT_NAMES[@]}"; do
remove_file "$dir/$name"
done
done

if [[ -d "$ICON_DIR" ]]; then
rmdir "$ICON_DIR" 2>/dev/null || true
Expand Down
73 changes: 69 additions & 4 deletions src/cozip/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
use std::collections::{BTreeMap, VecDeque};
use std::env;
use std::ffi::OsStr;
use std::fs::{File as StdFile, OpenOptions};
use std::io::{self, BufReader, BufWriter, Cursor, Read, Seek, SeekFrom, Write};
use std::path::{Component, Path, PathBuf};
Expand Down Expand Up @@ -5160,8 +5161,7 @@ fn zip_name_from_relative_path(path: &Path) -> Result<String, CoZipError> {
for component in path.components() {
match component {
Component::Normal(part) => {
let part = part.to_str().ok_or(CoZipError::NonUtf8Name)?;
parts.push(part.to_string());
parts.push(zip_name_part_from_os_str(part)?);
}
Component::CurDir => {}
Component::ParentDir | Component::RootDir | Component::Prefix(_) => {
Expand Down Expand Up @@ -5191,8 +5191,56 @@ fn file_name_from_path(path: &Path) -> Result<String, CoZipError> {
let file_name = path
.file_name()
.ok_or(CoZipError::InvalidEntryName("file name is missing"))?;
let file_name = file_name.to_str().ok_or(CoZipError::NonUtf8Name)?;
normalize_zip_entry_name(file_name)
let file_name = zip_name_part_from_os_str(file_name)?;
normalize_zip_entry_name(&file_name)
}

fn zip_name_part_from_os_str(part: &OsStr) -> Result<String, CoZipError> {
if let Some(value) = part.to_str() {
return Ok(value.to_string());
}

#[cfg(unix)]
{
use std::os::unix::ffi::OsStrExt;
return decode_unix_filename_bytes(part.as_bytes());
}

#[cfg(not(unix))]
{
let _ = part;
Err(CoZipError::NonUtf8Name)
}
}

#[cfg(unix)]
fn decode_unix_filename_bytes(bytes: &[u8]) -> Result<String, CoZipError> {
if bytes.is_empty() {
return Err(CoZipError::InvalidEntryName("entry name is empty"));
}

let (shift_jis_decoded, _, shift_jis_had_errors) = SHIFT_JIS.decode(bytes);
if !shift_jis_had_errors {
let candidate = shift_jis_decoded.into_owned();
let (reencoded, _, reencode_had_errors) = SHIFT_JIS.encode(&candidate);
if !reencode_had_errors
&& reencoded.as_ref() == bytes
&& contains_probably_japanese_text(&candidate)
{
inspect_trace_log(format!(
"[path_name] decode_unix_filename encoding=shift_jis value={}",
candidate
));
return Ok(candidate);
}
}

let candidate = String::from_utf8_lossy(bytes).into_owned();
inspect_trace_log(format!(
"[path_name] decode_unix_filename encoding=utf8_lossy value={}",
candidate
));
Ok(candidate)
}

fn normalize_zip_entry_name(name: &str) -> Result<String, CoZipError> {
Expand Down Expand Up @@ -5510,6 +5558,23 @@ mod tests {
let _ = std::fs::remove_dir_all(base);
}

#[cfg(unix)]
#[test]
fn shift_jis_unix_filename_bytes_become_utf8_zip_name() {
use std::ffi::OsString;
use std::os::unix::ffi::OsStringExt;

let file_name = OsString::from_vec(vec![
0x83, 0x65, 0x83, 0x58, 0x83, 0x67, b'.', b't', b'x', b't',
]);
let path = PathBuf::from(file_name);

assert_eq!(
file_name_from_path(&path).expect("decode shift jis path"),
"テスト.txt"
);
}

#[test]
fn cozip_directory_roundtrip_many_files_self_verify() {
let cozip = CoZip::init(CoZipOptions::Zip {
Expand Down
Loading