Skip to content

zip64 values in central directory should be ceiled only if they exceed allowed int size#30

Merged
ananthakumaran merged 2 commits into
ananthakumaran:masterfrom
Borzik:small-zip64
Jun 29, 2025
Merged

zip64 values in central directory should be ceiled only if they exceed allowed int size#30
ananthakumaran merged 2 commits into
ananthakumaran:masterfrom
Borzik:small-zip64

Conversation

@Borzik
Copy link
Copy Markdown
Contributor

@Borzik Borzik commented Jun 8, 2025

Currently for zip64 files, the values that might overflow (e.g. size of central directory) will always be set to their max value. Some archiver utilities, such as BetterZip on Mac, don't seem to like it and instead expect the correct value to be set.

The change I'm proposing will write correct number of entries and correct directory size in case these values are smaller than their int limit.

This might also be causing #18, but I don't have a Xiaomi phone to test it.

I'll have to do a few more tests, such as zipping files larger than 4GB, zipping more than 65k files (which hits entry limit), but I would appreciate any feedback while I'm working on it.

@ananthakumaran
Copy link
Copy Markdown
Owner

If possible, check how other tools are doing it

@Borzik
Copy link
Copy Markdown
Contributor Author

Borzik commented Jun 9, 2025

@ananthakumaran

Packmatic (another elixir zip streaming lib) encodes 32bit values and puts the max value only when the real value doesn't fit:
https://github.com/evadne/packmatic/blob/develop/lib/packmatic/field/central/directory_end.ex#L106
https://github.com/evadne/packmatic/blob/develop/lib/packmatic/field/helpers.ex#L16

@Borzik
Copy link
Copy Markdown
Contributor Author

Borzik commented Jun 10, 2025

I've updated it to be aligned with rubyzip:
https://github.com/rubyzip/rubyzip/blob/1f3f84c88914b2b3c77c18b73f2ecb42225a54af/lib/zip/central_directory.rb#L72

In end of central directory, they have:

  • disk numbers set to 0
  • entries count, size, and offset are either real value or max integer value of the expected size (zip64 check is irrelevant then, because if these values overflow for non-zip64 archive, it will be corrupt anyway)

@ananthakumaran
Copy link
Copy Markdown
Owner

Thanks for checking rubyzip, the approach seems good, I will review the PR and fix the CI this weekend.

@ananthakumaran
Copy link
Copy Markdown
Owner

I checked rubyzip, it follows the same approach for the central directory header as well. Can you fix that as well. I have fixed the CI pipeline on master, please rebase your branch.

@Borzik
Copy link
Copy Markdown
Contributor Author

Borzik commented Jun 14, 2025

@ananthakumaran I've rebased it.

As for entry records in central directory, rubyzip always puts max integer value instead of real size (lines 572 and 573):
https://github.com/rubyzip/rubyzip/blob/master/lib/zip/entry.rb#L560

So I'm not sure if anything else needs to be done.

@ananthakumaran
Copy link
Copy Markdown
Owner

@Borzik can you check prep_cdir_zip64_extra method, which is called before calling pack_c_dir_entry, and it only sets the field if size > limit?

@Borzik
Copy link
Copy Markdown
Contributor Author

Borzik commented Jun 15, 2025

The method you mention sets the size in the extra field, and does not affect normal central directory record.

Some info re. extra field's zip64 spec which I probably misunderstood so they can be ignored If you want zstream's implementation to match that part, `Extra.zip64_extended_info` will need to be updated to reflect the spec. It currently always writes sizes even if they are small:
Extra.zip64_extended_info(entry.size, entry.c_size, entry.local_file_header_offset)

while the spec says the fields MUST only appear if the corresponding Local or Central directory record field is set to 0xFFFF or 0xFFFFFFFF, which rubyzip implementation follows by not setting those fields if they are small enough to fit in the normal directory.

However, it doesn't say what should be placed there if the field is not set. I assume it would look like this:

Extra.zip64_extended_info(
  if(entry.size >= 0xFFFFFFFF, do: entry.size, else: 0),
  entry.c_size, # same if condition here
  entry.local_file_header_offset # same if condition here
)

but it looks strange and I'm not sure it is correct.

Update:

I've tested this locally and it does not work.

Central directory's original and compressed size should always be set to max int for zip64 files, and this way we get zip unpacker to read from zip64 extra fields even if the size is smaller.

Golang's zip implementation might be a bit easier to understand here:
https://cs.opensource.google/go/go/+/refs/tags/go1.24.4:src/archive/zip/writer.go;l=90

@ananthakumaran
Copy link
Copy Markdown
Owner

Thanks for checking, I looked at the spec once more, end of central directory seems good.

For central directory, we already seem to be doing the correct thing (thanks for checking Golang as well).

I'll have to do a few more tests, such as zipping files larger than 4GB, zipping more than 65k files (which hits entry limit), but I would appreciate any feedback while I'm working on it.

I am ok to merge this PR. Let me know if you are done with your testing.

@Borzik
Copy link
Copy Markdown
Contributor Author

Borzik commented Jun 26, 2025

@ananthakumaran I did a couple of tests. Here's a script that will generate a zip file with 66k files 100KB each, and a total size of 6.6GB. I was able to open and unpack this file in BetterZip on Mac, and it didn't report any issues with the file.

1..66000
|> Stream.map(fn num ->
  {:ok, io} = String.duplicate("abcde", 20000) |> StringIO.open()
  Zstream.entry(
    "tmp/#{num}.txt",
    IO.binstream(io, 10000),
    coder: Zstream.Coder.Stored
  )
end)
|> Zstream.zip(zip64: true)
|> Stream.into(File.stream!("./test_zip64.zip"))
|> Stream.run()

@ananthakumaran ananthakumaran merged commit f9217d7 into ananthakumaran:master Jun 29, 2025
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants