zip64 values in central directory should be ceiled only if they exceed allowed int size#30
Conversation
|
If possible, check how other tools are doing it |
|
Packmatic (another elixir zip streaming lib) encodes 32bit values and puts the max value only when the real value doesn't fit: |
|
I've updated it to be aligned with rubyzip: In end of central directory, they have:
|
|
Thanks for checking rubyzip, the approach seems good, I will review the PR and fix the CI this weekend. |
|
I checked rubyzip, it follows the same approach for the central directory header as well. Can you fix that as well. I have fixed the CI pipeline on master, please rebase your branch. |
|
@ananthakumaran I've rebased it. As for entry records in central directory, rubyzip always puts max integer value instead of real size (lines 572 and 573): So I'm not sure if anything else needs to be done. |
|
@Borzik can you check |
|
The method you mention sets the size in the extra field, and does not affect normal central directory record. Some info re. extra field's zip64 spec which I probably misunderstood so they can be ignoredIf you want zstream's implementation to match that part, `Extra.zip64_extended_info` will need to be updated to reflect the spec. It currently always writes sizes even if they are small:Extra.zip64_extended_info(entry.size, entry.c_size, entry.local_file_header_offset)while the spec says the fields MUST only appear if the corresponding Local or Central directory record field is set to 0xFFFF or 0xFFFFFFFF, which rubyzip implementation follows by not setting those fields if they are small enough to fit in the normal directory. However, it doesn't say what should be placed there if the field is not set. I assume it would look like this: Extra.zip64_extended_info(
if(entry.size >= 0xFFFFFFFF, do: entry.size, else: 0),
entry.c_size, # same if condition here
entry.local_file_header_offset # same if condition here
)but it looks strange and I'm not sure it is correct. Update: I've tested this locally and it does not work. Central directory's original and compressed size should always be set to max int for zip64 files, and this way we get zip unpacker to read from zip64 extra fields even if the size is smaller. Golang's |
|
Thanks for checking, I looked at the spec once more, For
I am ok to merge this PR. Let me know if you are done with your testing. |
|
@ananthakumaran I did a couple of tests. Here's a script that will generate a zip file with 66k files 100KB each, and a total size of 6.6GB. I was able to open and unpack this file in BetterZip on Mac, and it didn't report any issues with the file. 1..66000
|> Stream.map(fn num ->
{:ok, io} = String.duplicate("abcde", 20000) |> StringIO.open()
Zstream.entry(
"tmp/#{num}.txt",
IO.binstream(io, 10000),
coder: Zstream.Coder.Stored
)
end)
|> Zstream.zip(zip64: true)
|> Stream.into(File.stream!("./test_zip64.zip"))
|> Stream.run() |
Currently for zip64 files, the values that might overflow (e.g. size of central directory) will always be set to their max value. Some archiver utilities, such as BetterZip on Mac, don't seem to like it and instead expect the correct value to be set.
The change I'm proposing will write correct number of entries and correct directory size in case these values are smaller than their int limit.
This might also be causing #18, but I don't have a Xiaomi phone to test it.
I'll have to do a few more tests, such as zipping files larger than 4GB, zipping more than 65k files (which hits entry limit), but I would appreciate any feedback while I'm working on it.