Stop mutating source IO state during magic-byte detection#143
Open
andreaslillebo wants to merge 2 commits intorails:mainfrom
Open
Stop mutating source IO state during magic-byte detection#143andreaslillebo wants to merge 2 commits intorails:mainfrom
andreaslillebo wants to merge 2 commits intorails:mainfrom
Conversation
Marcel::MimeType.for(io) called io.binmode + io.set_encoding(BINARY) before reading magic bytes, mutating the caller's IO. After detection the IO read back as ASCII-8BIT regardless of how the caller had opened it, which downstream trips json 3.0's "UTF-8 string passed as BINARY" error when the bytes are JSON-encoded. The mutation is redundant. magic_match_io reads via IO#read(n, buffer) with a binary-encoded buffer, which already produces a binary result regardless of the IO's mode. Adds a regression test confirming the source IO's external_encoding is preserved across Marcel::MimeType.for.
With io.binmode and io.set_encoding(Encoding::BINARY) removed in the previous commit, the workarounds added in PR rails#140 are no longer needed: - The unary + on io.to_s made the resulting StringIO's underlying string mutable so set_encoding could re-tag it. With no set_encoding call, a frozen string is fine. - The io.close_write call made the StringIO readonly before binmode/set_encoding so Ruby 3.4 wouldn't warn about modifying the frozen string. With no encoding modification, there's nothing to warn about. The "no Ruby 3.4 frozen string warnings with StringIO" regression test from PR rails#140 still passes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stop mutating source IO state during magic-byte detection
Problem
Marcel::MimeType.for(io)permanently mutates the source IO's encoding. Callers that read the IO after mime detection get back ASCII-8BIT bytes regardless of how they originally opened it:For StringIO objects the underlying string is also re-tagged in place. When the bytes are then JSON-encoded, this trips the json gem's
UTF-8 string passed as BINARYwarning, which becomes an error in json 3.0.Cause
magic_matchcallsio.binmodeandio.set_encoding(Encoding::BINARY)before reading magic bytes:These two lines were carried over from mimemagic in #30 and predate the
bufferparameter pattern inmagic_match_io. They're redundant -magic_match_ioreads viaIO#read(n, buffer)with a binary-encoded buffer, which already produces a binary result regardless of the IO's mode.Solution
This PR has two commits.
Commit 1: remove the redundant mutations.
binmodeandset_encoding(Encoding::BINARY)are gone.magic_match_ioreads through the binary buffer.Commit 2: remove the workarounds those mutations required. With the mutations gone, the support code added to keep them safe under Ruby 3.4 is no longer needed:
+io.to_s(the mutable-string fallback, Fix Ruby 3.4 frozen string literal warnings with StringIO #140) - only needed becauseset_encodingwould later try to mutate the StringIO's underlying string. With noset_encodingcall, a frozen string is fine.io.close_write(Fix Ruby 3.4 frozen string literal warnings with StringIO #140) - only needed to make the StringIO read-only beforebinmode/set_encodingso Ruby 3.4 wouldn't warn about modifying the underlying frozen string. With no encoding modification, there's nothing to warn about.The "no Ruby 3.4 frozen string warnings with StringIO" regression test from #140 still passes.
Test
Adds a regression test asserting
external_encodingsurvives a call toMarcel::MimeType.for: