Skip to content

Stop mutating source IO state during magic-byte detection#143

Open
andreaslillebo wants to merge 2 commits intorails:mainfrom
andreaslillebo:mime-type-for-mutates-io-encoding
Open

Stop mutating source IO state during magic-byte detection#143
andreaslillebo wants to merge 2 commits intorails:mainfrom
andreaslillebo:mime-type-for-mutates-io-encoding

Conversation

@andreaslillebo
Copy link
Copy Markdown

Stop mutating source IO state during magic-byte detection

Problem

Marcel::MimeType.for(io) permanently mutates the source IO's encoding. Callers that read the IO after mime detection get back ASCII-8BIT bytes regardless of how they originally opened it:

require "stringio"
io = StringIO.new("hello world")
io.external_encoding   # => #<Encoding:UTF-8>
Marcel::MimeType.for(io)
io.external_encoding   # => #<Encoding:BINARY (ASCII-8BIT)>

For StringIO objects the underlying string is also re-tagged in place. When the bytes are then JSON-encoded, this trips the json gem's UTF-8 string passed as BINARY warning, which becomes an error in json 3.0.

Cause

magic_match calls io.binmode and io.set_encoding(Encoding::BINARY) before reading magic bytes:

io.binmode if io.respond_to?(:binmode)
io.set_encoding(Encoding::BINARY) if io.respond_to?(:set_encoding)
buffer = (+"").encode(Encoding::BINARY)

MAGIC.send(method) { |type, matches| magic_match_io(io, matches, buffer) }

These two lines were carried over from mimemagic in #30 and predate the buffer parameter pattern in magic_match_io. They're redundant - magic_match_io reads via IO#read(n, buffer) with a binary-encoded buffer, which already produces a binary result regardless of the IO's mode.

Solution

This PR has two commits.

Commit 1: remove the redundant mutations. binmode and set_encoding(Encoding::BINARY) are gone. magic_match_io reads through the binary buffer.

Commit 2: remove the workarounds those mutations required. With the mutations gone, the support code added to keep them safe under Ruby 3.4 is no longer needed:

The "no Ruby 3.4 frozen string warnings with StringIO" regression test from #140 still passes.

Test

Adds a regression test asserting external_encoding survives a call to Marcel::MimeType.for:

test "does not mutate the source IO's encoding" do
  io = StringIO.new("hello world")
  original_encoding = io.external_encoding

  Marcel::MimeType.for(io)

  assert_equal original_encoding, io.external_encoding
end

Marcel::MimeType.for(io) called io.binmode + io.set_encoding(BINARY)
before reading magic bytes, mutating the caller's IO. After detection
the IO read back as ASCII-8BIT regardless of how the caller had opened
it, which downstream trips json 3.0's "UTF-8 string passed as BINARY"
error when the bytes are JSON-encoded.

The mutation is redundant. magic_match_io reads via IO#read(n, buffer)
with a binary-encoded buffer, which already produces a binary result
regardless of the IO's mode.

Adds a regression test confirming the source IO's external_encoding is
preserved across Marcel::MimeType.for.
With io.binmode and io.set_encoding(Encoding::BINARY) removed in the
previous commit, the workarounds added in PR rails#140 are no longer needed:

- The unary + on io.to_s made the resulting StringIO's underlying
  string mutable so set_encoding could re-tag it. With no set_encoding
  call, a frozen string is fine.
- The io.close_write call made the StringIO readonly before
  binmode/set_encoding so Ruby 3.4 wouldn't warn about modifying the
  frozen string. With no encoding modification, there's nothing to
  warn about.

The "no Ruby 3.4 frozen string warnings with StringIO" regression test
from PR rails#140 still passes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant