Skip to content

fix(scripts): write meta_data.json with explicit utf-8 encoding#1047

Open
luoxiu065-zjx wants to merge 1 commit into
larksuite:mainfrom
luoxiu065-zjx:fix/fetch-meta-utf8-encoding
Open

fix(scripts): write meta_data.json with explicit utf-8 encoding#1047
luoxiu065-zjx wants to merge 1 commit into
larksuite:mainfrom
luoxiu065-zjx:fix/fetch-meta-utf8-encoding

Conversation

@luoxiu065-zjx
Copy link
Copy Markdown

@luoxiu065-zjx luoxiu065-zjx commented May 22, 2026

Summary

On Windows with a non-UTF-8 locale (e.g. CP936/GBK), scripts/fetch_meta.py crashes at json.dump(..., ensure_ascii=False) whenever the API metadata contains characters outside the locale's range (e.g. Thai U+0E42 from regional service descriptions). This breaks make build on every fresh checkout for affected users.

Changes

  • Pin open(OUT_PATH, "w") to encoding="utf-8" so the write side matches the existing utf-8 read at line 77.

Test Plan

  • Reproduced the original UnicodeEncodeError: 'gbk' codec can't encode character 'โ' on Windows zh-CN before the fix.
  • After the fix, python3 scripts/fetch_meta.py writes a complete `internal/registry/meta_data.json` (~1.4 MB, 13 services) and exits 0 under the same locale.
  • No behavior change on platforms where the default encoding is already utf-8 (Linux, macOS).

Related Issues

  • None

Summary by CodeRabbit

  • Chores
    • Improved consistency of internal data handling by specifying explicit character encoding for better reliability across different systems.

Review Change Stack

On Windows with a non-UTF-8 locale (e.g. CP936/GBK), the default open()
encoding crashes json.dump(..., ensure_ascii=False) whenever metadata
contains characters outside the locale's range (e.g. Thai U+0E42 in
regional service descriptions), breaking `make build`. The matching
read at line 77 already pins utf-8; align the write side.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions github-actions Bot added the size/M Single-domain feat or fix with limited business impact label May 22, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f9caece0-5852-4e62-8e1e-db04c38a074d

📥 Commits

Reviewing files that changed from the base of the PR and between 4582dfd and 5fed622.

📒 Files selected for processing (1)
  • scripts/fetch_meta.py

📝 Walkthrough

Walkthrough

The scripts/fetch_meta.py script now explicitly specifies UTF-8 encoding when writing the metadata JSON file, replacing reliance on platform defaults and ensuring consistent text encoding behavior across different systems.

Changes

Encoding specification

Layer / File(s) Summary
UTF-8 encoding in meta_data.json write
scripts/fetch_meta.py
The script now specifies encoding="utf-8" when opening internal/registry/meta_data.json for writing, ensuring platform-independent text encoding during JSON output.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

A rabbit hops through scripts with care,
Adding UTF-8 everywhere!
No platform woes shall block the way—
Consistent encoding saves the day! 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding explicit UTF-8 encoding to the meta_data.json write operation in scripts/fetch_meta.py.
Description check ✅ Passed The PR description includes all required template sections with comprehensive details: a clear summary of the Windows UTF-8 locale bug, specific changes made, thorough test plan with platform-specific verification, and related issues.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Single-domain feat or fix with limited business impact

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants