Before submitting
Summary
When accessing web pages containing Chinese characters on Windows, browser-harness throws a UnicodeEncodeError due to surrogate characters (U+DC80-U+DCFF) in the CDP response data. The error occurs during exec() execution in run.py, making it impossible to retrieve or print page information from Chinese websites.
The CDP (Chrome DevTools Protocol) returns page data (titles, URLs, etc.) that may contain surrogate characters when the page has certain encodings or special characters. These surrogate characters (U+DC80-U+DCFF) are invalid in UTF-8 and cause encoding errors when Python tries to process them.
Affected Operations
js() - Cannot execute JavaScript that returns Chinese text content ❌
page_info() - Works correctly ✅
- Basic navigation - Works correctly ✅
Repro
- Start Chrome with remote debugging enabled on Windows 11
- Run the following command to test basic page info (this works):
browser-harness <<'PY'
goto_url("https://cloud.tencent.com/developer/article/2663247")
wait_for_load()
info = page_info()
print(f"Title: {info['title']}")
PY
Result: ✅ Success - prints the page title correctly:
Title: 🐴 600行代码统治浏览器:Browser Harness如何让AI智能体获得完全自由-腾讯云开发者社区-腾讯云
- Now try to extract article content using
js():
browser-harness <<'PY'
goto_url("https://cloud.tencent.com/developer/article/2663247")
wait_for_load()
# Try to extract article content
content = js("""
const article = document.querySelector('.article-content, .content, article');
if (article) {
return article.innerText.substring(0, 500);
}
return "No article content found";
""")
print(content[:200])
PY
Result: ❌ Error:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Users\Administrator\.local\bin\browser-harness.exe\__main__.py", line 10, in <module>
sys.exit(main())
~~~~^^
File "C:\Users\Administrator\browser-harness\src\browser_harness\run.py", line 140, in main
exec(code, exec_globals)
~~~~^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 125-126: surrogates not allowed
Environment
OS: Windows 11 Pro 10.0.26200
Chrome version: 148.0.7778.168
browser-harness --version: 0.1.0 (git)
browser-harness --doctor output:
browser-harness doctor
platform Windows 11
python 3.14.3
version 0.1.0 (git)
latest release (could not reach github)
[ok ] chrome running
[ok ] daemon alive
[ok ] active browser connections — 1
default — active page: 🐴 600行代码统治浏览器:Browser Harness如何让AI智能体获得完全自由-腾讯云开发者社区-腾讯云 — https://cloud.tencent.com/developer/article/2663247
[FAIL] profile-use installed — optional: curl -fsSL https://browser-use.com/profile.sh | sh
[FAIL] BROWSER_USE_API_KEY set — optional: needed only for cloud browsers / profile sync
Before submitting
browser-harness --doctorand read the output.install.md.cloud.browser-use.comissue.Summary
When accessing web pages containing Chinese characters on Windows, browser-harness throws a UnicodeEncodeError due to surrogate characters (U+DC80-U+DCFF) in the CDP response data. The error occurs during exec() execution in run.py, making it impossible to retrieve or print page information from Chinese websites.
The CDP (Chrome DevTools Protocol) returns page data (titles, URLs, etc.) that may contain surrogate characters when the page has certain encodings or special characters. These surrogate characters (U+DC80-U+DCFF) are invalid in UTF-8 and cause encoding errors when Python tries to process them.
Affected Operations
js()- Cannot execute JavaScript that returns Chinese text content ❌page_info()- Works correctly ✅Repro
Result: ✅ Success - prints the page title correctly:
js():Result: ❌ Error:
Environment
OS: Windows 11 Pro 10.0.26200
Chrome version: 148.0.7778.168
browser-harness --version: 0.1.0 (git)
browser-harness --doctor output: