Skip to content

fix: 修复 B 站下载 412 错误及日志/状态文件竞态问题#332

Closed
iorireal wants to merge 5 commits into
JefferyHcool:masterfrom
iorireal:fix/bilibili-download-412-and-logger
Closed

fix: 修复 B 站下载 412 错误及日志/状态文件竞态问题#332
iorireal wants to merge 5 commits into
JefferyHcool:masterfrom
iorireal:fix/bilibili-download-412-and-logger

Conversation

@iorireal
Copy link
Copy Markdown

Summary

  • 修复 B 站下载 HTTP 412 错误(yt-dlp 升级 + 加强反爬请求头)
  • 修复 downloader 日志丢失问题(改用 get_logger
  • 补齐 download_subtitles 的 cookies 查找路径
  • 修复状态文件竞态 + 空文件防御
  • 移除硬编码绝对路径,添加 cookies.txt 到 .gitignore

Closes #330

涉及文件

文件 变更内容
backend/app/downloaders/bilibili_downloader.py logger 改用 get_logger;cookies 查找补齐为 4 个位置(去硬编码路径);三处 headers 加强
backend/app/downloaders/youtube_downloader.py logger 改用 get_logger
backend/app/services/note.py _update_status 去掉 temp+rename,改为直接写入
backend/app/routers/note.py get_task_status 增加空文件检测和 JSON 解析异常捕获
.gitignore 添加 **/cookies.txt

Test plan

  • 提交 B 站视频链接,确认下载不再返回 412
  • 查看 backend/logs/app.log,确认 downloader 日志正常输出
  • 任务执行期间反复查询 task_status,确认不再出现 JSONDecodeError
  • 无 cookies 环境下确认字幕获取正确回退到音频转写

🤖 Generated with Claude Code

- 修复 downloader logger:bilibili_downloader 和 youtube_downloader 改用 get_logger,确保下载日志写入 app.log
- 补齐 download_subtitles 的 cookies 查找路径(从 2 个位置扩展到 4 个)
- 加强 B 站反爬请求头:新增 Origin、Sec-Fetch-*、Accept-Encoding,extractor_retries 3→5
- 修复 _update_status 竞态:去掉 temp+rename,直接写入避免 Windows 下空文件问题
- 修复 get_task_status 空文件防御:增加空内容和 JSON 解析异常捕获
- 移除 cookies 文件查找中的硬编码 Windows 绝对路径
- 改用环境变量 + 相对路径的查找策略
- 添加 **/cookies.txt 到 .gitignore
移除未使用的 import、变量和类型,修复 any 类型为具体类型,
消除全部 44 个 lint error,保留 15 个非阻塞 warning。
- note.py: format 中勾选"原片截图"时触发全量下载,修复截图不显示问题
- video_reader.py: 添加 MAX_GRIDS=25 限制,防止豆包模型 token 超限
- prompt_builder.py: 优化 prompt 构建逻辑
- 其他后端文件: 代码清理和优化

chore: 初始化 AI Literacy 开发规范体系

- CLAUDE.md, AGENTS.md, MODEL_ROUTING.md
- .claude/agents/ (5 个规范 agent)
- .github/workflows/ai-literacy.yml
- scripts/ai-literacy-check.sh
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR primarily targets reliability improvements across the download + note-generation pipeline: it strengthens Bilibili download requests / cookie discovery, fixes task status JSON read robustness, and reduces several frontend any usages while adding Mermaid diagram rendering + richer export behavior.

Changes:

  • Backend: improve Bilibili downloader headers/cookie lookup, harden task status read path against empty/invalid JSON, and adjust media download logic to correctly force video download when screenshot format is selected.
  • Frontend: add Mermaid rendering in Markdown preview and embed images as base64 on Markdown export; tighten types (anyunknown / Record<string, unknown>) and remove unused code/imports.
  • Repo tooling/docs: add security lint script + harness/agent docs, ignore cookies.txt, and mount cookies.txt into backend containers.

Reviewed changes

Copilot reviewed 64 out of 67 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
scripts/lint.sh Adds a pre-commit “security lint” script to catch sensitive files, absolute paths, and credential-like strings.
scripts/ai-literacy-check.sh Adds a CI fallback script placeholder for harness constraint enforcement.
REFLECTION_LOG.md Adds a reflection log template for pipeline learnings.
MODEL_ROUTING.md Documents agent-to-model routing guidance for orchestrated workflows.
docker-compose.yml Mounts cookies.txt into the backend container.
docker-compose.gpu.yml Same cookies bind mount for GPU compose stack.
CLAUDE.md Expands workflow/constraints documentation and local dev/test commands.
BillNote_frontend/src/utils/request.ts Tightens response typing and updates comments.
BillNote_frontend/src/store/taskStore/index.ts Replaces several any types with safer types.
BillNote_frontend/src/store/chatStore/index.ts Avoids unused-var lint issue via rename + eslint disable.
BillNote_frontend/src/services/note.ts Removes any from catch param.
BillNote_frontend/src/services/model.ts Tightens request payload types.
BillNote_frontend/src/services/downloader.ts Tightens platform type from any to string.
BillNote_frontend/src/pages/SettingPage/transcriber.tsx Removes unused icon/state fields.
BillNote_frontend/src/pages/SettingPage/Monitor.tsx Simplifies catch and localizes UI comments.
BillNote_frontend/src/pages/SettingPage/Downloader.tsx Removes unused import.
BillNote_frontend/src/pages/SettingPage/components/menuBar.tsx Fixes prop typing (anyIMenuProps) and removes unused imports.
BillNote_frontend/src/pages/SettingPage/about.tsx Removes unused icons and localizes UI comments.
BillNote_frontend/src/pages/NotFoundPage/index.tsx Updates file header comment.
BillNote_frontend/src/pages/HomePage/Home.tsx Removes unused local variable.
BillNote_frontend/src/pages/HomePage/components/transcriptViewer.tsx Removes unused imports and localizes comments.
BillNote_frontend/src/pages/HomePage/components/StepBar.tsx Removes unused variable.
BillNote_frontend/src/pages/HomePage/components/NoteHistory.tsx Removes unused imports/state/effect and cleans search logic.
BillNote_frontend/src/pages/HomePage/components/NoteForm.tsx Adjusts defaults for video understanding controls and simplifies store usage.
BillNote_frontend/src/pages/HomePage/components/MarkmapComponent.tsx Adds typings (unknown), improves XMind conversion typing.
BillNote_frontend/src/pages/HomePage/components/MarkdownViewer.tsx Adds Mermaid rendering and base64-embedding images on Markdown download; improves component typing.
BillNote_frontend/src/pages/HomePage/components/MarkdownHeader.tsx Simplifies version list rendering logic.
BillNote_frontend/src/pages/HomePage/components/History.tsx Removes unused icons.
BillNote_frontend/src/pages/HomePage/components/ChatPanel.tsx Tightens typing and removes any casts.
BillNote_frontend/src/layouts/SettingLayout.tsx Localizes UI comments.
BillNote_frontend/src/hooks/useTaskPolling.ts Removes unused store selectors.
BillNote_frontend/src/constant/note.ts Adds mermaid as a selectable note format.
BillNote_frontend/src/components/LazyImage.tsx Updates file header comment.
BillNote_frontend/src/components/Icons/iconMap.ts Updates file header comment.
BillNote_frontend/src/components/Form/modelForm/ModelSelector.tsx Simplifies catch param typing.
BillNote_frontend/src/components/Form/modelForm/Icons/iconMap.ts Updates file header comment.
BillNote_frontend/src/components/Form/modelForm/Form.tsx Removes unused logic/imports and simplifies provider form flow.
BillNote_frontend/src/components/Form/modelForm/components/providerCard.tsx Removes debug logs and replaces ts-ignore with ts-expect-error.
BillNote_frontend/src/components/Form/DownloaderForm/providerCard.tsx Tightens icon prop typing and removes debug logs/unused imports.
BillNote_frontend/src/components/Form/DownloaderForm/Options.tsx Removes unused imports and dead navigation code.
BillNote_frontend/src/components/Form/DownloaderForm/Form.tsx Simplifies catch param typing.
BillNote_frontend/package.json Adds mermaid dependency.
BillNote_frontend/package-lock.json Updates lockfile for new deps (including mermaid / ant-design x).
backend/app/utils/video_reader.py Adds dense sampling + scene-change scoring to select keyframes; introduces numpy usage.
backend/app/transcriber/whisper.py Adjusts model path handling after snapshot download.
backend/app/transcriber/transcriber_provider.py Clarifies fallback comment wording.
backend/app/services/note.py Fixes screenshot/video download decision logic, adjusts defaults for interval/grid, and changes status file write behavior.
backend/app/routers/note.py Hardens task status read against empty/invalid JSON and avoids JSONDecodeError crashes.
backend/app/routers/config.py Localizes comment for whisper download tracking dict.
backend/app/gpt/qwen_gpt.py Updates comment wording for time format.
backend/app/gpt/prompt_builder.py Adds mermaid format support and rewrites style instructions to more structured prompts.
backend/app/gpt/openai_gpt.py Updates comment wording for time format.
backend/app/gpt/deepseek_gpt.py Updates comment wording for time format.
backend/app/downloaders/youtube_downloader.py Switches to project logger (get_logger).
backend/app/downloaders/douyin_helper/abogus.py Localizes docstring types and removes English duplicates.
backend/app/downloaders/douyin_downloader.py Localizes docstring by removing duplicated English text.
backend/app/downloaders/bilibili_downloader.py Switches to project logger, strengthens headers, and expands cookies lookup paths.
AGENTS.md Adds a project “persistent memory” doc for agent workflows/gotchas.
.gitignore Ignores **/cookies.txt.
.github/workflows/ai-literacy.yml Adds workflow to enforce harness constraints (currently mostly a placeholder).
.claude/HARNESS.md Adds harness definition and constraints inventory.
.claude/agents/tdd-agent.md Adds agent role description for test generation.
.claude/agents/spec-writer.md Adds agent role description for spec updates.
.claude/agents/orchestrator.md Adds orchestrator agent role description.
.claude/agents/integration-agent.md Adds integration agent role description.
.claude/agents/code-reviewer.md Adds code reviewer agent role description.
Files not reviewed (1)
  • BillNote_frontend/package-lock.json: Language not supported
Comments suppressed due to low confidence (1)

BillNote_frontend/src/pages/HomePage/components/NoteForm.tsx:466

  • 这里把 Controllerfield 丢弃后,Checkbox 的变更通过 form.setValue('video_understanding', v) 手动写入,但 v 在 Radix/Shadcn Checkbox 中类型通常是 boolean | 'indeterminate'。如果出现 indeterminate,会把非布尔值写进表单状态,后续校验/序列化可能异常。建议继续使用 field.value/field.onChange,或至少把 v 显式转换为 v === true 并在 setValue 时设置 shouldDirty/shouldValidate

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 163 to 169
if self.dedupe_enabled:
frame_hash = self._calculate_file_md5(output_path)
if frame_hash == last_hash:
os.remove(output_path)
# 删掉未被选中的冗余帧
if ts not in selected_timestamps:
os.remove(output_path)
continue
Comment thread backend/app/utils/video_reader.py Outdated
Comment on lines +139 to +145
# 同时确保覆盖视频全程:每隔 frame_interval 至少保留一帧
for ts in range(0, int(duration), self.frame_interval):
# 找到该区间内得分最高的帧
candidates = [(t, p) for t, p in valid_frames if ts <= t < ts + self.frame_interval]
if candidates:
best = max(candidates, key=lambda x: scores.get(x[0], 0))
selected_timestamps.add(best[0])
Comment on lines +62 to +65
const id = `mermaid-${Math.random().toString(36).slice(2, 9)}`
mermaid.render(id, chart)
.then(({ svg }) => setSvg(svg))
.catch((e) => setError(e.message || 'Mermaid 渲染失败'))
}
const results = await Promise.all(replacements)
for (const { search, replace } of results) {
content = content.replace(search, replace)
Comment on lines 139 to 148
@@ -144,8 +143,8 @@ const NoteForm = () => {
quality: 'medium',
model_name: modelList[0]?.model_name || '',
style: 'minimal',
video_interval: 6,
grid_size: [2, 2],
video_interval: 3,
grid_size: [3, 3],
format: [],
Comment thread docker-compose.gpu.yml
Comment on lines 16 to 19
volumes:
- ./backend:/app
- ./cookies.txt:/app/cookies.txt
expose:
Comment on lines +41 to +98
ydl_opts = {
'format': 'bestaudio[ext=m4a]/bestaudio/best',
'outtmpl': output_path,
'postprocessors': [
{
'key': 'FFmpegExtractAudio',
'preferredcodec': 'mp3',
'preferredquality': '64',
}
],
'noplaylist': True,
'quiet': False,
# 添加 B 站反爬绕过
'http_headers': {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
'Accept-Encoding': 'gzip, deflate, br',
'Referer': 'https://www.bilibili.com/',
'Origin': 'https://www.bilibili.com',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-site',
},
'extractor_retries': 5,
}

# 添加 cookies 支持 - 尝试多个位置
found = False
# 1. 环境变量指定的路径
cookies_path = Path(BILIBILI_COOKIES_FILE)
if cookies_path.is_absolute() and cookies_path.exists():
ydl_opts['cookiefile'] = str(cookies_path)
logger.info(f"使用 cookies 文件: {cookies_path}")
found = True
# 2. 尝试相对于本文件的路径(backend 根目录)
if not found:
cookies_path = Path(__file__).parent.parent.parent / BILIBILI_COOKIES_FILE
if cookies_path.exists():
ydl_opts['cookiefile'] = str(cookies_path)
logger.info(f"使用 cookies 文件: {cookies_path}")
found = True
# 3. 尝试当前工作目录
if not found:
cookies_path = Path(os.getcwd()) / BILIBILI_COOKIES_FILE
if cookies_path.exists():
ydl_opts['cookiefile'] = str(cookies_path)
logger.info(f"使用 cookies 文件: {cookies_path}")
found = True
# 4. 尝试 Docker 根目录
if not found:
cookies_path = Path('/app') / BILIBILI_COOKIES_FILE
if cookies_path.exists():
ydl_opts['cookiefile'] = str(cookies_path)
logger.info(f"使用 cookies 文件: {cookies_path}")
found = True
if not found:
logger.warning(f"B站 cookies 文件不存在,下载可能失败")
Comment on lines +187 to +197
else:
status_content = json.loads(content)
except json.JSONDecodeError:
logger.warning(f"状态文件 JSON 解析失败: {status_path}")
if os.path.exists(result_path):
os.remove(status_path)
return R.success({
"status": TaskStatus.PENDING.value,
"message": "任务排队中",
"task_id": task_id
})
Comment on lines 41 to 66
ydl_opts = {
'format': 'bestaudio[ext=m4a]/bestaudio/best',
'outtmpl': output_path,
'postprocessors': [
{
'key': 'FFmpegExtractAudio',
'preferredcodec': 'mp3',
'preferredquality': '64',
}
],
'noplaylist': True,
'quiet': False,
# 添加 B 站反爬绕过
'http_headers': {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
'Accept-Encoding': 'gzip, deflate, br',
'Referer': 'https://www.bilibili.com/',
'Origin': 'https://www.bilibili.com',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-site',
},
'extractor_retries': 5,
}
Comment on lines +71 to +83
@staticmethod
def _scene_change_score(img_path_a: str, img_path_b: str) -> float:
"""计算两帧之间的场景变化分数(0-1)。
使用缩略图差异,对画面内容变化敏感,忽略细微噪声。"""
try:
a = Image.open(img_path_a).convert("L").resize((64, 36), Image.Resampling.NEAREST)
b = Image.open(img_path_b).convert("L").resize((64, 36), Image.Resampling.NEAREST)
arr_a = np.array(a, dtype=np.float32)
arr_b = np.array(b, dtype=np.float32)
diff = np.abs(arr_a - arr_b).mean() / 255.0
return float(diff)
except Exception:
return 0.0
Copy link
Copy Markdown
Author

iorireal commented May 6, 2026

Superseded by #340. The replacement PR is rebuilt from origin/master and keeps only the focused Bilibili download / logger / task status fixes plus regression tests, without the unrelated agent docs, AI workflow files, frontend lint cleanup, Mermaid work, or lockfile churn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: B 站下载 412 错误 + 日志丢失 + 状态文件竞态

2 participants