fix: 修复 B 站下载 412 错误及日志/状态文件竞态问题#332
Conversation
- 修复 downloader logger:bilibili_downloader 和 youtube_downloader 改用 get_logger,确保下载日志写入 app.log - 补齐 download_subtitles 的 cookies 查找路径(从 2 个位置扩展到 4 个) - 加强 B 站反爬请求头:新增 Origin、Sec-Fetch-*、Accept-Encoding,extractor_retries 3→5 - 修复 _update_status 竞态:去掉 temp+rename,直接写入避免 Windows 下空文件问题 - 修复 get_task_status 空文件防御:增加空内容和 JSON 解析异常捕获
- 移除 cookies 文件查找中的硬编码 Windows 绝对路径 - 改用环境变量 + 相对路径的查找策略 - 添加 **/cookies.txt 到 .gitignore
移除未使用的 import、变量和类型,修复 any 类型为具体类型, 消除全部 44 个 lint error,保留 15 个非阻塞 warning。
- note.py: format 中勾选"原片截图"时触发全量下载,修复截图不显示问题 - video_reader.py: 添加 MAX_GRIDS=25 限制,防止豆包模型 token 超限 - prompt_builder.py: 优化 prompt 构建逻辑 - 其他后端文件: 代码清理和优化 chore: 初始化 AI Literacy 开发规范体系 - CLAUDE.md, AGENTS.md, MODEL_ROUTING.md - .claude/agents/ (5 个规范 agent) - .github/workflows/ai-literacy.yml - scripts/ai-literacy-check.sh
There was a problem hiding this comment.
Pull request overview
This PR primarily targets reliability improvements across the download + note-generation pipeline: it strengthens Bilibili download requests / cookie discovery, fixes task status JSON read robustness, and reduces several frontend any usages while adding Mermaid diagram rendering + richer export behavior.
Changes:
- Backend: improve Bilibili downloader headers/cookie lookup, harden task status read path against empty/invalid JSON, and adjust media download logic to correctly force video download when screenshot format is selected.
- Frontend: add Mermaid rendering in Markdown preview and embed images as base64 on Markdown export; tighten types (
any→unknown/Record<string, unknown>) and remove unused code/imports. - Repo tooling/docs: add security lint script + harness/agent docs, ignore
cookies.txt, and mountcookies.txtinto backend containers.
Reviewed changes
Copilot reviewed 64 out of 67 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/lint.sh | Adds a pre-commit “security lint” script to catch sensitive files, absolute paths, and credential-like strings. |
| scripts/ai-literacy-check.sh | Adds a CI fallback script placeholder for harness constraint enforcement. |
| REFLECTION_LOG.md | Adds a reflection log template for pipeline learnings. |
| MODEL_ROUTING.md | Documents agent-to-model routing guidance for orchestrated workflows. |
| docker-compose.yml | Mounts cookies.txt into the backend container. |
| docker-compose.gpu.yml | Same cookies bind mount for GPU compose stack. |
| CLAUDE.md | Expands workflow/constraints documentation and local dev/test commands. |
| BillNote_frontend/src/utils/request.ts | Tightens response typing and updates comments. |
| BillNote_frontend/src/store/taskStore/index.ts | Replaces several any types with safer types. |
| BillNote_frontend/src/store/chatStore/index.ts | Avoids unused-var lint issue via rename + eslint disable. |
| BillNote_frontend/src/services/note.ts | Removes any from catch param. |
| BillNote_frontend/src/services/model.ts | Tightens request payload types. |
| BillNote_frontend/src/services/downloader.ts | Tightens platform type from any to string. |
| BillNote_frontend/src/pages/SettingPage/transcriber.tsx | Removes unused icon/state fields. |
| BillNote_frontend/src/pages/SettingPage/Monitor.tsx | Simplifies catch and localizes UI comments. |
| BillNote_frontend/src/pages/SettingPage/Downloader.tsx | Removes unused import. |
| BillNote_frontend/src/pages/SettingPage/components/menuBar.tsx | Fixes prop typing (any → IMenuProps) and removes unused imports. |
| BillNote_frontend/src/pages/SettingPage/about.tsx | Removes unused icons and localizes UI comments. |
| BillNote_frontend/src/pages/NotFoundPage/index.tsx | Updates file header comment. |
| BillNote_frontend/src/pages/HomePage/Home.tsx | Removes unused local variable. |
| BillNote_frontend/src/pages/HomePage/components/transcriptViewer.tsx | Removes unused imports and localizes comments. |
| BillNote_frontend/src/pages/HomePage/components/StepBar.tsx | Removes unused variable. |
| BillNote_frontend/src/pages/HomePage/components/NoteHistory.tsx | Removes unused imports/state/effect and cleans search logic. |
| BillNote_frontend/src/pages/HomePage/components/NoteForm.tsx | Adjusts defaults for video understanding controls and simplifies store usage. |
| BillNote_frontend/src/pages/HomePage/components/MarkmapComponent.tsx | Adds typings (unknown), improves XMind conversion typing. |
| BillNote_frontend/src/pages/HomePage/components/MarkdownViewer.tsx | Adds Mermaid rendering and base64-embedding images on Markdown download; improves component typing. |
| BillNote_frontend/src/pages/HomePage/components/MarkdownHeader.tsx | Simplifies version list rendering logic. |
| BillNote_frontend/src/pages/HomePage/components/History.tsx | Removes unused icons. |
| BillNote_frontend/src/pages/HomePage/components/ChatPanel.tsx | Tightens typing and removes any casts. |
| BillNote_frontend/src/layouts/SettingLayout.tsx | Localizes UI comments. |
| BillNote_frontend/src/hooks/useTaskPolling.ts | Removes unused store selectors. |
| BillNote_frontend/src/constant/note.ts | Adds mermaid as a selectable note format. |
| BillNote_frontend/src/components/LazyImage.tsx | Updates file header comment. |
| BillNote_frontend/src/components/Icons/iconMap.ts | Updates file header comment. |
| BillNote_frontend/src/components/Form/modelForm/ModelSelector.tsx | Simplifies catch param typing. |
| BillNote_frontend/src/components/Form/modelForm/Icons/iconMap.ts | Updates file header comment. |
| BillNote_frontend/src/components/Form/modelForm/Form.tsx | Removes unused logic/imports and simplifies provider form flow. |
| BillNote_frontend/src/components/Form/modelForm/components/providerCard.tsx | Removes debug logs and replaces ts-ignore with ts-expect-error. |
| BillNote_frontend/src/components/Form/DownloaderForm/providerCard.tsx | Tightens icon prop typing and removes debug logs/unused imports. |
| BillNote_frontend/src/components/Form/DownloaderForm/Options.tsx | Removes unused imports and dead navigation code. |
| BillNote_frontend/src/components/Form/DownloaderForm/Form.tsx | Simplifies catch param typing. |
| BillNote_frontend/package.json | Adds mermaid dependency. |
| BillNote_frontend/package-lock.json | Updates lockfile for new deps (including mermaid / ant-design x). |
| backend/app/utils/video_reader.py | Adds dense sampling + scene-change scoring to select keyframes; introduces numpy usage. |
| backend/app/transcriber/whisper.py | Adjusts model path handling after snapshot download. |
| backend/app/transcriber/transcriber_provider.py | Clarifies fallback comment wording. |
| backend/app/services/note.py | Fixes screenshot/video download decision logic, adjusts defaults for interval/grid, and changes status file write behavior. |
| backend/app/routers/note.py | Hardens task status read against empty/invalid JSON and avoids JSONDecodeError crashes. |
| backend/app/routers/config.py | Localizes comment for whisper download tracking dict. |
| backend/app/gpt/qwen_gpt.py | Updates comment wording for time format. |
| backend/app/gpt/prompt_builder.py | Adds mermaid format support and rewrites style instructions to more structured prompts. |
| backend/app/gpt/openai_gpt.py | Updates comment wording for time format. |
| backend/app/gpt/deepseek_gpt.py | Updates comment wording for time format. |
| backend/app/downloaders/youtube_downloader.py | Switches to project logger (get_logger). |
| backend/app/downloaders/douyin_helper/abogus.py | Localizes docstring types and removes English duplicates. |
| backend/app/downloaders/douyin_downloader.py | Localizes docstring by removing duplicated English text. |
| backend/app/downloaders/bilibili_downloader.py | Switches to project logger, strengthens headers, and expands cookies lookup paths. |
| AGENTS.md | Adds a project “persistent memory” doc for agent workflows/gotchas. |
| .gitignore | Ignores **/cookies.txt. |
| .github/workflows/ai-literacy.yml | Adds workflow to enforce harness constraints (currently mostly a placeholder). |
| .claude/HARNESS.md | Adds harness definition and constraints inventory. |
| .claude/agents/tdd-agent.md | Adds agent role description for test generation. |
| .claude/agents/spec-writer.md | Adds agent role description for spec updates. |
| .claude/agents/orchestrator.md | Adds orchestrator agent role description. |
| .claude/agents/integration-agent.md | Adds integration agent role description. |
| .claude/agents/code-reviewer.md | Adds code reviewer agent role description. |
Files not reviewed (1)
- BillNote_frontend/package-lock.json: Language not supported
Comments suppressed due to low confidence (1)
BillNote_frontend/src/pages/HomePage/components/NoteForm.tsx:466
- 这里把
Controller的field丢弃后,Checkbox 的变更通过form.setValue('video_understanding', v)手动写入,但v在 Radix/Shadcn Checkbox 中类型通常是boolean | 'indeterminate'。如果出现indeterminate,会把非布尔值写进表单状态,后续校验/序列化可能异常。建议继续使用field.value/field.onChange,或至少把v显式转换为v === true并在setValue时设置shouldDirty/shouldValidate。
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if self.dedupe_enabled: | ||
| frame_hash = self._calculate_file_md5(output_path) | ||
| if frame_hash == last_hash: | ||
| os.remove(output_path) | ||
| # 删掉未被选中的冗余帧 | ||
| if ts not in selected_timestamps: | ||
| os.remove(output_path) | ||
| continue |
| # 同时确保覆盖视频全程:每隔 frame_interval 至少保留一帧 | ||
| for ts in range(0, int(duration), self.frame_interval): | ||
| # 找到该区间内得分最高的帧 | ||
| candidates = [(t, p) for t, p in valid_frames if ts <= t < ts + self.frame_interval] | ||
| if candidates: | ||
| best = max(candidates, key=lambda x: scores.get(x[0], 0)) | ||
| selected_timestamps.add(best[0]) |
| const id = `mermaid-${Math.random().toString(36).slice(2, 9)}` | ||
| mermaid.render(id, chart) | ||
| .then(({ svg }) => setSvg(svg)) | ||
| .catch((e) => setError(e.message || 'Mermaid 渲染失败')) |
| } | ||
| const results = await Promise.all(replacements) | ||
| for (const { search, replace } of results) { | ||
| content = content.replace(search, replace) |
| @@ -144,8 +143,8 @@ const NoteForm = () => { | |||
| quality: 'medium', | |||
| model_name: modelList[0]?.model_name || '', | |||
| style: 'minimal', | |||
| video_interval: 6, | |||
| grid_size: [2, 2], | |||
| video_interval: 3, | |||
| grid_size: [3, 3], | |||
| format: [], | |||
| volumes: | ||
| - ./backend:/app | ||
| - ./cookies.txt:/app/cookies.txt | ||
| expose: |
| ydl_opts = { | ||
| 'format': 'bestaudio[ext=m4a]/bestaudio/best', | ||
| 'outtmpl': output_path, | ||
| 'postprocessors': [ | ||
| { | ||
| 'key': 'FFmpegExtractAudio', | ||
| 'preferredcodec': 'mp3', | ||
| 'preferredquality': '64', | ||
| } | ||
| ], | ||
| 'noplaylist': True, | ||
| 'quiet': False, | ||
| # 添加 B 站反爬绕过 | ||
| 'http_headers': { | ||
| 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36', | ||
| 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', | ||
| 'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2', | ||
| 'Accept-Encoding': 'gzip, deflate, br', | ||
| 'Referer': 'https://www.bilibili.com/', | ||
| 'Origin': 'https://www.bilibili.com', | ||
| 'Sec-Fetch-Dest': 'empty', | ||
| 'Sec-Fetch-Mode': 'cors', | ||
| 'Sec-Fetch-Site': 'same-site', | ||
| }, | ||
| 'extractor_retries': 5, | ||
| } | ||
|
|
||
| # 添加 cookies 支持 - 尝试多个位置 | ||
| found = False | ||
| # 1. 环境变量指定的路径 | ||
| cookies_path = Path(BILIBILI_COOKIES_FILE) | ||
| if cookies_path.is_absolute() and cookies_path.exists(): | ||
| ydl_opts['cookiefile'] = str(cookies_path) | ||
| logger.info(f"使用 cookies 文件: {cookies_path}") | ||
| found = True | ||
| # 2. 尝试相对于本文件的路径(backend 根目录) | ||
| if not found: | ||
| cookies_path = Path(__file__).parent.parent.parent / BILIBILI_COOKIES_FILE | ||
| if cookies_path.exists(): | ||
| ydl_opts['cookiefile'] = str(cookies_path) | ||
| logger.info(f"使用 cookies 文件: {cookies_path}") | ||
| found = True | ||
| # 3. 尝试当前工作目录 | ||
| if not found: | ||
| cookies_path = Path(os.getcwd()) / BILIBILI_COOKIES_FILE | ||
| if cookies_path.exists(): | ||
| ydl_opts['cookiefile'] = str(cookies_path) | ||
| logger.info(f"使用 cookies 文件: {cookies_path}") | ||
| found = True | ||
| # 4. 尝试 Docker 根目录 | ||
| if not found: | ||
| cookies_path = Path('/app') / BILIBILI_COOKIES_FILE | ||
| if cookies_path.exists(): | ||
| ydl_opts['cookiefile'] = str(cookies_path) | ||
| logger.info(f"使用 cookies 文件: {cookies_path}") | ||
| found = True | ||
| if not found: | ||
| logger.warning(f"B站 cookies 文件不存在,下载可能失败") |
| else: | ||
| status_content = json.loads(content) | ||
| except json.JSONDecodeError: | ||
| logger.warning(f"状态文件 JSON 解析失败: {status_path}") | ||
| if os.path.exists(result_path): | ||
| os.remove(status_path) | ||
| return R.success({ | ||
| "status": TaskStatus.PENDING.value, | ||
| "message": "任务排队中", | ||
| "task_id": task_id | ||
| }) |
| ydl_opts = { | ||
| 'format': 'bestaudio[ext=m4a]/bestaudio/best', | ||
| 'outtmpl': output_path, | ||
| 'postprocessors': [ | ||
| { | ||
| 'key': 'FFmpegExtractAudio', | ||
| 'preferredcodec': 'mp3', | ||
| 'preferredquality': '64', | ||
| } | ||
| ], | ||
| 'noplaylist': True, | ||
| 'quiet': False, | ||
| # 添加 B 站反爬绕过 | ||
| 'http_headers': { | ||
| 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36', | ||
| 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', | ||
| 'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2', | ||
| 'Accept-Encoding': 'gzip, deflate, br', | ||
| 'Referer': 'https://www.bilibili.com/', | ||
| 'Origin': 'https://www.bilibili.com', | ||
| 'Sec-Fetch-Dest': 'empty', | ||
| 'Sec-Fetch-Mode': 'cors', | ||
| 'Sec-Fetch-Site': 'same-site', | ||
| }, | ||
| 'extractor_retries': 5, | ||
| } |
| @staticmethod | ||
| def _scene_change_score(img_path_a: str, img_path_b: str) -> float: | ||
| """计算两帧之间的场景变化分数(0-1)。 | ||
| 使用缩略图差异,对画面内容变化敏感,忽略细微噪声。""" | ||
| try: | ||
| a = Image.open(img_path_a).convert("L").resize((64, 36), Image.Resampling.NEAREST) | ||
| b = Image.open(img_path_b).convert("L").resize((64, 36), Image.Resampling.NEAREST) | ||
| arr_a = np.array(a, dtype=np.float32) | ||
| arr_b = np.array(b, dtype=np.float32) | ||
| diff = np.abs(arr_a - arr_b).mean() / 255.0 | ||
| return float(diff) | ||
| except Exception: | ||
| return 0.0 |
|
Superseded by #340. The replacement PR is rebuilt from |
Summary
get_logger)download_subtitles的 cookies 查找路径Closes #330
涉及文件
backend/app/downloaders/bilibili_downloader.pyget_logger;cookies 查找补齐为 4 个位置(去硬编码路径);三处 headers 加强backend/app/downloaders/youtube_downloader.pyget_loggerbackend/app/services/note.py_update_status去掉 temp+rename,改为直接写入backend/app/routers/note.pyget_task_status增加空文件检测和 JSON 解析异常捕获.gitignore**/cookies.txtTest plan
backend/logs/app.log,确认 downloader 日志正常输出task_status,确认不再出现 JSONDecodeError🤖 Generated with Claude Code