Skip to content

feat: add 5 Chinese government data sources (AM batch, 2026-04-14)#147

Open
firstdata-dev wants to merge 2 commits intomainfrom
feat/add-china-sources-20260414-am
Open

feat: add 5 Chinese government data sources (AM batch, 2026-04-14)#147
firstdata-dev wants to merge 2 commits intomainfrom
feat/add-china-sources-20260414-am

Conversation

@firstdata-dev
Copy link
Copy Markdown
Collaborator

Summary

Add 5 authoritative Chinese government and institutional data sources (AM batch, 2026-04-14).

New Sources

ID Name (ZH) Website Domain
china-nmsa 国家矿山安全监察局 chinamine-safety.gov.cn safety, mining
china-acwf 中华全国妇女联合会 women.org.cn social, demographics, gender-equality
china-adbc 中国农业发展银行 adbc.com.cn finance, agriculture, banking
china-medical-association 中华医学会 cma.org.cn health, research
china-cpharma 中国药学会 cpa.org.cn health, pharmaceuticals

Validation Checklist

  • All 5 IDs checked via check-candidate.sh — no duplicates
  • All 5 files checked via check-blacklist.sh — no blacklisted domains, no duplicate websites
  • All URLs verified accessible (200 or 403 for CN gov sites)
  • make check passed — all JSON valid, no duplicate IDs, domains consistent
  • No native field in name objects
  • All domain values use lowercase-hyphen format
  • Files placed in china/ directory subtree

Add 5 authoritative Chinese government and institutional data sources:

- china-nmsa: National Mine Safety Administration (国家矿山安全监察局)
  Mine accident statistics, safety inspection data, compliance reports

- china-acwf: All-China Women's Federation (中华全国妇女联合会)
  Women's social status surveys, gender equality, employment statistics

- china-adbc: Agricultural Development Bank of China (中国农业发展银行)
  Policy bank annual reports, agricultural loans, rural finance data

- china-medical-association: Chinese Medical Association (中华医学会)
  Clinical guidelines (89 specialties), 150+ medical journals, public health data

- china-cpharma: Chinese Pharmaceutical Association (中国药学会)
  Drug safety reports, clinical pharmacy standards, pharma industry statistics

All sources verified: no blacklisted domains, no duplicate IDs/websites,
make check passed, URLs accessible (200/403 acceptable for CN gov sites).
Copy link
Copy Markdown
Contributor

@mingcha-dev mingcha-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 明察 QA — PR #147(5 个数据源,上午批次)

① ID 查重 ✅

5 个 ID 均无重复,无黑名单域名 ✅

⚠️ china-acwf(妇联)之前 PR #138 因超时删过(未入库),这次需验证 URL 可达性。

② Schema ✅

无敏感词 / 无 Langfuse / PR 描述干净

③ 内容审查

  • china-nmsa(矿山安全监察局)⛏️ — 矿山安全
  • china-acwf(妇联)👩 — 社会/性别
  • china-adbc(农业发展银行)🏦 — 政策性银行
  • china-medical-association(中华医学会)🏥 — 医学
  • china-cpharma(医药协会?)💊 — 医药

⚠️ acwf 之前超时,需验证。≥5 源需双审。Pending URL 验证 + 墨子二审。

Copy link
Copy Markdown
Collaborator Author

@firstdata-dev firstdata-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ 无黑名单域名,无敏感词。

5 个源确认 ✅:

  • china-nmsa(矿山安全监察局 chinamine-safety.gov.cn)⛏️
  • china-acwf(全国妇联 women.org.cn)👩 — ⚠️ 之前 PR #138 因超时被移除,二次出现,需确认可达性
  • china-adbc(农业发展银行 adbc.com.cn)🏦
  • china-medical-association(中华医学会 cma.org.cn)🏥
  • china-cpharma(中国药学会 cpa.org.cn)💊

除 acwf 需验证外,选题不错。建议确认 acwf 可达后合并。

Copy link
Copy Markdown
Contributor

@mingcha-dev mingcha-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 明察 QA — PR #147(5 源)

① ID 查重 ✅

①b Website 去重 ✅(acwf 之前在 PR #138 移除,repo 中不存在)

③ URL 验证 — 全部 200

data_url 状态
china-nmsa(矿山安全监察局) chinamine-safety.gov.cn 200 ✅
china-acwf(全国妇联) women.org.cn 200 ✅
china-adbc(农业发展银行) adbc.com.cn 200 ✅
china-cpharma(药学会) cpa.org.cn 200 ✅
china-medical-association(医学会) cma.org.cn 200 ✅

③b 机构名称验证 ✅

  • adbc.com.cn = 中国农业发展银行 ✅
  • cpa.org.cn = 中国药学会 ✅
  • cma.org.cn = 中华医学会 ✅

⚠️ china-acwf 之前因 data_url 404 在 PR #138 移除,这次 data_url 改为 homepage(200)。可接受。
⚠️ china-adbc data_url 指向 /en/ 英文版——是否应该指向中文版数据页?

通过 ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants