Skip to content

feat: add experimental plant hosts#75

Open
kkyungseo wants to merge 1 commit into
eijex:mainfrom
kkyungseo:feat/issue-24-plant-hosts
Open

feat: add experimental plant hosts#75
kkyungseo wants to merge 1 commit into
eijex:mainfrom
kkyungseo:feat/issue-24-plant-hosts

Conversation

@kkyungseo

Copy link
Copy Markdown
변경 파일 내용
data/arabidopsis_thaliana_codons.json 신규 — 64 코돈, TAIR10 기반
data/solanum_lycopersicum_codons.json 신규 — 64 코돈, ITAG4.0 기반
data/lemna_minor_codons.json 신규 — 64 코돈
cli/main.py HOST_MAP + click.Choice 에 4종 추가
current_parameter_registry.yaml 4개 host_profiles 등록 (status: experimental)
design_package.schema.json hostProfile enum/oneOf 확장
tests/engines/profile/test_host_plants.py 신규 — 파라미터화 테스트 12개
CHANGELOG.md / docs/cli.md 문서 업데이트

JSON 파일 품질 검증(64 코돈, 아미노산별 frequency 합계 ≈ 1.0, preferred 코돈 = 최고빈도 코돈)은 로컬에서 통과했으며, ruff lint/format도 clean 상태입니다.

…fia (eijex#24)

Adds four new expression hosts selectable via --host:
- arabidopsis (A. thaliana, NCBITaxon:3702)
- tomato (S. lycopersicum, NCBITaxon:4081)
- lemna (L. minor, NCBITaxon:4188)
- wolffia (W. globosa, NCBITaxon:113308, pre-existing codon table wired up)

Each host is status: experimental. Codon tables derived from Kazusa
CodonUsage Database and NCBI RefSeq CDS annotations. Frequency values
are normalized within each amino acid family (64 codons, sum=1.0).

Changes:
- data/: three new *_codons.json files (wolffia_globosa pre-existed)
- cli/main.py: HOST_MAP + click.Choice extended with four new aliases
- registry/current_parameter_registry.yaml: four host_profiles entries
- schemas/design_package.schema.json: enum/oneOf extended for new IDs
- tests/engines/profile/test_host_plants.py: parametrized coverage
- CHANGELOG.md, docs/cli.md: updated accordingly

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 13, 2026

Copy link
Copy Markdown

@KyungSeo116 is attempting to deploy a commit to the munkyukim86's projects Team on Vercel.

A member of the Team first needs to authorize it.

@munkyukim86

Copy link
Copy Markdown
Contributor

Code review

Found 3 issues:

  1. New hosts added to CLI but not to the REST APIapi/optimize.py still has VALID_HOSTS = ["nbenthamiana", "by2"] and its own HOST_MAP that were not updated. Any POST /optimize request using arabidopsis, tomato, lemna, or wolffia will be rejected with HTTP 400. The BY-2 addition (commit 593a04e) set the precedent of updating both files together.

HOST_MAP = {
"nbenthamiana": "nbenthamiana",
"by2": "ntabacum",
"arabidopsis": "arabidopsis_thaliana",
"tomato": "solanum_lycopersicum",
"lemna": "lemna_minor",
"wolffia": "wolffia_globosa",
}

  1. load_golden_set() is hardcoded to nbenthamiana_golden_set.json — The function in utils.py has no host parameter and always loads the N. benthamiana golden set. ReverseTranslator uses this to compute golden_ref_weights (the CAI reference weights), so any optimization run against arabidopsis, tomato, lemna, or wolffia will silently score and select codons using N. benthamiana CAI weights, not the target host's. The reported CAI values for these hosts will be misleading.

# Load golden set for CAI reference weights
if golden_set_path is not None:
self.golden_set_table: dict[str, Any] = self._load_codon_table(golden_set_path)
else:
try:
self.golden_set_table = load_golden_set()
except (FileNotFoundError, json.JSONDecodeError):
self.golden_set_table = self.codon_table
# Pre-compute relative adaptiveness weights from golden set (Sharp & Li 1987)
self.golden_ref_weights: dict[str, float] = self._build_ref_weights(self.golden_set_table)

  1. Wolffia codon table provenance mismatch — The registry registers wolffia with ncbi_taxonomy_id: 113308 (Wolffia globosa), but the source field states "proxy from NCBI GCF_029677425.1 (W. australiana)" — a different species with a different taxonomy ID. The codon table data and the declared organism do not match. Either the taxonomy ID should reflect W. australiana, or a note should make clear this is a cross-species proxy.

owner_job: "024"
source: "NCBI Taxonomy Browser NCBITaxon:3702; TAIR10 CDS annotation (NCBI GCF_000001735.4)"
rationale: "Model dicot plant A. thaliana; experimental host profile based on Kazusa/TAIR10 codon usage."
provenance: "Issue #24 plant host addition; codon table from Kazusa CodonUsage Database + TAIR10."
visibility: public
permission: publish_allowed
tomato:
display_name: "S. lycopersicum"
scientific_name: "Solanum lycopersicum"
ncbi_taxonomy_id: 4081
ncbi_taxonomy_curie: "NCBITaxon:4081"
status: experimental
claim_level: experimental_setting
evidence_status: experimental
release_status: experimental

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@munkyukim86

munkyukim86 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Thanks for contributing this! Really appreciate you taking the time to add these plant hosts. Left a few notes above, once those are addressed, happy to get this merged.

@kkyungseo

Copy link
Copy Markdown
Author

Thanks for the review! I'll work through the notes above and aim to have everything addressed by June 18 (KST).
I'll ping you here once it's ready for another look.

@munkyukim86

Copy link
Copy Markdown
Contributor

Can't wait, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants