Skip to content

feat: add experimental insect hosts#76

Open
kkyungseo wants to merge 1 commit into
eijex:mainfrom
kkyungseo:feat/issue-23-insect-hosts
Open

feat: add experimental insect hosts#76
kkyungseo wants to merge 1 commit into
eijex:mainfrom
kkyungseo:feat/issue-23-insect-hosts

Conversation

@kkyungseo

Copy link
Copy Markdown
파일 역할
src/factorforge/data/spodoptera_frugiperda_codons.json Sf9 플레이스홀더 코돈 테이블 (GC≈41%, AGA→Arg)
src/factorforge/data/trichoplusia_ni_codons.json Tni 플레이스홀더 코돈 테이블 (GC≈39%, 더 강한 AT-bias)
src/factorforge/cli/main.py (수정) HOST_MAP, click.Choice, help text에 sf9 / tni 추가
src/factorforge/registry/current_parameter_registry.yaml (수정) host_profiles에 sf9, tni 추가 (status=experimental, owner_job=023)
tests/engines/profile/test_host_insects.py 18개 테스트 추가 및 통과, 기존 494개 테스트 무회귀 확인

플레이스홀더 설계 원칙

  • total_cds=0 값을 사용하여 플레이스홀더 상태를 명시

  • source 필드에 "PLACEHOLDER" 포함 → 테스트에서 강제 검증

  • 레피도프테라(Lepidoptera) 특성 보존

    • AGA → Arg 우선 사용

    • 강한 AT-bias 반영

    • GC(Tni) < GC(Sf9) 관계 유지

  • 실제 데이터 확보 시 JSON 파일 교체만으로 파이프라인 완성 가능

… ni) (eijex#23)

Wires up two BEVS insect cell-line hosts following the same pattern as the
plant host additions in eijex#24. Codon tables are clearly marked PLACEHOLDER
(total_cds/total_codons=0, PLACEHOLDER in source field) so the pipeline
remains functional while real CDS datasets are pending.

- data/spodoptera_frugiperda_codons.json: Sf9 placeholder table, GC~41%,
  AT-biased Lepidoptera frequencies, AGA preferred for Arg
- data/trichoplusia_ni_codons.json: Tni/High Five placeholder table,
  GC~39%, stronger AT-bias than Sf9, AGA preferred for Arg
- cli/main.py: HOST_MAP += sf9/tni; click.Choice and help text updated
- registry/current_parameter_registry.yaml: sf9 and tni added to
  host_profiles as status=experimental, owner_job=023
- tests/engines/profile/test_host_insects.py: 18 tests covering file
  existence, 64-codon completeness, frequency normalization, optimizer
  round-trip, registry wiring, placeholder guards, and biological invariants

All 18 new tests pass; 494 pre-existing tests unaffected (0 regressions).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 13, 2026

Copy link
Copy Markdown

@KyungSeo116 is attempting to deploy a commit to the munkyukim86's projects Team on Vercel.

A member of the Team first needs to authorize it.

@munkyukim86

Copy link
Copy Markdown
Contributor

Code review

Found 3 issues:

  1. New hosts added to CLI but not to the REST APIapi/optimize.py still has VALID_HOSTS = ["nbenthamiana", "by2"] and its own HOST_MAP that were not updated. Any POST /optimize request using sf9 or tni will be rejected (line 274 raises ValueError: Unsupported host). Note: PR feat: add experimental plant hosts #75 has the same omission — both PRs need api/optimize.py updated before the new hosts are reachable via the web API.

HOST_MAP = {
"nbenthamiana": "nbenthamiana",
"by2": "ntabacum",
"sf9": "spodoptera_frugiperda",
"tni": "trichoplusia_ni",
}
def _configure_stdio() -> None:

  1. No runtime warning when using placeholder codon tables — The help text mentions "placeholder codon tables" but the engine loads and uses them silently at runtime. load_codon_table() in utils.py does not inspect total_cds or the source field, so a user running --host sf9 receives optimization output with no indication that the underlying codon data is unvalidated. A click.echo() warning before optimization runs (or a WARNING: in the FASTA header) would be the minimum mitigation.

{
"organism": "Spodoptera frugiperda",
"source": "PLACEHOLDER — awaiting verified CDS dataset. Frequencies are estimates derived from Lepidoptera genome surveys (Kazusa CodonUsage Database, Spodoptera frugiperda ISE-6 assembly). Do not use for production optimization until data is validated and source is updated.",
"description": "PLACEHOLDER codon usage table for Spodoptera frugiperda (fall armyworm). Sf9 cells are the primary insect cell line for baculovirus expression vector systems (BEVS). Shows characteristic Lepidoptera AT-bias at synonymous positions and strong AGA preference for Arg.",
"total_cds": 0,
"total_codons": 0,
"codons": {
"TTT": {
"aa": "F",
"frequency": 0.46,

  1. GC scoring band (55–65%) silently applied to insect hosts — The default GC band is calibrated for N. benthamiana (benchmark analysis 004). Sf9 CDS GC is ~41% and T. ni is ~39%. With no host-specific ScoringConfig override, every sequence optimized for sf9 or tni that lands in its natural GC range will receive a penalized GC score and potentially a composite-score warning. The band mismatch is ~15 percentage points. A host-specific gc_min/gc_max entry (or at minimum a documented known limitation) is needed before users interpret the scores.

owner_job: "023"
source: "NCBI Taxonomy Browser NCBITaxon:7108; codon table is PLACEHOLDER — awaiting verified CDS dataset"
rationale: "Sf9 insect cell line (Lepidoptera); primary BEVS host. Codon table uses placeholder frequencies derived from Lepidoptera genome surveys."
provenance: "Issue #23 insect host addition; placeholder codon table until authoritative S. frugiperda CDS build is available."
visibility: public
permission: publish_allowed
tni:
display_name: "Tni / T. ni (High Five)"
scientific_name: "Trichoplusia ni"
ncbi_taxonomy_id: 7111
ncbi_taxonomy_curie: "NCBITaxon:7111"
status: experimental
claim_level: experimental_setting
evidence_status: experimental
release_status: experimental
owner_job: "023"
source: "NCBI Taxonomy Browser NCBITaxon:7111; codon table is PLACEHOLDER — awaiting verified CDS dataset (ref: Tnms42 genome GCF_003590095.1)"
rationale: "Tni High Five insect cell line (BTI-Tn5B1-4, Lepidoptera); widely used for secreted protein in BEVS. Codon table uses placeholder frequencies."
provenance: "Issue #23 insect host addition; placeholder codon table until authoritative T. ni CDS build is available."
visibility: public
permission: publish_allowed
codon_reference:
id: nbenthamiana_legacy_kazusa_sgn_v101
species: Nicotiana benthamiana
ncbi_taxid: 4100
source_label: "Kazusa CodonUsage Database + SGN genome v1.0.1"
source_status: legacy_metadata_only
build_path_status: incomplete
authoritative_build_script_available: false

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@munkyukim86

munkyukim86 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Thanks for contributing this! Really appreciate you taking the time to add the insect hosts. Left a few notes above, once those are addressed, happy to get this merged.

@kkyungseo

Copy link
Copy Markdown
Author

Thanks for the review! I'll work through the notes above and aim to have everything addressed by June 18 (KST).
I'll ping you here once it's ready for another look.

@munkyukim86

Copy link
Copy Markdown
Contributor

Can't wait, thanks!

@munkyukim86

Copy link
Copy Markdown
Contributor

Thanks for the contribution! The pandas import failure here is from a stale CI snapshot (2026-06-15) — main has since fixed the benchmark module's dependency setup. Could you rebase this branch onto current main to pick that up? Should resolve the CI failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants