Rename FASTA lookup helper: sequence_lookup_with_ens_fallback → lookup_sequence_with_version_fallback#352
Merged
Merged
Conversation
…on_fallback The old name was misleading: the fallback isn't to "ENS" (both Ensembl and GENCODE IDs start with ENS); the fallback is to a version-stripped form. The ENS prefix is just a safety gate so we don't strip non-Ensembl .N suffixes (e.g. TAIR isoform suffixes like AT1G01010.1). Docstring rewritten to reflect that both Ensembl and GENCODE produce versioned IDs - the formats just split that information differently (Ensembl in a separate *_version attribute, GENCODE embedded in the ID itself). The helper exists for the GENCODE case where the GTF stores the full versioned ID and a literal FASTA lookup would miss. No back-compat shim: this helper was added today in v2.9.6 and isn't exposed in __init__.py, so no external callers can depend on the old name. Bump version to 2.9.7.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The helper I added in #350 / v2.9.6 was misnamed. ENS isn't the fallback — both Ensembl and GENCODE protein/transcript IDs start with ENS. The actual fallback is to a version-stripped form of the same ID; the ENS-prefix check is just a guard that says "this ID has a version we know how to strip safely" (in contrast to e.g. TAIR
AT1G01010.1where.1is an isoform suffix, not a version).Renamed to
lookup_sequence_with_version_fallbackand the docstring is rewritten to correctly describe both Ensembl and GENCODE as having versioned IDs (they just split that information differently across the GTF / FASTA pair).Internal helper only — not exported from
pyensembl/__init__.py, was added today in v2.9.6, so no external callers can depend on the old name. No back-compat shim.Three call sites updated:
Transcript.sequenceTranscript.protein_sequenceGenome.transcript_sequence(id)/Genome.protein_sequence(id)Bumps version to 2.9.7.
Test plan
pytest tests/test_versioned_protein_fasta.py tests/test_mouse.py tests/test_tair10_complete.py tests/test_versions.py tests/test_sequence_data.py tests/test_transcript_sequences.py(20 passed locally)./lint.sh