Optional incompleteness flag for required features (start_codon / stop_codon / CDS pairs)

## Background

GTF files don't enforce that a protein-coding transcript has both a `start_codon` and a `stop_codon` row — partial assemblies (e.g. TAIR `chr_patch_hapl_scaff` fragments, openvax/pyensembl#252) drop one or both even though `CDS` rows are present. Downstream consumers (pyensembl in particular) discover this at query time — `Transcript.coding_sequence` had to learn to return `None` rather than raise.

If gtfparse optionally surfaced this incompleteness at parse time, downstream tooling could decide its policy up front instead of catching `KeyError` / `ValueError` per-transcript.

## Proposal

A `transcript_completeness` (or `required_features_complete`) DataFrame column, opt-in via a kwarg:

```python
df = read_gtf(path, flag_incomplete_transcripts=True)
# adds a column 'has_start_codon' and 'has_stop_codon' aggregated over rows
# sharing transcript_id, or a single 'transcript_complete' boolean.
```

Or returned as a separate companion DataFrame indexed by `transcript_id`, so it doesn't bloat the main GTF DataFrame for callers that don't ask.

## Scope guard

- Arguably out of scope for a pure parser — "what's complete" is downstream policy. But the parse already scans every row to expand attributes; adding an aggregated boolean is cheap.
- The exact set of "required" features is opinionated. The proposal here is the common case (`CDS` rows imply you need `start_codon` and `stop_codon` for that transcript to be translatable).

## Related

- openvax/pyensembl#252 — TAIR fragments
- openvax/pyensembl#176 — `coding_sequence_position_ranges` initially didn't include stop codon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optional incompleteness flag for required features (start_codon / stop_codon / CDS pairs) #66

Background

Proposal

Scope guard

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Optional incompleteness flag for required features (start_codon / stop_codon / CDS pairs) #66

Description

Background

Proposal

Scope guard

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions