Hi Andrey,
Thank you for developing and maintaining this excellent tool.
I have a specific question regarding the reads assignment strategy when two isoforms originating from the same gene are highly similar.
Example: I am investigating the PARVB gene, which has the following two highly similar isoforms:
Reference/Known Isoform: PARVB-201
Novel Isoform identified by IsoQuant: 86796 (a novel isoform ID generated during the run)
These two isoforms appear to differ only slightly, specifically in their first and last exons.I extracted the reads assigned to both 86796 and PARVB-201 from the Embryo.transcript_model_reads.tsv.gz file and visualized them in IGV.
I found that a significant portion of the reads assigned to the novel isoform 86796 do not contain the unique first exon specific to that isoform.
In many cases, these reads seem structurally more compatible with the known PARVB-201 isoform (which might only differ in the terminal exon or lack the unique start/end features of the novel isoform 86796).
Given this observation, I am confused about the underlying mechanism of reads distribution:
Could you please clarify the strategy IsoQuant employs for assigning full-length reads when multiple competing isoforms from the same gene are nearly identical or when the reads only cover the shared internal regions?
Specifically:
- Does IsoQuant prioritize known (reference) isoforms over novel isoforms in cases of ambiguity?
- How does the model account for reads that support the bulk of a novel isoform's structure but fail to include a single defining feature (like a unique starting exon)?
- What is the recommended threshold or metric within IsoQuant's output to assess the confidence of reads assigned to a novel isoform, especially when the unique distinguishing features are missing in the assigned reads?
Understanding this assignment strategy is crucial for the biological validation of these novel isoforms.
Thank you very much for your time and assistance!
Best regards,
yijia
Hi Andrey,
Thank you for developing and maintaining this excellent tool.
I have a specific question regarding the reads assignment strategy when two isoforms originating from the same gene are highly similar.
Example: I am investigating the PARVB gene, which has the following two highly similar isoforms:
Reference/Known Isoform: PARVB-201
Novel Isoform identified by IsoQuant: 86796 (a novel isoform ID generated during the run)
These two isoforms appear to differ only slightly, specifically in their first and last exons.I extracted the reads assigned to both 86796 and PARVB-201 from the Embryo.transcript_model_reads.tsv.gz file and visualized them in IGV.
I found that a significant portion of the reads assigned to the novel isoform 86796 do not contain the unique first exon specific to that isoform.
In many cases, these reads seem structurally more compatible with the known PARVB-201 isoform (which might only differ in the terminal exon or lack the unique start/end features of the novel isoform 86796).
Given this observation, I am confused about the underlying mechanism of reads distribution:
Could you please clarify the strategy IsoQuant employs for assigning full-length reads when multiple competing isoforms from the same gene are nearly identical or when the reads only cover the shared internal regions?
Specifically:
Understanding this assignment strategy is crucial for the biological validation of these novel isoforms.
Thank you very much for your time and assistance!
Best regards,
yijia