Skip to content

ClinVar ClinSig updates and resolving submissions ties (4/N)#285

Merged
pj-sullivan merged 4 commits into
mainfrom
pj-sullivan/submissions-ties
Jun 12, 2026
Merged

ClinVar ClinSig updates and resolving submissions ties (4/N)#285
pj-sullivan merged 4 commits into
mainfrom
pj-sullivan/submissions-ties

Conversation

@pj-sullivan

@pj-sullivan pj-sullivan commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Purpose/implementation Section

What feature is being added or bug is being addressed?

Separate PR since there are a lot of edits to scripts/resolve-clinvar-submissions.R
Updated the README since it was out of date.

What was your approach?

  • Expand recognized ClinVar significance categories — values now include low-penetrance P/LP, "Established/Likely risk allele" (Reported as "Risk allele"), "Uncertain risk allele" (reported as "Uncertain significance), and VUS-high/mid/low.
  • Refactor the consensus/tie-resolution logic (both with and without a concept-ID list). "Latest" and "Most severe" conflict resolution is now performed only for variants from the categories in which there is a tie (e.g., if there are 3 P/LP, 3 VUS, and 1 B/LB, only the P/LP and VUS are considered).
  • Output filename is now resolved-clinvar-{date}{-concept-conflict_res}.tsv.
  • Validate input file dates: --variant_summary and --submission_summary filenames must reference the same date, or the script errors out.
  • Final output filtering: drop rows with NA or non-actionable ClinicalSignificance (-, not provided, association, risk factor, drug response); dedupe by VariationID (keeping the latest evaluated) instead of vcf_id.

What GitHub issue does your pull request address?

Closes #283
Closes #282

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Download ClinVar files (I don't feel like this needs to be a data release?)

curl https://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/archive/variant_summary_2026-06.txt.gz -o data/variant_summary_2026-06.txt.gz
curl https://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/archive/submission_summary_2026-06.txt.gz -o data/submission_summary_2026-06.txt.gz

If you want to verify files:

md5sum data/*2026-06*
dcf6a458aa8fc850542a79ec9565271a  data/submission_summary_2026-06.txt.gz
53c1f539c36b0e7b5314e56788d963ab  data/variant_summary_2026-06.txt.gz

Run resolve clinvar using the cancer concept IDs (file from January is fine, concept IDs don't update often).

Rscript scripts/resolve-clinvar-intepretations.R --variant_summary data/variant_summary_2026-06.txt.gz --submission_summary data/submission_summary_2026-06.txt.gz --outdir refs --conceptID_list refs/clinvar_cancer_concept_ids_20260130.txt --conflict_res "latest"

Which areas should receive a particularly close look?

The output of the scripts/resolve-clinvar-intepretations.R and md5sum:

                           Benign              Benign/Likely benign 
                           209038                             93732 
                    Likely benign                 Likely pathogenic 
                          1098533                            112704 
Likely pathogenic, low penetrance                        Pathogenic 
                               13                            165698 
       Pathogenic, low penetrance      Pathogenic/Likely pathogenic 
                                1                             45791 
                      Risk allele            Uncertain significance 
                               24                           2353012 
                         VUS-high                           VUS-low 
                                1                                 1 

5c4d9e9a04ff05062ab661151369dce8  refs/resolved-clinvar-2026-06-cancer-latest.tsv

Is there anything that you want to discuss further?

Documentation Checklist

  • The function has examples to showcase the usage
  • Added a vignette

@pj-sullivan pj-sullivan requested review from jharenza and rjcorb June 10, 2026 14:33
@pj-sullivan pj-sullivan self-assigned this Jun 10, 2026
Base automatically changed from pj-sullivan/filter-clinvar to main June 10, 2026 19:44

@rjcorb rjcorb left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran resolve-clinvar-interpretations.R and got the same results. Updated code logic also looks good to me.

I've also updated the unadjusted InterVar call substring extraction to close #286.

@pj-sullivan pj-sullivan merged commit ca64343 into main Jun 12, 2026
1 check passed
@pj-sullivan pj-sullivan deleted the pj-sullivan/submissions-ties branch June 12, 2026 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: Account for new VUS and P/LP categories Update final resolution of ClinVar submissions to only consider most recurrent calls

3 participants