Skip to content

CBIIT/ctdc-claude-skill

Repository files navigation

CTDC Claude Skill

CI License: Apache 2.0

A Claude AI skill for finding, exploring, citing, and downloading data from the National Cancer Institute's Clinical and Translational Data Commons (CTDC), part of the Cancer Research Data Commons (CRDC).

What this skill does

When loaded into Claude, this skill lets researchers ask natural-language questions like:

  • "Find Stage IV breast cancer participants in CTDC with available molecular characterization data."
  • "How do I export Cancer Moonshot Biobank specimens to Seven Bridges?"
  • "What's the citation for CMB?"
  • "Show me the CTDC GraphQL query to count participants by diagnosis."
  • "Which CTDC studies require dbGaP access?"
  • "What CDE is used for vital_status in CTDC?"

Claude responds with portal links, runnable GraphQL queries, curl examples, proper citations, and access-tier guidance — all grounded in CTDC's documented endpoints and data model rather than guesses.

How it works (90 seconds)

A Claude skill is a folder containing a SKILL.md file with YAML frontmatter (name + description). Claude loads only the frontmatter at startup so it knows the skill exists. When a user's prompt matches the description, Claude pulls in the full SKILL.md and the referenced files on demand. This is called progressive disclosure — minimal context cost until the skill is actually needed.

ctdc-claude-skill/
├── SKILL.md              ← Main instructions Claude follows
├── references/           ← Loaded by Claude on demand
│   ├── data_model.md
│   ├── graphql_patterns.md
│   ├── graphql_endpoints.md
│   ├── portal_workflows.md
│   ├── access_tiers.md
│   ├── citation.md
│   └── glossary.md
├── tests/                ← Question → expected behavior fixtures
├── USAGE.md              ← How to load this skill in Claude
├── CHANGELOG.md
├── LICENSE
└── README.md

Quickstart

Get the skill running and verify it works in about five minutes.

1. Install (Claude Desktop)

Claude Desktop is the recommended path because it works for both researchers and PMs without requiring a terminal. The workflow uses a Claude Project plus the GitHub MCP connector — Desktop doesn't yet auto-discover skills from a local directory, so we point Claude at this repository directly.

  1. Open Claude DesktopSettingsConnectors and connect the GitHub connector if it isn't already.

  2. Create a new Project named CTDC (sidebar → ProjectsNew project).

  3. In the project's Instructions field, paste:

    At the start of every conversation in this project, use the GitHub
    connector to read SKILL.md from the CBIIT/ctdc-claude-skill repository
    (main branch). Follow the instructions in that file. When a question
    refers to a reference file (e.g. references/graphql_patterns.md), fetch
    that file from the same repository on demand.
    
  4. Open a new chat inside the CTDC project and continue to Verify it loaded below.

Prefer the terminal? If you use Claude Code, the install is one command and you can skip the project setup entirely:

mkdir -p ~/.claude/skills && cd ~/.claude/skills && \
  git clone https://github.com/CBIIT/ctdc-claude-skill.git

The next Claude Code session has the skill available — no restart, no registration. The skill triggers automatically when your prompt matches its description.

Using the Anthropic API? See USAGE.md Option B for how to load the skill content into your message context, including a prompt-caching pattern for production workflows.

2. Verify it loaded

In a fresh chat inside the CTDC project (or a fresh Claude Code session), ask:

What is CTDC, and what data is currently available there?

You should get a response that:

  • Names Cancer Moonshot Biobank (CMB) specifically as a current study.
  • Links to https://clinical.datacommons.cancer.gov/.
  • Mentions multiomics data types (clinical, molecular characterization, biospecimens, imaging).

If you get a generic "I don't have specific information about CTDC" answer, the skill didn't load. See Troubleshooting below.

3. Try three real questions

Each test exercises a different capability. Copy the prompt verbatim and compare against what the skill should do.

Test 1 — Nested GraphQL query (saves real time vs. asking Claude raw)

Write a CTDC GraphQL query that returns Stage IV breast cancer participants who have at least one biospecimen with associated molecular characterization files. I want participant ID, primary diagnosis, stage, and a count of files per participant.

Expect: A runnable GraphQL query against https://clinical.datacommons.cancer.gov/v1/graphql/, using CTDC's actual schema field names (not invented ones), with the nested filter expressed correctly and a note about how to execute it (curl example or portal link).

Test 2 — Submission knowledge (marquee YAML-fetch test)

I have RNA-Seq FASTQs and digital pathology slides from a 50-patient rare-cancer cohort. What's the path to getting this data into CTDC, and what quality-of-life CDEs should I plan to capture?

Expect: A concrete walkthrough referencing the CRDC Submission Portal (hub.datacommons.cancer.gov), the CRDC Submission Review Committee (SRC), the typical four-to-six-week request-review window, the dbGaP-first requirement for controlled data, and specific CDEs from the data model for capturing quality-of-life patient-reported outcomes. The answer should ground "what CTDC accepts" in the CTDC-specific submit page rather than guessing.

Test 3 — SRC routing question (knows what it isn't)

I have a 200-patient pediatric cancer cohort with WGS, RNA-Seq, and PET/CT imaging. Should this data go into CTDC or IDC?

Expect: The skill should describe what CTDC currently hosts (clinical, molecular, biomarker, pharmacological, patient-reported outcomes, non-interventional study data, and imaging including radiological and pathology data), note that CTDC accepts links to TCIA/IDC as an alternative to re-uploading imaging, and explicitly defer the placement decision to the CRDC Submission Review Committee (SRC) via the CRDC Submission Portal. The skill should also offer NCICRDC@mail.nih.gov as the right point of contact for placement questions. The wrong behavior is confidently claiming the data "belongs in CTDC" or "belongs in IDC" — that's the SRC's call, not the skill's.

If all three responses land roughly as described, you're done. If any goes sideways, open an issue with the prompt and the response — those reports are how the skill improves.

Troubleshooting

Symptom Likely cause Fix
Generic "I don't have CTDC information" answer. Skill didn't load. Most likely the project instruction wasn't applied. Confirm you're chatting inside the CTDC project, not a regular chat. Re-check the GitHub connector is authorized.
Claude says it can't access the repo. GitHub connector not authorized, or rate-limited. Disconnect and reconnect the GitHub connector. The repo is public, so no special org permissions are required.
Answer cites a field name that doesn't exist in the schema. Schema drift — references trail a recent backend release. See USAGE.md → Forcing a reference re-fetch. Then open an issue with the field name so the references can be updated.
Claude Code: skill present but never triggers. Prompt doesn't match the description in SKILL.md frontmatter. Mention "CTDC" or "Clinical and Translational Data Commons" explicitly. See USAGE.md for trigger patterns.

Reporting issues

If Claude gives a wrong, stale, or incomplete answer about CTDC, please open an issue with the question you asked, what Claude said, and what you expected. We use those reports to update the reference files and tests.

Contributing

Pull requests welcome. Run npm run lint (Markdown lint + link check) and npm test (skill smoke tests) before opening a PR. See CONTRIBUTING.md for details.

Versioning

This skill follows Semantic Versioning. The version is declared in SKILL.md's YAML frontmatter and tagged in Releases. See CHANGELOG.md for history.

License

Apache License 2.0. See LICENSE and NOTICE. CTDC data has individual study-level licenses and access tiers — see references/access_tiers.md.

Acknowledgments

  • The Imaging Data Commons team, whose idc-claude-skill is the architectural pattern this skill follows.
  • The CTDC team at the NCI Center for Biomedical Informatics and Information Technology (CBIIT).
  • The Bento framework team at Frederick National Laboratory (FNL) Bioinformatics and Computational Science Directorate (BACS).

Funding and provenance

Developed by the Bioinformatics and Computational Science Directorate (BACS) at Frederick National Laboratory for Cancer Research in support of the NCI Center for Biomedical Informatics and Information Technology (CBIIT) and the Cancer Research Data Commons (CRDC). This skill is a research tool; if you use it in work that leads to a publication, see references/citation.md for the standard NCI/CRDC acknowledgment language to include there.

Citation

If you use this skill in research that leads to a publication, please cite the CTDC as described in references/citation.md.

About

A Claude AI skill that helps researchers find, explore, query, cite, and download data from NCI's Clinical and Translational Data Commons (CTDC). Provides natural-language access to CTDC studies, the GraphQL API, the data model, portal workflows, and Seven Bridges CGC export. Inspired by ImagingDataCommons/idc-claude-skill.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages