A Claude AI skill for finding, exploring, citing, and downloading data from the National Cancer Institute's Clinical and Translational Data Commons (CTDC), part of the Cancer Research Data Commons (CRDC).
When loaded into Claude, this skill lets researchers ask natural-language questions like:
- "Find Stage IV breast cancer participants in CTDC with available molecular characterization data."
- "How do I export Cancer Moonshot Biobank specimens to Seven Bridges?"
- "What's the citation for CMB?"
- "Show me the CTDC GraphQL query to count participants by diagnosis."
- "Which CTDC studies require dbGaP access?"
- "What CDE is used for
vital_statusin CTDC?"
Claude responds with portal links, runnable GraphQL queries, curl examples,
proper citations, and access-tier guidance — all grounded in CTDC's documented
endpoints and data model rather than guesses.
A Claude skill is a folder containing a SKILL.md file with YAML frontmatter
(name + description). Claude loads only the frontmatter at startup so it
knows the skill exists. When a user's prompt matches the description, Claude
pulls in the full SKILL.md and the referenced files on demand. This is called
progressive disclosure — minimal context cost until the skill is actually
needed.
ctdc-claude-skill/
├── SKILL.md ← Main instructions Claude follows
├── references/ ← Loaded by Claude on demand
│ ├── data_model.md
│ ├── graphql_patterns.md
│ ├── graphql_endpoints.md
│ ├── portal_workflows.md
│ ├── access_tiers.md
│ ├── citation.md
│ └── glossary.md
├── tests/ ← Question → expected behavior fixtures
├── USAGE.md ← How to load this skill in Claude
├── CHANGELOG.md
├── LICENSE
└── README.md
Get the skill running and verify it works in about five minutes.
Claude Desktop is the recommended path because it works for both researchers and PMs without requiring a terminal. The workflow uses a Claude Project plus the GitHub MCP connector — Desktop doesn't yet auto-discover skills from a local directory, so we point Claude at this repository directly.
-
Open Claude Desktop → Settings → Connectors and connect the GitHub connector if it isn't already.
-
Create a new Project named CTDC (sidebar → Projects → New project).
-
In the project's Instructions field, paste:
At the start of every conversation in this project, use the GitHub connector to read SKILL.md from the CBIIT/ctdc-claude-skill repository (main branch). Follow the instructions in that file. When a question refers to a reference file (e.g. references/graphql_patterns.md), fetch that file from the same repository on demand. -
Open a new chat inside the CTDC project and continue to Verify it loaded below.
Prefer the terminal? If you use Claude Code, the install is one command and you can skip the project setup entirely:
mkdir -p ~/.claude/skills && cd ~/.claude/skills && \
git clone https://github.com/CBIIT/ctdc-claude-skill.gitThe next Claude Code session has the skill available — no restart, no registration. The skill triggers automatically when your prompt matches its description.
Using the Anthropic API? See USAGE.md Option B for how to
load the skill content into your message context, including a prompt-caching
pattern for production workflows.
In a fresh chat inside the CTDC project (or a fresh Claude Code session), ask:
What is CTDC, and what data is currently available there?
You should get a response that:
- Names Cancer Moonshot Biobank (CMB) specifically as a current study.
- Links to
https://clinical.datacommons.cancer.gov/. - Mentions multiomics data types (clinical, molecular characterization, biospecimens, imaging).
If you get a generic "I don't have specific information about CTDC" answer, the skill didn't load. See Troubleshooting below.
Each test exercises a different capability. Copy the prompt verbatim and compare against what the skill should do.
Write a CTDC GraphQL query that returns Stage IV breast cancer participants who have at least one biospecimen with associated molecular characterization files. I want participant ID, primary diagnosis, stage, and a count of files per participant.
Expect: A runnable GraphQL query against
https://clinical.datacommons.cancer.gov/v1/graphql/, using CTDC's actual
schema field names (not invented ones), with the nested filter expressed
correctly and a note about how to execute it (curl example or portal
link).
I have RNA-Seq FASTQs and digital pathology slides from a 50-patient rare-cancer cohort. What's the path to getting this data into CTDC, and what quality-of-life CDEs should I plan to capture?
Expect: A concrete walkthrough referencing the CRDC Submission Portal
(hub.datacommons.cancer.gov), the
CRDC Submission Review Committee (SRC), the typical four-to-six-week
request-review window, the dbGaP-first requirement for controlled data,
and specific CDEs from the data model for capturing quality-of-life
patient-reported outcomes. The answer should ground "what CTDC accepts"
in the CTDC-specific submit page rather than guessing.
I have a 200-patient pediatric cancer cohort with WGS, RNA-Seq, and PET/CT imaging. Should this data go into CTDC or IDC?
Expect: The skill should describe what CTDC currently hosts (clinical,
molecular, biomarker, pharmacological, patient-reported outcomes,
non-interventional study data, and imaging including radiological and
pathology data), note that CTDC accepts links to TCIA/IDC as an
alternative to re-uploading imaging, and explicitly defer the placement
decision to the CRDC Submission Review Committee (SRC) via the CRDC
Submission Portal. The skill should also offer NCICRDC@mail.nih.gov as
the right point of contact for placement questions. The wrong behavior is
confidently claiming the data "belongs in CTDC" or "belongs in IDC" —
that's the SRC's call, not the skill's.
If all three responses land roughly as described, you're done. If any goes sideways, open an issue with the prompt and the response — those reports are how the skill improves.
| Symptom | Likely cause | Fix |
|---|---|---|
| Generic "I don't have CTDC information" answer. | Skill didn't load. Most likely the project instruction wasn't applied. | Confirm you're chatting inside the CTDC project, not a regular chat. Re-check the GitHub connector is authorized. |
| Claude says it can't access the repo. | GitHub connector not authorized, or rate-limited. | Disconnect and reconnect the GitHub connector. The repo is public, so no special org permissions are required. |
| Answer cites a field name that doesn't exist in the schema. | Schema drift — references trail a recent backend release. | See USAGE.md → Forcing a reference re-fetch. Then open an issue with the field name so the references can be updated. |
| Claude Code: skill present but never triggers. | Prompt doesn't match the description in SKILL.md frontmatter. |
Mention "CTDC" or "Clinical and Translational Data Commons" explicitly. See USAGE.md for trigger patterns. |
If Claude gives a wrong, stale, or incomplete answer about CTDC, please open an issue with the question you asked, what Claude said, and what you expected. We use those reports to update the reference files and tests.
Pull requests welcome. Run npm run lint (Markdown lint + link check) and
npm test (skill smoke tests) before opening a PR. See CONTRIBUTING.md for
details.
This skill follows Semantic Versioning. The version is
declared in SKILL.md's YAML frontmatter and tagged in
Releases. See
CHANGELOG.md for history.
Apache License 2.0. See LICENSE and NOTICE.
CTDC data has individual study-level licenses and
access tiers — see references/access_tiers.md.
- The Imaging Data Commons team, whose
idc-claude-skillis the architectural pattern this skill follows. - The CTDC team at the NCI Center for Biomedical Informatics and Information Technology (CBIIT).
- The Bento framework team at Frederick National Laboratory (FNL) Bioinformatics and Computational Science Directorate (BACS).
Developed by the Bioinformatics and Computational Science Directorate (BACS)
at Frederick National Laboratory for Cancer Research in support of the NCI
Center for Biomedical Informatics and Information Technology (CBIIT) and the
Cancer Research Data Commons (CRDC). This skill is a research tool; if you
use it in work that leads to a publication, see
references/citation.md for the standard NCI/CRDC
acknowledgment language to include there.
If you use this skill in research that leads to a publication, please cite the
CTDC as described in references/citation.md.