A unified metadata database and documentation system for the Canadian Community Health Survey (CCHS). Merges variable metadata from 8 data sources into a queryable DuckDB database exposed through an MCP server and command-line interface with 10 query tools.
| If you need... | Go to... | Coverage |
|---|---|---|
| Query variable metadata | MCP server | 16,899 variables across 251 datasets |
| Complete CCHS files | cchs-osf-docs/ |
2001-2023 (1,262 files) |
| Curated download | GitHub Releases | Core Master Collection ZIP |
| File catalog | data/catalog/ |
YAML catalog with file metadata |
git clone https://github.com/Big-Life-Lab/cchsflow-docs.git
cd cchsflow-docs
./scripts/setup.shThis installs dependencies, downloads a pre-built database from GitHub Releases, and configures the MCP server. Open the folder in Claude Code or another MCP-compatible client and start asking questions.
Requires Python 3.8+. No R installation needed.
Or query directly from the terminal without MCP:
python3 mcp-server/cli.py search smoking
python3 mcp-server/cli.py detail SMKDSTY
python3 mcp-server/cli.py summary- Setup guide and tutorials — Setup options, CLI reference, walkthrough, and task-oriented recipes
- Tool reference — Complete specification for all 10 tools
Database: 16,963 variables, 253 datasets, 8 data sources, cycles 2001-2023.
For developers contributing to the metadata database:
# Setup R packages (first time)
Rscript --vanilla -e "renv::restore()"
# Build the unified database (~2 min)
Rscript --vanilla database/build_db.RRequires R 4.2+ and the cchsflow-data repository cloned as a sibling directory. See architecture.md for the full data flow.
Download curated collections from GitHub Releases:
Core Master Collection (v1.1.0) - Essential English master documentation
- 129 files: Questionnaires, data dictionaries, user guides, derived variables
- English only, Master files only
- Years 2001-2023 (complete coverage)
- Canonical filenames for easy sharing
- Also available in: NotebookLM for AI-assisted exploration
Statistics Canada health survey documentation is scattered across multiple sources with inconsistent naming, incomplete coverage, and formats that aren't machine-readable. This repository consolidates that documentation and metadata into a unified, queryable system.
What this repo does:
- Unified metadata database — Merges variable definitions from DDI XML, PUMF RData files, Master SAS labels, ICES Data Dictionary, cchsflow worksheets, and extracted YAML into a single DuckDB database
- MCP and CLI query interface — 10 tools for searching variables, tracing them across cycles, comparing file types, and generating harmonisation rows. Available via MCP (for AI assistants) or the command-line interface (for direct use)
- Documentation catalog — 1,262 CCHS files with UIDs, provenance tracking, and curated collections via GitHub Releases
- Stable identifiers — The UID system gives every file a predictable, canonical name regardless of original source
- Full provenance — Every record traces to a specific data source with authority level
| Resource | Description |
|---|---|
| CCHS NotebookLM | AI assistant for exploring CCHS documentation |
| cchsflow | R package for harmonising CCHS variables across cycles |
| cchsflow-data | CCHS PUMF data files and DDI metadata from ODESSI |
| 613apps.ca | Population health applications using CCHS data |
cchsflow-docs/
├── database/
│ ├── cchs_metadata.duckdb # Unified database (gitignored build artefact)
│ ├── schema.sql # DuckDB schema (13 tables, 6 views)
│ └── build_db.R # Build script (Phase 0 → 1 → 2)
├── ingestion/
│ ├── ingest_pumf_rdata.R # Phase 1: PUMF RData → variable_datasets, value_codes
│ └── ingest_ddi_xml.R # Phase 2: DDI XML → question text, stats, groups
├── mcp-server/
│ ├── server.py # FastMCP v2 server (10 tools)
│ ├── cli.py # Standalone CLI (same 10 queries, no MCP needed)
│ └── requirements.txt
├── data/
│ ├── sources.csv # Data source registry (6 sources)
│ ├── datasets.csv # Dataset definitions (251 datasets)
│ ├── variables.csv # Variable registry (16,899 variables)
│ ├── catalog/
│ │ └── cchs_catalog.yaml # Document-level metadata (1,262 entries)
│ └── manifests/ # Collection manifests for GitHub Releases
├── development/
│ ├── architecture/ # Design rationale and proposals
│ └── ontology/ # Variable relationship modelling (in progress)
├── docs/
│ ├── mcp-guide.md # MCP tool tutorials and workflow examples
│ ├── mcp-reference.md # MCP tool specifications (all 10 tools)
│ ├── architecture.md # System architecture and data flow
│ ├── uid-system.md # UID specification
│ └── glossary.md # CCHS terminology
└── cchs-osf-docs/ # CCHS documentation mirror (gitignored)
Master vs Share files
- Master files: Full survey documentation for Research Data Centres (RDCs). Complete questionnaires, full data dictionaries, unrestricted variables.
- Share files: Public-use subsets with privacy protection. Subset of variables, some aggregated or suppressed.
Temporal types
- Single-year (s): Standard annual surveys (most common after 2007)
- Dual-year (d): Two-year combined data collections (2007-2008, 2009-2010, etc.)
- Multi-year (m): Multi-year pooled surveys (less common)
Document categories
- Questionnaires (qu): Survey instruments with all questions asked
- Data dictionaries (dd): Variable definitions, codes, and frequencies
- User guides (ug): Methodology, sampling, weighting instructions
- Derived variables (dv): Documentation of calculated/constructed variables
- Record layouts (rl): File structure and variable positions
- Syntax files: SAS/SPSS/Stata code for data processing
The CCHS UID system provides unique identifiers for documentation files:
cchs-{year}{temporal}-{doc_type}-{category}-[{subcategory}-]{language}-{extension}-{sequence:02d}
Examples:
cchs-2009d-m-questionnaire-e-pdf-01 # 2009 dual-year, master questionnaire, English PDF
cchs-2015s-s-data-dictionary-f-docx-01 # 2015 single-year, share data dictionary, French WordSee docs/uid-system.md for the full specification.
The Canadian Community Health Survey (CCHS) is conducted by Statistics Canada. Survey data and documentation are accessed and adapted in accordance with the Statistics Canada Open Licence.
Source: Statistics Canada, Canadian Community Health Survey (CCHS). Reproduced and distributed on an "as is" basis with the permission of Statistics Canada.
Adapted from: Statistics Canada survey documentation. This does not constitute an endorsement by Statistics Canada of this product.
For information about accessing CCHS data, visit: