One command. Three methods. Automatic clinical diagnosis from item-level response data.
You provide the scale. The agent runs cut-off, IRT, and DCM — and writes the report.
Given item-level response data from any psychological scale, the agent applies three complementary diagnostic methods and generates a structured markdown report with prevalence estimates, method comparisons, and plain-language clinical interpretation.
See details of data analysis in the preprint on OSF - AutoPsychDx: An LLM Agent Framework for Automated Psychometric Diagnosis Using Multi-Method Classification (Zhang, 2026).
Psychometric diagnosis requires expertise across multiple frameworks. Researchers must manually choose between sum-score cut-offs, IRT, and DCM — each with different software, assumptions, and outputs. Reconciling disagreements between methods takes additional effort, and generating readable reports requires even more.
What if an LLM agent could do all of this automatically?
- 🚀 Runs all three methods — cut-off, IRT, and DCM in a single command
- 📋 Validates inputs automatically — checks item IDs, response ranges, and missing data
- 💬 Flags method disagreements — highlights ambiguous cases where methods diverge
- 📊 Generates a full report — prevalence tables, item content, and clinical interpretation
- 🔄 Works with any scale — instrument-agnostic, configured via a simple
items.csv
You set up the project folder. The agent diagnoses, interprets, and reports.
A person is flagged as diagnosed if at least 2 of 3 methods agree (consensus diagnosis). Persons where only 1 method flags them are marked ambiguous in the report.
Requirements: Claude Code, tmux, R ≥ 4.0, Python ≥ 3.10.
Step 1 — Install pipx
| Platform | Command |
|---|---|
| macOS | brew install pipx && pipx ensurepath |
| Linux | python3 -m pip install --user pipx && python3 -m pipx ensurepath |
| Windows | python -m pip install --user pipx && python -m pipx ensurepath |
Step 2 — Install R
Download from https://cran.r-project.org or:
| Platform | Command |
|---|---|
| macOS | brew install r |
| Linux (Debian/Ubuntu) | sudo apt install r-base |
| Windows | Installer from CRAN |
Then install R packages:
install.packages(c("mirt", "CDM"))Step 3 — Install the diagnosis command
git clone https://github.com/JihongZ/AutoPsychDx
cd AutoPsychDx
pipx install -e . # use pipx, not pipVerify:
diagnosis --help| Command | Description |
|---|---|
diagnosis compile <folder> |
Generate items.csv from responses.csv |
diagnosis run <folder> |
Run the full diagnosis pipeline (spinner, blocks until done) |
diagnosis run <folder> --clear |
Delete Output/ then re-run |
diagnosis clean <folder> |
Remove generated Output/ directory |
diagnosis clean <folder> --all |
Also remove items.csv (responses.csv is never deleted) |
diagnosis attach <name> |
Attach to a running tmux session to watch live output |
diagnosis ls |
List all active diagnosis sessions |
diagnosis kill <name> |
Stop a running session |
diagnosis version |
Show installed version |
<folder> is the path to your project folder (e.g. Projects/PTSD_Forbes2018).
<name> is the folder name only (e.g. PTSD_Forbes2018).
Both
compileandrunrun the agent inside a tmux session in the background and show a spinner until finished. Usediagnosis attach <name>at any time to watch the agent output live.
The minimum setup is a single file: responses.csv (cleaned item response data). Run diagnosis compile to generate items.csv automatically — this adds item metadata that produces a richer, more interpretable report.
A CSV where each row is a person and each column is an item (column names = item IDs):
GAD1,GAD2,GAD3,GAD4,GAD5,GAD6,GAD7
0,1,0,2,1,0,1
1,2,1,3,2,1,2
...Place this file directly in your project folder. If you need to extract items from a larger dataset, write a prepare_responses.R script (see below).
items.csv adds item metadata (full wording, scale name, validated cut-off) that makes the report more meaningful. There are three ways to create it — choose whichever fits your workflow:
diagnosis compile Projects/your_studyThe agent reads responses.csv, infers scale name, response range, and cut-off from the data, and writes items.csv automatically. Review the output — inferred values (especially scale name and cut-off) may need manual correction.
Write a script that builds both responses.csv and items.csv from your raw dataset. Run it once before diagnosis run. Best when raw data is messy (mixed demographic columns, wide format, or downloaded from an external source).
R — local file
# prepare_responses.R — extract GAD-7 items from a local wide-format dataset
# 1. Load raw data
raw <- read.csv("raw_data.csv") # e.g. 500 rows × 80 columns (demographics + items)
# 2. Define items
item_ids <- c("GAD1", "GAD2", "GAD3", "GAD4", "GAD5", "GAD6", "GAD7")
item_texts <- c(
"Feeling nervous, anxious, or on edge",
"Not being able to stop or control worrying",
"Worrying too much about different things",
"Trouble relaxing",
"Being so restless that it's hard to sit still",
"Becoming easily annoyed or irritable",
"Feeling afraid as if something awful might happen"
)
# 3. Extract and write responses.csv
responses <- raw[, item_ids]
write.csv(responses, "responses.csv", row.names = FALSE)
message("Saved responses.csv: ", nrow(responses), " persons x ", ncol(responses), " items")
# 4. Write items.csv
items <- data.frame(
item_id = item_ids,
item_text = item_texts,
scale = "GAD-7",
cutoff = 10,
response_min = 0,
response_max = 3
)
write.csv(items, "items.csv", row.names = FALSE)
message("Saved items.csv: ", nrow(items), " items")Rscript Projects/your_study/prepare_responses.RR — download from the internet (e.g. OSF)
# prepare_responses.R — download PHQ-9 data from OSF and extract items
library(osfr) # install.packages("osfr") if needed
# 1. Download raw data
osf_retrieve_file("https://osf.io/abc123") |>
osf_download(path = ".", conflicts = "overwrite")
raw <- read.csv("raw_data.csv")
# 2. Define items
item_ids <- paste0("PHQ", 1:9)
item_texts <- c(
"Little interest or pleasure in doing things",
"Feeling down, depressed, or hopeless",
"Trouble falling or staying asleep, or sleeping too much",
"Feeling tired or having little energy",
"Poor appetite or overeating",
"Feeling bad about yourself",
"Trouble concentrating on things",
"Moving or speaking slowly / being fidgety or restless",
"Thoughts that you would be better off dead"
)
# 3. Extract and write responses.csv
responses <- raw[, item_ids]
write.csv(responses, "responses.csv", row.names = FALSE)
message("Saved responses.csv: ", nrow(responses), " persons x ", ncol(responses), " items")
# 4. Write items.csv
items <- data.frame(
item_id = item_ids,
item_text = item_texts,
scale = "PHQ-9",
cutoff = 10,
response_min = 0,
response_max = 3
)
write.csv(items, "items.csv", row.names = FALSE)
message("Saved items.csv: ", nrow(items), " items")Rscript Projects/your_study/prepare_responses.RCreate items.csv directly in Excel, Google Sheets, or any spreadsheet editor. Save as CSV and place it in the project folder.
Required columns:
| Column | Description |
|---|---|
item_id |
Unique ID matching column names in responses.csv (e.g. PCL1) |
item_text |
Full item wording as shown to respondents |
scale |
Scale name used to group items and label outputs (e.g. PCL-5) |
cutoff |
Validated sum-score cut-off for this scale (repeat for all rows in the scale) |
response_min |
Minimum response value (e.g. 0) |
response_max |
Maximum response value (e.g. 4) |
item_id,item_text,scale,cutoff,response_min,response_max
PCL1,Repeated disturbing and unwanted memories of the stressful experience,PCL-5,33,0,4
PCL2,Repeated disturbing dreams of the stressful experience,PCL-5,33,0,4diagnosis run Projects/your_study responses.csv
│
▼
diagnosis compile <folder> ← infers metadata, writes items.csv
│
▼
items.csv + responses.csv
│
▼
diagnosis run <folder>
│
├─── Method A: Sum Score Cut-off ──► dx_cutoff (0/1)
│
├─── Method B: IRT (Graded Response Model) ──► dx_irt (0/1) + θ ± SE
│
└─── Method C: DCM (CDM::gdm) ──► dx_dcm (0/1) + P(diagnosed)
│
▼
Consensus: diagnosed if ≥ 2/3 methods agree
│
▼
Output/
[scale]_diagnosis.R ← generated R script
[scale]_diagnosis_results.csv ← person-level results
[scale]_diagnosis_output.txt ← raw report text
diagnosis_report.md ← full report with interpretation
Both commands run inside a tmux session in the background and block with a spinner until done. Use diagnosis attach <name> to watch live output at any time. Skill files in diagnosis/skills/ define the agent workflow — edit them to change behaviour for all projects.
Projects/PTSD_Forbes2018/ demonstrates the workflow using publicly available PHQ-9 data from Forbes et al. (2018).
diagnosis run Projects/PTSD_Forbes2018prepare_responses.R downloads the data from OSF automatically on first run. All output is written to Projects/PTSD_Forbes2018/Output/.
|
Agent running in tmux |
Generated diagnosis_report.md |
Projects/Anxiety_GAD_Forbes2018/ demonstrates the two-command workflow using publicly available GAD-7 data from Forbes et al. (2018). responses.csv is placed directly in the project folder — no prepare_responses.R needed.
Step 1 — Generate items.csv
❯ diagnosis compile Projects/Anxiety_GAD_Forbes2018
Found responses.csv — using it as raw data.
╭─────────────────────────── Generate items.csv ───────────────────────────╮
│ Project: .../Projects/Anxiety_GAD_Forbes2018 │
│ Raw data: responses.csv │
╰───────────────────────────────────────────────────────────────────────────╯
Session: diagnosis-anxiety-gad-forbes2018-compile
Attach: tmux attach -t diagnosis-anxiety-gad-forbes2018-compile
Kill: diagnosis kill Anxiety_GAD_Forbes2018
✓ Agent finished.
items.csv created.
Step 2 — Run diagnosis
❯ diagnosis run Projects/Anxiety_GAD_Forbes2018
╭──────────────────────── Psychometric Diagnosis Agent ────────────────────╮
│ Project: .../Projects/Anxiety_GAD_Forbes2018 │
│ Version: 0.1.1 │
╰───────────────────────────────────────────────────────────────────────────╯
Session: diagnosis-anxiety-gad-forbes2018
Attach: tmux attach -t diagnosis-anxiety-gad-forbes2018
Kill: diagnosis kill Anxiety_GAD_Forbes2018
✓ Agent finished.
All output is written to Projects/Anxiety_GAD_Forbes2018/Output/.
Note
Current Limitations
- Unidimensional only: All three methods currently assume a single latent construct. Multidimensional scales (e.g., instruments with subscales measuring distinct attributes) are not yet supported.
- DCM: Only the general diagnostic model (
CDM::gdm) is supported. LCDM, GDINA, DINA, and DINO are not yet implemented. - IRT: Limited to the Graded Response Model. Requires complete responses — handle missing data in
prepare_responses.Rbefore running.
diagnosis compile— new command that generatesitems.csvautomatically fromresponses.csv. The agent infers item IDs, scale name, response range, and cut-off from the data without manual setup.diagnosis clean— new command to remove generated outputs. DeletesOutput/by default;--allalso removesitems.csv.responses.csvis never touched.responses.csvas direct input — drop a response matrix into the project folder and rundiagnosis compiledirectly.prepare_responses.Ris now optional (only needed when extracting columns from a larger dataset).- Auto-exit tmux sessions — agent sessions now close automatically when the agent finishes. No keypress required.
- Consistent CLI — both
compileandrunrun in the background with a spinner and block until done. Usediagnosis attach <name>to watch live output at any time.
- Initial release:
diagnosis runwith cut-off, IRT (GRM), and DCM (GDM) methods - Consensus diagnosis (≥ 2/3 methods agree)
- Structured markdown report with prevalence, method comparison, and clinical interpretation
- tmux-based agent sessions with attach/kill/list commands
| Phase | Feature | Details |
|---|---|---|
| v0.2 | Multidimensional DCMs |
|
| v0.3 | Multidimensional IRT |
|
| v0.4 | Additional IRT models |
|
| v0.5 | Missing data & robustness |
|
| Future | Multi-backend & validation |
|
- Forbes et al. (2018) — PHQ-9 and GAD-7 community sample dataset used in the example project
mirt— R package for Item Response Theory (IRT) modelsCDM— R package for Cognitive Diagnosis Models (DCM)- ClawTeam — architectural inspiration for the tmux-based agent CLI
- Claude Code — LLM agent runtime
MIT License — free to use, modify, and distribute. See LICENSE.
.
├── diagnosis/ # Python package (pipx install -e .)
│ ├── cli.py # Typer CLI: compile / run / clean / attach / kill / ls / version
│ ├── tmux.py # tmux session management
│ ├── skill_loader.py # loads bundled skill files
│ ├── __init__.py
│ ├── __main__.py
│ └── skills/
│ ├── diagnosis.md # agent workflow definition
│ ├── generate-items.md # items.csv generation workflow
│ └── psychometric-diagnosis.md # R function definitions
├── pyproject.toml # package metadata and entry point
├── .claude/
│ └── commands/ # same skills as Claude Code slash commands
├── Projects/
│ └── PTSD_Forbes2018/ # example project
│ ├── items.csv # item metadata
│ ├── prepare_responses.R # downloads OSF data → responses.csv
│ └── Output/ # auto-generated (git-ignored)
├── Screenshots/
│ ├── Diagnosis_PTSD.png
│ └── Diagnosis_Report.png
└── README.md
AutoPsychDx
Cut-off · IRT · DCM · One Command
If you find this project useful, please consider giving it a ⭐

