🧠 AutoPsychDx

One command. Three methods. Automatic clinical diagnosis from item-level response data.

You provide the scale. The agent runs cut-off, IRT, and DCM — and writes the report.

Given item-level response data from any psychological scale, the agent applies three complementary diagnostic methods and generates a structured markdown report with prevalence estimates, method comparisons, and plain-language clinical interpretation.

See details of data analysis in the preprint on OSF - AutoPsychDx: An LLM Agent Framework for Automated Psychometric Diagnosis Using Multi-Method Classification (Zhang, 2026).

🤔 Why This Tool?

Psychometric diagnosis requires expertise across multiple frameworks. Researchers must manually choose between sum-score cut-offs, IRT, and DCM — each with different software, assumptions, and outputs. Reconciling disagreements between methods takes additional effort, and generating readable reports requires even more.

What if an LLM agent could do all of this automatically?

🚀 Runs all three methods — cut-off, IRT, and DCM in a single command
📋 Validates inputs automatically — checks item IDs, response ranges, and missing data
💬 Flags method disagreements — highlights ambiguous cases where methods diverge
📊 Generates a full report — prevalence tables, item content, and clinical interpretation
🔄 Works with any scale — instrument-agnostic, configured via a simple items.csv

✨ The Result?

You set up the project folder. The agent diagnoses, interprets, and reports.

📐 Diagnostic Methods

📏 Sum Score Cut-off

Sums item responses per person and compares to a validated clinical threshold.

Key parameter: cutoff (from items.csv)

Best for: Quick screening when a published cut-off exists (e.g. PHQ-9 ≥ 10, PCL-5 ≥ 33)

📈 Item Response Theory

Fits a Graded Response Model using mirt. Estimates latent trait θ with standard errors per person.

Key parameter: theta_cutoff (default: 0)

Best for: Scales with items of varying quality; when measurement uncertainty matters

🔬 Diagnostic Classification

Fits a log-linear cognitive diagnosis model using CDM::gdm. Returns posterior class membership probability per person.

Key parameter: prob_cutoff (default: 0.5)

Best for: When the construct is naturally categorical (present/absent)

A person is flagged as diagnosed if at least 2 of 3 methods agree (consensus diagnosis). Persons where only 1 method flags them are marked ambiguous in the report.

⚡ Quick Start

Requirements: Claude Code, tmux, R ≥ 4.0, Python ≥ 3.10.

Step 1 — Install pipx

Platform	Command
macOS	`brew install pipx && pipx ensurepath`
Linux	`python3 -m pip install --user pipx && python3 -m pipx ensurepath`
Windows	`python -m pip install --user pipx && python -m pipx ensurepath`

Step 2 — Install R

Download from https://cran.r-project.org or:

Platform	Command
macOS	`brew install r`
Linux (Debian/Ubuntu)	`sudo apt install r-base`
Windows	Installer from CRAN

Then install R packages:

install.packages(c("mirt", "CDM"))

Step 3 — Install the diagnosis command

git clone https://github.com/JihongZ/AutoPsychDx
cd AutoPsychDx
pipx install -e .   # use pipx, not pip

Verify:

diagnosis --help

🖥️ CLI Reference

Command	Description
`diagnosis compile <folder>`	Generate `items.csv` from `responses.csv`
`diagnosis run <folder>`	Run the full diagnosis pipeline (spinner, blocks until done)
`diagnosis run <folder> --clear`	Delete `Output/` then re-run
`diagnosis clean <folder>`	Remove generated `Output/` directory
`diagnosis clean <folder> --all`	Also remove `items.csv` (responses.csv is never deleted)
`diagnosis attach <name>`	Attach to a running tmux session to watch live output
`diagnosis ls`	List all active diagnosis sessions
`diagnosis kill <name>`	Stop a running session
`diagnosis version`	Show installed version

<folder> is the path to your project folder (e.g. Projects/PTSD_Forbes2018). <name> is the folder name only (e.g. PTSD_Forbes2018).

Both compile and run run the agent inside a tmux session in the background and show a spinner until finished. Use diagnosis attach <name> at any time to watch the agent output live.

🛠️ How to Use

The minimum setup is a single file: responses.csv (cleaned item response data). Run diagnosis compile to generate items.csv automatically — this adds item metadata that produces a richer, more interpretable report.

Step 1 — Prepare `responses.csv`

A CSV where each row is a person and each column is an item (column names = item IDs):

GAD1,GAD2,GAD3,GAD4,GAD5,GAD6,GAD7
0,1,0,2,1,0,1
1,2,1,3,2,1,2
...

Place this file directly in your project folder. If you need to extract items from a larger dataset, write a prepare_responses.R script (see below).

Step 2 — Generate `items.csv` (three options)

items.csv adds item metadata (full wording, scale name, validated cut-off) that makes the report more meaningful. There are three ways to create it — choose whichever fits your workflow:

Option 1 — `diagnosis compile` (quickest)

diagnosis compile Projects/your_study

The agent reads responses.csv, infers scale name, response range, and cut-off from the data, and writes items.csv automatically. Review the output — inferred values (especially scale name and cut-off) may need manual correction.

Option 2 — Script (`prepare_responses.R` or Python)

Write a script that builds both responses.csv and items.csv from your raw dataset. Run it once before diagnosis run. Best when raw data is messy (mixed demographic columns, wide format, or downloaded from an external source).

R — local file

# prepare_responses.R — extract GAD-7 items from a local wide-format dataset

# 1. Load raw data
raw <- read.csv("raw_data.csv")   # e.g. 500 rows × 80 columns (demographics + items)

# 2. Define items
item_ids   <- c("GAD1", "GAD2", "GAD3", "GAD4", "GAD5", "GAD6", "GAD7")
item_texts <- c(
  "Feeling nervous, anxious, or on edge",
  "Not being able to stop or control worrying",
  "Worrying too much about different things",
  "Trouble relaxing",
  "Being so restless that it's hard to sit still",
  "Becoming easily annoyed or irritable",
  "Feeling afraid as if something awful might happen"
)

# 3. Extract and write responses.csv
responses <- raw[, item_ids]
write.csv(responses, "responses.csv", row.names = FALSE)
message("Saved responses.csv: ", nrow(responses), " persons x ", ncol(responses), " items")

# 4. Write items.csv
items <- data.frame(
  item_id      = item_ids,
  item_text    = item_texts,
  scale        = "GAD-7",
  cutoff       = 10,
  response_min = 0,
  response_max = 3
)
write.csv(items, "items.csv", row.names = FALSE)
message("Saved items.csv: ", nrow(items), " items")

Rscript Projects/your_study/prepare_responses.R

R — download from the internet (e.g. OSF)

# prepare_responses.R — download PHQ-9 data from OSF and extract items

library(osfr)   # install.packages("osfr") if needed

# 1. Download raw data
osf_retrieve_file("https://osf.io/abc123") |>
  osf_download(path = ".", conflicts = "overwrite")
raw <- read.csv("raw_data.csv")

# 2. Define items
item_ids   <- paste0("PHQ", 1:9)
item_texts <- c(
  "Little interest or pleasure in doing things",
  "Feeling down, depressed, or hopeless",
  "Trouble falling or staying asleep, or sleeping too much",
  "Feeling tired or having little energy",
  "Poor appetite or overeating",
  "Feeling bad about yourself",
  "Trouble concentrating on things",
  "Moving or speaking slowly / being fidgety or restless",
  "Thoughts that you would be better off dead"
)

# 3. Extract and write responses.csv
responses <- raw[, item_ids]
write.csv(responses, "responses.csv", row.names = FALSE)
message("Saved responses.csv: ", nrow(responses), " persons x ", ncol(responses), " items")

# 4. Write items.csv
items <- data.frame(
  item_id      = item_ids,
  item_text    = item_texts,
  scale        = "PHQ-9",
  cutoff       = 10,
  response_min = 0,
  response_max = 3
)
write.csv(items, "items.csv", row.names = FALSE)
message("Saved items.csv: ", nrow(items), " items")

Rscript Projects/your_study/prepare_responses.R

Option 3 — Manual spreadsheet (most control)

Create items.csv directly in Excel, Google Sheets, or any spreadsheet editor. Save as CSV and place it in the project folder.

Required columns:

Column	Description
`item_id`	Unique ID matching column names in `responses.csv` (e.g. `PCL1`)
`item_text`	Full item wording as shown to respondents
`scale`	Scale name used to group items and label outputs (e.g. `PCL-5`)
`cutoff`	Validated sum-score cut-off for this scale (repeat for all rows in the scale)
`response_min`	Minimum response value (e.g. `0`)
`response_max`	Maximum response value (e.g. `4`)

item_id,item_text,scale,cutoff,response_min,response_max
PCL1,Repeated disturbing and unwanted memories of the stressful experience,PCL-5,33,0,4
PCL2,Repeated disturbing dreams of the stressful experience,PCL-5,33,0,4

Step 3 — Run diagnosis

diagnosis run Projects/your_study

🏗️ How It Works

  responses.csv
       │
       ▼
  diagnosis compile <folder>          ← infers metadata, writes items.csv
       │
       ▼
  items.csv  +  responses.csv
       │
       ▼
  diagnosis run <folder>
       │
       ├─── Method A: Sum Score Cut-off ──► dx_cutoff (0/1)
       │
       ├─── Method B: IRT (Graded Response Model) ──► dx_irt (0/1) + θ ± SE
       │
       └─── Method C: DCM (CDM::gdm) ──► dx_dcm (0/1) + P(diagnosed)
                      │
                      ▼
          Consensus: diagnosed if ≥ 2/3 methods agree
                      │
                      ▼
       Output/
         [scale]_diagnosis.R           ← generated R script
         [scale]_diagnosis_results.csv ← person-level results
         [scale]_diagnosis_output.txt  ← raw report text
         diagnosis_report.md           ← full report with interpretation

Both commands run inside a tmux session in the background and block with a spinner until done. Use diagnosis attach <name> to watch live output at any time. Skill files in diagnosis/skills/ define the agent workflow — edit them to change behaviour for all projects.

📊 Example: Depression Screening (Forbes 2018)

Projects/PTSD_Forbes2018/ demonstrates the workflow using publicly available PHQ-9 data from Forbes et al. (2018).

diagnosis run Projects/PTSD_Forbes2018

prepare_responses.R downloads the data from OSF automatically on first run. All output is written to Projects/PTSD_Forbes2018/Output/.

Agent running in tmux

Generated diagnosis_report.md

📊 Example: Anxiety Screening (Forbes 2018)

Projects/Anxiety_GAD_Forbes2018/ demonstrates the two-command workflow using publicly available GAD-7 data from Forbes et al. (2018). responses.csv is placed directly in the project folder — no prepare_responses.R needed.

Step 1 — Generate items.csv

❯ diagnosis compile Projects/Anxiety_GAD_Forbes2018
Found responses.csv — using it as raw data.
╭─────────────────────────── Generate items.csv ───────────────────────────╮
│ Project: .../Projects/Anxiety_GAD_Forbes2018                              │
│ Raw data: responses.csv                                                   │
╰───────────────────────────────────────────────────────────────────────────╯
  Session: diagnosis-anxiety-gad-forbes2018-compile
  Attach:  tmux attach -t diagnosis-anxiety-gad-forbes2018-compile
  Kill:    diagnosis kill Anxiety_GAD_Forbes2018

  ✓  Agent finished.
items.csv created.

Step 2 — Run diagnosis

❯ diagnosis run Projects/Anxiety_GAD_Forbes2018
╭──────────────────────── Psychometric Diagnosis Agent ────────────────────╮
│ Project: .../Projects/Anxiety_GAD_Forbes2018                              │
│ Version: 0.1.1                                                            │
╰───────────────────────────────────────────────────────────────────────────╯
  Session: diagnosis-anxiety-gad-forbes2018
  Attach:  tmux attach -t diagnosis-anxiety-gad-forbes2018
  Kill:    diagnosis kill Anxiety_GAD_Forbes2018

  ✓  Agent finished.

All output is written to Projects/Anxiety_GAD_Forbes2018/Output/.

Note

Current Limitations

Unidimensional only: All three methods currently assume a single latent construct. Multidimensional scales (e.g., instruments with subscales measuring distinct attributes) are not yet supported.
DCM: Only the general diagnostic model (CDM::gdm) is supported. LCDM, GDINA, DINA, and DINO are not yet implemented.
IRT: Limited to the Graded Response Model. Requires complete responses — handle missing data in prepare_responses.R before running.

📝 Changelog

v0.1.1

diagnosis compile — new command that generates items.csv automatically from responses.csv. The agent infers item IDs, scale name, response range, and cut-off from the data without manual setup.
diagnosis clean — new command to remove generated outputs. Deletes Output/ by default; --all also removes items.csv. responses.csv is never touched.
responses.csv as direct input — drop a response matrix into the project folder and run diagnosis compile directly. prepare_responses.R is now optional (only needed when extracting columns from a larger dataset).
Auto-exit tmux sessions — agent sessions now close automatically when the agent finishes. No keypress required.
Consistent CLI — both compile and run run in the background with a spinner and block until done. Use diagnosis attach <name> to watch live output at any time.

v0.1.0

Initial release: diagnosis run with cut-off, IRT (GRM), and DCM (GDM) methods
Consensus diagnosis (≥ 2/3 methods agree)
Structured markdown report with prevalence, method comparison, and clinical interpretation
tmux-based agent sessions with attach/kill/list commands

🗺️ Roadmap

Phase	Feature	Details
v0.2	Multidimensional DCMs	User-specified Q-matrix in `items.csv` (item × attribute mapping) LCDM via `CDM::gdm` with multi-attribute Q-matrix GDINA via `GDINA::GDINA` for flexible item–attribute interactions DINA / DINO as constrained special cases Attribute-level diagnostic profiles per person
v0.3	Multidimensional IRT	MIRT (multidimensional GRM) via `mirt` with exploratory or confirmatory specification Subscale-level θ estimates with SEs Per-subscale cut-off and consensus diagnosis
v0.4	Additional IRT models	2PL for binary items GPCM / PCM for alternative polytomous models Automatic model selection based on item format
v0.5	Missing data & robustness	Full-information maximum likelihood (FIML) for IRT Multiple imputation support Longitudinal multi-timepoint comparison
Future	Multi-backend & validation	Support for additional LLM backends (OpenAI, Gemini) External validation against structured clinical interviews Automated sensitivity analysis across cut-off thresholds

📖 Acknowledgements

Forbes et al. (2018) — PHQ-9 and GAD-7 community sample dataset used in the example project
mirt — R package for Item Response Theory (IRT) models
CDM — R package for Cognitive Diagnosis Models (DCM)
ClawTeam — architectural inspiration for the tmux-based agent CLI
Claude Code — LLM agent runtime

📄 License

MIT License — free to use, modify, and distribute. See LICENSE.

🗂️ Project Structure

.
├── diagnosis/                           # Python package (pipx install -e .)
│   ├── cli.py                           # Typer CLI: compile / run / clean / attach / kill / ls / version
│   ├── tmux.py                          # tmux session management
│   ├── skill_loader.py                  # loads bundled skill files
│   ├── __init__.py
│   ├── __main__.py
│   └── skills/
│       ├── diagnosis.md                 # agent workflow definition
│       ├── generate-items.md            # items.csv generation workflow
│       └── psychometric-diagnosis.md   # R function definitions
├── pyproject.toml                       # package metadata and entry point
├── .claude/
│   └── commands/                        # same skills as Claude Code slash commands
├── Projects/
│   └── PTSD_Forbes2018/                 # example project
│       ├── items.csv                    # item metadata
│       ├── prepare_responses.R          # downloads OSF data → responses.csv
│       └── Output/                      # auto-generated (git-ignored)
├── Screenshots/
│   ├── Diagnosis_PTSD.png
│   └── Diagnosis_Report.png
└── README.md

AutoPsychDx

Cut-off · IRT · DCM · One Command

If you find this project useful, please consider giving it a ⭐

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.claude/commands		.claude/commands
.github/workflows		.github/workflows
Projects		Projects
Screenshots		Screenshots
diagnosis		diagnosis
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 AutoPsychDx

🤔 Why This Tool?

✨ The Result?

📐 Diagnostic Methods

📏 Sum Score Cut-off

📈 Item Response Theory

🔬 Diagnostic Classification

⚡ Quick Start

🖥️ CLI Reference

🛠️ How to Use

Step 1 — Prepare `responses.csv`

Step 2 — Generate `items.csv` (three options)

Option 1 — `diagnosis compile` (quickest)

Option 2 — Script (`prepare_responses.R` or Python)

Option 3 — Manual spreadsheet (most control)

Step 3 — Run diagnosis

🏗️ How It Works

📊 Example: Depression Screening (Forbes 2018)

📊 Example: Anxiety Screening (Forbes 2018)

📝 Changelog

v0.1.1

v0.1.0

🗺️ Roadmap

📖 Acknowledgements

📄 License

🗂️ Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 AutoPsychDx

🤔 Why This Tool?

✨ The Result?

📐 Diagnostic Methods

📏 Sum Score Cut-off

📈 Item Response Theory

🔬 Diagnostic Classification

⚡ Quick Start

🖥️ CLI Reference

🛠️ How to Use

Step 1 — Prepare responses.csv

Step 2 — Generate items.csv (three options)

Option 1 — diagnosis compile (quickest)

Option 2 — Script (prepare_responses.R or Python)

Option 3 — Manual spreadsheet (most control)

Step 3 — Run diagnosis

🏗️ How It Works

📊 Example: Depression Screening (Forbes 2018)

📊 Example: Anxiety Screening (Forbes 2018)

📝 Changelog

v0.1.1

v0.1.0

🗺️ Roadmap

📖 Acknowledgements

📄 License

🗂️ Project Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Step 1 — Prepare `responses.csv`

Step 2 — Generate `items.csv` (three options)

Option 1 — `diagnosis compile` (quickest)

Option 2 — Script (`prepare_responses.R` or Python)

Packages