Skip to content

DeclanMichaels/Judgment-Stability-Probing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Judgment Stability Probing (JSP): Instrument, Data, and Analysis

This repository contains the complete instrument, raw data, analysis pipeline, and interactive report for the paper:

Judgment Stability Under Cultural Perturbation: Probing Eight Large Language Models for Framing Compliance

Declan Michaels (2026). Pre-registered on the Open Science Framework: osf.io/xnv5f

What This Measures

Judgment Stability Probing (JSP) presents a language model with pairs of concepts and asks it to rate their similarity on a 1 to 7 scale, then explain the relationship. The same pairs are rated under different framing conditions: no framing (baseline), four cultural framings ("In a collectivist society," etc.), and two nonsense framings ("In a geometric society," "In a glorbic society"). If the model has stable similarity judgments, they should not change in response to meaningless framing. JSP measures whether they do, how much, and what form the instability takes.

The instrument operates entirely through the API, requires no access to model internals, and produces quantitative measures (drift, Spearman rho, Procrustes distance, compliance rate) that can be compared across models and tracked over time.

Key Findings (8 models, 390,654 valid ratings)

Every model tested produces geometric framing keywords in its explanations at rates between 19.4% and 80.6%. For one model (Grok 4.20), nonsense framing produces deeper similarity reordering than any cultural framing. The one model with always-on reasoning shows 80.6% geometric compliance; its non-reasoning counterpart shows 30.8%. In the main task, no model refuses nonsense framing or flags it as meaningless.

For full results, see the paper in experiments/rcp-v2/papers/ or the interactive report.

Repository Contents

experiments/rcp-v2/
  papers/
    rcp-v2-full-paper-draft.md    The paper (markdown)
    rcp-v2-paper.pdf              The paper (PDF with figures)
    rcp-v2-preregistration.md     Pre-registration document
    rcp-v2-preregistration.pdf    Pre-registration (PDF)
  stimuli/
    concepts.json                 54 concepts across 3 domains
    probes-v2.json                1,431 concept pairs
    templates/                    7 framing preamble templates
  config.json                    Experiment configuration
  parse.py                       Response parser (raw -> structured)
  build_report.py                Analysis pipeline (produces report.json)
  split_report.py                Splits report into lite + explanations
  test_build_report.py           Tests for the analysis pipeline
  report.html                    Interactive report viewer (entry point)
  report-lite.json               Pre-built report data (no explanation text)
  report.json                    Full report data (includes explanations)
  reports/                       Report viewer JS/CSS assets
  analysis/
    permutation_tests.py          Domain-level permutation tests
    factor_analysis.py            Factor analysis on response data
    embedding_validation.py       Concept inventory embedding validation

results/rcp-v2/                  Raw response data (16 runs: 8 models x 2 temperatures)
  anthropic_claude-sonnet-4-6/   Sonnet 4.6 (pre-registered)
  openai_gpt-5.4-mini/           GPT-5.4 Mini (pre-registered)
  google_gemini-2.5-flash/       Gemini 2.5 Flash (pre-registered)
  together_meta-llama_.../       Llama 3.3 70B (pre-registered)
  xai_grok-4-1-fast-.../         Grok 4.1 Fast (pre-registered)
  anthropic_claude-opus-4-6/     Opus 4.6 (exploratory)
  openai_gpt-5.4/                GPT-5.4 (exploratory)
  xai_grok-4.20/                 Grok 4.20 (exploratory, always-on reasoning)

runner/                          Shared execution engine (multi-vendor API adapters)
models/                          Model registry (vendor configs)
ui/                              Browser-based experiment runner UI

Each model directory under results/rcp-v2/ contains two timestamped run directories: one for temperature 0 (1 iteration, 10,017 calls) and one for temperature 0.7 (5 iterations for pre-registered models, 2 for exploratory). Each run contains responses.jsonl (one JSON object per API call) and run_meta.json (run parameters and counts).

Quick Start: View the Report

No dependencies beyond Python 3 (which you already have). The interactive report is pre-built from the raw data.

chmod +x view-report.sh
./view-report.sh

This opens the report in your browser. Use the temperature toggle to switch between temperature 0 and temperature 0.7 results. Sections include data quality, cluster validation, drift analysis, FSI heatmap, permutation tests, PCA analysis, compliance gradient, Procrustes alignment, variance comparison, explanation viewer, and temperature comparison.

Rebuilding the Report from Raw Data

To verify the analysis by rebuilding the report from the raw response data in results/:

pip install numpy scipy
cd experiments/rcp-v2
python3 build_report.py
python3 split_report.py          # produces report-lite.json + explanations.json
./../../view-report.sh           # view the rebuilt report

Replicating the Experiment

To run the experiment against your own models, start.sh bootstraps a virtual environment with all dependencies and starts the experiment platform:

chmod +x start.sh
./start.sh

Open http://localhost:8002 in your browser. The UI walks you through API key configuration, model selection, and experiment execution. The start script creates a virtual environment and installs dependencies automatically. No manual pip installs required.

Requirements: Python 3.10+, API keys for the vendors you want to test.

The experiment sends 10,017 API calls per model at temperature 0 (1,431 pairs x 7 framings x 1 iteration). A temperature 0.7 pass with 5 iterations adds 50,085 calls. Budget accordingly.

Concept Inventory

54 concepts across three domains, 18 per domain:

Physical: acceleration, amplitude, buoyancy, conduction, convection, crystallization, density, diffusion, elasticity, erosion, evaporation, friction, magnetism, oscillation, refraction, sublimation, turbulence, viscosity.

Institutional: arbitration, bureaucracy, census, citizenship, constitution, federation, jurisdiction, legislation, naturalization, parliament, prosecution, ratification, referendum, regulation, republic, sovereignty, tariff, taxation.

Moral: altruism, compassion, conscience, courage, devotion, dignity, forgiveness, generosity, gratitude, honesty, honor, humility, integrity, loyalty, obedience, sacrifice, tolerance, wisdom.

The inventory was validated using dual-model embedding analysis and pilot cluster accuracy data. In the main experiment, hierarchical clustering recovers the three-domain structure with 92.6% to 100.0% accuracy across all eight models.

Framing Conditions

Each condition prepends a single sentence to the probe. No instruction to adopt, inhabit, or role-play any perspective.

  1. Unframed (no preamble, baseline)
  2. "In an individualist society."
  3. "In a collectivist society."
  4. "In a hierarchical society."
  5. "In an egalitarian society."
  6. "In a geometric society." (interpretable nonsense)
  7. "In a glorbic society." (uninterpretable nonsense)

Models Tested

Model Vendor Status Reasoning Temp 0.7 Iterations
Sonnet 4.6 Anthropic pre-registered off 5
GPT-5.4 Mini OpenAI pre-registered off 5
Gemini 2.5 Flash Google pre-registered off 5
Llama 3.3 70B Together (Meta) pre-registered off 5
Grok 4.1 Fast xAI pre-registered off 5
Opus 4.6 Anthropic exploratory off 2
GPT-5.4 OpenAI exploratory off 2
Grok 4.20 xAI exploratory always on 2

Data Format

Each line in responses.jsonl is a JSON object:

{
  "experiment": "rcp-v2",
  "model": "claude-sonnet-4-6",
  "vendor": "anthropic",
  "timestamp": "2026-04-14T03:27:20Z",
  "stimulus_id": "phys_01",
  "stimulus_text": "...",
  "prompt_template": "collectivist",
  "iteration": 1,
  "raw_response": "Rating: 3\nExplanation: ...",
  "parsed": {
    "rating": 3,
    "explanation": "..."
  },
  "meta": {
    "tokens_in": 245,
    "tokens_out": 180,
    "latency_ms": 1200,
    "temperature": 0.7,
    "max_tokens": 500
  }
}

Related

Methodology Note

This research was conducted with AI assistance (Claude, Anthropic). The methodology, analysis decisions, and writing reflect the author's judgment. AI tools were used for implementation, literature review, cross-model review, and drafting. This disclosure is made in accordance with the author's position that explicit AI-assisted methodology acknowledgment is more honest than omission.

License

Apache License 2.0. See LICENSE.

Contact

Declan Michaels declan@moral-os.com linkedin.com/in/declanmichaels

About

Results data, analysis code, and results viewer for the RCP V2 experiment

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors