Crispen is a Python code refactoring CLI tool. It reads a unified diff from stdin, identifies which lines changed, and applies a set of automated refactors only to the changed regions of affected files, writing the modified files back in place.
It uses libcst for format-preserving AST transformations and an LLM for intelligent, context-aware changes. Supported providers: Anthropic Claude, OpenAI, DeepSeek, Moonshot/Kimi, and LM Studio (local).
git diff | crispen
Crispen operates only on the lines you actually changed — not the whole file. Each refactor receives the diff's line ranges and skips code outside those ranges. This makes it safe to run on any in-progress change without disturbing surrounding code.
Crispen requires Python 3.12+.
git clone https://github.com/Voidious/crispen
cd crispen
uv syncPipe any unified diff to crispen:
# Refactor uncommitted changes
git diff | crispen
# Refactor staged changes
git diff --cached | crispen
# Refactor changes since a specific commit
git diff HEAD~1 | crispen
# Refactor a specific file as if it were entirely new (triggers whole-file subdir split)
git diff /dev/null crispen/file_limiter/advisor.py | crispenCrispen prints a summary of every change it applies, then writes the modified files back in place.
Crispen reads configuration from [tool.crispen] in pyproject.toml, with optional overrides in .crispen.toml in the project root.
[tool.crispen]
# LLM provider: "anthropic" (default), "openai", "deepseek", "moonshot", or "lmstudio"
provider = "anthropic"
# LLM model to use (default: "claude-sonnet-4-6")
model = "claude-sonnet-4-6"
# Optional base URL override for OpenAI-compatible providers.
# Useful for LM Studio on a non-default port, or other self-hosted endpoints.
# base_url = "http://localhost:1234/v1"
# Optional tool_choice override for OpenAI-compatible providers (default: unset).
# Use "required" for local models (e.g. LM Studio with qwen3) that do not
# support the default named-function form.
# tool_choice = "required"
# HTTP timeout in seconds for each LLM API call (default: 60.0).
# Raise this when using slow local models.
# api_timeout = 60.0
# FunctionSplitter: max function body lines before splitting (default: 75)
max_function_length = 75
# FileLimiter: max file lines before splitting into sibling files (default: 1000).
# Set to 0 to disable FileLimiter entirely.
max_file_lines = 1000
# FileLimiter: when the diff covers every line of the file (whole-file add or
# replacement), place new files in a subdirectory named after the module
# (e.g. service/*.py for service.py). Default: true.
file_limiter_subdir_split = true
# FileLimiter: route pytest fixtures split out of test files to conftest.py
# instead of a regular sibling module, so pytest auto-discovers them without
# any import in the original file (avoids F401/F811 flake8 warnings).
# Set to false if not using pytest or using custom fixture decorators. Default: true.
file_limiter_pytest_conftest = true
# FileLimiter: when to keep re-export stubs in the original file for public
# names that were moved to new files. Re-exports preserve the module's public
# API so existing callers need no changes. Default: "imported".
#
# "always" — Always add re-exports for every public name. Best for
# library packages whose public API may not be imported within
# this codebase.
# "application" — Add re-exports in non-test files (same as "always"), but
# omit them in test files. Pragmatic middle ground.
# "imported" — Only add a re-export when the name is actually imported from
# the original module somewhere else in the project (the same
# rule already used for private names). Best for most
# application codebases. Note: only detects
# "from module import name" style imports, not qualified access
# via "module.name".
file_limiter_reexports = "imported"
# Tuple Return to Dataclass: min tuple element count to trigger replacement (default: 4)
min_tuple_size = 4
# Tuple Return to Dataclass: update callers in diff files even outside changed ranges (default: true)
# When false and unreachable callers exist, the transformation is skipped instead
update_diff_file_callers = true
# DuplicateExtractor: min statement weight for a duplicate group (default: 3)
min_duplicate_weight = 3
# DuplicateExtractor: max sequence length for duplicate search (default: 8)
max_duplicate_seq_len = 8
# Whether to generate docstrings in extracted helper functions (default: false)
helper_docstrings = false
# Retry counts for extraction and LLM verification failures
extraction_retries = 2
llm_verify_retries = 2
# FileLimiter: additional retry attempts after an LLM-related failure (default: 2)
file_limiter_retries = 2
# FileLimiter: recursively split newly-created files that are still over the
# limit (default: true). The recursion terminates when each new file is either
# under the limit, aborted (cannot be split), or produces no further oversized files.
file_limiter_recursive = true
# Timing output printed at the end of each run (default: "detailed").
# "off" — no timing output.
# "basic" — total run time, total LLM time, and total token counts.
# "detailed" — adds per-call timing lines during the run (in verbose mode)
# plus per-call-type, per-refactor, and per-file breakdowns.
# timing = "detailed"
# Run only specific refactors (default: run all).
# Valid names: "if_not_else", "duplicate_extractor", "function_splitter",
# "tuple_dataclass", "file_limiter", "match_function"
# ("match_function" controls the sub-pass inside duplicate_extractor that
# replaces inline code with calls to existing functions; only takes effect
# when duplicate_extractor is also enabled.)
# enabled_refactors = ["function_splitter", "file_limiter"]
# Always skip specific refactors (ignored when enabled_refactors is set).
# disabled_refactors = ["file_limiter"]Set the appropriate environment variable for your chosen provider:
export ANTHROPIC_API_KEY=sk-ant-... # for provider = "anthropic"
export OPENAI_API_KEY=sk-... # for provider = "openai"
export DEEPSEEK_API_KEY=sk-... # for provider = "deepseek"
export MOONSHOT_API_KEY=sk-... # for provider = "moonshot"
# LM Studio (provider = "lmstudio") runs locally and requires no API key.Point crispen at a running LM Studio local server. LM Studio exposes an OpenAI-compatible API, which crispen uses directly:
[tool.crispen]
provider = "lmstudio"
model = "your-loaded-model-name"
# base_url defaults to "http://localhost:1234/v1"; override if needed:
# base_url = "http://localhost:8080/v1"No API key is required — LM Studio does not authenticate requests.
Flips negated if/else conditions to eliminate the not.
When an if not condition: has an else clause, crispen rewrites it to if condition: and swaps the two branches. This eliminates a layer of logical indirection and makes intent clearer.
Before:
if not is_valid(data):
handle_error(data)
else:
process(data)After:
if is_valid(data):
process(data)
else:
handle_error(data)Skipped when there is no else clause, or when the else is an elif chain.
Replaces large tuple return values with @dataclass instances and updates call sites.
Functions that return large tuples (4+ elements by default) are difficult to read at call sites — callers must remember which index means what. Crispen replaces the tuple literal with a named @dataclass constructor call, automatically generates the dataclass definition, and rewrites every tuple-unpacking call site to use the dataclass's named attributes.
Only fires when:
- The tuple is inside a
returnstatement (not a function argument). - Every in-file caller of the function uses tuple-unpacking assignment (
a, b = func()). - The tuple has at least
min_tuple_sizeelements (default 4).
Before:
def get_metrics(data):
count = len(data)
total = sum(data)
average = total / count
peak = max(data)
return count, total, average, peak
# elsewhere:
count, total, average, peak = get_metrics(data)After:
from dataclasses import dataclass
from typing import Any
@dataclass
class GetMetricsResult:
count: Any
total: Any
average: Any
peak: Any
def get_metrics(data):
count = len(data)
total = sum(data)
average = total / count
peak = max(data)
return GetMetricsResult(count=count, total=total, average=average, peak=peak)
# elsewhere:
_ = get_metrics(data)
count = _.count
total = _.total
average = _.average
peak = _.peakField names are inferred from unpacking assignments at call sites (e.g., count, total, average, peak = get_metrics(data)), from the variable names in the tuple itself, or defaulted to field_0, field_1, etc. The intermediate variable name (_, _result, or the snake_case dataclass name) is chosen to avoid collisions with existing names in the file.
Configuration:
min_tuple_size— minimum tuple element count to trigger replacement (default: 4).
Extracts duplicate code blocks into shared helper functions using an LLM.
Crispen scans the changed functions for repeated sequences of statements. When a duplicate group is found, it calls the LLM to produce a single extracted helper function with an appropriate name, and replaces each occurrence with a call to that helper.
The algorithm:
- Hashes each statement in a function by its AST structure (ignoring whitespace and comments).
- Finds repeated subsequences above a minimum weight threshold.
- Asks the LLM if the matching sections of code are a semantic match ("veto check"), and requests any pitfalls to note for the extraction step. These notes are passed to every subsequent extraction attempt.
- If accepted, asks the LLM to write the helper, determine its parameters, and update the call sites.
- Runs a series of algorithmic checks to verify the code change. A failure triggers a retry (step 4, up to
extraction_retries), including a note to the LLM about the verification failure. - After passing the algorithmic checks, asks the LLM to verify the output. A failure triggers a retry (step 4, up to
llm_verify_retries), including both the previous code change and detailed feedback from the LLM verification step. - If accepted, validates the output syntactically and with pyflakes before applying it.
Before:
def process_users(users, archived_users):
for user in users:
name = user["name"].strip().lower()
email = user["email"].strip().lower()
db.save(name, email)
for person in archived_users:
full_name = person["name"].strip().lower()
contact = person["email"].strip().lower()
archive.save(full_name, contact)Note that the two blocks use different variable names (name/email vs full_name/contact, user vs person). The algorithm detects the structural match regardless, and the LLM generalises the variable names into appropriate parameter names.
After:
def _normalize_contact(record):
name = record["name"].strip().lower()
email = record["email"].strip().lower()
return name, email
def process_users(users, archived_users):
for user in users:
name, email = _normalize_contact(user)
db.save(name, email)
for person in archived_users:
full_name, contact = _normalize_contact(person)
archive.save(full_name, contact)Configuration:
min_duplicate_weight— minimum "weight" (sum of statement sizes) a repeated group must have to be extracted (default: 3).max_duplicate_seq_len— maximum number of statements in a duplicate sequence (default: 8).extraction_retries— how many times to retry after an algorithmic check fails (default: 2).llm_verify_retries— how many times to retry after the LLM verification step rejects the output (default: 2).
Replaces a code block with a call to an existing function that performs the same operation.
When a block of code in the diff is semantically equivalent to the body of an existing function in the same file, crispen replaces the inline block with a call to that function. This is the complement of DuplicateExtractor: instead of creating a new helper, it recognises that one already exists.
The algorithm:
- Fingerprints every function body in the file by its normalised AST structure (ignoring variable names, whitespace, and comments).
- For each statement sequence in the diff, checks whether its fingerprint matches any function body.
- Asks the LLM to verify the match is semantically valid and not a coincidental structural similarity.
- If confirmed, asks the LLM to generate the correct call expression (mapping arguments as needed) and replaces the block.
Before:
def _fetch_json(url, headers):
response = requests.get(url, headers=headers)
response.raise_for_status()
return response.json()
def sync_orders():
# Inline copy of _fetch_json's body with different variable names:
resp = requests.get(orders_url, headers=api_headers)
resp.raise_for_status()
orders = resp.json()
process(orders)After:
def _fetch_json(url, headers):
response = requests.get(url, headers=headers)
response.raise_for_status()
return response.json()
def sync_orders():
orders = _fetch_json(orders_url, api_headers)
process(orders)The LLM veto step ensures the replacement is only applied when the semantics genuinely match — structural similarity alone is not enough.
Splits functions that exceed the line-count limit into smaller helpers.
When a function in the changed region is too long (more than max_function_length body lines, default 75), crispen splits it. It identifies the best split point — the one that minimises the number of free variables passed to the helper — and asks the LLM to name the extracted helper function.
The extracted tail becomes a private helper (_helper_name) placed immediately after the original function. If the tail references self, the helper is extracted as a regular instance method; otherwise it is extracted as a @staticmethod (for class methods) or a module-level function.
Before:
def build_report(config, data):
# ... 40 lines of setup ...
headers = compute_headers(config)
rows = []
for item in data:
row = format_row(item, headers)
validate_row(row)
rows.append(row)
totals = compute_totals(rows)
footer = format_footer(totals)
return assemble(headers, rows, footer)After:
def build_report(config, data):
# ... 40 lines of setup ...
headers = compute_headers(config)
return _format_rows_and_assemble(headers, data)
def _format_rows_and_assemble(headers, data):
rows = []
for item in data:
row = format_row(item, headers)
validate_row(row)
rows.append(row)
totals = compute_totals(rows)
footer = format_footer(totals)
return assemble(headers, rows, footer)Functions are skipped if they are:
asyncfunctions- Generator functions (contain
yield) - Functions with nested
defstatements (closures)
Safety checks applied before writing:
- The rewritten source must compile without
SyntaxError. - pyflakes must not report any new
UndefinedNamewarnings after the split.
Configuration:
max_function_length— maximum allowed body lines (default: 75).helper_docstrings— whether to include a docstring in extracted helper functions (default:false).
Splits files that exceed the line-count limit into smaller sibling modules.
When a file in the diff exceeds max_file_lines (default: 1000) after all other refactors have run, crispen splits it. It classifies every top-level entity by whether it was added, modified, or left unchanged by the diff, builds a dependency graph to find groups that can safely move together, and asks the LLM to assign each group to a new file. The original file is updated with from .module import name re-exports so existing callers are unaffected.
The algorithm:
- Parse all top-level entities (functions, classes, and statement blocks) from the post-refactor source.
- Classify each entity as NEW (added by the diff), MODIFIED (existed before and changed), or UNMODIFIED (existed before and unchanged).
- Build a name-reference dependency graph and compute strongly connected components (SCCs). Entities in the same SCC must be moved together.
- SCCs containing any UNMODIFIED entity must stay (Set 1). SCCs of NEW entities can be freely moved (Set 2). SCCs of purely MODIFIED entities may be migrated (Set 3) — the LLM decides which.
- Ask the LLM to assign each movable SCC group to a target filename. Groups with related purposes may share a file.
- Generate the new files, add
from .module import namere-exports to the original, and verify every entity's source is preserved before writing.
Whole-file mode (subdir split): When the diff covers every line of the file — e.g. a brand-new file — crispen places the new sibling files in a subdirectory named after the module (e.g. service/*.py for service.py). For non-test files, a service/__init__.py is generated that re-exports the public API so callers need no updates. For test files, the original test_service.py keeps re-export stubs so pytest can still find the tests. Test classes that contain test_ methods are not re-exported (re-importing them would cause pytest to discover and run every test twice).
Script entry points: When a non-test file contains if __name__ == '__main__':, the original file is kept on disk as the runnable entry point (with re-export stubs) rather than being replaced by subdir/__init__.py. The subdirectory is named with a _lib, _helpers, _impl, _internals, or _support suffix to avoid shadowing the original module name.
Directories with dashes: Crispen skips files located under directories whose names contain dashes (e.g. my-project/service.py), because dashes are illegal in Python package names and would produce a SyntaxError in generated imports. Rename the directory to use underscores first.
Before (service.py, 1200 lines):
import json
class Config:
# ... 80 lines ...
class DataStore:
# ... 200 lines ...
def fetch_records(store, query):
# ... 60 lines ...
def export_csv(records, path):
# ... 40 lines ...After:
# service.py (updated, with re-exports)
import json
from .models import Config, DataStore # fmt: skip # noqa: F401, E501
from .utils import export_csv, fetch_records # fmt: skip # noqa: F401, E501# service/models.py (new file)
import json
class Config:
# ... 80 lines ...
class DataStore:
# ... 200 lines ...# service/utils.py (new file)
from .models import DataStore
def fetch_records(store, query):
# ... 60 lines ...
def export_csv(records, path):
# ... 40 lines ...Safety checks applied before writing:
- Every entity's source (minus import lines) must appear verbatim in the combined output.
- The proposed split must not introduce circular file imports.
Configuration:
max_file_lines— file line count threshold (default: 1000). Set to0to disable.file_limiter_subdir_split— use a subdirectory for whole-file diffs (default:true).file_limiter_retries— additional LLM retry attempts on failure (default: 2).file_limiter_recursive— recursively split newly-created files that are still over the limit (default:true).file_limiter_pytest_conftest— route fixtures split out of test files toconftest.pyso pytest auto-discovers them without imports (avoids F401/F811 warnings). Set tofalseif not using pytest or using custom fixture decorators (default:true).file_limiter_reexports— controls when re-export stubs are added to the original file for public names moved to new files:"always"(every public name),"application"(all non-test files, no test files), or"imported"(only when the name is imported from this module elsewhere in the project — the same rule used for private names). Default:"imported".
stdin (unified diff)
│
▼
crispen/cli.py # Entry point: reads stdin, calls parse_diff then run_engine
│
├── crispen/diff_parser.py # Parses diff → Dict[filepath, List[(start, end)]]
│
└── crispen/engine.py # Loads files, runs all refactors, writes back
│
├── crispen/refactors/
│ ├── base.py # Refactor base class (libcst.CSTTransformer)
│ ├── if_not_else.py # if not x: A else B → if x: B else A
│ ├── tuple_dataclass.py # Large tuple returns → @dataclass
│ ├── caller_updater.py # Update tuple-unpacking call sites
│ ├── duplicate_extractor.py # Extract duplicate blocks
│ └── function_splitter.py # Split oversized functions
│
└── crispen/file_limiter/
├── runner.py # Orchestrates FileLimiter phases
├── classifier.py # Classify entities (Set 1/2/3)
├── advisor.py # LLM placement planner
├── code_gen.py # Generate split file contents
├── entity_parser.py # Parse top-level entities
└── dep_graph.py # Dependency graph + SCC finder
- Create
crispen/refactors/my_refactor.pysubclassingRefactorfrombase.py. - Override
leave_*methods; guard withself._in_changed_range(original_node). - Append change descriptions to
self.changes_made. - Register the class in
engine.py's_REFACTORSlist. - Add
tests/test_my_refactor.py— 100% branch coverage is enforced.