This package provides optional host-layer preprocessing utilities for Context Compiler integrations.
It is experimental and separate from the deterministic core engine in src/.
Recommended install for integrations using this package:
pip install "context-compiler[experimental]".
Integrations should import this package from the installed environment rather than using repo-relative preprocessor paths.
Compatibility note:
- Use
heuristic_preprocessor.pyandparse_preprocessor_output(...).
heuristic_preprocessor.py: conservative structural preprocessing pass.output_validation.py: shared normalization/validation boundary.prompt_utils.py: state-aware prompt rendering helper.constants.py: shared protocol literals and directive validation patterns.prompts/default.txt: default runtime prompt.prompts/llama.txt: stricter prompt for Llama-family models in LLM-only mode.
Public validator entry point:
parse_preprocessor_output(raw_output: object, *, source_input: str | None = None) -> str | Nonevalidate_preprocessor_output(raw_output: object, *, source_input: str | None = None) -> dict
All preprocessor outputs (heuristic or LLM) must be validated with
parse_preprocessor_output(...) before being applied.
Classification contract:
directive: safe, validated canonical directive (outputis a directive string)no_directive: confident ordinary content (outputisnull)unknown: unsafe to rewrite (outputisnull)
unknown is reject/abstain behavior. Malformed, ambiguous, mixed-intent,
quoted/reported, unsupported, or unsafe outputs must not be rewritten.
Only validated directive output may be used as rewritten compiler input.
no_directive and unknown must fall back to original user input.
source_input is optional at the API level for backward compatibility.
For integration behavior, it is REQUIRED for LLM fallback validation calls:
pass source_input=<original user text> so source-aware reject rules can
block unsafe rewrites.
Engine-owned near-misses are reject cases (for example set premise to X,
change premise X) and must remain unknown (not rewritten).
Raw preprocessor/LLM outputs must not be passed directly to the compiler.
The preprocessor does not expand directive grammar. It may emit only validated canonical directives accepted by the compiler.
- Run
preprocess_heuristic(message). - If a heuristic candidate directive exists, validate it with
parse_preprocessor_output(...). - If no valid directive was produced, run LLM fallback preprocess.
- Validate fallback output with
parse_preprocessor_output(..., source_input=message). - If a valid directive is produced, pass it through a normal compiler input path.
For session-owned integrations, use
engine.step(...). For transcript-based integrations that receive full chat history each turn:- use
context_compiler.compile_transcript(...)for stateless evaluation - use
engine.apply_transcript(...)to update an existing engine Otherwise pass the original user input unchanged.
- use
- Use
prompts/default.txtas the recommended default prompt. - Use
prompts/llama.txtonly for LLM-only preprocessing with Llama-family models. - Heuristic-first integrations should still keep
default.txtas the normal fallback prompt unless there is a model-specific reason not to.
prompt_utils.py exposes:
render_prompt(path: Path, state: State) -> str | None
Behavior:
- reads prompt text from
path - strips leading
#header lines and leading blank lines - replaces
<NULL_OR_VALUE>and<SET OF CURRENT POLICY ITEMS>using state - returns
Noneif prompt loading fails
- This package does not mutate compiler state directly.
- State changes still occur only through compiler parsing/replay paths.