Production AI engineer · Microsoft 365 / Cloud consulting background · Dresden, Germany
I build deterministic tooling that makes AI agents measurable — quality you can defend with numbers, not vibes. Available for freelance.
Most AI tooling calls the agent and hopes. I build the layer that makes the agent's output measurable — one coherent stack with three jobs: you score the instructions before you trust the agent, you review its output with more than one mind, and you give it the context to be right in the first place. The throughline is measure first, then fix.
| Layer | Tool | Its job |
|---|---|---|
| 1 · Score | schliff | Measure the instructions before you run the agent. |
| 2 · Review | hydra | Review the diff with more than one model's read. |
| 3 · Context | vault-sync (private) | Keep the agent working from ground truth. |
Two shipped, public tools do the heavy lifting; the rest are private systems that show how I work end to end.
| Repo | What it does | Signal |
|---|---|---|
| schliff | Deterministic, stdlib-only quality scorer for AI agent instruction files — SKILL.md, CLAUDE.md, .cursorrules, AGENTS.md, system prompts |
8 scorers (7 in the composite) · anti-gaming detection · deterministic patches · 1,288 tests · MIT · on PyPI |
| hydra | Multi-perspective review council for Claude Code: advisors analyze, reviewers cross-examine, a chairman synthesizes | 4 advisors (6 in deep mode) · 3 cross-examining reviewers · up to 10 agents · Claude Opus + OpenAI Codex · MIT |
| vault-sync (private) | Syncs GitHub repos + PyPI metadata into an Obsidian vault as a Context Mirror | CLI + macOS menubar widget (Molty mascot) + Claude Code MCP plugin (4 read-only MCP tools, 4 skills, 2 hooks) · Python ≥3.10 · MIT |
| project-beat (private) | FastAPI + Next.js 16 freelance-job radar across German boards | Scrapes 4 active boards five times daily · 6-component hybrid matching · Supabase dashboard |
| mission-control (private) | Next.js 16 command center for an OpenClaw VPS | 23 server-side API endpoints · Kanban board · JSON persistence · Tailscale-only access |
| fpaul.dev | Personal developer site — Next.js 16, MDX, Writing section on AI security and agent tooling | Live on Vercel |
Deterministic, well-tested tooling for the agentic-coding ecosystem: things that score and review AI agents instead of just calling them. The throughline is measure first, then fix — anti-gaming detection so a score can't be juiced, deterministic patches that apply ~32% of schliff's fixes mechanically, and spec-first discipline where every claim is checked against the real artifact. Stdlib-first Python, TypeScript where the runtime demands it.
A clean merged PR is the receipt I trust most — third-party-validated proof a maintainer accepted the work.
| Project | Contribution | Status |
|---|---|---|
| modelcontextprotocol/servers | Added a root CLAUDE.md covering the full reference-servers monorepo — 7 servers (4 TypeScript, 3 Python) |
PR #3733, merged by a maintainer, April 2026 |
Same thesis, applied upstream: better context, fewer guesses.
Dev environment & stack
- Languages: Python (stdlib-first, ≥3.10), TypeScript, SQL
- AI / Agents: Claude Code, OpenAI Codex, MCP servers, agent instruction-file quality scoring, multi-agent review councils
- Web: Next.js 16, React 19, Tailwind CSS, MDX
- Backend / Data: FastAPI, Supabase / Postgres, Playwright, multilingual embeddings
- Infra: Docker, Tailscale zero-trust networking, Vercel, Hetzner VPS
- Tooling discipline: deterministic scorers, anti-gaming detection, heavy test coverage, single-sourced versioning, spec-driven workflows
- Knowledge base: Obsidian (PARA), synced to repos via vault-sync
More private systems
- OpenClaw / Vega stack — self-hosted OpenClaw Gateway on a Hetzner VPS: Docker Compose, a security-hardening overlay, and access locked behind a Tailscale zero-trust network, driving an always-on OpenClaw agent workspace.
- Mission Control — private Next.js command center for the OpenClaw VPS: 23 server-side API endpoints, a Kanban board (Open / In Progress / Review / Done), JSON-file persistence (no database), reached only over Tailscale.
- project-beat — private Python / FastAPI + Next.js system that scrapes 4 active German freelance job boards (freelance.de, GULP, Freelancermap, Hays — 13 sources configured) five times daily and ranks postings against profiles via a 6-component hybrid matching pipeline on a Supabase dashboard.
Production AI engineer with a consulting background in Microsoft 365 and Microsoft Cloud, based in Dresden, Germany. The enterprise work taught me the thing the AI-hype market keeps forgetting: tooling that can't be measured can't be trusted. So I work spec-first — a spec is the single source of truth, the code follows, and claims get verified against the real artifact — and I build to the same standard I'd ship to a client. That's the whole reason the stack starts with score.
Available for freelance engagements — AI tooling, agent quality / eval systems, Microsoft 365 / Microsoft Cloud, and full-stack web.



