Skip to content

didrod205/unspook

👻 unspook

Reveal & remove the invisible, dangerous, and confusable characters hiding in your text.

npm version bundle size CI types license

🌐 Try the free web app →  ·  no install, nothing uploaded, works offline.


Your text is probably not as clean as it looks. Copy something from a website, a PDF, a Word doc, a chat app, or an AI assistant and you'll often paste in characters you can't see:

  • Zero-width spaces and BOMs that break === comparisons, search, and CSV imports.
  • Non-breaking spaces masquerading as normal spaces — the bane of every "why won't this match?" bug.
  • “Smart quotes”, em–dashes and ellipses… that wreck code, JSON, and CSVs.
  • Bidi control characters — the Trojan Source attack (CVE-2021-42574) that makes code read one way and compile another.
  • Unicode "tag" characters used to smuggle invisible prompt-injection instructions into text fed to LLMs.
  • Homoglyphs — a Cyrillic а or Greek ο that looks exactly like Latin but isn't (phishing, impersonation, broken lookups).

unspook finds them, shows you exactly what's there, and cleans your text — 100% locally, with zero dependencies and no API key.

📸 Screenshot / demo GIF: ./web/screenshot.png — replace with a recording of the live app.

Why it exists

Every "text sanitizer" you find online makes you paste sensitive content into someone else's server. That's exactly backwards for a privacy/security tool. unspook runs entirely in your browser or your terminal — your text never leaves your machine. And because detecting these characters is a precise, spec-based problem (not a vibe), it's the kind of thing you want a small, tested, deterministic tool for — not a guess.

Who it's for

Developers (clean code, configs, commit hooks), writers & marketers (clean copy before publishing), designers (paste-safe content), educators & researchers (spot hidden characters in AI text), ops & support (sanitize logs and tickets), and anyone who's ever fought a "looks identical but won't match" bug.

Install

No install needed — just open the web app.

For the library / CLI:

npm install unspook        # library
npm install -g unspook     # CLI (or use npx unspook)

Ships ESM and CommonJS, with TypeScript types.

Usage

In code

import { scan, clean, reveal, report, stats } from "unspook";

clean("Hello​world");                 // "Helloworld"  (zero-width space removed)
clean("a b");                         // "a b"         (NBSP → normal space)
clean("“quote” — dash…", { smartPunctuation: true }); // '"quote" -- dash...'
clean("аdmin", { homoglyphs: true });      // "admin"       (Cyrillic а → a)

scan("hi​there");
// [{ index: 2, line: 1, column: 3, char: "​", codePoint: 8203, hex: "U+200B",
//    name: "ZERO WIDTH SPACE", category: "zero-width", severity: "warning" }]

reveal("a​b");                        // "a[U+200B]b"

// report() pairs each finding with its source line — for security/code review.
report(code);  // [{ finding: { line, column, hex, name, … }, lineText }, …]

stats(text);                               // { total, byCategory, bySeverity }

Every Finding now carries line and column (1-based; column counted in code points, so it matches what you see) — jump straight to the offender.

On the command line

unspook notes.md                 # print cleaned text
cat draft.txt | unspook          # use it as a filter in any pipeline
unspook -w README.md             # clean a file in place
unspook --reveal config.yml      # show what's hiding
unspook --scan src/index.ts      # list findings (line:col); exits 1 if any → CI
unspook --report src/index.ts    # show each finding with its source line + caret
unspook --aggressive blog.md     # also fix smart quotes, homoglyphs & whitespace

--report prints a compiler-style diagnostic — perfect for catching a Trojan Source attack in review:

src/auth.js:2:18  DANGER  U+202E RIGHT-TO-LEFT OVERRIDE (bidi)
  if (access != "ad[U+202E]nimda[U+202C]") {
                   ^

Drop --scan into a pre-commit hook or CI to fail the build if invisible/bidi characters sneak into your codebase.

Cleaning options

Option Default What it does
zeroWidth Remove zero-width / invisible chars (ZWSP, BOM, word joiner…)
bidi Remove bidirectional controls (Trojan Source)
tag Remove Unicode tag chars (invisible prompt injection)
control Remove C0/C1 control characters
invisibleSpaces Normalize NBSP & exotic spaces → space; drop soft hyphens
variationSelectors Remove variation selectors (off by default — used by emoji)
smartPunctuation Convert “ ” ‘ ’ — … to ASCII
homoglyphs Map look-alike letters to Latin (Cyrillic/Greek/fullwidth)
collapseWhitespace Collapse runs of spaces/tabs
normalizeNewlines \r\n, \r\n
trim Trim the ends

DEFAULT_OPTIONS and AGGRESSIVE_OPTIONS presets are exported too.

FAQ

Is my text uploaded anywhere? No. The web app and the library run entirely on your device — there is no server, no telemetry, no network request. You can use it offline.

Will it break my emoji? No. Variation selectors (which emoji rely on) are kept by default. Turn on variationSelectors only if you specifically want them removed.

Does it modify visible content? By default it only removes invisible/dangerous characters and normalizes odd spaces — your visible text is preserved. Smart-quote and homoglyph conversion are opt-in because they change visible characters.

How is this different from a regex like /[​]/g? unspook covers dozens of code points across eight categories (zero-width, bidi, tag, control, exotic spaces, smart punctuation, homoglyphs, variation selectors), names each finding, assigns a severity, tracks positions, and gives you a tested, maintained, reversible-by-option cleaner. No regex to copy-paste-and-get-wrong.

Can I use it in CI / a pre-commit hook? Yes — unspook --scan <files> exits with code 1 when anything is found.

Why "unspook"? It un-spooks your text: removes the ghostly invisible characters. 👻

Contributing

Contributions are very welcome! See CONTRIBUTING.md and the Code of Conduct. Adding a code point or a homoglyph mapping? Include a test and a reference.

git clone https://github.com/didrod205/unspook.git
cd unspook
npm install
npm test          # run the suite
npm run dev       # run the web app locally

💖 Sponsor

unspook is free, MIT-licensed, and built in spare time. If it saved you from a maddening invisible-character bug — or a security incident — please consider supporting it:

  • Star this repo — free, and it genuinely helps others find it.
  • 🍋 Sponsor via Lemon Squeezy — one-time or recurring support.

Where your support goes: keeping the character database current with new Unicode releases, expanding the homoglyph/confusables coverage, maintaining the free hosted web app, adding integrations (VS Code extension, ESLint plugin, pre-commit hook), and answering issues quickly.

License

MIT © unspook contributors

About

Reveal & remove invisible, dangerous & confusable characters in your text — zero-width spaces, BOMs, bidi (Trojan Source), homoglyphs, smart quotes. 100% local. Web app + library + CLI.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors