SkynetPunycoderGen

Unicode characters that are easy to confuse with a chosen English letter to conduct Homograph Attacks.

About

A Homograph Attack is a type of phishing or spoofing attack where an attacker creates a fake domain name that looks visually identical (or very similar) to a legitimate website by using characters from different alphabets or similar-looking letters.

For example:

-- Real site: paypal.com
-- Fake site: paypaӏ.com

The second domain may use a Cyrillic character instead of the normal Latin l, but to humans it looks almost the same. This technique is also called an IDN Homograph Attack because it often abuses Internationalized Domain Names (IDNs).

Internationalized Domain Name (IDN) Homograph Attacks take advantage of characters from different languages that look nearly identical, such as the Latin “a” and the Cyrillic “а”. While it might look like a regular domain or email on the surface, what’s really happening behind the scenes is a character-swap using similar-looking but technically different Unicode characters.

This repository includes a small Punycode / IDN toolkit: a line-wise CLI encoder/decoder (punycodegen.py) and a homoglyph explorer CLI (punycode_gen.py) documented below.

`punycode_gen.py` — Punycode variants for letters `a`–`z`

punycode_gen.py lists Unicode characters that are easy to confuse with a chosen English letter, together with each character’s IDNA Punycode (ACE) label (xn--…). That is useful when studying IDN homograph risk, comparing lookalike scripts (Latin vs Cyrillic vs Greek, mathematical alphabets, etc.), or auditing domain-like strings.

Requirements

Python 3.9+ (stdlib only; no pip install needed).
confusables.txt in the same directory as punycode_gen.py.

The script resolves the data file with:

Path(__file__).resolve().with_name("confusables.txt")

Obtaining `confusables.txt`

The file is part of the Unicode Security Mechanisms data used by UTS #39.

Download the current release, for example:
https://www.unicode.org/Public/security/latest/confusables.txt
Save it next to punycode_gen.py as confusables.txt.

If the file is missing, the script exits with an error and prints the download hint.

Usage

Interactive (prompts for one letter when stdin is a TTY):

python3 punycode_gen.py
# Enter a letter (a-z): a

Non-interactive (recommended for scripts):

python3 punycode_gen.py --letter a
python3 punycode_gen.py -l z

Help:

python3 punycode_gen.py -h

Stdout is configured for UTF-8 when the interpreter supports reconfigure, so unusual scripts print reliably in most terminals.

Output format

For the chosen letter, the tool prints a header and one line per variant:

🔎 Punycode variants for letter: 'a'

à -> xn--0ca
a -> a
А -> xn--80a
…

Each line is: glyph -> ACE string (what you would see in an ASCII-only IDN label for that single code point). Some mathematical / styled letters normalize under IDNA to plain ASCII (for example a), so the right-hand side may be a single ASCII letter instead of xn--….

How variants are collected

Confusables graph — Pairs from confusables.txt are treated as undirected edges. Starting from the code points for the lowercase and uppercase letter, all characters in the same connected component are candidates.
Latin “same base letter” expansion — Any assigned Unicode character whose NFD form, with combining marks (Mn) removed, casefolds to the target letter (e.g. accented Latin letters for a) is included.

Results are filtered (printable, no lone control characters), then sorted by ACE string and code point for stable output.

Performance and size

Loading and parsing confusables.txt is usually on the order of tens to low hundreds of milliseconds on a modern machine.
The Latin scan walks the full Unicode code space once per invocation; expect on the order of ~0.5 seconds per letter depending on hardware.

Limitations and ethics

Output depends on the version of confusables.txt and on Python’s IDNA implementation; ACE strings may differ from other tools or registrars for edge cases.
This tool is intended for security research, education, and defensive review. Do not use it to impersonate brands, phish, or register deceptive domains.

Data license

confusables.txt is distributed by Unicode, Inc. under their terms of use. Keep their copyright and license notices in the file when redistributing it.

License

Add a LICENSE file for your preferred terms if you publish this repository publicly. Code you wrote is yours to license; confusables.txt remains subject to Unicode’s terms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SkynetPunycoderGen

About

`punycode_gen.py` — Punycode variants for letters `a`–`z`

Requirements

Obtaining `confusables.txt`

Usage

Output format

How variants are collected

Performance and size

Limitations and ethics

Data license

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
confusables.txt		confusables.txt
punycode_gen.py		punycode_gen.py

Folders and files

Latest commit

History

Repository files navigation

SkynetPunycoderGen

About

punycode_gen.py — Punycode variants for letters a–z

Requirements

Obtaining confusables.txt

Usage

Output format

How variants are collected

Performance and size

Limitations and ethics

Data license

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`punycode_gen.py` — Punycode variants for letters `a`–`z`

Obtaining `confusables.txt`

Packages