diff --git a/blog/2026/03-14-why-identical-strings-still-fail/index.md b/blog/2026/03-14-why-identical-strings-still-fail/index.md
new file mode 100644
index 00000000000..c608b26f8dd
--- /dev/null
+++ b/blog/2026/03-14-why-identical-strings-still-fail/index.md
@@ -0,0 +1,373 @@
+---
+slug: why-identical-strings-still-fail
+title: 看起來一樣，為什麼字串還是比對失敗？
+authors: Z. Yuan
+image: /img/2026/0314-string-compare-unicode.svg
+tags: [unicode, python, text-processing]
+description: 字串看起來一樣，不代表它們真的一樣。問題通常出在 Unicode、不可見字元，以及你對電腦的過度信任。
+---
+
+你一定遇過這種事：
+
+兩段字看起來一模一樣，結果程式就是比對失敗。
+
+然後你盯著螢幕五分鐘，開始懷疑自己是不是瞎了。
+
+通常不是你瞎了。
+
+是電腦太誠實。
+
+<!-- truncate -->
+
+對人類來說，兩個字「看起來一樣」，大多就會自動被腦補成同一件事。
+
+對程式不是。
+
+程式不看感覺，它看的是：
+
+- code point
+- byte sequence
+- 正規化形式
+- 有沒有混進不可見字元
+
+只要其中一項不同，它就很有可能判定：
+
+> **不一樣就是不一樣。**
+
+很冷酷，但也沒毛病。
+
+## 一個最經典的例子：`é`
+
+先看這兩段字：
+
+```python
+s1 = "café"
+s2 = "cafe\u0301"
+
+print(s1 == s2)
+```
+
+很多人直覺會以為輸出是 `True`。
+
+實際上通常是：
+
+```python
+False
+```
+
+因為這兩個 `é`，雖然長得一樣，但底層不是同一種表示法：
+
+- `é`：單一 code point
+- `e` + `◌́`：字母 `e` 加上 combining acute accent
+
+畫面看起來差不多。
+
+但對字串比較來說，它們不是同一串東西。
+
+## 為什麼會這樣？
+
+因為 Unicode 並不是「一個字長怎樣」那麼簡單。
+
+它更像是一套規則，告訴你：
+
+- 字元怎麼編號
+- 字元怎麼組合
+- 不同平台怎麼表示它們
+
+這裡有三個層次要分清楚。
+
+### 1. Code point
+
+Unicode 會替每個字元指派一個編號，例如：
+
+- `A` → `U+0041`
+- `é` → `U+00E9`
+
+這是最基本的身分證號。
+
+### 2. Grapheme
+
+使用者眼中看到的一個「字」，不一定只由一個 code point 組成。
+
+像剛才的 `e` + 重音符號，就是一個很典型的例子。
+
+人類看到的是一個字。
+
+程式看到的可能是兩個成分。
+
+### 3. Encoding
+
+等到字串真的要存成 bytes 時，又會有 UTF-8、UTF-16 之類的編碼問題。
+
+所以「看起來一樣」這件事，在不同層次上都可能失手。
+
+## 常見地雷，不只重音符號
+
+這類問題不只發生在法文或特殊字元，很多平常資料都會踩到。
+
+### 一、全形與半形
+
+```python
+s1 = "ABC123"
+s2 = "ＡＢＣ１２３"
+
+print(s1 == s2)  # False
+```
+
+對人類來說，這只是字比較胖。
+
+對程式來說，是完全不同的字元。
+
+### 二、不可見字元
+
+最討厭的通常不是長得不一樣的字，而是你看不到的字。
+
+例如：
+
+- zero-width space
+- non-breaking space
+- directional marks
+- 文字從網頁複製時帶進來的控制字元
+
+這些東西混進資料後，畫面還是很乾淨，只有你的比對結果開始發瘋。
+
+### 三、大小寫不是你想的那麼簡單
+
+很多人以為 case-insensitive compare 只要 `lower()` 就好。
+
+不一定。
+
+某些語言的大小寫轉換規則沒那麼樸素，Unicode 也不是全世界都只講英文。
+
+如果你真的要做 Unicode 層級的大小寫無關比較，通常會更偏向使用：
+
+```python
+text.casefold()
+```
+
+而不是只靠 `lower()`。
+
+## 解法：先正規化，再談比較
+
+這種問題的標準處理方式叫做 **Unicode normalization**。
+
+Python 內建的 `unicodedata` 就能做：
+
+```python
+import unicodedata
+
+s1 = "café"
+s2 = "cafe\u0301"
+
+n1 = unicodedata.normalize("NFC", s1)
+n2 = unicodedata.normalize("NFC", s2)
+
+print(n1 == n2)  # True
+```
+
+這時候兩邊就會先被整理成相同的表示形式，再做比較。
+
+終於肯講人話了。
+
+## NFC、NFD、NFKC、NFKD 到底差在哪？
+
+這四個名字第一次看很像亂碼，實際上只是在回答兩個問題：
+
+1. 要不要拆開？
+2. 要不要做相容性轉換？
+
+### 1. NFC
+
+**Canonical Composition**
+
+傾向把可合併的字元組合回去。
+
+例如：
+
+- `e` + accent → `é`
+
+這通常是**最保守也最常用**的選擇。
+
+如果你的需求是：
+
+- 儲存一般文字
+- 做穩定比對
+- 保留原始語意
+
+那大多數情況下，先試 `NFC` 就對了。
+
+### 2. NFD
+
+**Canonical Decomposition**
+
+把可組合字元拆開。
+
+比較常見於某些文字分析流程，或你真的需要逐個組件處理字元時。
+
+一般業務系統不太會把它當預設格式。
+
+### 3. NFKC
+
+**Compatibility Composition**
+
+除了標準正規化之外，還會做「相容性」層級的轉換。
+
+例如某些：
+
+- 全形字
+- 相容字元
+- 視覺上接近但語意被 Unicode 視為可折疊的形式
+
+都可能被收斂成更統一的結果。
+
+這很有用。
+
+也很危險。
+
+因為它做得比較多，所以適合：
+
+- 搜尋索引
+- 使用者輸入清理
+- 帳號、識別碼這種你想盡量收斂格式的欄位
+
+但如果你處理的是：
+
+- 法律文本
+- 排版敏感內容
+- 必須保留原貌的資料
+
+那就不要隨手上 `NFKC`。
+
+### 4. NFKD
+
+拆開版的 compatibility normalization。
+
+除非你真的知道自己在做什麼，不然大部分時候不會先選它。
+
+## 一個比較像樣的清理流程
+
+實務上，比對文字通常不只做 normalization。
+
+還會一起處理：
+
+- Unicode normalization
+- case folding
+- 空白整理
+- 不可見控制字元移除
+
+例如：
+
+```python
+import re
+import unicodedata
+
+
+def normalize_text(text: str) -> str:
+    text = unicodedata.normalize("NFKC", text)
+    text = text.casefold()
+    text = re.sub(r"[\u200b-\u200f\u202a-\u202e\u2060-\u206f]", "", text)
+    text = re.sub(r"\s+", " ", text).strip()
+    return text
+
+
+s1 = " Docsaid\u00A0Lab "
+s2 = "docsaid lab"
+
+print(normalize_text(s1) == normalize_text(s2))  # True
+```
+
+這個版本已經比單純 `strip().lower()` 靠譜很多。
+
+至少不會一邊自信，一邊出錯。
+
+## 但不要什麼都正規化
+
+這裡有個很常見的過度工程：
+
+> 「反正 normalization 很好用，那我全部欄位都先做一遍。」
+
+別。
+
+有些資料不能亂動。
+
+例如：
+
+- 密碼
+- token
+- 簽章資料
+- 雜湊前原文
+- 需要逐 byte 保真的欄位
+
+這些東西只要你先正規化，後面就可能整串對不起來。
+
+有些系統甚至不是壞在比較，而是壞在你「好心幫它整理過」。
+
+工程界有很多 bug，就是這樣被做出來的。
+
+## 什麼時候該用哪一種？
+
+如果你懶得記規格，可以先記這個粗暴版本：
+
+- **一般文字儲存 / 顯示**：先考慮 `NFC`
+- **搜尋、帳號、使用者輸入比對**：考慮 `NFKC` + `casefold()`
+- **安全敏感資料**：不要亂正規化
+- **看到明明一樣卻比對失敗**：先懷疑 Unicode，再懷疑人生
+
+這個順序比較省時間。
+
+## 怎麼快速排查？
+
+當你懷疑字串有鬼，不要只 `print(text)`。
+
+那通常沒有用。
+
+請直接看它的表示方式：
+
+```python
+text = "cafe\u0301"
+
+print(repr(text))
+print([hex(ord(ch)) for ch in text])
+```
+
+輸出會像這樣：
+
+```python
+'cafe\u0301'
+['0x63', '0x61', '0x66', '0x65', '0x301']
+```
+
+這時你就知道，不是資料庫在針對你，也不是 Python 今天心情不好。
+
+是字串裡真的多了一個 combining mark。
+
+## 最後
+
+字串比對失敗，很多時候不是邏輯太複雜。
+
+而是你以為「看起來一樣」就等於「底層一樣」。
+
+這個假設對人類合理，對電腦不合理。
+
+電腦不會幫你腦補。
+
+它只會安靜地回你一個 `False`，然後看你自己崩潰。
+
+所以，如果你有以下症狀：
+
+- 從網頁貼過來的字一直對不起來
+- 使用者名稱明明一樣卻查不到
+- 多語系文本在搜尋和去重時怪怪的
+- 比對前你只做了 `lower().strip()` 然後很有信心
+
+那你現在該做的事情大概不是再加一個 `if`。
+
+而是先去把 Unicode 正規化補上。
+
+這比較像在修 bug，不像在祈禱。
+
+## 參考資料
+
+- [Unicode Standard Annex #15: Unicode Normalization Forms](https://unicode.org/reports/tr15/)
+- [Python `unicodedata` Documentation](https://docs.python.org/3/library/unicodedata.html)
+- [Python `str.casefold`](https://docs.python.org/3/library/stdtypes.html#str.casefold)
diff --git a/i18n/en/docusaurus-plugin-content-blog/2026/03-14-why-identical-strings-still-fail/index.md b/i18n/en/docusaurus-plugin-content-blog/2026/03-14-why-identical-strings-still-fail/index.md
new file mode 100644
index 00000000000..f568788b2cd
--- /dev/null
+++ b/i18n/en/docusaurus-plugin-content-blog/2026/03-14-why-identical-strings-still-fail/index.md
@@ -0,0 +1,378 @@
+---
+slug: why-identical-strings-still-fail
+title: They Look the Same. Why Do Strings Still Fail to Match?
+authors: Z. Yuan
+image: /en/img/2026/0314-string-compare-unicode.svg
+tags: [unicode, python, text-processing]
+description: Two strings can look identical and still fail to match. The usual suspects are Unicode, invisible characters, and misplaced trust in computers.
+---
+
+You have probably seen this before:
+
+Two strings look exactly the same, and the comparison still fails.
+
+Then you stare at the screen for five minutes and begin to wonder whether your eyes have stopped working.
+
+Usually, your eyes are fine.
+
+The computer is just being painfully honest.
+
+<!-- truncate -->
+
+For humans, “looks the same” is often good enough.
+
+For code, it is not.
+
+Code does not care about vibes. It cares about:
+
+- code points
+- byte sequences
+- normalization forms
+- whether invisible characters are hiding inside the string
+
+If any of those differ, the answer may simply be:
+
+> **Different is different.**
+
+Cold, yes.
+
+Wrong? Not really.
+
+## The classic example: `é`
+
+Take these two strings:
+
+```python
+s1 = "café"
+s2 = "cafe\u0301"
+
+print(s1 == s2)
+```
+
+A lot of people expect `True`.
+
+In practice, you usually get:
+
+```python
+False
+```
+
+Why? Because these two versions of `é` are not represented the same way:
+
+- `é`: a single code point
+- `e` + `◌́`: the letter `e` followed by a combining acute accent
+
+They look the same on screen.
+
+They are not the same string underneath.
+
+## Why does this happen?
+
+Because Unicode is not just “what character looks like what.”
+
+It is a system that defines:
+
+- how characters are assigned numbers
+- how they can be combined
+- how different platforms can represent them
+
+There are three layers worth separating.
+
+### 1. Code point
+
+Unicode assigns an identifier to each character, for example:
+
+- `A` → `U+0041`
+- `é` → `U+00E9`
+
+Think of this as the character’s ID card.
+
+### 2. Grapheme
+
+What a user sees as one visible character is not always one code point.
+
+The `e` plus accent example is a classic case.
+
+Humans see one character.
+
+Your program may see two pieces.
+
+### 3. Encoding
+
+Once strings become bytes, you still have encoding involved: UTF-8, UTF-16, and so on.
+
+So “looks the same” can fail at multiple layers.
+
+## The usual traps are not limited to accents
+
+This problem is not only about French text or unusual symbols. Plenty of ordinary data can trigger it.
+
+### 1. Full-width vs half-width characters
+
+```python
+s1 = "ABC123"
+s2 = "ＡＢＣ１２３"
+
+print(s1 == s2)  # False
+```
+
+To humans, this is the same text wearing a wider coat.
+
+To a program, these are different characters.
+
+### 2. Invisible characters
+
+The worst characters are often the ones you cannot see.
+
+For example:
+
+- zero-width spaces
+- non-breaking spaces
+- directional marks
+- control characters copied from web pages or office documents
+
+Once these get into your data, the text still looks clean.
+Your comparison logic, however, starts having opinions.
+
+### 3. Case conversion is not always as simple as you think
+
+A lot of people assume case-insensitive comparison means `lower()` and move on.
+
+Not always.
+
+Unicode covers far more than English, and some languages have case rules that are less obedient.
+
+If you actually want Unicode-aware case-insensitive comparison, this is usually closer to what you want:
+
+```python
+text.casefold()
+```
+
+Not just `lower()`.
+
+## The fix: normalize first, compare second
+
+The standard solution here is **Unicode normalization**.
+
+Python already gives you the tool:
+
+```python
+import unicodedata
+
+s1 = "café"
+s2 = "cafe\u0301"
+
+n1 = unicodedata.normalize("NFC", s1)
+n2 = unicodedata.normalize("NFC", s2)
+
+print(n1 == n2)  # True
+```
+
+Now both strings are converted to the same normalized form before comparison.
+
+At that point, the computer finally starts behaving like a reasonable colleague.
+
+## What do NFC, NFD, NFKC, and NFKD actually mean?
+
+The names look unpleasant at first, but they answer only two questions:
+
+1. should characters be decomposed?
+2. should compatibility transformations be applied?
+
+### 1. NFC
+
+**Canonical Composition**
+
+It prefers combining decomposed sequences when possible.
+
+For example:
+
+- `e` + accent → `é`
+
+This is usually the safest and most common choice.
+
+If your goal is:
+
+- storing text
+- doing stable comparisons
+- preserving meaning
+
+then `NFC` is a very reasonable default.
+
+### 2. NFD
+
+**Canonical Decomposition**
+
+It breaks combined characters into components.
+
+This is more useful in specialized text-processing workflows where you actually want to operate on the pieces.
+
+Most business systems do not use it as the default storage form.
+
+### 3. NFKC
+
+**Compatibility Composition**
+
+This goes beyond canonical normalization and also applies compatibility-level transformations.
+
+That means some things such as:
+
+- full-width characters
+- compatibility symbols
+- visually similar forms that Unicode considers foldable
+
+may be collapsed into a more unified representation.
+
+This is powerful.
+
+It is also not harmless.
+
+It is useful for:
+
+- search indexing
+- cleaning user input
+- usernames or identifiers where you want more aggressive normalization
+
+But if you are dealing with:
+
+- legally sensitive text
+- layout-sensitive content
+- data that must preserve original form exactly
+
+then do not reach for `NFKC` casually.
+
+### 4. NFKD
+
+This is the decomposed version of compatibility normalization.
+
+Unless you know exactly why you need it, it is usually not your first choice.
+
+## A more realistic text-cleaning pipeline
+
+In practice, string comparison often needs more than normalization alone.
+
+You may also want:
+
+- Unicode normalization
+- case folding
+- whitespace cleanup
+- removal of invisible formatting characters
+
+For example:
+
+```python
+import re
+import unicodedata
+
+
+def normalize_text(text: str) -> str:
+    text = unicodedata.normalize("NFKC", text)
+    text = text.casefold()
+    text = re.sub(r"[\u200b-\u200f\u202a-\u202e\u2060-\u206f]", "", text)
+    text = re.sub(r"\s+", " ", text).strip()
+    return text
+
+
+s1 = " Docsaid\u00A0Lab "
+s2 = "docsaid lab"
+
+print(normalize_text(s1) == normalize_text(s2))  # True
+```
+
+That is already far more reliable than `strip().lower()`.
+
+At least now the confidence is somewhat deserved.
+
+## But do not normalize everything
+
+A common overcorrection looks like this:
+
+> “Normalization works well. I will apply it to every field.”
+
+No.
+
+Some data should not be touched.
+
+For example:
+
+- passwords
+- tokens
+- signed payloads
+- pre-hash source text
+- any field that must preserve byte-level fidelity
+
+If you normalize those, you may quietly break the entire pipeline.
+
+Some systems do not fail because comparison is hard.
+They fail because someone helpfully “cleaned” the data first.
+
+Engineering has produced many bugs this way.
+
+## So which one should you use?
+
+If you do not want to memorize the spec, remember this rough rule:
+
+- **general text storage / display**: start with `NFC`
+- **search, usernames, user-input comparison**: consider `NFKC` + `casefold()`
+- **security-sensitive data**: do not normalize casually
+- **if matching fails even though text looks identical**: suspect Unicode before suspecting your sanity
+
+That ordering saves time.
+
+## How do you debug this quickly?
+
+When you suspect a string is hiding something, do not just `print(text)`.
+
+That is often useless.
+
+Inspect the representation directly:
+
+```python
+text = "cafe\u0301"
+
+print(repr(text))
+print([hex(ord(ch)) for ch in text])
+```
+
+You will get something like:
+
+```python
+'cafe\u0301'
+['0x63', '0x61', '0x66', '0x65', '0x301']
+```
+
+At that point, you know the database is not targeting you personally and Python has not developed attitude.
+
+There really is a combining mark in the string.
+
+## Final words
+
+String matching failures are often not a sign that your logic is complicated.
+
+They are a sign that you assumed “visually identical” meant “structurally identical.”
+
+That is a reasonable assumption for humans.
+It is not a reasonable assumption for computers.
+
+Computers do not fill in the blanks for you.
+
+They quietly return `False` and let you experience character development.
+
+So if you are dealing with any of these:
+
+- copied text from web pages that never matches
+- usernames that look identical but fail lookup
+- multilingual text behaving strangely in search or deduplication
+- a comparison pipeline built on `lower().strip()` and optimism
+
+then the next step is probably not another `if` statement.
+
+It is Unicode normalization.
+
+That looks more like debugging and less like prayer.
+
+## References
+
+- [Unicode Standard Annex #15: Unicode Normalization Forms](https://unicode.org/reports/tr15/)
+- [Python `unicodedata` Documentation](https://docs.python.org/3/library/unicodedata.html)
+- [Python `str.casefold`](https://docs.python.org/3/library/stdtypes.html#str.casefold)
diff --git a/i18n/ja/docusaurus-plugin-content-blog/2026/03-14-why-identical-strings-still-fail/index.md b/i18n/ja/docusaurus-plugin-content-blog/2026/03-14-why-identical-strings-still-fail/index.md
new file mode 100644
index 00000000000..63af0cf036d
--- /dev/null
+++ b/i18n/ja/docusaurus-plugin-content-blog/2026/03-14-why-identical-strings-still-fail/index.md
@@ -0,0 +1,363 @@
+---
+slug: why-identical-strings-still-fail
+title: 同じに見えるのに、なぜ文字列比較は失敗するのか？
+authors: Z. Yuan
+image: /ja/img/2026/0314-string-compare-unicode.svg
+tags: [unicode, python, text-processing]
+description: 見た目が同じでも、文字列が同じとは限りません。原因はたいてい Unicode、不可視文字、そしてコンピュータへの雑な期待です。
+---
+
+こういう経験はたぶん一度はあるはずです。
+
+文字列はまったく同じに見えるのに、比較すると失敗する。
+
+そして画面を五分くらい見つめたあと、自分の目がおかしくなったのかと思い始める。
+
+たいてい、目は悪くありません。
+
+コンピュータが妙に正直なだけです。
+
+<!-- truncate -->
+
+人間にとって「同じに見える」は、だいたい十分です。
+
+コードにとっては違います。
+
+コードが見ているのは、感覚ではなく次のようなものです。
+
+- code point
+- byte sequence
+- 正規化形式
+- 文字列の中に不可視文字が混じっていないか
+
+そのどれかが違えば、答えはこうなります。
+
+> **違うものは違う。**
+
+冷たいですが、別に間違ってはいません。
+
+## 定番の例：`é`
+
+まずはこの二つを見てください。
+
+```python
+s1 = "café"
+s2 = "cafe\u0301"
+
+print(s1 == s2)
+```
+
+多くの人は `True` を期待します。
+
+でも実際には、たいていこうなります。
+
+```python
+False
+```
+
+理由は単純で、この二つの `é` は内部表現が同じではないからです。
+
+- `é`：単一の code point
+- `e` + `◌́`：`e` のあとに combining acute accent
+
+画面では同じに見えても、文字列としては別物です。
+
+## どうしてこうなるのか？
+
+Unicode は、単なる「この字はこう見える」という話ではありません。
+
+むしろ次のことを定める仕組みです。
+
+- 文字にどう番号を振るか
+- 文字をどう組み合わせるか
+- それを各環境でどう表現するか
+
+ここでは三つの層を分けて考えると分かりやすいです。
+
+### 1. Code point
+
+Unicode は各文字に識別子を割り当てます。たとえば：
+
+- `A` → `U+0041`
+- `é` → `U+00E9`
+
+文字の身分証みたいなものです。
+
+### 2. Grapheme
+
+ユーザーが「一文字」と認識するものが、必ずしも一つの code point とは限りません。
+
+`e` とアクセントの組み合わせは、その典型です。
+
+人間は一文字と見ます。
+
+プログラムは二つの部品と見ているかもしれません。
+
+### 3. Encoding
+
+さらに文字列が bytes になる段階では、UTF-8 や UTF-16 のような encoding も関わってきます。
+
+つまり「同じに見える」は、いくつもの層で簡単に裏切られます。
+
+## よくある地雷は、アクセントだけではない
+
+この問題はフランス語や特殊文字だけの話ではありません。普段のデータでも普通に起きます。
+
+### 1. 全角と半角
+
+```python
+s1 = "ABC123"
+s2 = "ＡＢＣ１２３"
+
+print(s1 == s2)  # False
+```
+
+人間から見ると、少し幅が広いだけです。
+
+プログラムから見ると、別の文字です。
+
+### 2. 不可視文字
+
+面倒なのは、違って見える文字より、見えない文字です。
+
+たとえば：
+
+- zero-width space
+- non-breaking space
+- directional marks
+- Web ページや Office 文書から混入した制御文字
+
+これらが入っても画面はきれいなままです。
+
+壊れるのは、だいたい比較処理の方です。
+
+### 3. 大文字・小文字も思ったほど単純ではない
+
+case-insensitive compare は `lower()` で十分、と思っている人は多いです。
+
+残念ながら、必ずしもそうではありません。
+
+Unicode は英語だけの世界ではありませんし、言語によっては大小文字変換がもっと癖のある動きをします。
+
+Unicode を意識した大小文字無視の比較なら、たいていはこちらの方がましです。
+
+```python
+text.casefold()
+```
+
+`lower()` だけで済ませない方が安全です。
+
+## 解決策：まず正規化してから比較する
+
+こういう問題の標準的な対処は **Unicode normalization** です。
+
+Python なら `unicodedata` が最初から使えます。
+
+```python
+import unicodedata
+
+s1 = "café"
+s2 = "cafe\u0301"
+
+n1 = unicodedata.normalize("NFC", s1)
+n2 = unicodedata.normalize("NFC", s2)
+
+print(n1 == n2)  # True
+```
+
+両方を同じ正規化形式にそろえてから比較すれば、ようやく話が通じます。
+
+## NFC、NFD、NFKC、NFKD は何が違うのか？
+
+最初は暗号みたいに見えますが、実際には二つの問いに答えているだけです。
+
+1. 分解するか？
+2. compatibility 変換までやるか？
+
+### 1. NFC
+
+**Canonical Composition**
+
+可能なものは合成した形に寄せます。
+
+たとえば：
+
+- `e` + accent → `é`
+
+これは一番無難で、よく使われる選択です。
+
+用途が次のようなものなら、まず `NFC` を考えれば大きく外しません。
+
+- 普通の文字列保存
+- 安定した比較
+- 意味を保ったまま整形したい場合
+
+### 2. NFD
+
+**Canonical Decomposition**
+
+合成文字を分解します。
+
+文字の構成要素を個別に扱いたい処理では役立ちますが、一般的な業務システムの保存形式としてはあまり選ばれません。
+
+### 3. NFKC
+
+**Compatibility Composition**
+
+標準的な正規化に加えて、compatibility レベルの変換も行います。
+
+つまり、たとえば次のようなものが、より統一された形に寄せられる可能性があります。
+
+- 全角文字
+- compatibility 文字
+- 見た目が似ていて Unicode 上は折りたためる形式
+
+便利です。
+
+同時に、雑に使うと危険です。
+
+向いているのは：
+
+- 検索インデックス
+- ユーザー入力の整理
+- ユーザー名や識別子の比較
+
+逆に、次のようなものには慎重になるべきです。
+
+- 法的に厳密な文面
+- レイアウト依存の内容
+- 元の見た目を正確に残す必要があるデータ
+
+### 4. NFKD
+
+compatibility normalization の分解版です。
+
+明確な理由がないなら、最初に選ぶことはあまりありません。
+
+## 実務では、もう少しまとめて処理する
+
+実際の文字列比較は、normalization だけで終わらないことが多いです。
+
+たとえば次のような処理も一緒に入ります。
+
+- Unicode normalization
+- case folding
+- 空白整理
+- 不可視の整形文字の除去
+
+例を挙げると、こうなります。
+
+```python
+import re
+import unicodedata
+
+
+def normalize_text(text: str) -> str:
+    text = unicodedata.normalize("NFKC", text)
+    text = text.casefold()
+    text = re.sub(r"[\u200b-\u200f\u202a-\u202e\u2060-\u206f]", "", text)
+    text = re.sub(r"\s+", " ", text).strip()
+    return text
+
+
+s1 = " Docsaid\u00A0Lab "
+s2 = "docsaid lab"
+
+print(normalize_text(s1) == normalize_text(s2))  # True
+```
+
+これなら `strip().lower()` よりはずっとまともです。
+
+少なくとも、自信満々に間違える確率は下がります。
+
+## ただし、何でも正規化すればよいわけではない
+
+ここでありがちな過剰対応があります。
+
+> 「正規化は便利だ。全部のフィールドにかけよう。」
+
+やめた方がいいです。
+
+触ってはいけないデータがあります。
+
+たとえば：
+
+- パスワード
+- token
+- 署名対象データ
+- hash 前の原文
+- byte 単位で厳密性が必要なフィールド
+
+こういうものを勝手に正規化すると、後で全部つじつまが合わなくなります。
+
+比較が難しいのではなく、誰かが親切のつもりで壊しているだけ、という事故は珍しくありません。
+
+## では、どれを使うべきか？
+
+仕様を全部覚えたくないなら、この雑だが実用的なルールで十分です。
+
+- **一般的な文字列保存 / 表示**：まず `NFC`
+- **検索、ユーザー名、入力比較**：`NFKC` + `casefold()` を検討
+- **セキュリティ敏感なデータ**：むやみに正規化しない
+- **見た目は同じなのに比較が失敗する**：まず Unicode を疑う
+
+この順番の方が時間を無駄にしません。
+
+## どうやって素早く調べるか？
+
+文字列に何か潜んでいそうなら、`print(text)` だけでは足りません。
+
+たいてい、それでは何も分かりません。
+
+表現を直接見ます。
+
+```python
+text = "cafe\u0301"
+
+print(repr(text))
+print([hex(ord(ch)) for ch in text])
+```
+
+こんな出力になります。
+
+```python
+'cafe\u0301'
+['0x63', '0x61', '0x66', '0x65', '0x301']
+```
+
+これで、データベースがあなたを嫌っているわけでも、Python が急に気難しくなったわけでもないと分かります。
+
+文字列の中に、本当に combining mark が入っているだけです。
+
+## 最後に
+
+文字列比較の失敗は、必ずしもロジックが難しいせいではありません。
+
+多くの場合は、「見た目が同じなら中身も同じだろう」という前提が崩れているだけです。
+
+その前提は人間には自然です。
+
+コンピュータには自然ではありません。
+
+コンピュータは補完してくれません。
+
+静かに `False` を返して、こちらに学習を要求してくるだけです。
+
+もし今あなたが、こんな症状を見ているなら：
+
+- Web から貼った文字列がどうしても一致しない
+- 同じに見えるユーザー名が検索で出てこない
+- 多言語テキストの検索や重複排除が妙に怪しい
+- 比較前に `lower().strip()` だけやって安心していた
+
+次にやるべきことは、`if` を足すことではたぶんありません。
+
+Unicode 正規化です。
+
+その方が、祈るよりはずっと工学的です。
+
+## 参考資料
+
+- [Unicode Standard Annex #15: Unicode Normalization Forms](https://unicode.org/reports/tr15/)
+- [Python `unicodedata` Documentation](https://docs.python.org/3/library/unicodedata.html)
+- [Python `str.casefold`](https://docs.python.org/3/library/stdtypes.html#str.casefold)
diff --git a/static/img/2026/0314-string-compare-unicode.svg b/static/img/2026/0314-string-compare-unicode.svg
new file mode 100644
index 00000000000..29975987cf9
--- /dev/null
+++ b/static/img/2026/0314-string-compare-unicode.svg
@@ -0,0 +1,21 @@
+<svg width="1200" height="630" viewBox="0 0 1200 630" fill="none" xmlns="http://www.w3.org/2000/svg">
+  <defs>
+    <linearGradient id="bg" x1="0" y1="0" x2="1200" y2="630" gradientUnits="userSpaceOnUse">
+      <stop stop-color="#0B1020"/>
+      <stop offset="1" stop-color="#1D428A"/>
+    </linearGradient>
+  </defs>
+  <rect width="1200" height="630" rx="28" fill="url(#bg)"/>
+  <rect x="72" y="76" width="1056" height="478" rx="26" fill="#0F172A" fill-opacity="0.55" stroke="#93C5FD" stroke-opacity="0.25"/>
+  <text x="112" y="176" fill="#F8FAFC" font-family="-apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif" font-size="54" font-weight="700">They look the same.</text>
+  <text x="112" y="238" fill="#F8FAFC" font-family="-apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif" font-size="54" font-weight="700">Why do strings still fail?</text>
+  <text x="112" y="306" fill="#BFDBFE" font-family="SFMono-Regular, Menlo, Consolas, monospace" font-size="30">"café" != "cafe◌́"</text>
+  <text x="112" y="360" fill="#CBD5E1" font-family="-apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif" font-size="28">Unicode normalization, invisible characters, and engineer sadness.</text>
+  <rect x="746" y="146" width="288" height="96" rx="18" fill="#111827" stroke="#60A5FA" stroke-opacity="0.6"/>
+  <text x="780" y="204" fill="#E2E8F0" font-family="SFMono-Regular, Menlo, Consolas, monospace" font-size="34">NFC</text>
+  <rect x="746" y="268" width="288" height="96" rx="18" fill="#111827" stroke="#60A5FA" stroke-opacity="0.35"/>
+  <text x="780" y="326" fill="#E2E8F0" font-family="SFMono-Regular, Menlo, Consolas, monospace" font-size="34">NFKC</text>
+  <rect x="746" y="390" width="288" height="96" rx="18" fill="#111827" stroke="#60A5FA" stroke-opacity="0.2"/>
+  <text x="780" y="448" fill="#E2E8F0" font-family="SFMono-Regular, Menlo, Consolas, monospace" font-size="34">casefold()</text>
+  <text x="112" y="500" fill="#E2E8F0" font-family="-apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif" font-size="22">DOCSAID</text>
+</svg>