Skip to content

[Romanizer] 羅馬字顯示變調 ruby#226

Open
Aiuanyu wants to merge 1 commit intomainfrom
romanizer-sandhi-ruby-5811855358397986759
Open

[Romanizer] 羅馬字顯示變調 ruby#226
Aiuanyu wants to merge 1 commit intomainfrom
romanizer-sandhi-ruby-5811855358397986759

Conversation

@Aiuanyu
Copy link
Copy Markdown
Owner

@Aiuanyu Aiuanyu commented Apr 7, 2026

This change introduces a modular sandhi engine (HakkaSandhi) and integrates it into the Romanizer tool to display tone sandhi results using <ruby> tags, matching the visual style of dictionary search results.

Key changes:

  • New js/sandhi.js engine supporting Dapu dialect rules with a framework for future expansion.
  • Romanizer tool now applies sandhi based on selected dialect and renders ruby markup.
  • Refactored Romanizer's UI interaction logic (uppercase toggle, deletion, hyphenation) to correctly traverse nested ruby elements.
  • Synchronized dictionary sandhi logic in main.js with the new centralized engine.
  • Consistent CSS coloring for sandhi changes in both learning and Romanizer views.

Fixes #225


PR created automatically by Jules for task 5811855358397986759 started by @Aiuanyu

- Introduced `js/sandhi.js` as a centralized tone sandhi engine for Hakka dialects.
- Updated the Romanizer to display tone sandhi results using `<ruby>` tags.
- Refactored `main.js` to use the new sandhi engine for dictionary search results.
- Enhanced Romanizer logic to correctly handle manual capitalization and joining rules for syllables wrapped in `<ruby>` tags.
- Added styling for sandhi-applied ruby elements in the Romanizer tool.
- Normalized dialect name handling within the sandhi engine.

Co-authored-by: Aiuanyu <93833368+Aiuanyu@users.noreply.github.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Apr 7, 2026

Deploying hakspring with  Cloudflare Pages  Cloudflare Pages

Latest commit: b63c00a
Status: ✅  Deploy successful!
Preview URL: https://aaabbed9.hakspring.pages.dev
Branch Preview URL: https://romanizer-sandhi-ruby-581185.hakspring.pages.dev

View logs

@google-labs-jules
Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@claude
Copy link
Copy Markdown

claude bot commented Apr 7, 2026

Code Review — PR #226: [Romanizer] 羅馬字顯示變調 ruby

Good architectural decision to extract sandhi logic into a dedicated HakkaSandhi module. The centralization removes a lot of duplication and the IIFE pattern with a clean public API (applyToTokens, applyToHtml, hasRulesFor) is well-structured. Below are observations worth addressing before merge.


Bugs / Correctness

1. Sandhi rule divergence from old getDapuSandhiHtml (potentially intentional)

In js/sandhi.js Rule 1 (line ~32), the current-token pattern now includes circumflex vowels:

// NEW (sandhi.js)
current.match(/[àèìòùâêîôû](?![bdg])/)
// OLD (main.js getDapuSandhiHtml)
variant.match(/[àèìòù](?![bdg])/)

The comment acknowledges this ("Dapu 31 is circumflex"), so it may be intentional — but it is a silent behaviour change. If circumflex âêîôû represents tone 53 (not 31), and tone 53 has the same sandhi outcome as tone 31, adding it here is correct. Worth a short comment confirming this is deliberate.

2. Rule 2 next-syllable expansion

Old Rule 2 in main.js:

next.match(/[ǎěǐǒǔâêîôû]/)

New Rule 2 in sandhi.js:

next.match(/[ǎěǐǒǔâêîôûàèìòùáéíóú]/)

This adds grave (àèìòù) and acute (áéíóú) accented vowels as valid next-syllable triggers for the 33→35 sandhi rule. If these characters represent additional tones that correctly trigger this sandhi, the expansion is right — but it could also cause false positives. Please confirm these extra characters are intended triggers.

3. ruby.textContent setter clears children before rt is appended — order is correct but fragile

In js/romanizer.js inside populateSegmentGroup:

ruby.textContent = v.text;   // creates text node, removes any prior children
const rt = document.createElement('rt');
rt.textContent = v.newTone;
ruby.appendChild(rt);

This works because the ruby element is freshly created. However, using textContent as a setter on a parent element is a footgun — it silently removes all children. Prefer ruby.appendChild(document.createTextNode(v.text)) to make the intent explicit and safe if construction order ever changes. This pattern appears in two places in the new romanizer code.


Code Quality

4. Repeated ruby-building pattern (3x)

The same ruby construction block appears in three places: the res.variants forEach, the else if (res.newTone) branch in populateSegmentGroup, and the string template in applyToHtml. Consider extracting a small DOM helper:

function makeSandhiRuby(text, newTone, rubyClass) {
    const ruby = document.createElement('ruby');
    ruby.className = rubyClass;
    ruby.appendChild(document.createTextNode(text));
    const rt = document.createElement('rt');
    rt.textContent = newTone;
    ruby.appendChild(rt);
    return ruby;
}

5. Indentation inconsistency in main.js

The new block in applyDapuSandhiToGenerated uses 4-space indentation inside the if while surrounding code uses 2 spaces. Minor, but inconsistent with the rest of the file.

6. Dead-code guard in applyToTokens

The check currentToken.startsWith('<ruby') inside applyToTokens (sandhi.js ~line 79) is only reachable when the function is called from applyToHtml. When called from populateSegmentGroup, input tokens come from tokenizeRomanization which never produces raw HTML strings. The guard is harmless but a short comment explaining why it exists would help future readers.


Minor Notes

  • The typeof HakkaSandhi !== 'undefined' guard is checked at every call site. Since sandhi.js is loaded with defer before main.js and romanizer.js, the only realistic failure mode is a network load error. The guards are fine defensively, but an alternative is a no-op stub exported from sandhi.js itself so callers do not need the guard.
  • The window.HakkaSandhi = HakkaSandhi export at the bottom of sandhi.js works correctly for the current script-tag setup. If the project ever migrates to ES modules, swap this for a standard export.
  • No tests are included. The rule logic is pure (string in → object out), so unit tests for applyToTokens with known input/output pairs would be straightforward and would help catch regressions when adding new dialect rules in the future.

Overall this is a clean, well-motivated refactor. The main things to verify before merging are items 1 and 2 (rule correctness) and item 3 (the textContent setter footgun).

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a generalized Hakka tone sandhi engine in a new file js/sandhi.js and integrates it across the application. The changes refactor the romanizer to support rendering tone changes using <ruby> tags and replace the previous hardcoded Dapu sandhi logic in main.js with the new engine. Feedback focuses on reducing code duplication in js/romanizer.js by extracting helper functions for retrieving text nodes from syllable wrappers and for creating ruby elements.

Comment on lines 111 to 122
const textNode = Array.from(wrapper.childNodes).find(n => n.nodeType === Node.TEXT_NODE);
const rubyElement = wrapper.querySelector('ruby');

if (textNode) {
const currentText = textNode.textContent;
if (currentText[0] === currentText[0].toUpperCase()) {
textNode.textContent = currentText.toLowerCase();
} else {
textNode.textContent = currentText.charAt(0).toUpperCase() + currentText.slice(1);
handleTextNode(textNode);
} else if (rubyElement) {
// If it's a ruby, the first child node should be the text node of the base text
const rubyTextNode = Array.from(rubyElement.childNodes).find(n => n.nodeType === Node.TEXT_NODE);
if (rubyTextNode) {
handleTextNode(rubyTextNode);
}
// 手動調整後,毋使呼叫 applyCapitalizationRules,直接儲存狀態
saveState();
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

這段用來尋找 syllable wrapper (<span><ruby>) 內文字節點的邏輯,在 applyCapitalizationRules 函式中 (分別是 handleLoweringhandleUpper 的部分) 也有重複。

為了提高程式碼的可維護性並減少重複,建議將這段邏輯提取到一個獨立的輔助函式中。例如,可以在 romanizer.js 的頂層範圍定義一個函式:

function findSyllableTextNode(wrapper) {
    const textNode = Array.from(wrapper.childNodes).find(n => n.nodeType === Node.TEXT_NODE);
    if (textNode) return textNode;

    const rubyElement = wrapper.querySelector('ruby');
    if (rubyElement) {
        // The base text of a <ruby> tag is its first child text node.
        return Array.from(rubyElement.childNodes).find(n => n.nodeType === Node.TEXT_NODE) || null;
    }
    return null;
}

然後,這裡的程式碼就可以簡化成:

        const nodeToModify = findSyllableTextNode(wrapper);
        if (nodeToModify) {
            handleTextNode(nodeToModify);
        }

同樣的簡化也可以應用在 applyCapitalizationRules 函式中,讓程式碼更簡潔且易於維護。

Comment on lines +541 to +568
if (res.variants) {
res.variants.forEach((v, idx) => {
if (v.newTone) {
const ruby = document.createElement('ruby');
ruby.className = v.rubyClass;
ruby.textContent = v.text;
const rt = document.createElement('rt');
rt.textContent = v.newTone;
ruby.appendChild(rt);
wrapper.appendChild(ruby);
} else {
wrapper.appendChild(document.createTextNode(v.text));
}
if (idx < res.variants.length - 1) {
wrapper.appendChild(document.createTextNode('/'));
}
});
} else if (res.newTone) {
const ruby = document.createElement('ruby');
ruby.className = res.rubyClass;
ruby.textContent = res.text;
const rt = document.createElement('rt');
rt.textContent = res.newTone;
ruby.appendChild(rt);
wrapper.appendChild(ruby);
} else {
wrapper.appendChild(document.createTextNode(token));
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

if (res.variants) 區塊和 else if (res.newTone) 區塊中,創建 <ruby> 元素的程式碼是重複的。

為了讓程式碼更簡潔並易於維護,建議將這段邏輯提取到一個輔助函式中,以有效減少重複的程式碼。

                const createRubyElement = (text, newTone, className) => {
                    const ruby = document.createElement('ruby');
                    ruby.className = className;
                    ruby.textContent = text;
                    const rt = document.createElement('rt');
                    rt.textContent = newTone;
                    ruby.appendChild(rt);
                    return ruby;
                };

                if (res.variants) {
                    res.variants.forEach((v, idx) => {
                        if (v.newTone) {
                            wrapper.appendChild(createRubyElement(v.text, v.newTone, v.rubyClass));
                        } else {
                            wrapper.appendChild(document.createTextNode(v.text));
                        }
                        if (idx < res.variants.length - 1) {
                            wrapper.appendChild(document.createTextNode('/'));
                        }
                    });
                } else if (res.newTone) {
                    wrapper.appendChild(createRubyElement(res.text, res.newTone, res.rubyClass));
                } else {
                    wrapper.appendChild(document.createTextNode(token));
                }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Romanizer] 羅馬字顯示變調 ruby

1 participant