-
Notifications
You must be signed in to change notification settings - Fork 1
Potential fix for code scanning alert no. 122: Incomplete multi-character sanitization #53
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -316,9 +316,16 @@ | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| private extractTextContent(html: string): string { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return html | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .replace(/<script[^>]*>.*?<\/script>/gi, '') | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .replace(/<style[^>]*>.*?<\/style>/gi, '') | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| // Repeatedly remove <script> and <style> tags and their content | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| let sanitized = html; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| let previous; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| do { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| previous = sanitized; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| sanitized = sanitized | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .replace(/<script[^>]*>.*?<\/script>/gis, '') | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Check failure on line 325 in src/lib/search-service.ts
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+324
to
+325
Check failureCode scanning / CodeQL Incomplete multi-character sanitization High
This string may still contain
<script Error loading related location Loading |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .replace(/<style[^>]*>.*?<\/style>/gis, ''); | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| } while (sanitized !== previous); | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return sanitized | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+319
to
+328
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion ❓ Verification inconclusivePrefer DOM parsing over regex for robust tag stripping (addresses CodeQL alert #122). Regex-based HTML stripping is brittle (nested tags, malformed markup, edge whitespace). Since this runs in a browser context, use DOMParser to remove Proposed replacement within this method: private extractTextContent(html: string): string {
- // Repeatedly remove <script> and <style> tags and their content
- let sanitized = html;
- let previous;
- do {
- previous = sanitized;
- sanitized = sanitized
- .replace(/<script\b[^>]*>[\s\S]*?<\/script\s*>/gi, '')
- .replace(/<style\b[^>]*>[\s\S]*?<\/style\s*>/gi, '');
- } while (sanitized !== previous);
- return sanitized
- .replace(/<[^>]*>/g, ' ')
- .replace(/\s+/g, ' ')
- .trim();
+ // Prefer DOM parsing for robust removal of script/style content
+ if (typeof DOMParser !== 'undefined') {
+ const parser = new DOMParser();
+ const doc = parser.parseFromString(html, 'text/html');
+ doc.querySelectorAll('script, style').forEach((n) => n.remove());
+ const text = doc.body?.textContent ?? '';
+ return text.replace(/\s+/g, ' ').trim();
+ }
+ // Fallback to regex if DOMParser is unavailable (e.g., non-browser env)
+ let sanitized = html
+ .replace(/<script\b[^>]*>[\s\S]*?<\/script\s*>/gi, '')
+ .replace(/<style\b[^>]*>[\s\S]*?<\/style\s*>/gi, '')
+ .replace(/<[^>]*>/g, ' ');
+ return sanitized.replace(/\s+/g, ' ').trim();
}Verification: after applying, please re-run CodeQL on this PR to confirm alerts 185 and 186 are resolved. 🏁 Script executed: #!/bin/bash
echo "Checking for DOMParser references and runtime environment indicators..."
grep -R "DOMParser" -n src || echo "No existing DOMParser references in src/"
echo
echo "Checking for jsdom in dependencies..."
grep -R "\"jsdom\"" -n package.json || echo "No jsdom dependency in package.json"
echo
echo "Searching for browser globals usage..."
grep -R "document\." -n src || echo "No document references found in src/"
grep -R "window\." -n src || echo "No window references found in src/"
echo
echo "Locating extractTextContent usages..."
grep -R "extractTextContent" -n src || echo "No calls to extractTextContent found in src/"Length of output: 6218 Use DOMParser for robust Relying on iterative regex is brittle and can miss nested or malformed tags. Since this code runs in a browser (and under jsdom in tests), we can safely parse the HTML, remove unwanted nodes, and extract text. In non-browser environments, fall back to the original iterative regex logic. In private extractTextContent(html: string): string {
- // Repeatedly remove <script> and <style> tags and their content
- let sanitized = html;
- let previous;
- do {
- previous = sanitized;
- sanitized = sanitized
- .replace(/<script[^>]*>.*?<\/script>/gis, '')
- .replace(/<style[^>]*>.*?<\/style>/gis, '');
- } while (sanitized !== previous);
- return sanitized
- .replace(/<[^>]*>/g, ' ')
- .replace(/\s+/g, ' ')
- .trim();
+ // Prefer DOM parsing for robust removal of script/style content
+ if (typeof DOMParser !== 'undefined') {
+ const doc = new DOMParser().parseFromString(html, 'text/html');
+ doc.querySelectorAll('script, style').forEach((n) => n.remove());
+ const text = doc.body?.textContent ?? '';
+ return text.replace(/\s+/g, ' ').trim();
+ }
+ // Fallback to iterative regex removal in non-browser environments
+ let sanitized = html;
+ let previous: string;
+ do {
+ previous = sanitized;
+ sanitized = sanitized
+ .replace(/<script\b[^>]*>[\s\S]*?<\/script\s*>/gi, '')
+ .replace(/<style\b[^>]*>[\s\S]*?<\/style\s*>/gi, '');
+ } while (sanitized !== previous);
+ sanitized = sanitized.replace(/<[^>]*>/g, ' ');
+ return sanitized.replace(/\s+/g, ' ').trim();
}
After merging, please re-run CodeQL to confirm alerts 185 & 186 are resolved. 📝 Committable suggestion
Suggested change
🧰 Tools🪛 GitHub Check: CodeQL[failure] 324-325: Incomplete multi-character sanitization [failure] 325-325: Bad HTML filtering regexp 🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .replace(/<[^>]*>/g, ' ') | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .replace(/\s+/g, ' ') | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .trim(); | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+319
to
331
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Add targeted tests for tricky cases (nested, spaced closing tags, multiline). To prevent regressions and to validate the CodeQL concerns, add unit tests covering:
I can draft a small test suite (Jest/Vitest) for 🧰 Tools🪛 GitHub Check: CodeQL[failure] 324-325: Incomplete multi-character sanitization [failure] 325-325: Bad HTML filtering regexp 🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implicit any: type
previousto satisfy strict TS (noImplicitAny).let previous;infersany, violating strict TS and our guidelines. Type it explicitly.Apply this diff:
📝 Committable suggestion
🤖 Prompt for AI Agents