Skip to content

fix(plugin-chart-ag-grid-table): use display text for filter and sort on HTML cells#39885

Open
fitzee wants to merge 3 commits intoapache:masterfrom
fitzee:fitzee/aggrid-html-text-filter
Open

fix(plugin-chart-ag-grid-table): use display text for filter and sort on HTML cells#39885
fitzee wants to merge 3 commits intoapache:masterfrom
fitzee:fitzee/aggrid-html-text-filter

Conversation

@fitzee
Copy link
Copy Markdown

@fitzee fitzee commented May 5, 2026

SUMMARY

The Table V2 chart's TextCellRenderer renders cells containing HTML markup (e.g. anchor tags) via dangerouslySetInnerHTML, but the column definition's filterValueGetter and comparator are not configured for string columns. Both filter and sort therefore operate on the raw HTML string instead of the visible text:

  • Sort: Rows are ordered by raw HTML lexically. The URL inside the markup dictates ordering instead of the visible label — e.g. all rows whose anchor href starts with https://aaa/... sort before rows with https://bbb/... regardless of what label is shown to the user.
  • "Contains" filter: Matches against the raw HTML string. Typing a fragment of the visible label may miss; typing a URL substring (e.g. https) matches every HTML row.

This change adds a small htmlTextFilterValueGetter utility (with a matching htmlTextComparator) that extracts the visible text from HTML values via DOMParser + sanitizeHtml, and wires it into the colDef for string columns. Pass-through for non-HTML values, so plain string columns are unaffected — the comparator deliberately matches AG Grid's default codepoint ordering ('Z' before 'a') so plain string columns behave identically to before.

Scope: Client-side sort and filter only. When serverPagination is enabled, the existing comparator: () => 0 override delegates sorting to the backend; the database does not know to extract visible text from HTML, so server-paginated tables with HTML columns would need server-side handling and are out of scope for this PR.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Not applicable — the bug is observable via sort order rather than a visually distinct rendering.

TESTING INSTRUCTIONS

  1. Enable the AG_GRID_TABLE_ENABLED feature flag.
  2. Create a Table V2 chart against a dataset where one string column contains HTML, e.g. via a calculated column: '<a href="https://example.com/' || id || '">label_' || id || '</a>'
  3. Before this fix: clicking the column header sorts by URL prefix, not the visible label. Typing the label text into the column's "Contains" filter may not match.
  4. After this fix: sort matches the visible label order; the "Contains" filter matches the visible label.

Unit tests added in htmlTextFilterValueGetter.test.ts cover HTML extraction (anchor and nested markup), pass-through for plain strings, nulls, and numbers, plus the comparator's ordering, null handling, and preservation of AG Grid's default codepoint ordering for plain strings.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags: AG_GRID_TABLE_ENABLED
  • Changes UI
  • Includes DB Migration
  • Introduces new feature or API
  • Removes existing feature or API

@richardfogaca richardfogaca self-requested a review May 5, 2026 18:19
Copy link
Copy Markdown
Contributor

@richardfogaca richardfogaca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting on Richard's behalf — this is his PR reviewer agent. Forward any pushback to him and he'll loop me back in.

Left two functional notes below. The plain-string comparator change looks like the main thing to check; the server-pagination note may just need an explicit scope/test decision. All line numbers verified against HEAD df794f79aa.

Functional — worth investigating before merge

  • superset-frontend/plugins/plugin-chart-ag-grid-table/src/utils/htmlTextFilterValueGetter.ts:49

    localeCompare now runs for every GenericDataType.String column, not just the HTML cases. That does not preserve the previous AG Grid default string ordering for plain values; for example, simple JS comparison orders Z < a, while Z.localeCompare('a') returns the opposite order in the default runtime locale.

    WDYT — could we normalize HTML values to visible text but keep AG Grid/default comparator semantics for the final comparison, so plain string columns stay unaffected as the PR description says?

  • superset-frontend/plugins/plugin-chart-ag-grid-table/src/utils/useColDefs.ts:384

    In server-pagination mode this later spread replaces the string-column htmlTextComparator from line 328 with () => 0, so server-paginated Table V2 charts will still delegate sorting to the server/raw column value path. That seems like it can leave the HTML visible-text sort gap open for the same kind of calculated HTML string column when server pagination is enabled.

    WDYT — should this mode be supported too, or should the PR make the client-side-only scope explicit and cover that expectation in tests/docs?

Praise

  • superset-frontend/plugins/plugin-chart-ag-grid-table/src/utils/htmlTextFilterValueGetter.test.ts:30

    Nice focused coverage around visible-text extraction, nested markup, pass-through values, null handling, and comparator ordering. It makes the intended behavior of the new utility easy to reason about.

@netlify
Copy link
Copy Markdown

netlify Bot commented May 5, 2026

Deploy Preview for superset-docs-preview ready!

Name Link
🔨 Latest commit 517dd90
🔍 Latest deploy log https://app.netlify.com/projects/superset-docs-preview/deploys/69fa8c9087ce6600083f45dc
😎 Deploy Preview https://deploy-preview-39885--superset-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@fitzee
Copy link
Copy Markdown
Author

fitzee commented May 5, 2026

Thanks for the review (and please pass thanks to the bot wrangler). Both points addressed in 8eab7e9:

1. Comparator preserving default semantics — Replaced localeCompare() with codepoint comparison (< / >), matching AG Grid's defaultComparator behavior. Plain (non-HTML) string columns now sort identically to before this PR — e.g. 'Z' still sorts before 'a'. Added htmlTextComparator preserves default codepoint ordering for plain strings to the test file to lock that in.

2. Server-pagination scope — Agreed: server-paginated tables delegate sort to the backend (via the existing comparator: () => 0 override), and the database has no way to extract visible text from an HTML column. Rather than expand scope into server-side handling, I've documented the limitation with a comment in useColDefs.ts next to the new branch and made the client-side scope explicit in the PR description. Server-side normalization for HTML columns would be a separate, larger change.

@fitzee fitzee marked this pull request as ready for review May 6, 2026 00:13
@dosubot dosubot Bot added the viz:charts:table Related to the Table chart label May 6, 2026
Copy link
Copy Markdown
Contributor

@richardfogaca richardfogaca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, congrats on your first superset PR @fitzee 💯

@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 6, 2026

Code Review Agent Run #31e869

Actionable Suggestions - 0
Review Details
  • Files reviewed - 3 · Commit Range: df794f7..8eab7e9
    • superset-frontend/plugins/plugin-chart-ag-grid-table/src/utils/htmlTextFilterValueGetter.test.ts
    • superset-frontend/plugins/plugin-chart-ag-grid-table/src/utils/htmlTextFilterValueGetter.ts
    • superset-frontend/plugins/plugin-chart-ag-grid-table/src/utils/useColDefs.ts
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • Eslint (Linter) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.37%. Comparing base (d618837) to head (8eab7e9).
⚠️ Report is 9 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #39885   +/-   ##
=======================================
  Coverage   64.37%   64.37%           
=======================================
  Files        2569     2570    +1     
  Lines      134745   134765   +20     
  Branches    31278    31288   +10     
=======================================
+ Hits        86737    86757   +20     
  Misses      46510    46510           
  Partials     1498     1498           
Flag Coverage Δ
javascript 66.63% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

fitzee added 2 commits May 6, 2026 10:34
… on HTML cells

The Table V2 chart's TextCellRenderer renders cells containing HTML
markup (e.g. anchor tags) via dangerouslySetInnerHTML, but the column's
filterValueGetter and comparator were not configured for string columns.
As a result both filter and sort operated on the raw HTML string instead
of the visible text:

- Column sort ordered rows by raw HTML lexically, so the URL inside the
  markup dictated ordering instead of the visible label (e.g. all rows
  with a `https://aaa/...` URL sorted before rows with `https://bbb/...`
  regardless of the label shown to the user).
- The text "Contains" filter matched against the raw HTML, causing
  false positives on URL fragments and false negatives when typing the
  visible label.

Add a small `htmlTextFilterValueGetter` utility (with a matching
`htmlTextComparator`) that extracts visible text from HTML values via
DOMParser + sanitizeHtml, and wire it into the colDef for string
columns. Pass-through for non-HTML values.
… document client-side scope

Address review feedback:

- `htmlTextComparator` now uses codepoint comparison (`<` / `>`) instead
  of `localeCompare`, so it matches AG Grid's default string comparator
  semantics for plain (non-HTML) values. Locale-aware comparison flipped
  the order of e.g. `'Z'` vs `'a'`, which would have changed sort
  behavior for plain string columns the PR description claims are
  unaffected. Added a test asserting the default ordering is preserved.

- Added a comment in `useColDefs.ts` documenting the client-side scope:
  when `serverPagination` is enabled, sorting is delegated to the server
  via the existing `comparator: () => 0` override, so HTML normalization
  does not apply. Server-paginated tables with HTML columns are out of
  scope for this fix and would require server-side handling.
@fitzee fitzee force-pushed the fitzee/aggrid-html-text-filter branch from 8eab7e9 to 517dd90 Compare May 6, 2026 00:34
Comment on lines +323 to +335
...(dataType === GenericDataType.String && {
// HTML cells (e.g. anchor markup) are rendered by TextCellRenderer
// via dangerouslySetInnerHTML; without these the filter and sort
// operate on raw HTML so the URL inside the markup dictates order
// and the "Contains" filter matches against the raw HTML string.
//
// Scope: client-side only. When `serverPagination` is enabled, a
// later spread overrides `comparator` with `() => 0` so sorting is
// delegated to the server; the database does not know to extract
// visible text from HTML, so server-paginated tables with HTML
// columns are out of scope for this fix.
filterValueGetter: htmlTextFilterValueGetter,
comparator: htmlTextComparator,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Architect Review — HIGH

When serverPagination is enabled, string columns still use htmlTextFilterValueGetter on the client while server-side filtering continues to operate on raw HTML values, so client-side filter evaluation is based on visible text but the backend query is based on the unstripped HTML, leading to inconsistent or empty results for the same filter (e.g. HTML with nested tags where the visible text is not a contiguous substring of the raw HTML).

Suggestion: Gate htmlTextFilterValueGetter behind client-side mode for string columns when serverPagination is true, or introduce equivalent HTML-to-text normalization in the server-side filter conversion so both client and server evaluate filters against the same text value.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is an **Architect / Logical Review** comment left during a code review. These reviews are first-class, important findings — not optional suggestions. Do NOT dismiss this as a 'big architectural change' just because the title says architect review; most of these can be resolved with a small, localized fix once the intent is understood.

**Path:** superset-frontend/plugins/plugin-chart-ag-grid-table/src/utils/useColDefs.ts
**Line:** 323:335
**Comment:**
	*HIGH: When serverPagination is enabled, string columns still use htmlTextFilterValueGetter on the client while server-side filtering continues to operate on raw HTML values, so client-side filter evaluation is based on visible text but the backend query is based on the unstripped HTML, leading to inconsistent or empty results for the same filter (e.g. HTML with nested tags where the visible text is not a contiguous substring of the raw HTML).

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
If a suggested approach is provided above, use it as the authoritative instruction. If no explicit code suggestion is given, you MUST still draft and apply your own minimal, localized fix — do not punt back with 'no suggestion provided, review manually'. Keep the change as small as possible: add a guard clause, gate on a loading state, reorder an await, wrap in a conditional, etc. Do not refactor surrounding code or expand scope beyond the finding.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

Comment on lines +22 to +25
const stripHtmlToText = (html: string): string => {
const doc = new DOMParser().parseFromString(sanitizeHtml(html), 'text/html');
return (doc.body.textContent || '').trim();
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The comparator sanitizes and parses HTML during every comparison call, and sorting invokes the comparator many times (O(n log n)), which can cause major UI slowdowns on larger tables. Precompute/cache the extracted text once per row value (for example via value getter or memoization) and compare cached text instead of reparsing inside comparator hot path. [performance]

Severity Level: Major ⚠️
- ⚠️ Sorting large HTML string columns can lag noticeably.
- ⚠️ Comparator does DOMParser work on every comparison call.
Steps of Reproduction ✅
1. Create a Table V2 (AG Grid) chart via `TableChart` in
`superset-frontend/plugins/plugin-chart-ag-grid-table/src/AgGridTableChart.tsx:55-89`,
with a string dimension whose values contain HTML (e.g., anchor tags rendered by
`TextCellRenderer`), and a large row count (thousands of records).

2. `TableChart` builds column definitions using `useColDefs`
(`superset-frontend/plugins/plugin-chart-ag-grid-table/src/utils/useColDefs.ts:215-311`);
for string columns (`dataType === GenericDataType.String`) it sets `filterValueGetter:
htmlTextFilterValueGetter` and `comparator: htmlTextComparator` on the colDef at lines
44-57.

3. When the user clicks that column's header to sort, AG Grid (via `ThemedAgGridReact`,
imported in `AgGridTableChart.tsx:29-32`) repeatedly invokes `htmlTextComparator` from
`htmlTextFilterValueGetter.ts:47-57` as part of its `O(n log n)` sort, passing the raw
cell values for many pairwise comparisons.

4. For each comparison where the value looks like HTML, `htmlTextComparator` calls
`stripHtmlToText` (`htmlTextFilterValueGetter.ts:22-25`), which runs `sanitizeHtml(html)`
and `new DOMParser().parseFromString(..., 'text/html')`. With thousands of HTML cells,
this DOM parsing and sanitization happens thousands of times in a hot sort loop, which can
realistically cause noticeable UI sluggishness while sorting large HTML-heavy tables.
Precomputing or caching the stripped text per row would avoid this repeated work.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset-frontend/plugins/plugin-chart-ag-grid-table/src/utils/htmlTextFilterValueGetter.ts
**Line:** 22:25
**Comment:**
	*Performance: The comparator sanitizes and parses HTML during every comparison call, and sorting invokes the comparator many times (`O(n log n)`), which can cause major UI slowdowns on larger tables. Precompute/cache the extracted text once per row value (for example via value getter or memoization) and compare cached text instead of reparsing inside comparator hot path.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

@bito-code-review
Copy link
Copy Markdown
Contributor

The flagged issue is correct: when serverPagination is enabled, client-side filtering uses htmlTextFilterValueGetter to strip HTML for visible text, but server-side filtering operates on raw HTML, causing inconsistent results. To resolve, gate the htmlTextFilterValueGetter and htmlTextComparator behind client-side mode by adding !serverPagination to the condition.

superset-frontend/plugins/plugin-chart-ag-grid-table/src/utils/useColDefs.ts

...(dataType === GenericDataType.String && !serverPagination && {
          // HTML cells (e.g. anchor markup) are rendered by TextCellRenderer
          // via dangerouslySetInnerHTML; without these the filter and sort
          // operate on raw HTML so the URL inside the markup dictates order
          // and the "Contains" filter matches against the raw HTML string.
          //
          // Scope: client-side only. When `serverPagination` is enabled,
          // filtering and sorting are delegated to the server; the database
          // does not know to extract visible text from HTML, so this is gated
          // to avoid inconsistency between client and server evaluation.
          filterValueGetter: htmlTextFilterValueGetter,
          comparator: htmlTextComparator,
        }),

…lient-side mode

Address two bot review comments:

- **Performance:** `stripHtmlToText` ran `DOMParser` + `sanitizeHtml` on
  every comparator call. Sorts invoke the comparator O(n log n) times,
  which adds up on large tables. Cache extracted text by raw HTML string
  in a module-level Map; bounded by the number of unique HTML values in
  the dataset.

- **Server-pagination consistency:** Gate the string-column branch on
  `!serverPagination`. In server-pagination mode, the typed filter value
  is converted into a backend query that operates on the raw HTML stored
  in the database; stripping HTML client-side via `filterValueGetter`
  caused a mismatch where the typed text was searched against visible
  labels client-side but raw HTML server-side, leading to false
  positives or empty results. Server-paginated tables with HTML columns
  remain out of scope (they revert to pre-PR behavior — same backend
  query path as before, no inconsistency).

Sort in server-pagination mode is already delegated to the server via
the existing `comparator: () => 0` override, so removing the
`htmlTextComparator` here loses nothing in that mode.
@fitzee
Copy link
Copy Markdown
Author

fitzee commented May 6, 2026

Thanks for the bot review (and human reviewers — your patience). Both points addressed in the latest push (d395ea6109 → rebased to ec56d0038b):

1. Performance — htmlTextFilterValueGetter.ts — Memoized stripHtmlToText with a module-level Map<string, string> cache. Sort comparators run O(n log n) times against the same set of cell values, so reparsing on every call was a measurable cost on large tables. Cache is bounded by the number of unique HTML strings in the dataset.

2. Server-pagination consistency — useColDefs.ts — Gated the string-column branch on !serverPagination. In server-pagination mode AG Grid converts the typed filter value into a backend query that operates on the raw HTML stored in the database; stripping HTML client-side via filterValueGetter was creating exactly the inconsistency you flagged (typed text searched against visible labels client-side but raw HTML server-side). With the gate, server-paginated tables now behave as they did pre-PR — raw HTML semantics, consistent between client and server. Server-side HTML normalization for paginated tables would be a separate, larger change.

The existing tests still pass; the change is a strict-superset of the previous behavior in client-side mode.

@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 6, 2026

Code Review Agent Run #718574

Actionable Suggestions - 0
Review Details
  • Files reviewed - 3 · Commit Range: 58a40e8..ec56d00
    • superset-frontend/plugins/plugin-chart-ag-grid-table/src/utils/htmlTextFilterValueGetter.test.ts
    • superset-frontend/plugins/plugin-chart-ag-grid-table/src/utils/htmlTextFilterValueGetter.ts
    • superset-frontend/plugins/plugin-chart-ag-grid-table/src/utils/useColDefs.ts
  • Files skipped - 0
  • Tools
    • Eslint (Linter) - ✔︎ Successful
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

plugins size/L viz:charts:table Related to the Table chart

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants