fix(mcp): get_chart_sql drops x_axis on echarts_timeseries_* and only renders one query for mixed_timeseries#39865
Conversation
… renders one query for mixed_timeseries - Add `_extract_x_axis_col` helper to normalise the x_axis field (plain string or adhoc column dict) into a bare column name. - In `_build_query_context_from_form_data`, detect echarts_timeseries_* and mixed_timeseries viz types and prepend the x_axis column to the query's columns list so the temporal dimension appears in the generated SQL (was previously dropped because x_axis is stored separately from groupby in form_data). - For mixed_timeseries, build a second query dict from metrics_b / groupby_b and pass queries=[primary, secondary] to QueryContextFactory. Previously only the primary query was built, so the secondary series SQL was silently lost. - Add TestExtractXAxisCol and TestBuildQueryContextTimeseriesAndMixed test classes covering both bug fixes.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #39865 +/- ##
=======================================
Coverage 64.35% 64.36%
=======================================
Files 2569 2569
Lines 134680 134713 +33
Branches 31254 31262 +8
=======================================
+ Hits 86679 86710 +31
Misses 46505 46505
- Partials 1496 1498 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR fixes regressions in the MCP get_chart_sql tool’s query-context construction so that SQL rendering matches how Superset Explore builds queries for ECharts timeseries and mixed-timeseries charts.
Changes:
- Add
x_axisextraction and prepend it to query columns forecharts_timeseries_*andmixed_timeseriesso the temporal axis is not dropped from rendered SQL. - Build and pass two query dicts for
mixed_timeseries(primary + secondary) so both series layers’ SQL is returned. - Improve chart schema validation ergonomics (relax column-name regex constraints, add richer validation error formatting) and expand unit test coverage around these behaviors.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
superset/mcp_service/chart/tool/get_chart_sql.py |
Adds helpers to include x_axis in query columns and emits 2 queries for mixed_timeseries when building QueryContextFactory input. |
superset/mcp_service/chart/validation/schema_validator.py |
Formats Pydantic validation errors into more actionable per-field details and deduped suggestions. |
superset/mcp_service/chart/schemas.py |
Relaxes strict regex constraints on column-like fields (e.g., digit-prefixed / locale chars). |
superset/mcp_service/chart/tool/generate_chart.py |
Expands tool docstring with additional request examples for several chart types. |
tests/unit_tests/mcp_service/chart/tool/test_get_chart_sql.py |
Adds unit tests for _extract_x_axis_col and for correct query building for ECharts timeseries + mixed timeseries. |
tests/unit_tests/mcp_service/chart/test_chart_schemas.py |
Adds tests verifying relaxed column-name acceptance and continued blocking of obvious XSS/SQL-injection attempts (where applicable). |
Comments suppressed due to low confidence (1)
superset/mcp_service/chart/schemas.py:785
- FilterConfig.column removed the regex constraint, but sanitize_column still calls sanitize_user_input() without check_sql_keywords=True. As a result, values containing SQL metacharacters/comments (e.g. ';', '--', '/*') and dangerous SQL keywords are no longer rejected at validation time, which weakens the injection protections compared to the previous regex. Consider enabling check_sql_keywords=True (similar to ColumnRef.sanitize_name) so relaxed column naming stays compatible with locale chars while still blocking SQL injection patterns.
column: str = Field(
...,
min_length=1,
max_length=255,
# No regex pattern: sanitize_name() already blocks XSS/SQL injection;
# many valid column names (digit-prefixed, locale chars, etc.) would
# be rejected by a strict pattern while posing no security risk.
# Use get_dataset_info to find exact column names.
validation_alias=AliasChoices("column", "col"),
)
op: Literal[
"=",
">",
"<",
">=",
"<=",
"!=",
"LIKE",
"ILIKE",
"NOT LIKE",
"IN",
"NOT IN",
] = Field(
...,
description="LIKE/ILIKE use % wildcards. IN/NOT IN take a list.",
validation_alias=AliasChoices("op", "operator", "opr"),
)
value: str | int | float | bool | list[str | int | float | bool] = Field(
...,
description="For IN/NOT IN, provide a list.",
validation_alias=AliasChoices("value", "val"),
)
@field_validator("column")
@classmethod
def sanitize_column(cls, v: str) -> str:
"""Sanitize filter column name to prevent injection attacks."""
# sanitize_user_input raises ValueError when allow_empty=False (default)
# so the return value is guaranteed to be a non-None str
return sanitize_user_input(v, "Filter column", max_length=255) # type: ignore[return-value]
…mplexity - _extract_x_axis_col: also handle adhoc-SQL x_axis dicts (expressionType: "SQL") by returning the full dict so the SQL expression is preserved in the query context; column_name dicts (expressionType: "SIMPLE") still return the bare column name string - extract _build_single_query_dict and _build_mixed_timeseries_secondary as module-level helpers to reduce McCabe complexity of _build_query_context_from_form_data below the C901 limit - add tests for SQL-expression x_axis: _extract_x_axis_col unit tests and an integration test through _build_query_context_from_form_data - fix test docstring: regressions introduced after apache#38700, not apache#39636
afddf60 to
213fffd
Compare
Summary
Fixes two bugs in
get_chart_sqlintroduced after #38700:Bug 1 — x_axis dropped for
echarts_timeseries_*:For XY/timeseries charts the temporal column is stored in
form_data["x_axis"], not ingroupby. When_build_query_context_from_form_databuilt the query dict it only usedgroupby, so the time axis column was silently missing from the generated SQL.Bug 2 —
mixed_timeseriesrenders only one query:mixed_timeserieshas two independent series layers (primary:metrics/groupby, secondary:metrics_b/groupby_b). Previously only the primary query was built and passed toQueryContextFactory, soget_chart_sqlreturned SQL for only half the chart.Changes
_extract_x_axis_col: new helper that normalisesx_axis(plain string or adhoc column dict) to a bare column name._build_single_query_dict: extracted from an inner function so complexity stays within the C901 limit._build_mixed_timeseries_secondary: new helper that builds the secondary query dict frommetrics_b/groupby_b._build_query_context_from_form_data: detectsecharts_timeseries_*andmixed_timeseriesviz types; prependsx_axisto columns; passesqueries=[primary, secondary]formixed_timeseries.TestExtractXAxisColandTestBuildQueryContextTimeseriesAndMixedtest classes.Testing