fix(recommandation): Fix chart recommandation#39886
fix(recommandation): Fix chart recommandation#39886alexandrusoare wants to merge 2 commits intomasterfrom
Conversation
Code Review Agent Run #04a47bActionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #39886 +/- ##
==========================================
- Coverage 64.35% 64.35% -0.01%
==========================================
Files 2569 2569
Lines 134680 134750 +70
Branches 31254 31272 +18
==========================================
+ Hits 86679 86718 +39
- Misses 46505 46534 +29
- Partials 1496 1498 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Code Review Agent Run #b70632Actionable Suggestions - 0Additional Suggestions - 1
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
| temporal = [c for c in columns if c.data_type == "temporal"] | ||
| numeric = [c for c in columns if c.data_type == "numeric"] | ||
| categorical = [c for c in columns if c.data_type in ("string", "boolean")] |
There was a problem hiding this comment.
This can probably be one loop right? Would that be better for the performance overhead?
| if temporal and numeric: | ||
| return _candidates_temporal_numeric(numeric, row_count) | ||
| if categorical and numeric: | ||
| return _candidates_categorical_numeric(numeric, categorical) | ||
| if len(numeric) >= 2: | ||
| return _candidates_multi_numeric(numeric, categorical) | ||
| if len(numeric) == 1 and not temporal and not categorical: | ||
| return _candidates_single_numeric(numeric[0], row_count) | ||
| return [] |
There was a problem hiding this comment.
Should this be a single function call with multiple arguments instead?
| def _candidates_temporal_numeric( | ||
| numeric: list[DataColumn], row_count: int | ||
| ) -> list[str]: | ||
| # Few data points are better as a bar chart than a line | ||
| if row_count < 5: | ||
| candidates = ["bar chart", "table"] | ||
| else: | ||
| candidates = ["line chart", "area chart", "bar chart"] | ||
| if len(numeric) > 1: | ||
| candidates.append("multi-line chart") | ||
| return candidates | ||
|
|
||
|
|
||
| def _candidates_categorical_numeric( | ||
| numeric: list[DataColumn], | ||
| categorical: list[DataColumn], | ||
| ) -> list[str]: | ||
| candidates = ["bar chart"] | ||
| if len(numeric) == 1 and categorical[0].unique_count <= 10: | ||
| candidates.append("pie chart") | ||
| if len(numeric) >= 2: | ||
| candidates.append("scatter plot") | ||
| candidates.append("heatmap") | ||
| if any(c.unique_count > 5 for c in categorical): | ||
| candidates.append("treemap") | ||
| return candidates | ||
|
|
||
|
|
||
| def _candidates_single_numeric(col: DataColumn, row_count: int) -> list[str]: | ||
| candidates = ["big number / KPI", "gauge chart"] | ||
| if row_count > 20 and col.unique_count > 10: | ||
| candidates.insert(0, "histogram") | ||
| return candidates | ||
|
|
||
|
|
||
| def _candidates_multi_numeric( | ||
| numeric: list[DataColumn], | ||
| categorical: list[DataColumn], | ||
| ) -> list[str]: | ||
| candidates = ["scatter plot"] | ||
| if len(numeric) >= 3: | ||
| candidates.append("bubble chart") | ||
| if categorical: | ||
| candidates.append("heatmap") | ||
| return candidates |
There was a problem hiding this comment.
It looks a bit clearer with multiple functions though, not sure
| if all(isinstance(v, (int, float)) for v in sample_values): | ||
| data_type = "numeric" | ||
| elif all(isinstance(v, bool) for v in sample_values): | ||
| if idx < len(coltypes): |
| from superset.mcp_service.chart.tool.get_chart_data import _GENERIC_TYPE_MAP | ||
| from superset.utils.core import GenericDataType |
SUMMARY
recommended_visualizationsinget_chart_dataresponses are based purely on column count and column name string matching. A table chart with 2 columns gets "scatter plot" recommended; any column with "time" in the name triggers "line chart" even if it's not actually temporal.Root Cause
Two problems:
DataColumn.data_typeinference uses a Pythonisinstanceheuristic that checks forint/float/bool. Datetime values from SQL arrive as strings, so temporal columns are always classified as"string". Meanwhile, the query result already containscoltypes— a list ofGenericDataTypeenum values (NUMERIC, STRING, TEMPORAL, BOOLEAN) derived from actual SQL types viaextract_dataframe_dtypes()— but this field was never read.chart.viz_type(so it recommends the same type you already have), actual data types, or column cardinality (so high-cardinality ID columns trigger "scatter plot").Fix
coltypesfrom the query result and use it to populateDataColumn.data_typewith accurate SQL-derived types (falling back to the isinstance heuristic whencoltypesis unavailable)_recommend_visualizations(viz_type, columns, row_count)which:BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION