Conversation
Reviewer's GuideAdds a plotting pipeline under plots/ to generate all paper figures from HuggingFace datasets using shared styling utilities, and wires in dev-only dependencies for running the Snakefile workflow with uv. Class diagram for shared plotting utilities and script relationshipsclassDiagram
class Commons {
<<module>>
+OUTPUT_DIR : Path
+REPRESENTATION_MAPPING : dict
+PROPERTY_DISPLAY_MAPPING : dict
+PROPERTY_KEY_IN_MODEL_JSON : dict
+DATA_SIZE_MAPPING : dict
+GROUP_STYLES : dict
+REP_COLOR_OVERRIDES : dict
+MODEL_STYLES : dict
+NON_MATTEXT_REPS : set
+PROPERTY_COLORS : dict
+DEFAULT_COLOR : str
+TASK_REVERSE_MAP : dict
+range_frame(ax, x, y, pad)
+get_representation_group_and_color(representation)
+get_output_path(script_name, extension)
+save_figure(fig, script_name, dpi)
}
class Figure2 {
<<script>>
+download_data(repo_id, force_download)
+calculate_contributions(df)
+compute_per_dataset_values(df)
+compute_averaged_values(per_dataset)
+plot_panel_c(averaged, output_file)
+plot_appendix(per_dataset, output_file, single_row)
+main()
}
class Figure3 {
<<script>>
+compute_gcmg(alpha_loss_data_dict)
+load_data_from_hf()
+create_gcmg_plot()
}
class Figure5 {
<<script>>
+hf_to_nested_dict(repo_id, size_filter)
+plot_30k_bar_results(data_30k, properties, output_filepath)
}
class Figure6 {
<<script>>
+hf_to_data_scaling_dict(repo_id)
+hf_to_model_scaling_dict(repo_id)
+format_log_ticks_for_data_axis(value, pos)
+plot_dataset_scaling_relative_change(ax, data, property_name, show_xlabel)
+plot_model_scaling_relative_change(ax, data, property_name, show_xlabel)
+create_combined_relative_scaling_plot(data_size_data, model_size_data, output_filepath)
}
class Figure7 {
<<script>>
+SquishScale
+range_frame(ax, x, y_data_range, pad)
+normalize_property(series)
}
class AppendixRTTokenizer {
<<script>>
+range_frame(ax, x, y, pad)
+load_data_from_hf()
+calculate_percentage_improvement(rt_data, normal_data, data_sizes, representations, properties)
+create_percentage_improvement_plot()
}
class AppendixArchitecture {
<<script>>
+range_frame(ax, x, y, pad)
+download_data(repo_id, force_download)
+calculate_contributions(df)
+compute_per_dataset_values(df)
+compute_averaged_values(per_dataset)
+plot_panel_c(averaged, output_file)
+plot_appendix(per_dataset, output_file, single_row)
+main()
}
class AppendixPanelComparison {
<<script>>
+range_frame(ax, x, y, pad)
+download_data(repo_id, force_download)
+calculate_contributions(df)
+prepare_comparison_data(df)
+plot_comparison_dumbbells(comparison_data, output_file)
+main()
}
Commons <.. Figure2 : uses
Commons <.. Figure3 : uses
Commons <.. Figure5 : uses
Commons <.. Figure6 : uses
Commons <.. AppendixRTTokenizer : uses
Commons <.. AppendixArchitecture : uses
Commons <.. AppendixPanelComparison : uses
Flow diagram for Snakefile-driven figure generationflowchart TD
A_all["rule all"]
subgraph rules["Snakefile rules"]
R2["rule figure_2"]
R3["rule figure_3"]
R5["rule figure_5"]
R6["rule figure_6"]
R7["rule figure_7"]
Rrt["rule appendix_rt_tokenizer"]
Rarch["rule appendix_architecture"]
Rpanel["rule appendix_panel_comparison"]
Rclean["rule clean"]
end
subgraph scripts["Plot scripts"]
S2["figure_2.py"]
S3["figure_3.py"]
S5["figure_5.py"]
S6["figure_6.py"]
S7["figure_7.py"]
Srt["appendix_rt_tokenizer.py"]
Sarch["appendix_architecture.py"]
Spanel["appendix_panel_comparison.py"]
end
subgraph outputs["Generated artifacts"]
O2a["figure_2_panel_c.(png,pdf)"]
O2b["figure_2_appendix_2row.(png,pdf)"]
O2c["figure_2_appendix_1row.(png,pdf)"]
O3["figure_3.(png,pdf)"]
O5["figure_5.(png,pdf)"]
O6["figure_6.(png,pdf)"]
O7["figure_7.(png,pdf)"]
Ort["appendix_rt_tokenizer.(png,pdf)"]
OarchC["appendix_architecture_panel_c.(png,pdf)"]
Oarch2["appendix_architecture_2row.(png,pdf)"]
Oarch1["appendix_architecture_1row.(png,pdf)"]
Opanel["appendix_panel_comparison.(png,pdf)"]
end
A_all --> R2
A_all --> R3
A_all --> R5
A_all --> R6
A_all --> R7
A_all --> Rrt
A_all --> Rarch
A_all --> Rpanel
R2 -->|"uv run python"| S2
R3 -->|"uv run python"| S3
R5 -->|"uv run python"| S5
R6 -->|"uv run python"| S6
R7 -->|"uv run python"| S7
Rrt -->|"uv run python"| Srt
Rarch -->|"uv run python"| Sarch
Rpanel -->|"uv run python"| Spanel
S2 --> O2a
S2 --> O2b
S2 --> O2c
S3 --> O3
S5 --> O5
S6 --> O6
S7 --> O7
Srt --> Ort
Sarch --> OarchC
Sarch --> Oarch2
Sarch --> Oarch1
Spanel --> Opanel
Rclean -->|"rm outputs/*.png, *.pdf, logs/*.log"| outputs
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
📝 WalkthroughWalkthroughAdds a Snakemake workflow, a shared plotting utilities module, multiple new plotting scripts (figures and appendices) that download HF datasets and render publication figures, and minor tooling updates (.gitignore, dev dependencies). Outputs are written to a centralized outputs directory. Changes
Sequence DiagramsequenceDiagram
participant SM as Snakemake
participant FS as Figure Script
participant HF as HuggingFace Hub
participant CM as plots.commons
participant FSYS as File System
SM->>FS: Trigger figure rule (script + inputs)
FS->>HF: Download dataset (hf_hub_download / repo)
HF-->>FS: Return Parquet / dataset files
FS->>FS: Compute metrics (calculate_contributions, GCMG, scaling, normalization)
FS->>CM: Request styles & helpers (MODEL_STYLES, GROUP_STYLES, save_figure)
CM-->>FS: Return constants & save_figure
FS->>FSYS: save_figure writes PNG/PDF to outputs (via CM)
FSYS-->>FS: Confirm file paths
FS->>SM: Rule completes (outputs available)
SM->>FSYS: Log success / onerror points to logs/
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- There is substantial duplication between
figure_2.pyandappendix_architecture.py(e.g.download_data,calculate_contributions,compute_per_dataset_values,compute_averaged_values,plot_panel_c,plot_appendix); consider extracting these into shared utilities to avoid divergence and ease maintenance. GROUP_STYLES,REP_COLOR_OVERRIDES, and some mapping constants are redefined in multiple scripts (e.g.figure_6.pyvscommons.py), which can lead to inconsistencies; it would be cleaner to centralize these definitions incommons.pyand import them everywhere.- The plotting scripts mix
printandlogurulogging styles and hard-code HuggingFace repo IDs/paths inline; using a shared configuration/constant module for HF dataset IDs and standardizing on one logging approach would make the scripts easier to run in different environments and debug.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- There is substantial duplication between `figure_2.py` and `appendix_architecture.py` (e.g. `download_data`, `calculate_contributions`, `compute_per_dataset_values`, `compute_averaged_values`, `plot_panel_c`, `plot_appendix`); consider extracting these into shared utilities to avoid divergence and ease maintenance.
- `GROUP_STYLES`, `REP_COLOR_OVERRIDES`, and some mapping constants are redefined in multiple scripts (e.g. `figure_6.py` vs `commons.py`), which can lead to inconsistencies; it would be cleaner to centralize these definitions in `commons.py` and import them everywhere.
- The plotting scripts mix `print` and `loguru` logging styles and hard-code HuggingFace repo IDs/paths inline; using a shared configuration/constant module for HF dataset IDs and standardizing on one logging approach would make the scripts easier to run in different environments and debug.
## Individual Comments
### Comment 1
<location path="plots/appendix_rt_tokenizer.py" line_range="58-63" />
<code_context>
+ return rt_data, normal_data
+
+
+def calculate_percentage_improvement(
+ rt_data, normal_data, data_sizes, representations, properties
+):
+ """
+ Calculate percentage improvement: ((Normal_RMSE - RT_RMSE) / Normal_RMSE) * 100
+ Positive values indicate improvement (RT is better).
+ """
+ results = {}
</code_context>
<issue_to_address>
**issue:** The function/docstring and variable names reference RMSE but the implementation actually uses MAE.
This function uses `...['mae']['mean']` for both RT and normal tokenizers, while local names (`rt_rmse`, `normal_rmse`) and the docstring refer to RMSE. Please either rename the docstring/variables to MAE or parameterize the metric so the chosen metric (MAE vs RMSE) is explicit and consistent with the naming.
</issue_to_address>
### Comment 2
<location path="plots/Snakefile" line_range="83-84" />
<code_context>
+ commons="commons.py"
+ output:
+ FIGURE_OUTPUTS["figure_2"]
+ log:
+ "logs/figure_2.log"
+ shell:
+ "uv run python {input.script} 2>&1 | tee {log}"
</code_context>
<issue_to_address>
**issue (bug_risk):** Log files are written into a `logs/` directory that is never created in the workflow.
The `tee` command writes to `logs/...`, but nothing creates the `logs/` directory. On a clean checkout this will fail with “No such file or directory” before any figures are produced. Please ensure the directory is created, e.g. via a small `rule init_dirs` (`mkdir -p logs {OUTPUT_DIR}`) that `all`/figure rules depend on, or by prefixing each relevant shell command with `mkdir -p logs`.
</issue_to_address>
### Comment 3
<location path="plots/figure_6.py" line_range="433-434" />
<code_context>
+ final_handles = list(final_unique.values())
+ final_labels = list(final_unique.keys())
+
+ num_legend_cols = min(
+ len(final_labels), 3 if len(final_labels) > 4 else len(final_labels)
+ )
+
</code_context>
<issue_to_address>
**nitpick:** Legend column count logic is slightly counterintuitive and can over-allocate columns.
`num_legend_cols` is later used as `ncol=num_legend_cols + 1` when calling `fig.legend`, so the actual column count doesn’t match the computed value. For small label counts this can make the legend overly sparse and wide. Consider either using `ncol=num_legend_cols` directly or renaming the variable (e.g. `base_cols` vs `total_cols`) to make the extra column explicit.
</issue_to_address>
### Comment 4
<location path="plots/figure_7.py" line_range="136-146" />
<code_context>
+
+
+# Function to normalize data per property
+def normalize_property(series):
+ valid = series.dropna()
+ if len(valid) <= 1:
+ return series
+ min_val = valid.min()
+ max_val = valid.max()
+ range_val = max_val - min_val
+ if range_val == 0:
+ return series
+ return (series - min_val) / range_val
</code_context>
<issue_to_address>
**suggestion:** Properties with zero variance fall back to raw values instead of staying in [0, 1].
When `range_val == 0`, the function returns the original series, so those values are no longer in the [0, 1] space assumed elsewhere in the plot. On a normalized scale this can push flat-property bands outside the intended visual range. For consistent scaling, consider returning a constant (e.g., 0.5) or another explicit normalized value instead of the raw series in this degenerate case.
```suggestion
# Function to normalize data per property
def normalize_property(series):
valid = series.dropna()
# If there is 0 or 1 valid value, return a constant normalized value (e.g., 0.5)
if len(valid) <= 1:
return pd.Series(
0.5, index=series.index, dtype=float
).where(~series.isna(), other=pd.NA)
min_val = valid.min()
max_val = valid.max()
range_val = max_val - min_val
# If all valid values are identical, map them to a constant normalized value
if range_val == 0:
return pd.Series(
0.5, index=series.index, dtype=float
).where(~series.isna(), other=pd.NA)
return (series - min_val) / range_val
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 12
🧹 Nitpick comments (3)
plots/Snakefile (1)
11-12: Anchor the workflow to the Snakefile directory.These relative script/log/output paths only work reliably after
cd plots;snakemake -s plots/Snakefile ...from the repo root will look forfigure_2.pyandcommons.pyin the wrong place. Usingworkdir: workflow.basediror prefixing paths withworkflow.basedirremoves that cwd dependency.Also applies to: 77-177
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plots/Snakefile` around lines 11 - 12, The Snakefile currently assumes the CWD is plots (e.g., OUTPUT_DIR = "outputs" and rules that call figure_2.py and commons.py), which breaks when invoking snakemake from the repo root; to fix, anchor the workflow by adding workdir: workflow.basedir at the top of the Snakefile or by prefixing relative paths with workflow.basedir (update OUTPUT_DIR and all usages of figure_2.py, commons.py and any rule input/output patterns in the sections around OUTPUT_DIR and lines ~77-177 to use workflow.basedir + path so the scripts and output paths resolve regardless of the current working directory).plots/figure_3.py (1)
11-30: Reuse the shared group colors here.This local
GROUP_STYLESflips the compositional/geometric palette fromplots/commons.py:59-88, so category colors no longer mean the same thing across figures. Keeping the n-gram labels/styles local is fine, but the colors should still come from the shared mapping.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plots/figure_3.py` around lines 11 - 30, Replace the hard-coded color values in the local GROUP_STYLES with the shared palette by importing the shared group color mapping from the project's commons module and using it for the "color" entries (keep the existing member labels/styles unchanged); specifically, update GROUP_STYLES to pull colors for the "compositional", "geometric", and "local" keys from the shared mapping (e.g., the exported GROUP_COLORS or COMMON_GROUP_COLORS in commons) so figure colors match other plots while preserving the local n-gram label/style overrides.plots/figure_7.py (1)
128-133: Move the execution path undermain()and a module guard.Importing this module currently downloads data, builds the figure, and writes files as a side effect. The other plot scripts in this PR use explicit entrypoints; this one should too.
Also applies to: 149-265
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plots/figure_7.py` around lines 128 - 133, The top-level execution that calls load_dataset and builds/writes the figure (starting with the df = (...) assignment that invokes load_dataset("n0w0f/gnn_llm_wall", split="train") and the subsequent plotting code spanning the block around lines 149-265) must be moved into a dedicated main() function and guarded by a module check; refactor so that data loading, figure construction, and file writes are performed inside a function named main(), keep imports at module scope, and add if __name__ == "__main__": main() to prevent side effects on import.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plots/appendix_architecture.py`:
- Around line 127-167: The code adds horizontal offsets to the y-values (off is
added to comp/geom) which mutates the plotted contribution values; instead apply
the offset to the x-coordinates so y-values remain unchanged. In the loop over
averaged in plot_panel_c()/plot_appendix() use x coordinates like x0 = 0 + off
and x1 = 1 + off for ax.plot and for each ax.errorbar call pass x_pos + off (or
x0/x1) while leaving y_val and _y_std unmodified; keep offsets = {"MatText":
offset_step, "CoGN": 0.0, "MODNet": -offset_step}, style, labels, and zorder
logic the same. Also update the equivalent code region around the other
occurrence (lines ~215-251) to match this change.
- Around line 136-167: Enable the error bars in the ax.errorbar call by passing
yerr=_y_std (or the appropriate std var) instead of leaving it commented out,
and after computing comp, geom, comp_std, geom_std compute and extend all_y with
the error bar extents (e.g., comp±comp_std and geom±geom_std added to all_y) so
that range_frame() will account for the full y range; update the ax.errorbar
invocation (in the loop that currently binds x_pos, y_val, _y_std) to include
yerr and leave the other marker/label arguments as-is.
In `@plots/appendix_panel_comparison.py`:
- Around line 162-285: The code incorrectly shifts the contribution values by
adding/subtracting offset to the y coordinates (see variable offset and its use
in plotting mean_all, cogn_rep, and modnet); instead apply horizontal jitter to
the x coordinates. Change each ax.plot and ax.scatter for mean_all, cogn_rep,
and modnet to use x coordinates [0+dx, 1+dx] and x positions 0+dx and 1+dx
(where dx = +offset for mean_all, dx = -offset for cogn_rep, dx = -2*offset for
modnet) and remove the +/- offset adjustments from the y arguments (use the raw
mean_comp/mean_geom, cogn_comp/cogn_geom, modnet_comp/modnet_geom). Keep the
existing colors, linestyles, sizes, and zorders.
In `@plots/appendix_rt_tokenizer.py`:
- Around line 106-108: The current broad "except Exception as e" masks
unexpected HF schema/type bugs by turning them into None; change it to catch
only the expected missing-data errors (e.g., except (KeyError, IndexError,
ValueError) as e:) and set results[size][prop][rep] = None in that branch, and
for any other exception re-raise with context (e.g., raise RuntimeError(f"Error
calculating for {size}-{prop}-{rep}") from e) so unexpected issues aren't
silently swallowed.
- Around line 323-329: The print block at the end using total_comparisons can
divide by zero when total_comparisons == 0; update the code around the print
statements (variables total_comparisons, improvement_count, degradation_count in
appendix_rt_tokenizer.py) to check if total_comparisons is zero and, if so,
print a safe message like "No comparisons performed" or show 0% for
improvements/degradations instead of computing 100 * ... / total_comparisons;
otherwise compute and print the percentages as before.
In `@plots/commons.py`:
- Around line 90-93: REP_COLOR_OVERRIDES is currently unused so
per-representation colors like "robocrys" are ignored; update
get_representation_group_and_color to check REP_COLOR_OVERRIDES first and return
the overridden color (and appropriate group if needed) before falling back to
the existing group/color lookup logic, ensuring any callers that rely on
get_representation_group_and_color (and similar helpers referenced around the
same area) receive the override color; keep REP_COLOR_OVERRIDES as the
authoritative first check and only use the previous mapping when no override
exists.
In `@plots/figure_2.py`:
- Around line 116-157: The code is adding the horizontal offset to the y-values
(comp/geom) instead of shifting the x positions and is not passing the computed
std devs into yerr; update the plotting calls so ax.plot and ax.errorbar use x
coordinates shifted by the offset (e.g., use x_positions[x_pos] + off or add off
to x_pos) while leaving comp/geom unchanged for the y-axis, and pass the
corresponding std (_y_std) into ax.errorbar as yerr=_y_std; keep the existing
label logic (label=style["label"] if x_pos == 0 else "_nolegend_") and the same
style references (MODEL_STYLES, averaged, comp/geom, comp_std/geom_std, ax.plot,
ax.errorbar).
- Around line 17-27: The download_data function currently forces re-downloading
and omits a pinned revision; update download_data to default
force_download=False and pass an explicit immutable revision to the
hf_hub_download call (e.g., revision="<snapshot-sha-or-tag>") instead of always
forcing, and adjust callers (including main()) to stop passing True; apply the
same pattern to other dataset loaders mentioned (the load_dataset call in
figure_3.py, the load_dataset in figure_5.py, and the load_dataset in
appendix_rt_tokenizer.py) by adding a revision argument and defaulting
force_download to False so runs use cached, pinned snapshots.
In `@plots/figure_3.py`:
- Around line 116-121: The subplot titles are hardcoded to ["Dataset 1",
"Dataset 2", "Dataset 3"] causing all panels to be mislabeled; replace that
default with a call to the existing mapping function so titles reflect the
actual properties. Specifically, stop initializing property_titles with the
static list and instead set property_titles =
map_properties_to_titles(properties) (or use it when len(properties) ==
len(map_properties_to_titles(properties))); keep the fallback logic to generate
readable titles (e.g., p.replace("_", " ").title() or "Dataset N") only when
map_properties_to_titles is not available or its length doesn't match,
referencing the property_titles variable and
map_properties_to_titles(properties) function to locate the change.
In `@plots/figure_6.py`:
- Around line 127-135: The tick formatter format_log_ticks_for_data_axis is
matching wrong numeric literals (3e4, 3e5, 2e6) so dataset-size ticks are
labeled 2–3× too small; update the comparisons to check for the exact magnitudes
you want (1e4, 1e5, 1e6) or use a tolerant comparison (e.g., log10 or
numpy.isclose) to return "$10^4$", "$10^5$", "$10^6$" for those exact powers of
ten and otherwise return "" so labels reflect the true dataset sizes.
In `@plots/figure_7.py`:
- Around line 137-146: normalize_property currently returns the original series
when there is <=1 valid point or zero range which leaves raw MAE values
unnormalized; instead, for those fallback cases set all valid (non-NaN) entries
to a neutral normalized value (e.g., 0.5) and preserve NaNs so the rest of the
pipeline sees values within [0,1]; update the function normalize_property to
detect the <=1 or range_val==0 cases and return a series with the same index
where valid positions are 0.5 (as float) and NaNs unchanged.
- Around line 61-76: Implement the missing inverted() on the
InvertedSquishTransform class: add a method inverted(self) that returns a
Transform implementing the mathematical inverse of transform_non_affine (for
inputs <= self.cutoff return unchanged; for inputs > self.cutoff return
self.cutoff + (value - self.cutoff) * self.compression). Create and return a
small transform instance (e.g., a nested class or a new SquishTransform) whose
transform_non_affine implements that mapping and whose inverted() returns self
(so round-trip inversion works), and ensure input_dims/output_dims/is_separable
match the original.
---
Nitpick comments:
In `@plots/figure_3.py`:
- Around line 11-30: Replace the hard-coded color values in the local
GROUP_STYLES with the shared palette by importing the shared group color mapping
from the project's commons module and using it for the "color" entries (keep the
existing member labels/styles unchanged); specifically, update GROUP_STYLES to
pull colors for the "compositional", "geometric", and "local" keys from the
shared mapping (e.g., the exported GROUP_COLORS or COMMON_GROUP_COLORS in
commons) so figure colors match other plots while preserving the local n-gram
label/style overrides.
In `@plots/figure_7.py`:
- Around line 128-133: The top-level execution that calls load_dataset and
builds/writes the figure (starting with the df = (...) assignment that invokes
load_dataset("n0w0f/gnn_llm_wall", split="train") and the subsequent plotting
code spanning the block around lines 149-265) must be moved into a dedicated
main() function and guarded by a module check; refactor so that data loading,
figure construction, and file writes are performed inside a function named
main(), keep imports at module scope, and add if __name__ == "__main__": main()
to prevent side effects on import.
In `@plots/Snakefile`:
- Around line 11-12: The Snakefile currently assumes the CWD is plots (e.g.,
OUTPUT_DIR = "outputs" and rules that call figure_2.py and commons.py), which
breaks when invoking snakemake from the repo root; to fix, anchor the workflow
by adding workdir: workflow.basedir at the top of the Snakefile or by prefixing
relative paths with workflow.basedir (update OUTPUT_DIR and all usages of
figure_2.py, commons.py and any rule input/output patterns in the sections
around OUTPUT_DIR and lines ~77-177 to use workflow.basedir + path so the
scripts and output paths resolve regardless of the current working directory).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: afef853e-0633-43e7-81ab-a6665c379596
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (12)
.gitignoreplots/Snakefileplots/appendix_architecture.pyplots/appendix_panel_comparison.pyplots/appendix_rt_tokenizer.pyplots/commons.pyplots/figure_2.pyplots/figure_3.pyplots/figure_5.pyplots/figure_6.pyplots/figure_7.pypyproject.toml
There was a problem hiding this comment.
🧹 Nitpick comments (6)
plots/Snakefile (2)
200-207: Thecleanrule doesn't remove the sentinel file.If
.snakemake_dirs_createdis kept, thecleanrule should remove it to ensure a fresh state.🧹 Proposed fix
rule clean: shell: """ rm -rf {OUTPUT_DIR}/*.png {OUTPUT_DIR}/*.pdf rm -rf logs/*.log + rm -f .snakemake_dirs_created echo "Cleaned all outputs and logs" """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plots/Snakefile` around lines 200 - 207, The clean rule currently removes generated images and logs but omits the sentinel file; update the rule named "clean" to also remove the .snakemake_dirs_created sentinel (alongside {OUTPUT_DIR} and logs) so a full fresh state is produced; modify the shell block in rule clean to include removal of .snakemake_dirs_created (ensure path is correct relative to the Snakefile) and keep the existing echo message.
77-85: Consider makinginit_dirsan implicit dependency rather than tracking a sentinel file.The current approach with
.snakemake_dirs_createdworks but leaves an artifact. Sincecommons.pyalready createsOUTPUT_DIRon import (line 9), this rule may be partially redundant. At minimum, add the sentinel to.gitignore.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plots/Snakefile` around lines 77 - 85, The init_dirs rule currently writes a sentinel file (.snakemake_dirs_created) which leaves an artifact; instead either (A) convert the rule to declare directory outputs using Snakemake's directory() wrapper (e.g., output: directory(OUTPUT_DIR), directory("logs")) and remove creation of .snakemake_dirs_created, then make rules that require those dirs list rules.init_dirs.output as an implicit/input dependency so Snakemake ensures directories exist, or (B) if you keep the sentinel, add .snakemake_dirs_created to .gitignore; also note commons.py already creates OUTPUT_DIR on import so you can alternatively remove the rule entirely and rely on commons.py for directory creation.plots/figure_7.py (2)
84-112: Localrange_framehas intentionally different semantics.This implementation takes a conceptual
y_data_rangetuple (e.g.,[0, 1]) rather than actual y-data values, which differs fromcommons.py's version. Given the normalized data context, this design is appropriate here.Consider adding a brief docstring clarifying this difference to avoid future confusion.
📝 Suggested docstring improvement
# Define range_frame function -def range_frame(ax, x, y_data_range, pad=0.1): # y_data_range is conceptual [0,1] +def range_frame(ax, x, y_data_range, pad=0.1): + """Apply range frame styling for normalized data. + + Unlike commons.range_frame which takes raw y values, this version accepts + a conceptual y_data_range tuple (e.g., [0, 1]) for normalized MAE data. + """ # For ylim, consider the requested data range (typically 0 to 1 for normalized MAE)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plots/figure_7.py` around lines 84 - 112, Add a brief docstring to the range_frame function explaining that y_data_range is a conceptual normalized range (e.g., [0, 1]) rather than actual y-data values, describing parameters (ax, x, y_data_range, pad), what the function sets (ylim, xlim, spine positions/bounds) and explicitly noting that its semantics intentionally differ from the commons.py version to avoid confusion for future readers; reference the function name range_frame and mention y_data_range and the behavior for filtered_x/empty x so maintainers can locate the logic to be documented.
220-225: Remove unusedtext_labelvariable.
text_labelis assigned on line 220 but never used; line 225 uses the inline expression instead.🧹 Proposed fix
# Add model name label with outline for contrast and GRAY color - text_label = model.replace("-Best", "") - text = ax.text( x_pos + 0.1, y_pos,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plots/figure_7.py` around lines 220 - 225, The variable text_label is assigned but never used; update the ax.text call in the block that creates the label (where model and ax.text are used) to either use text_label instead of the inline model.replace("-Best", "") expression or remove the text_label assignment entirely to avoid the unused variable; locate the assignment to text_label and the ax.text invocation and make them consistent (use text_label in the ax.text call or delete the unused assignment).plots/appendix_rt_tokenizer.py (2)
22-30: Importrange_framefromcommons.pyinstead of redefining it locally.This local implementation diverges from the shared
range_frameincommons.py(lines 143-189):
- Uses asymmetric x-padding (
x_max + 2 * padvsx_max + pad)- Missing edge case handling when
x_min == x_max- Missing log scale support
- Missing bottom spine bounds
Using the shared version ensures consistency and avoids maintenance burden.
♻️ Proposed fix
from commons import ( DATA_SIZE_MAPPING, GROUP_STYLES, PROPERTY_DISPLAY_MAPPING, REPRESENTATION_MAPPING, get_representation_group_and_color, + range_frame, save_figure, )Then remove lines 22-30 entirely.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plots/appendix_rt_tokenizer.py` around lines 22 - 30, Remove the local range_frame definition in plots/appendix_rt_tokenizer.py and import the shared range_frame from commons.py instead; specifically delete the def range_frame(...) block and add an import for range_frame (from commons import range_frame) so the module uses the canonical implementation (which handles symmetric x-padding, x_min==x_max edge case, log-scale support, and bottom spine bounds) and update any local usages to call the imported range_frame.
167-168: Remove unused variable_subplot_height.
_subplot_heightis computed but never used. The figure height is derived fromTWO_COL_GOLDEN_RATIO_HEIGHT_INCHinstead.🧹 Proposed fix
# Calculate figure size subplot_width = TWO_COL_WIDTH / num_cols - _subplot_height = subplot_width / GOLDEN_RATIO TWO_COL_GOLDEN_RATIO_HEIGHT_INCH = TWO_COL_WIDTH / golden🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plots/appendix_rt_tokenizer.py` around lines 167 - 168, Remove the unused computed variable _subplot_height (assigned from subplot_width / GOLDEN_RATIO) since the code uses TWO_COL_GOLDEN_RATIO_HEIGHT_INCH (computed from TWO_COL_WIDTH / golden) for figure height; delete the _subplot_height assignment and ensure no remaining references to _subplot_height exist (also verify the intended constant name consistency between GOLDEN_RATIO and golden elsewhere if needed).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@plots/appendix_rt_tokenizer.py`:
- Around line 22-30: Remove the local range_frame definition in
plots/appendix_rt_tokenizer.py and import the shared range_frame from commons.py
instead; specifically delete the def range_frame(...) block and add an import
for range_frame (from commons import range_frame) so the module uses the
canonical implementation (which handles symmetric x-padding, x_min==x_max edge
case, log-scale support, and bottom spine bounds) and update any local usages to
call the imported range_frame.
- Around line 167-168: Remove the unused computed variable _subplot_height
(assigned from subplot_width / GOLDEN_RATIO) since the code uses
TWO_COL_GOLDEN_RATIO_HEIGHT_INCH (computed from TWO_COL_WIDTH / golden) for
figure height; delete the _subplot_height assignment and ensure no remaining
references to _subplot_height exist (also verify the intended constant name
consistency between GOLDEN_RATIO and golden elsewhere if needed).
In `@plots/figure_7.py`:
- Around line 84-112: Add a brief docstring to the range_frame function
explaining that y_data_range is a conceptual normalized range (e.g., [0, 1])
rather than actual y-data values, describing parameters (ax, x, y_data_range,
pad), what the function sets (ylim, xlim, spine positions/bounds) and explicitly
noting that its semantics intentionally differ from the commons.py version to
avoid confusion for future readers; reference the function name range_frame and
mention y_data_range and the behavior for filtered_x/empty x so maintainers can
locate the logic to be documented.
- Around line 220-225: The variable text_label is assigned but never used;
update the ax.text call in the block that creates the label (where model and
ax.text are used) to either use text_label instead of the inline
model.replace("-Best", "") expression or remove the text_label assignment
entirely to avoid the unused variable; locate the assignment to text_label and
the ax.text invocation and make them consistent (use text_label in the ax.text
call or delete the unused assignment).
In `@plots/Snakefile`:
- Around line 200-207: The clean rule currently removes generated images and
logs but omits the sentinel file; update the rule named "clean" to also remove
the .snakemake_dirs_created sentinel (alongside {OUTPUT_DIR} and logs) so a full
fresh state is produced; modify the shell block in rule clean to include removal
of .snakemake_dirs_created (ensure path is correct relative to the Snakefile)
and keep the existing echo message.
- Around line 77-85: The init_dirs rule currently writes a sentinel file
(.snakemake_dirs_created) which leaves an artifact; instead either (A) convert
the rule to declare directory outputs using Snakemake's directory() wrapper
(e.g., output: directory(OUTPUT_DIR), directory("logs")) and remove creation of
.snakemake_dirs_created, then make rules that require those dirs list
rules.init_dirs.output as an implicit/input dependency so Snakemake ensures
directories exist, or (B) if you keep the sentinel, add .snakemake_dirs_created
to .gitignore; also note commons.py already creates OUTPUT_DIR on import so you
can alternatively remove the rule entirely and rely on commons.py for directory
creation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d7792ff5-3eee-4ece-b8cf-7bdce431ba77
📒 Files selected for processing (4)
.gitignoreplots/Snakefileplots/appendix_rt_tokenizer.pyplots/figure_7.py
🚧 Files skipped from review as they are similar to previous changes (1)
- .gitignore
Summary by Sourcery
Add plotting utilities and workflows for generating MatText publication figures from HuggingFace datasets.
New Features:
Build:
Summary by CodeRabbit
New Features
Chores