feat: add plots scripts by n0w0f · Pull Request #109 · lamalab-org/MatText

n0w0f · 2026-03-11T01:38:21Z

Summary by Sourcery

Add plotting utilities and workflows for generating MatText publication figures from HuggingFace datasets.

New Features:

Introduce multiple standalone plotting scripts to generate main and appendix figures for the MatText paper using HuggingFace-hosted results.
Add a shared plotting commons module with styling, mappings, and helper utilities for saving figures.
Provide a Snakemake workflow to orchestrate generation and cleanup of all figures and associated logs.

Build:

Define a dev dependency group including plotting and workflow-related packages such as lama-aesthetics, snakemake, and pulp.

Summary by CodeRabbit

New Features
- Added an automated workflow to generate all publication figures and a suite of new, styled visualization scripts that compute statistics and save publication-ready PNG/PDF outputs.
Chores
- Expanded .gitignore to exclude workflow-generated files/directories.
- Added development dependencies for workflow automation and advanced plotting.

sourcery-ai · 2026-03-11T01:38:36Z

Reviewer's Guide

Adds a plotting pipeline under plots/ to generate all paper figures from HuggingFace datasets using shared styling utilities, and wires in dev-only dependencies for running the Snakefile workflow with uv.

Class diagram for shared plotting utilities and script relationships

classDiagram
    class Commons {
        <<module>>
        +OUTPUT_DIR : Path
        +REPRESENTATION_MAPPING : dict
        +PROPERTY_DISPLAY_MAPPING : dict
        +PROPERTY_KEY_IN_MODEL_JSON : dict
        +DATA_SIZE_MAPPING : dict
        +GROUP_STYLES : dict
        +REP_COLOR_OVERRIDES : dict
        +MODEL_STYLES : dict
        +NON_MATTEXT_REPS : set
        +PROPERTY_COLORS : dict
        +DEFAULT_COLOR : str
        +TASK_REVERSE_MAP : dict
        +range_frame(ax, x, y, pad)
        +get_representation_group_and_color(representation)
        +get_output_path(script_name, extension)
        +save_figure(fig, script_name, dpi)
    }

    class Figure2 {
        <<script>>
        +download_data(repo_id, force_download)
        +calculate_contributions(df)
        +compute_per_dataset_values(df)
        +compute_averaged_values(per_dataset)
        +plot_panel_c(averaged, output_file)
        +plot_appendix(per_dataset, output_file, single_row)
        +main()
    }

    class Figure3 {
        <<script>>
        +compute_gcmg(alpha_loss_data_dict)
        +load_data_from_hf()
        +create_gcmg_plot()
    }

    class Figure5 {
        <<script>>
        +hf_to_nested_dict(repo_id, size_filter)
        +plot_30k_bar_results(data_30k, properties, output_filepath)
    }

    class Figure6 {
        <<script>>
        +hf_to_data_scaling_dict(repo_id)
        +hf_to_model_scaling_dict(repo_id)
        +format_log_ticks_for_data_axis(value, pos)
        +plot_dataset_scaling_relative_change(ax, data, property_name, show_xlabel)
        +plot_model_scaling_relative_change(ax, data, property_name, show_xlabel)
        +create_combined_relative_scaling_plot(data_size_data, model_size_data, output_filepath)
    }

    class Figure7 {
        <<script>>
        +SquishScale
        +range_frame(ax, x, y_data_range, pad)
        +normalize_property(series)
    }

    class AppendixRTTokenizer {
        <<script>>
        +range_frame(ax, x, y, pad)
        +load_data_from_hf()
        +calculate_percentage_improvement(rt_data, normal_data, data_sizes, representations, properties)
        +create_percentage_improvement_plot()
    }

    class AppendixArchitecture {
        <<script>>
        +range_frame(ax, x, y, pad)
        +download_data(repo_id, force_download)
        +calculate_contributions(df)
        +compute_per_dataset_values(df)
        +compute_averaged_values(per_dataset)
        +plot_panel_c(averaged, output_file)
        +plot_appendix(per_dataset, output_file, single_row)
        +main()
    }

    class AppendixPanelComparison {
        <<script>>
        +range_frame(ax, x, y, pad)
        +download_data(repo_id, force_download)
        +calculate_contributions(df)
        +prepare_comparison_data(df)
        +plot_comparison_dumbbells(comparison_data, output_file)
        +main()
    }

    Commons <.. Figure2 : uses
    Commons <.. Figure3 : uses
    Commons <.. Figure5 : uses
    Commons <.. Figure6 : uses
    Commons <.. AppendixRTTokenizer : uses
    Commons <.. AppendixArchitecture : uses
    Commons <.. AppendixPanelComparison : uses

Flow diagram for Snakefile-driven figure generation

flowchart TD
    A_all["rule all"]

    subgraph rules["Snakefile rules"]
        R2["rule figure_2"]
        R3["rule figure_3"]
        R5["rule figure_5"]
        R6["rule figure_6"]
        R7["rule figure_7"]
        Rrt["rule appendix_rt_tokenizer"]
        Rarch["rule appendix_architecture"]
        Rpanel["rule appendix_panel_comparison"]
        Rclean["rule clean"]
    end

    subgraph scripts["Plot scripts"]
        S2["figure_2.py"]
        S3["figure_3.py"]
        S5["figure_5.py"]
        S6["figure_6.py"]
        S7["figure_7.py"]
        Srt["appendix_rt_tokenizer.py"]
        Sarch["appendix_architecture.py"]
        Spanel["appendix_panel_comparison.py"]
    end

    subgraph outputs["Generated artifacts"]
        O2a["figure_2_panel_c.(png,pdf)"]
        O2b["figure_2_appendix_2row.(png,pdf)"]
        O2c["figure_2_appendix_1row.(png,pdf)"]

        O3["figure_3.(png,pdf)"]
        O5["figure_5.(png,pdf)"]
        O6["figure_6.(png,pdf)"]
        O7["figure_7.(png,pdf)"]
        Ort["appendix_rt_tokenizer.(png,pdf)"]
        OarchC["appendix_architecture_panel_c.(png,pdf)"]
        Oarch2["appendix_architecture_2row.(png,pdf)"]
        Oarch1["appendix_architecture_1row.(png,pdf)"]
        Opanel["appendix_panel_comparison.(png,pdf)"]
    end

    A_all --> R2
    A_all --> R3
    A_all --> R5
    A_all --> R6
    A_all --> R7
    A_all --> Rrt
    A_all --> Rarch
    A_all --> Rpanel

    R2 -->|"uv run python"| S2
    R3 -->|"uv run python"| S3
    R5 -->|"uv run python"| S5
    R6 -->|"uv run python"| S6
    R7 -->|"uv run python"| S7
    Rrt -->|"uv run python"| Srt
    Rarch -->|"uv run python"| Sarch
    Rpanel -->|"uv run python"| Spanel

    S2 --> O2a
    S2 --> O2b
    S2 --> O2c

    S3 --> O3
    S5 --> O5
    S6 --> O6
    S7 --> O7
    Srt --> Ort
    Sarch --> OarchC
    Sarch --> Oarch2
    Sarch --> Oarch1
    Spanel --> Opanel

    Rclean -->|"rm outputs/*.png, *.pdf, logs/*.log"| outputs

File-Level Changes

Change	Details	Files
Introduce shared plotting utilities and style/mapping configuration used across all figure scripts.	Create commons.py with shared constants for representations, properties, task mappings, and group/model styles. Implement reusable utilities for range-framed axes, representation-to-color/group resolution, and standardized figure saving into an outputs/ directory. Define TASK_REVERSE_MAP and PROPERTY_KEY_IN_MODEL_JSON to normalize HuggingFace dataset schema across scripts.	`plots/commons.py`
Add standalone scripts to generate main paper figures (2,3,5,6,7) from HuggingFace datasets.	figure_2/appendix_architecture implement contribution calculation (geom_comp, comp_contrib), per-dataset aggregation, and MatText vs CoGN vs MODNet dumbbell plots with averaging and appendix layouts. figure_3 builds GCMG curves over n-gram bin counts from an HF dataset with grouped representation styles and SSA-based metric computation. figure_5 loads 30k-scale HF results, aggregates RMSE/STD per representation and property, and renders grouped bar charts colored by representation family. figure_6 loads BERT and LLaMA scaling results, converts them into nested dicts keyed by data/model size and representation, and produces a 2x3 grid of relative RMSE change curves with optional global error band. figure_7 normalizes MAE scores per property across models, defines a custom ‘squish’ y-scale to compress upper errors, and plots an annotated performance wall separating GNN/LLM/Other models.	`plots/figure_2.py` `plots/appendix_architecture.py` `plots/figure_3.py` `plots/figure_5.py` `plots/figure_6.py` `plots/figure_7.py`
Add scripts for appendix figures analyzing tokenizer effects and panel-wise architecture comparisons.	appendix_rt_tokenizer loads RT vs normal tokenizer metrics from HF, builds nested metric dicts, computes percentage MAE improvements per size/representation/property, and renders grouped bar plots plus summary stats. appendix_panel_comparison downloads contribution analysis data, computes geometry/composition contributions, aggregates MatText mean vs CoGN and MODNet per property, and plots multi-panel dumbbell comparisons with shared legend.	`plots/appendix_rt_tokenizer.py` `plots/appendix_panel_comparison.py`
Introduce a Snakemake workflow to orchestrate generation of all figures and manage logs/outputs.	Define FIGURE_OUTPUTS mapping and rules for each figure/appendix script that run via `uv run python ...` and tee logs. Provide aggregate `all` target depending on all PNG/PDF artifacts in outputs/, plus a `clean` rule and success/error hooks printing brief status messages.	`plots/Snakefile`
Wire in development-time dependencies and lockfile needed to run plotting workflow.	Add a dev dependency group in pyproject.toml including lama-aesthetics via Git, Snakemake, and a constrained pulp version. Check in uv.lock to pin the environment used for running the plotting scripts and Snakefile.	`pyproject.toml` `uv.lock`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

coderabbitai · 2026-03-11T01:38:42Z

📝 Walkthrough

Walkthrough

Adds a Snakemake workflow, a shared plotting utilities module, multiple new plotting scripts (figures and appendices) that download HF datasets and render publication figures, and minor tooling updates (.gitignore, dev dependencies). Outputs are written to a centralized outputs directory.

Changes

Cohort / File(s)	Summary
Configuration & Build `\.gitignore`, `pyproject.toml`	Added `.snakemake/` and `.snakemake_dirs_created` to gitignore; added a `dev` dependency group with `snakemake`, `pulp`, and `lama-aesthetics`.
Workflow Orchestration `plots/Snakefile`	New Snakemake workflow defining rules for multiple figures and appendices, centralized OUTPUT_DIR/FIGURE_OUTPUTS mapping, per-rule logging, clean rule, and success/error hooks.
Shared Plotting Infrastructure `plots/commons.py`	New centralized constants, mappings, color/group styles, output directory handling, and utilities (`range_frame`, `get_representation_group_and_color`, `get_output_path`, `save_figure`).
Main Figure Scripts `plots/figure_2.py`, `plots/figure_3.py`, `plots/figure_5.py`, `plots/figure_6.py`, `plots/figure_7.py`	New scripts that download HuggingFace datasets, compute metrics (contributions, GCMG, scaling, normalization), and produce multi-panel publication plots; figure_7 adds a custom SquishScale transform.
Appendix Scripts `plots/appendix_architecture.py`, `plots/appendix_panel_comparison.py`, `plots/appendix_rt_tokenizer.py`	New appendix modules that download data, compute per-dataset and averaged contributions, prepare comparison datasets, compute tokenizer improvement percentages, and render multi-panel/dumbbell/bar figures.
Outputs & Logging `plots/...logs/`, `outputs/` (implicit)	New plotting scripts write PNG/PDF outputs to the centralized outputs directory and write per-rule logs under `logs/` (configured by Snakefile).

Sequence Diagram

sequenceDiagram
    participant SM as Snakemake
    participant FS as Figure Script
    participant HF as HuggingFace Hub
    participant CM as plots.commons
    participant FSYS as File System

    SM->>FS: Trigger figure rule (script + inputs)
    FS->>HF: Download dataset (hf_hub_download / repo)
    HF-->>FS: Return Parquet / dataset files
    FS->>FS: Compute metrics (calculate_contributions, GCMG, scaling, normalization)
    FS->>CM: Request styles & helpers (MODEL_STYLES, GROUP_STYLES, save_figure)
    CM-->>FS: Return constants & save_figure
    FS->>FSYS: save_figure writes PNG/PDF to outputs (via CM)
    FSYS-->>FS: Confirm file paths
    FS->>SM: Rule completes (outputs available)
    SM->>FSYS: Log success / onerror points to logs/

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 In a burrow of code I hop and hum,
Snakemake threads figures, one by one they come.
HF parcels fetched, commons colors set,
PNGs and PDFs in outputs are met —
I nibble a carrot and call it done. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 54.72% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title 'feat: add plots scripts' accurately describes the main change, which is the addition of multiple plotting scripts and utilities for generating MatText publication figures.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch plots

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Hey - I've found 4 issues, and left some high level feedback:

There is substantial duplication between figure_2.py and appendix_architecture.py (e.g. download_data, calculate_contributions, compute_per_dataset_values, compute_averaged_values, plot_panel_c, plot_appendix); consider extracting these into shared utilities to avoid divergence and ease maintenance.
GROUP_STYLES, REP_COLOR_OVERRIDES, and some mapping constants are redefined in multiple scripts (e.g. figure_6.py vs commons.py), which can lead to inconsistencies; it would be cleaner to centralize these definitions in commons.py and import them everywhere.
The plotting scripts mix print and loguru logging styles and hard-code HuggingFace repo IDs/paths inline; using a shared configuration/constant module for HF dataset IDs and standardizing on one logging approach would make the scripts easier to run in different environments and debug.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- There is substantial duplication between `figure_2.py` and `appendix_architecture.py` (e.g. `download_data`, `calculate_contributions`, `compute_per_dataset_values`, `compute_averaged_values`, `plot_panel_c`, `plot_appendix`); consider extracting these into shared utilities to avoid divergence and ease maintenance.
- `GROUP_STYLES`, `REP_COLOR_OVERRIDES`, and some mapping constants are redefined in multiple scripts (e.g. `figure_6.py` vs `commons.py`), which can lead to inconsistencies; it would be cleaner to centralize these definitions in `commons.py` and import them everywhere.
- The plotting scripts mix `print` and `loguru` logging styles and hard-code HuggingFace repo IDs/paths inline; using a shared configuration/constant module for HF dataset IDs and standardizing on one logging approach would make the scripts easier to run in different environments and debug.

## Individual Comments

### Comment 1
<location path="plots/appendix_rt_tokenizer.py" line_range="58-63" />
<code_context>
+    return rt_data, normal_data
+
+
+def calculate_percentage_improvement(
+    rt_data, normal_data, data_sizes, representations, properties
+):
+    """
+    Calculate percentage improvement: ((Normal_RMSE - RT_RMSE) / Normal_RMSE) * 100
+    Positive values indicate improvement (RT is better).
+    """
+    results = {}
</code_context>
<issue_to_address>
**issue:** The function/docstring and variable names reference RMSE but the implementation actually uses MAE.

This function uses `...['mae']['mean']` for both RT and normal tokenizers, while local names (`rt_rmse`, `normal_rmse`) and the docstring refer to RMSE. Please either rename the docstring/variables to MAE or parameterize the metric so the chosen metric (MAE vs RMSE) is explicit and consistent with the naming.
</issue_to_address>

### Comment 2
<location path="plots/Snakefile" line_range="83-84" />
<code_context>
+        commons="commons.py"
+    output:
+        FIGURE_OUTPUTS["figure_2"]
+    log:
+        "logs/figure_2.log"
+    shell:
+        "uv run python {input.script} 2>&1 | tee {log}"
</code_context>
<issue_to_address>
**issue (bug_risk):** Log files are written into a `logs/` directory that is never created in the workflow.

The `tee` command writes to `logs/...`, but nothing creates the `logs/` directory. On a clean checkout this will fail with “No such file or directory” before any figures are produced. Please ensure the directory is created, e.g. via a small `rule init_dirs` (`mkdir -p logs {OUTPUT_DIR}`) that `all`/figure rules depend on, or by prefixing each relevant shell command with `mkdir -p logs`.
</issue_to_address>

### Comment 3
<location path="plots/figure_6.py" line_range="433-434" />
<code_context>
+    final_handles = list(final_unique.values())
+    final_labels = list(final_unique.keys())
+
+    num_legend_cols = min(
+        len(final_labels), 3 if len(final_labels) > 4 else len(final_labels)
+    )
+
</code_context>
<issue_to_address>
**nitpick:** Legend column count logic is slightly counterintuitive and can over-allocate columns.

`num_legend_cols` is later used as `ncol=num_legend_cols + 1` when calling `fig.legend`, so the actual column count doesn’t match the computed value. For small label counts this can make the legend overly sparse and wide. Consider either using `ncol=num_legend_cols` directly or renaming the variable (e.g. `base_cols` vs `total_cols`) to make the extra column explicit.
</issue_to_address>

### Comment 4
<location path="plots/figure_7.py" line_range="136-146" />
<code_context>
+
+
+# Function to normalize data per property
+def normalize_property(series):
+    valid = series.dropna()
+    if len(valid) <= 1:
+        return series
+    min_val = valid.min()
+    max_val = valid.max()
+    range_val = max_val - min_val
+    if range_val == 0:
+        return series
+    return (series - min_val) / range_val
</code_context>
<issue_to_address>
**suggestion:** Properties with zero variance fall back to raw values instead of staying in [0, 1].

When `range_val == 0`, the function returns the original series, so those values are no longer in the [0, 1] space assumed elsewhere in the plot. On a normalized scale this can push flat-property bands outside the intended visual range. For consistent scaling, consider returning a constant (e.g., 0.5) or another explicit normalized value instead of the raw series in this degenerate case.

```suggestion
# Function to normalize data per property
def normalize_property(series):
    valid = series.dropna()
    # If there is 0 or 1 valid value, return a constant normalized value (e.g., 0.5)
    if len(valid) <= 1:
        return pd.Series(
            0.5, index=series.index, dtype=float
        ).where(~series.isna(), other=pd.NA)
    min_val = valid.min()
    max_val = valid.max()
    range_val = max_val - min_val
    # If all valid values are identical, map them to a constant normalized value
    if range_val == 0:
        return pd.Series(
            0.5, index=series.index, dtype=float
        ).where(~series.isna(), other=pd.NA)
    return (series - min_val) / range_val
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

plots/appendix_rt_tokenizer.py

plots/Snakefile

plots/figure_6.py

plots/figure_7.py

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 12

🧹 Nitpick comments (3)

plots/Snakefile (1)
11-12: Anchor the workflow to the Snakefile directory.

These relative script/log/output paths only work reliably after cd plots; snakemake -s plots/Snakefile ... from the repo root will look for figure_2.py and commons.py in the wrong place. Using workdir: workflow.basedir or prefixing paths with workflow.basedir removes that cwd dependency.

Also applies to: 77-177
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plots/Snakefile` around lines 11 - 12, The Snakefile currently assumes the
CWD is plots (e.g., OUTPUT_DIR = "outputs" and rules that call figure_2.py and
commons.py), which breaks when invoking snakemake from the repo root; to fix,
anchor the workflow by adding workdir: workflow.basedir at the top of the
Snakefile or by prefixing relative paths with workflow.basedir (update
OUTPUT_DIR and all usages of figure_2.py, commons.py and any rule input/output
patterns in the sections around OUTPUT_DIR and lines ~77-177 to use
workflow.basedir + path so the scripts and output paths resolve regardless of
the current working directory).
plots/figure_3.py (1)
11-30: Reuse the shared group colors here.

This local GROUP_STYLES flips the compositional/geometric palette from plots/commons.py:59-88, so category colors no longer mean the same thing across figures. Keeping the n-gram labels/styles local is fine, but the colors should still come from the shared mapping.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plots/figure_3.py` around lines 11 - 30, Replace the hard-coded color values
in the local GROUP_STYLES with the shared palette by importing the shared group
color mapping from the project's commons module and using it for the "color"
entries (keep the existing member labels/styles unchanged); specifically, update
GROUP_STYLES to pull colors for the "compositional", "geometric", and "local"
keys from the shared mapping (e.g., the exported GROUP_COLORS or
COMMON_GROUP_COLORS in commons) so figure colors match other plots while
preserving the local n-gram label/style overrides.
plots/figure_7.py (1)
128-133: Move the execution path under main() and a module guard.

Importing this module currently downloads data, builds the figure, and writes files as a side effect. The other plot scripts in this PR use explicit entrypoints; this one should too.

Also applies to: 149-265
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plots/figure_7.py` around lines 128 - 133, The top-level execution that calls
load_dataset and builds/writes the figure (starting with the df = (...)
assignment that invokes load_dataset("n0w0f/gnn_llm_wall", split="train") and
the subsequent plotting code spanning the block around lines 149-265) must be
moved into a dedicated main() function and guarded by a module check; refactor
so that data loading, figure construction, and file writes are performed inside
a function named main(), keep imports at module scope, and add if __name__ ==
"__main__": main() to prevent side effects on import.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plots/appendix_architecture.py`:
- Around line 127-167: The code adds horizontal offsets to the y-values (off is
added to comp/geom) which mutates the plotted contribution values; instead apply
the offset to the x-coordinates so y-values remain unchanged. In the loop over
averaged in plot_panel_c()/plot_appendix() use x coordinates like x0 = 0 + off
and x1 = 1 + off for ax.plot and for each ax.errorbar call pass x_pos + off (or
x0/x1) while leaving y_val and _y_std unmodified; keep offsets = {"MatText":
offset_step, "CoGN": 0.0, "MODNet": -offset_step}, style, labels, and zorder
logic the same. Also update the equivalent code region around the other
occurrence (lines ~215-251) to match this change.
- Around line 136-167: Enable the error bars in the ax.errorbar call by passing
yerr=_y_std (or the appropriate std var) instead of leaving it commented out,
and after computing comp, geom, comp_std, geom_std compute and extend all_y with
the error bar extents (e.g., comp±comp_std and geom±geom_std added to all_y) so
that range_frame() will account for the full y range; update the ax.errorbar
invocation (in the loop that currently binds x_pos, y_val, _y_std) to include
yerr and leave the other marker/label arguments as-is.

In `@plots/appendix_panel_comparison.py`:
- Around line 162-285: The code incorrectly shifts the contribution values by
adding/subtracting offset to the y coordinates (see variable offset and its use
in plotting mean_all, cogn_rep, and modnet); instead apply horizontal jitter to
the x coordinates. Change each ax.plot and ax.scatter for mean_all, cogn_rep,
and modnet to use x coordinates [0+dx, 1+dx] and x positions 0+dx and 1+dx
(where dx = +offset for mean_all, dx = -offset for cogn_rep, dx = -2*offset for
modnet) and remove the +/- offset adjustments from the y arguments (use the raw
mean_comp/mean_geom, cogn_comp/cogn_geom, modnet_comp/modnet_geom). Keep the
existing colors, linestyles, sizes, and zorders.

In `@plots/appendix_rt_tokenizer.py`:
- Around line 106-108: The current broad "except Exception as e" masks
unexpected HF schema/type bugs by turning them into None; change it to catch
only the expected missing-data errors (e.g., except (KeyError, IndexError,
ValueError) as e:) and set results[size][prop][rep] = None in that branch, and
for any other exception re-raise with context (e.g., raise RuntimeError(f"Error
calculating for {size}-{prop}-{rep}") from e) so unexpected issues aren't
silently swallowed.
- Around line 323-329: The print block at the end using total_comparisons can
divide by zero when total_comparisons == 0; update the code around the print
statements (variables total_comparisons, improvement_count, degradation_count in
appendix_rt_tokenizer.py) to check if total_comparisons is zero and, if so,
print a safe message like "No comparisons performed" or show 0% for
improvements/degradations instead of computing 100 * ... / total_comparisons;
otherwise compute and print the percentages as before.

In `@plots/commons.py`:
- Around line 90-93: REP_COLOR_OVERRIDES is currently unused so
per-representation colors like "robocrys" are ignored; update
get_representation_group_and_color to check REP_COLOR_OVERRIDES first and return
the overridden color (and appropriate group if needed) before falling back to
the existing group/color lookup logic, ensuring any callers that rely on
get_representation_group_and_color (and similar helpers referenced around the
same area) receive the override color; keep REP_COLOR_OVERRIDES as the
authoritative first check and only use the previous mapping when no override
exists.

In `@plots/figure_2.py`:
- Around line 116-157: The code is adding the horizontal offset to the y-values
(comp/geom) instead of shifting the x positions and is not passing the computed
std devs into yerr; update the plotting calls so ax.plot and ax.errorbar use x
coordinates shifted by the offset (e.g., use x_positions[x_pos] + off or add off
to x_pos) while leaving comp/geom unchanged for the y-axis, and pass the
corresponding std (_y_std) into ax.errorbar as yerr=_y_std; keep the existing
label logic (label=style["label"] if x_pos == 0 else "_nolegend_") and the same
style references (MODEL_STYLES, averaged, comp/geom, comp_std/geom_std, ax.plot,
ax.errorbar).
- Around line 17-27: The download_data function currently forces re-downloading
and omits a pinned revision; update download_data to default
force_download=False and pass an explicit immutable revision to the
hf_hub_download call (e.g., revision="<snapshot-sha-or-tag>") instead of always
forcing, and adjust callers (including main()) to stop passing True; apply the
same pattern to other dataset loaders mentioned (the load_dataset call in
figure_3.py, the load_dataset in figure_5.py, and the load_dataset in
appendix_rt_tokenizer.py) by adding a revision argument and defaulting
force_download to False so runs use cached, pinned snapshots.

In `@plots/figure_3.py`:
- Around line 116-121: The subplot titles are hardcoded to ["Dataset 1",
"Dataset 2", "Dataset 3"] causing all panels to be mislabeled; replace that
default with a call to the existing mapping function so titles reflect the
actual properties. Specifically, stop initializing property_titles with the
static list and instead set property_titles =
map_properties_to_titles(properties) (or use it when len(properties) ==
len(map_properties_to_titles(properties))); keep the fallback logic to generate
readable titles (e.g., p.replace("_", " ").title() or "Dataset N") only when
map_properties_to_titles is not available or its length doesn't match,
referencing the property_titles variable and
map_properties_to_titles(properties) function to locate the change.

In `@plots/figure_6.py`:
- Around line 127-135: The tick formatter format_log_ticks_for_data_axis is
matching wrong numeric literals (3e4, 3e5, 2e6) so dataset-size ticks are
labeled 2–3× too small; update the comparisons to check for the exact magnitudes
you want (1e4, 1e5, 1e6) or use a tolerant comparison (e.g., log10 or
numpy.isclose) to return "$10^4$", "$10^5$", "$10^6$" for those exact powers of
ten and otherwise return "" so labels reflect the true dataset sizes.

In `@plots/figure_7.py`:
- Around line 137-146: normalize_property currently returns the original series
when there is <=1 valid point or zero range which leaves raw MAE values
unnormalized; instead, for those fallback cases set all valid (non-NaN) entries
to a neutral normalized value (e.g., 0.5) and preserve NaNs so the rest of the
pipeline sees values within [0,1]; update the function normalize_property to
detect the <=1 or range_val==0 cases and return a series with the same index
where valid positions are 0.5 (as float) and NaNs unchanged.
- Around line 61-76: Implement the missing inverted() on the
InvertedSquishTransform class: add a method inverted(self) that returns a
Transform implementing the mathematical inverse of transform_non_affine (for
inputs <= self.cutoff return unchanged; for inputs > self.cutoff return
self.cutoff + (value - self.cutoff) * self.compression). Create and return a
small transform instance (e.g., a nested class or a new SquishTransform) whose
transform_non_affine implements that mapping and whose inverted() returns self
(so round-trip inversion works), and ensure input_dims/output_dims/is_separable
match the original.

---

Nitpick comments:
In `@plots/figure_3.py`:
- Around line 11-30: Replace the hard-coded color values in the local
GROUP_STYLES with the shared palette by importing the shared group color mapping
from the project's commons module and using it for the "color" entries (keep the
existing member labels/styles unchanged); specifically, update GROUP_STYLES to
pull colors for the "compositional", "geometric", and "local" keys from the
shared mapping (e.g., the exported GROUP_COLORS or COMMON_GROUP_COLORS in
commons) so figure colors match other plots while preserving the local n-gram
label/style overrides.

In `@plots/figure_7.py`:
- Around line 128-133: The top-level execution that calls load_dataset and
builds/writes the figure (starting with the df = (...) assignment that invokes
load_dataset("n0w0f/gnn_llm_wall", split="train") and the subsequent plotting
code spanning the block around lines 149-265) must be moved into a dedicated
main() function and guarded by a module check; refactor so that data loading,
figure construction, and file writes are performed inside a function named
main(), keep imports at module scope, and add if __name__ == "__main__": main()
to prevent side effects on import.

In `@plots/Snakefile`:
- Around line 11-12: The Snakefile currently assumes the CWD is plots (e.g.,
OUTPUT_DIR = "outputs" and rules that call figure_2.py and commons.py), which
breaks when invoking snakemake from the repo root; to fix, anchor the workflow
by adding workdir: workflow.basedir at the top of the Snakefile or by prefixing
relative paths with workflow.basedir (update OUTPUT_DIR and all usages of
figure_2.py, commons.py and any rule input/output patterns in the sections
around OUTPUT_DIR and lines ~77-177 to use workflow.basedir + path so the
scripts and output paths resolve regardless of the current working directory).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: afef853e-0633-43e7-81ab-a6665c379596

📥 Commits

Reviewing files that changed from the base of the PR and between eaefa11 and 25c6db0.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (12)

.gitignore
plots/Snakefile
plots/appendix_architecture.py
plots/appendix_panel_comparison.py
plots/appendix_rt_tokenizer.py
plots/commons.py
plots/figure_2.py
plots/figure_3.py
plots/figure_5.py
plots/figure_6.py
plots/figure_7.py
pyproject.toml

plots/appendix_architecture.py

plots/appendix_panel_comparison.py

plots/appendix_rt_tokenizer.py

plots/figure_2.py

plots/figure_3.py

plots/figure_6.py

plots/figure_7.py

coderabbitai

🧹 Nitpick comments (6)

plots/Snakefile (2)

200-207: The clean rule doesn't remove the sentinel file.

If .snakemake_dirs_created is kept, the clean rule should remove it to ensure a fresh state.

🧹 Proposed fix

 rule clean:
     shell:
         """
         rm -rf {OUTPUT_DIR}/*.png {OUTPUT_DIR}/*.pdf
         rm -rf logs/*.log
+        rm -f .snakemake_dirs_created
         echo "Cleaned all outputs and logs"
         """

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@plots/Snakefile` around lines 200 - 207, The clean rule currently removes
generated images and logs but omits the sentinel file; update the rule named
"clean" to also remove the .snakemake_dirs_created sentinel (alongside
{OUTPUT_DIR} and logs) so a full fresh state is produced; modify the shell block
in rule clean to include removal of .snakemake_dirs_created (ensure path is
correct relative to the Snakefile) and keep the existing echo message.

77-85: Consider making init_dirs an implicit dependency rather than tracking a sentinel file.

The current approach with .snakemake_dirs_created works but leaves an artifact. Since commons.py already creates OUTPUT_DIR on import (line 9), this rule may be partially redundant. At minimum, add the sentinel to .gitignore.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@plots/Snakefile` around lines 77 - 85, The init_dirs rule currently writes a
sentinel file (.snakemake_dirs_created) which leaves an artifact; instead either
(A) convert the rule to declare directory outputs using Snakemake's directory()
wrapper (e.g., output: directory(OUTPUT_DIR), directory("logs")) and remove
creation of .snakemake_dirs_created, then make rules that require those dirs
list rules.init_dirs.output as an implicit/input dependency so Snakemake ensures
directories exist, or (B) if you keep the sentinel, add .snakemake_dirs_created
to .gitignore; also note commons.py already creates OUTPUT_DIR on import so you
can alternatively remove the rule entirely and rely on commons.py for directory
creation.

plots/figure_7.py (2)

84-112: Local range_frame has intentionally different semantics.

This implementation takes a conceptual y_data_range tuple (e.g., [0, 1]) rather than actual y-data values, which differs from commons.py's version. Given the normalized data context, this design is appropriate here.

Consider adding a brief docstring clarifying this difference to avoid future confusion.

📝 Suggested docstring improvement

 # Define range_frame function
-def range_frame(ax, x, y_data_range, pad=0.1):  # y_data_range is conceptual [0,1]
+def range_frame(ax, x, y_data_range, pad=0.1):
+    """Apply range frame styling for normalized data.
+
+    Unlike commons.range_frame which takes raw y values, this version accepts
+    a conceptual y_data_range tuple (e.g., [0, 1]) for normalized MAE data.
+    """
     # For ylim, consider the requested data range (typically 0 to 1 for normalized MAE)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@plots/figure_7.py` around lines 84 - 112, Add a brief docstring to the
range_frame function explaining that y_data_range is a conceptual normalized
range (e.g., [0, 1]) rather than actual y-data values, describing parameters
(ax, x, y_data_range, pad), what the function sets (ylim, xlim, spine
positions/bounds) and explicitly noting that its semantics intentionally differ
from the commons.py version to avoid confusion for future readers; reference the
function name range_frame and mention y_data_range and the behavior for
filtered_x/empty x so maintainers can locate the logic to be documented.

220-225: Remove unused text_label variable.

text_label is assigned on line 220 but never used; line 225 uses the inline expression instead.

🧹 Proposed fix

                 # Add model name label with outline for contrast and GRAY color
-                text_label = model.replace("-Best", "")
-
                 text = ax.text(
                     x_pos + 0.1,
                     y_pos,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@plots/figure_7.py` around lines 220 - 225, The variable text_label is
assigned but never used; update the ax.text call in the block that creates the
label (where model and ax.text are used) to either use text_label instead of the
inline model.replace("-Best", "") expression or remove the text_label assignment
entirely to avoid the unused variable; locate the assignment to text_label and
the ax.text invocation and make them consistent (use text_label in the ax.text
call or delete the unused assignment).

plots/appendix_rt_tokenizer.py (2)

22-30: Import range_frame from commons.py instead of redefining it locally.

This local implementation diverges from the shared range_frame in commons.py (lines 143-189):

Uses asymmetric x-padding (x_max + 2 * pad vs x_max + pad)
Missing edge case handling when x_min == x_max
Missing log scale support
Missing bottom spine bounds

Using the shared version ensures consistency and avoids maintenance burden.

♻️ Proposed fix

 from commons import (
     DATA_SIZE_MAPPING,
     GROUP_STYLES,
     PROPERTY_DISPLAY_MAPPING,
     REPRESENTATION_MAPPING,
     get_representation_group_and_color,
+    range_frame,
     save_figure,
 )

Then remove lines 22-30 entirely.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@plots/appendix_rt_tokenizer.py` around lines 22 - 30, Remove the local
range_frame definition in plots/appendix_rt_tokenizer.py and import the shared
range_frame from commons.py instead; specifically delete the def
range_frame(...) block and add an import for range_frame (from commons import
range_frame) so the module uses the canonical implementation (which handles
symmetric x-padding, x_min==x_max edge case, log-scale support, and bottom spine
bounds) and update any local usages to call the imported range_frame.

167-168: Remove unused variable _subplot_height.

_subplot_height is computed but never used. The figure height is derived from TWO_COL_GOLDEN_RATIO_HEIGHT_INCH instead.

🧹 Proposed fix

     # Calculate figure size
     subplot_width = TWO_COL_WIDTH / num_cols
-    _subplot_height = subplot_width / GOLDEN_RATIO
     TWO_COL_GOLDEN_RATIO_HEIGHT_INCH = TWO_COL_WIDTH / golden

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@plots/appendix_rt_tokenizer.py` around lines 167 - 168, Remove the unused
computed variable _subplot_height (assigned from subplot_width / GOLDEN_RATIO)
since the code uses TWO_COL_GOLDEN_RATIO_HEIGHT_INCH (computed from
TWO_COL_WIDTH / golden) for figure height; delete the _subplot_height assignment
and ensure no remaining references to _subplot_height exist (also verify the
intended constant name consistency between GOLDEN_RATIO and golden elsewhere if
needed).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@plots/appendix_rt_tokenizer.py`:
- Around line 22-30: Remove the local range_frame definition in
plots/appendix_rt_tokenizer.py and import the shared range_frame from commons.py
instead; specifically delete the def range_frame(...) block and add an import
for range_frame (from commons import range_frame) so the module uses the
canonical implementation (which handles symmetric x-padding, x_min==x_max edge
case, log-scale support, and bottom spine bounds) and update any local usages to
call the imported range_frame.
- Around line 167-168: Remove the unused computed variable _subplot_height
(assigned from subplot_width / GOLDEN_RATIO) since the code uses
TWO_COL_GOLDEN_RATIO_HEIGHT_INCH (computed from TWO_COL_WIDTH / golden) for
figure height; delete the _subplot_height assignment and ensure no remaining
references to _subplot_height exist (also verify the intended constant name
consistency between GOLDEN_RATIO and golden elsewhere if needed).

In `@plots/figure_7.py`:
- Around line 84-112: Add a brief docstring to the range_frame function
explaining that y_data_range is a conceptual normalized range (e.g., [0, 1])
rather than actual y-data values, describing parameters (ax, x, y_data_range,
pad), what the function sets (ylim, xlim, spine positions/bounds) and explicitly
noting that its semantics intentionally differ from the commons.py version to
avoid confusion for future readers; reference the function name range_frame and
mention y_data_range and the behavior for filtered_x/empty x so maintainers can
locate the logic to be documented.
- Around line 220-225: The variable text_label is assigned but never used;
update the ax.text call in the block that creates the label (where model and
ax.text are used) to either use text_label instead of the inline
model.replace("-Best", "") expression or remove the text_label assignment
entirely to avoid the unused variable; locate the assignment to text_label and
the ax.text invocation and make them consistent (use text_label in the ax.text
call or delete the unused assignment).

In `@plots/Snakefile`:
- Around line 200-207: The clean rule currently removes generated images and
logs but omits the sentinel file; update the rule named "clean" to also remove
the .snakemake_dirs_created sentinel (alongside {OUTPUT_DIR} and logs) so a full
fresh state is produced; modify the shell block in rule clean to include removal
of .snakemake_dirs_created (ensure path is correct relative to the Snakefile)
and keep the existing echo message.
- Around line 77-85: The init_dirs rule currently writes a sentinel file
(.snakemake_dirs_created) which leaves an artifact; instead either (A) convert
the rule to declare directory outputs using Snakemake's directory() wrapper
(e.g., output: directory(OUTPUT_DIR), directory("logs")) and remove creation of
.snakemake_dirs_created, then make rules that require those dirs list
rules.init_dirs.output as an implicit/input dependency so Snakemake ensures
directories exist, or (B) if you keep the sentinel, add .snakemake_dirs_created
to .gitignore; also note commons.py already creates OUTPUT_DIR on import so you
can alternatively remove the rule entirely and rely on commons.py for directory
creation.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d7792ff5-3eee-4ece-b8cf-7bdce431ba77

📥 Commits

Reviewing files that changed from the base of the PR and between 25c6db0 and 62e7ebd.

📒 Files selected for processing (4)

.gitignore
plots/Snakefile
plots/appendix_rt_tokenizer.py
plots/figure_7.py

🚧 Files skipped from review as they are similar to previous changes (1)

.gitignore

feat: add plots scripts

25c6db0

sourcery-ai bot reviewed Mar 11, 2026

View reviewed changes

plots/appendix_rt_tokenizer.py Show resolved Hide resolved

plots/Snakefile Show resolved Hide resolved

plots/figure_6.py Show resolved Hide resolved

plots/figure_7.py Show resolved Hide resolved

n0w0f and others added 3 commits March 10, 2026 21:42

Update plots/figure_7.py

fcd1c9a

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

chore: sourcery review comments

fa19d6f

chore: sourcery review comments

62e7ebd

n0w0f requested a review from kjappelbaum March 11, 2026 01:49

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

n0w0f merged commit ee3be44 into main Mar 12, 2026
6 checks passed

n0w0f deleted the plots branch March 12, 2026 12:44

Conversation

n0w0f commented Mar 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Summary by CodeRabbit

Uh oh!

sourcery-ai bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Class diagram for shared plotting utilities and script relationships

Flow diagram for Snakefile-driven figure generation

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

n0w0f commented Mar 11, 2026 •

edited by coderabbitai bot

Loading

sourcery-ai bot commented Mar 11, 2026 •

edited

Loading

coderabbitai bot commented Mar 11, 2026 •

edited

Loading