diff --git a/CHANGELOG.md b/CHANGELOG.md
index b99276e2..395f9d26 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,15 +1,341 @@
# CHANGELOG
-## v0.9.0 (2025-05-23)
+## v0.9.1 (2026-02-27)
-### Step
+### Bug Fixes
-- Bumping minor version
- ([`e333641`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/e3336417a09b4ef26e71bde1b54da840f0980ab9))
+- Add missing 'import os' in performance tests
+ ([`7a5ec6d`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/7a5ec6d57ca7934d7d1003d417b3907f6d692308))
+
+- test_boxplot_performance.py - test_histogram_performance.py
+
+- Add missing __init__.py in tests/templates
+ ([`7b7e6cb`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/7b7e6cbbe3bf9ea591de2075cb40861c2136a879))
+
+- Remove 6 deprecated templates (sync with tools_refactor)
+ ([`d0bbc5e`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/d0bbc5ea8a7151bbf5f389549612b01862d6f382))
+
+- Spac_boxplot outputs in json and validated ha5d
+ ([`c87e782`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/c87e782ff03c3ec52a2ae4c4353f5b426a6fc9d0))
+
+- **boxplot**: Replace deprecated append call with concat
+ ([`4906439`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/4906439dfdbec132ba675e4d13b9c48cd33d8c38))
+
+- **boxplot_template**: Address minor comments from copilot
+ ([`cd5abd0`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/cd5abd005d35678351867ae14020ca9d57317e02))
+
+- **check_layer**: Use check_table spac function to evaluate if adata.layer is present
+ ([`0cf530b`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/0cf530bb603c0d74beeaa797df5f8ad222512921))
+
+- **combine_annotations_template**: Address comments from copilot CR for
+ combine_annotations_template function
+ ([`9d8582a`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/9d8582a88b1d61cd1912aa146153178d8287d82a))
+
+- **histogram_performance**: Add clarifying comment for old hist implementation
+ ([`179482e`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/179482eb16bbfaa7566bfcc8aae0adb29c0d2429))
+
+- **histogram_template**: Fix odd number of cells in test
+ ([`51ba1c4`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/51ba1c4609e5d4435e612faf6b632bd8f1f76927))
+
+- **interactive_spatial_plot_template**: Remove nidap comments
+ ([`64ff302`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/64ff3023155345b86ad9b72838bae0b53db21930))
+
+- **nearest_neighbor_template**: Break the title in two lines
+ ([`4e083fb`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/4e083fbe77a1aaa1dac1d5b3d7841ed172721132))
+
+- **normalize_batch_template**: Fix typo and unused import
+ ([`9edf8ce`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/9edf8cee2972ecfd34244d7f3482a7ca5be94b2e))
+
+- **performance_test**: Fix the speedup calculation logic
+ ([`c31c3ff`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/c31c3ffec708bf2d2cf0a9ce487f54ccd04fe874))
+
+- **posit_it_python_template**: Fixed typo
+ ([`e70f547`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/e70f547aae17a011ce52162c0bf6fd42a74902ed))
+
+- **quantile_scaling_template**: Fix typo in both function and unit tests
+ ([`de6ee91`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/de6ee910c5a36eb7fa2c6429f36695317ceebb03))
+
+- **relational_heatmap_template**: Address the issue of insecure temporary file and comments from
+ copilot
+ ([`5662bdb`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/5662bdbf77980af891bd55e45c02a20ed7af7546))
+
+- **ripley_template**: Address review comments - merge dev into the branch and fix unit test
+ ([`415df89`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/415df89b0d3d368fa126b6167d2fb68028a8b512))
+
+- **ripley_template**: Address review comments - replace debug prints with logging
+ ([`9914716`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/9914716db356d8ca217963a0d95c85bb37e2ff19))
+
+- **sankey_plot_template**: Address the comments from copilot
+ ([`07baeb9`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/07baeb9037fd6bc3cefee2b71abb1b2d777223d9))
+
+- **scripts**: Remove old performance testing script
+ ([`c0762c3`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/c0762c34c453e35bbe07e9a57180686cde1de7e8))
+
+- **select_values_template**: Fix pandas/numpy version compatibility issue
+ ([`cedb6d1`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/cedb6d163d1ffc7383961c0291a20a474f35843f))
+
+- **setup_analysis_template**: Fix setup_analysis_template function
+ ([`dddc33a`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/dddc33a78a0d6af0262eefd4b9250c0cfc19b77e))
+
+- **spatial_interaction_template**: Fix typo
+ ([`1a3d03d`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/1a3d03daa6e0630132c2596e149a74cdba0520a0))
+
+- **spatial_plot_temp**: Addrss copilot comments spatial_plot_template.py
+ ([`028a049`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/028a049bf5857cdb51eff4a76a4940dc50100872))
+
+- **subset_analysis_template**: Fix typo and enhance function
+ ([`aefcb29`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/aefcb294f3d3ab5059e76232c667b5c3a37963a1))
+
+- **summarize_dataframe_template**: Address comments from copilot
+ ([`30118f9`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/30118f9813cc8768401b2206ca1ba867a79175aa))
+
+- **template_utils**: Address review comments
+ ([`e9f0883`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/e9f088335c76c093aa75edc81b3bab33b53a30c8))
+
+- **template_utils**: Address review comments again
+ ([`c94b219`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/c94b219f30b551caeffa5074dda6173cf3a9ab8f))
+
+- **template_utils**: Use applymap instead of map for pandas compatibility
+ ([`bbfa2f6`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/bbfa2f6cc82aadaf67b904642ec3f0ff61b3a816))
+
+- **test_arcsinh_normalization_template**: Handle odd numbers with better list slicing
+ ([`afca7ff`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/afca7ff1eb153e3545a805a04be2df4acd283d15))
+
+- **test_manual_phenotyping_temp**: Address comments of copilot review for unit tests
+ ([`e4d61cf`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/e4d61cf3748e6eb9f3aca7cad9faf2b41b6ea652))
+
+- **test_performance**: Set the path to include spac
+ ([`9c4d606`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/9c4d606b9baf125dd5084125eb29e439b1930ab4))
+
+- **tsne_analysis_template**: Fixed typo
+ ([`7ae8e57`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/7ae8e5725326e68e9f2ce440be62089c06ad5f36))
+
+- **umap_transformation_template**: Return adata in place and fix comments of copilot
+ ([`9d24638`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/9d24638e5a235f3e128e72fc1e11f28f52bb1822))
+
+- **umap_tsne_pca_template**: Address the comments from copilot
+ ([`7fb9b2c`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/7fb9b2cd291e3aa3651e5b3d01240772330609ff))
+
+- **visualize_nearest_neighbor_template**: Fix typo
+ ([`52a4ee6`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/52a4ee6ef66f9ffaed59391bbe6d0fd4f003a816))
+
+- **visualize_ripley**: Add missing __init__.py for templates module
+ ([`ff9238c`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/ff9238c126ab5e524608b24c28a47bebc3ed487d))
+
+- **visualize_ripley**: Make plt.show() conditional based on show_plot parameter
+ ([`0e2747e`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/0e2747e8d05547c2e2dca4ad9e2b8ec730e24260))
+
+### Code Style
+
+- **qc-metrics**: Fix spelling typo in nFeature metric
+ ([`59675ca`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/59675cad6540787be3a8a8a300dcc6c7e398dec9))
+
+### Features
+
+- Add refactored galaxy tools
+ ([`4d2e3d7`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/4d2e3d722472e4a1808e936bd6267cc59e709a55))
+
+- Add spac arcsinh_norm interactive_spatial_plot galaxy tools
+ ([`67e3ec4`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/67e3ec45822572f147f785999dfa0c4121d01635))
+
+- Add SPAC boxplot Galaxy tool for Docker deployment
+ ([`9e3bea0`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/9e3bea0de6400b3a1031f831f8ced81f410b9007))
+
+- Add spac_load_csv_files galaxy tools
+ ([`d2526a7`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/d2526a79965acb886c0100a04385f5325ec1923b))
+
+- Add spac_setup_analysis galaxy tools
+ ([`bb6834b`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/bb6834bc2d292604f7499e09e9245384a3b0f694))
+
+- Add spac_zscore_normalization galaxy tools
+ ([`cbbcd9e`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/cbbcd9e47930cf9b258a8668af6ec81238ad09d9))
+
+- Refactor all templates and unit tests
+ ([`8005111`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/800511118ab5970754b4e09bf7017b423328da92))
+
+- Refactored all template run_from_json() functions to use centralized save_results from
+ template_utils - Added show_static_image toggle (default False) to relational_heatmap_template and
+ sankey_plot_template to prevent Plotly-to-PNG hang on Galaxy - Refactored all unit tests in
+ tests/templates/ using snowball approach: real data, real filesystem, no mocking - One test file
+ per template validating output file existence, naming conventions, and non-empty artifacts -
+ Updated posit_it_python_template to use centralized save_results
+
+Templates changed: 43 files in src/spac/templates/ Tests changed: 37 files in tests/templates/
+
+- **add_pin_color_rule_template**: Add add_pin_color_rule_template fnction and unit tests
+ ([`2477266`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/2477266b7de03e603b1dc9d48b9a53bed0af61ad))
+
+- **analysis_to_csv_template**: Add analysis_to_csv_template function and unit tests
+ ([`448a980`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/448a980c496dfa5295a33e4432649947da6e6af7))
+
+- **append_annotation_template**: Add append_annotation_template function and unit tests
+ ([`5e68e02`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/5e68e02faaa423f381d3d7321c1378f51e7f3c7f))
+
+- **arcsinh_normalization_template**: Add arcsinh_normalization_template function and unit tests
+ ([`ff6cce4`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/ff6cce42b602642a4bd2f211d1dbd1fe6c6fd65e))
+
+- **binary_to_categorical_annotation_template**: Add binary_to_categorical_annotation_template
+ function and unit tests
+ ([`8e500ec`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/8e500ecfe264b6125b6ace828003684e4b1b5cad))
+- **boxplot_template**: Add boxplot_template function and unit tests
+ ([`eb810ab`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/eb810ab4a532901817a5652ad59733af38b083fd))
-## v0.8.11 (2025-05-23)
+- **calculate_centroid_template**: Add calculate_centroid_template function and unit tests
+ ([`4fea9c3`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/4fea9c336dabc207d6fa66de8553d3fe89c9dd62))
+
+- **combine_annotations_template**: Add combine_annotations_template function and unit tests
+ ([`829a4bd`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/829a4bdbc95575079d9e24492f0e6bc2ea57475e))
+
+- **combine_dataframes_template**: Add combine_dataframes_template function and unit tests
+ ([`3e24237`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/3e24237c5a4e3f57e09c0396532200dcdf471990))
+
+- **downsample_cells_template**: Add downsample_cells_template function and unit tests
+ ([`47adf3e`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/47adf3e881f28d05b4627b1ec3d0954b403f634e))
+
+- **hierarchical_heatmap_template**: Add hierarchical_heatmap_template and unit tests
+ ([`67e5a80`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/67e5a802500459e01a918c7fe96d20ab45c373c2))
+
+- **hierarchical_heatmap_template**: Add hierarchical_heatmap_template function and unit tests
+ ([`6466e2f`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/6466e2f8daa4c95c49e0b6d0b0e5ec1460064d09))
+
+- **histogram_template**: Add histogram_template and unit tests
+ ([`3380427`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/3380427ecbd6d743aaedfb954d612905153079e4))
+
+- **interactive_spatial_plot_template**: Add interactive_spatial_plot_template function and unit
+ tests
+ ([`3f4336b`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/3f4336bc75f04ef1f4f8ca740a8b9b678a288da2))
+
+- **load_csv**: Add load_csv template function with configuration support
+ ([`5456658`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/5456658e8d7b0ab0475ead65bbe982ccefbacf47))
+
+- Add load_csv_files() to template_utils.py for loading and combining CSV files - Add
+ spell_out_special_characters() to handle biological marker names - Add
+ load_csv_files_with_config.py template wrapper for NIDAP compatibility - Add comprehensive unit
+ tests for both functions - Support column name cleaning, metadata mapping, and string column
+ enforcement
+
+- **manual_phenotyping_template**: Add manual_phenotyping_template function and unit tests
+ ([`941d641`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/941d641352c6eaee1c293b5ef98f1a9d92646c0c))
+
+- **nearest_neighbor_calculation_template**: Add nearest_neighbor_calculation_template function and
+ unit tests
+ ([`19cd477`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/19cd477fbb3af575ae2b90615b34e801fb4dc66c))
+
+- **neighborhood_profile_template**: Add neighborhood_profile_template function and unit tests
+ ([`824d131`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/824d131fed2216a5a291f61fc53d53e5e2c98c11))
+
+- **normalize_batch_template**: Add normalize_batch_template functionand unit tests
+ ([`a71e865`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/a71e8656033f79736330a1f902200a6c81c4b37e))
+
+- **phenograph_clustering_template**: Add phenograph_clustering_template function and unit tests
+ ([`ca29330`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/ca2933068c63c0a877970a496d4d9e2afe34d447))
+
+- **posit_it_python_template**: Add posit_it_python_template functionand unit tests
+ ([`bbb53f7`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/bbb53f712285a19f738aa117205e907c1aa0404d))
+
+- **qc-metrics**: Add common single cell quality control metrics
+ ([`994bac4`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/994bac4896650ecaeee7c81e2084872509a4b815))
+
+- **qc_summary_statistics**: Add summary statistics table for sc/spatial transcriptomics quality
+ control metrics
+ ([`a228e5e`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/a228e5eb33777d03bec96af826568545e44157fd))
+
+- **quantile_scaling_template**: Refactor nidap code, add quantile_scaling_template function and
+ unit tests
+ ([`542f985`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/542f985f4d1a2811f010b68da8dadffe8ac64220))
+
+- **relational_heatmap_template**: Add relational_heatmap_template function and unit tests
+ ([`c57075d`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/c57075d87995b7f35b0c9065a354363f7cceb623))
+
+- **rename_labels_template**: Add rename_labels_template function and unit tests
+ ([`96446d7`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/96446d70e392aa00e1ba77b2abddaa040d6aea7a))
+
+- **ripley_l_template**: Add ripley_l_template and unit tests
+ ([`c889259`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/c88925913faf4c99c906e9b55a073759395faa78))
+
+- **sankey_plot_template**: Add sankey_plot_template function and unit tests
+ ([`34b4eee`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/34b4eee45ae98fd0ce7c5f184611f4993125949f))
+
+- **select_values_template**: Add select_values_template function and unit tests
+ ([`e59c994`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/e59c994352f09ee1398f9cea6cc02041daa0cd03))
+
+- **setup_analysis_template**: Add setup_analysis_template function and unit tests
+ ([`1cfb39e`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/1cfb39ed56e30e805bf7bcf29399e8ce0f20333c))
+
+- **spatial_interaction_template**: Add spatial_interaction_template and unit tests
+ ([`a7b1349`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/a7b13494e86b14555aba6a6fbcc0c7c12c1441f1))
+
+- **spatial_plot_temp**: Add spatial_plot_template.py and unit tests
+ ([`0f26c08`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/0f26c08744f5611f0aa92a9a60b0ad20815e8d6e))
+
+- **subset_analysis_template**: Add subset_analysis_template function and unit tests
+ ([`1db00a8`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/1db00a88dfdfbbc68862b3aac99aea5332e2255c))
+
+- **summarize_annotation_statistics**: Add summarize_annotation_statistics template function and
+ unit tests
+ ([`34961bd`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/34961bde03bfc47211047ec58cbacc0814712e3c))
+
+- **summarize_dataframe_template**: Add summarize_dataframe_template function and unit tests
+ ([`06d8feb`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/06d8feb4d72d132efb6c4db04f6254a8bd69ca04))
+
+- **template_utils**: Add string_list_to_dictionary to template utils
+ ([`6ab7a9d`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/6ab7a9d91111c6fd1aa4a771bf85f8d07bd01b28))
+
+- **template_utils**: Add template_utils and unit tests
+ ([`b960684`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/b960684f3e1887330f91ad307cc36b467d68bea3))
+
+- **test_performance**: Add performance tests for boxplot/histogram
+ ([`862e523`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/862e523d08f40bb1e0aee53437fa06bdb533ac45))
+
+- **tsne_analysis_template**: Add tsne_analysis_template function and unit tests
+ ([`abda610`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/abda61091b2a94251eccbe2535efc300d79a7e73))
+
+- **umap_transformation_template**: Add umap_transformation_template function and unit tests
+ ([`e79fd78`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/e79fd7814a8728c9f9a529e90256ac44726d1571))
+
+- **umap_tsne_pca_template**: Add umap_tsne_pca_template function and unit tests
+ ([`d67f6c7`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/d67f6c7278e7c8622ca43832acf8b74ae4a4363e))
+
+- **utag_clustering_template**: Add utag_clustering_template and unit tests
+ ([`6da3985`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/6da39852b1d2ee8b3a1b4fda5c040055d6ae3cd4))
+
+- **utag_clustering_template**: Add utag_clustering_template and unit tests
+ ([`743fb10`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/743fb10f71dc015b8892a4b291d3a2a9069e89a2))
+
+- **visualize_nearest_neighbor_template**: Add visualize_nearest_neighbor_template function and unit
+ tests
+ ([`07ecdfa`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/07ecdfa15c3b5c4c7f65288811306bf68cef4962))
+
+- **visualize_ripley_template**: Add visualize_ripley_template and unit tests
+ ([`48608e2`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/48608e26ca16c1e89a573286553b370f2a3f508b))
+
+- **zscore_normalization_template**: Add zscore_normalization_template and unit tests
+ ([`b2d68c5`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/b2d68c5fb6cd1dfba38684d855f38aea98d56296))
+
+### Refactoring
+
+- Merge paper.bib and paper.md updates from address-reviewer-comments branch
+ ([`9ae3ef3`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/9ae3ef331290197d3fcc305fb88f9b2baac3bccc))
+
+- Streamline galaxy tools implementation
+ ([`009d010`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/009d01000c6a80dad9f6dc4a508b334a19796b3b))
+
+- **get_qc_summary_table**: Adjust code style to adhere to spac guidlines closer
+ ([`5e03dc2`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/5e03dc23b5deaadb853fd26f327f643d7e19ad12))
+
+- **get_qc_summary_table**: Refactor quality control summary statistics function and tests based on
+ the PR review
+ ([`d5061c4`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/d5061c43d31c20338576c58d207619f9ae789143))
+
+### Testing
+
+- **perforamnce**: Skip performance tests by default
+ ([`fc664ad`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/fc664ad96b3375a3255f9bb36e51d0e4a505daba))
+
+
+## v0.9.0 (2025-05-23)
### Bug Fixes
@@ -82,6 +408,9 @@
### Continuous Integration
+- **version**: Automatic development release
+ ([`3e126e9`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/3e126e9711be5d485010ced7460f99a180c8089e))
+
- **version**: Automatic development release
([`195761d`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/195761de5563e80a60a7ea43ecb73e6105dc7d1d))
@@ -173,6 +502,11 @@
- **interactive_spatial_plot**: Used partial for better readability
([`60283bd`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/60283bd7671d2f2a65b52d77f4792b7461a8e407))
+### Step
+
+- Bumping minor version
+ ([`e333641`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/e3336417a09b4ef26e71bde1b54da840f0980ab9))
+
### Testing
- **comments**: Add extensive comments for complex data set generation in utag tests
diff --git a/galaxy_tools/README.md b/galaxy_tools/README.md
new file mode 100644
index 00000000..c615436a
--- /dev/null
+++ b/galaxy_tools/README.md
@@ -0,0 +1,12 @@
+# SPAC Galaxy Tools
+
+ ## Requirements
+ - Galaxy instance with Docker enabled
+ - Docker image: nciccbr/spac:v1
+
+ ## Installation
+ 1. Pull Docker image: `docker pull nciccbr/spac:v1`
+ 2. Copy tool directory to Galaxy's tools folder
+ 3. Add to tool_conf.xml:
+```xml
+
\ No newline at end of file
diff --git a/galaxy_tools/refactor_tools/nidap_to_galaxy_synthesizer.py b/galaxy_tools/refactor_tools/nidap_to_galaxy_synthesizer.py
new file mode 100644
index 00000000..d695dc7d
--- /dev/null
+++ b/galaxy_tools/refactor_tools/nidap_to_galaxy_synthesizer.py
@@ -0,0 +1,569 @@
+#!/usr/bin/env python3
+"""
+Generalized NIDAP to Galaxy synthesizer - Production Version v11
+- No hardcoded tool-specific logic
+- Blueprint-driven for all tools
+- Handles multiple files/columns via blueprint flags
+- FIXED: Use 'binary' instead of 'pickle' for Galaxy compatibility
+- FIXED: Use 'set -eu' instead of 'set -euo pipefail' for broader shell compatibility
+- FIXED: Pass outputs spec as environment variable to avoid encoding issues
+- FIXED: Method signature for build_command_section
+"""
+
+import argparse
+import json
+import re
+import shutil
+from pathlib import Path
+from typing import Dict, List, Tuple
+
+class GeneralizedNIDAPToGalaxySynthesizer:
+
+ def __init__(self, docker_image: str = "nciccbr/spac:v1"):
+ self.docker_image = docker_image
+ self.galaxy_profile = "24.2"
+ self.wrapper_script = Path('run_spac_template.sh')
+ self.runner_script = Path('spac_galaxy_runner.py')
+
+ def slugify(self, name: str) -> str:
+ """Convert name to valid Galaxy tool ID component"""
+ s = re.sub(r'\[.*?\]', '', name).strip()
+ s = s.lower()
+ s = re.sub(r'\s+', '_', s)
+ s = re.sub(r'[^a-z0-9_]+', '', s)
+ s = re.sub(r'_+', '_', s)
+ return s.strip('_')
+
+ def escape_xml(self, text: str, is_attribute: bool = True) -> str:
+ """Escape XML special characters"""
+ if text is None:
+ return ""
+ text = str(text)
+ text = text.replace('&', '&')
+ text = text.replace('<', '<')
+ text = text.replace('>', '>')
+ if is_attribute:
+ text = text.replace('"', '"')
+ text = text.replace("'", ''')
+ return text
+
+ def clean_description(self, description: str) -> str:
+ """Clean NIDAP-specific content from descriptions"""
+ if not description:
+ return ""
+
+ desc = str(description).replace('\r\n', '\n').replace('\r', '\n')
+ desc = re.sub(r'\[DUET\s*Documentation\]\([^)]+\)', '', desc, flags=re.IGNORECASE)
+ desc = re.sub(r'Please refer to\s+(?:,?\s*and\s*)+', '', desc, flags=re.IGNORECASE)
+ desc = re.sub(r'\\(?=\s*(?:\n|$))', '', desc)
+ desc = re.sub(r'[ \t]{2,}', ' ', desc)
+ desc = re.sub(r'\n{3,}', '\n\n', desc)
+
+ return desc.strip()
+
+ def determine_input_format(self, dataset: Dict, tool_name: str) -> str:
+ """
+ Determine the correct format for an input dataset.
+ Simple mapping based on dataType field.
+ Uses 'binary' instead of 'pickle' for Galaxy compatibility.
+ """
+ data_type = dataset.get('dataType', '').upper()
+
+ # Handle comma-separated types (e.g., "CSV, Tabular")
+ data_types = [dt.strip() for dt in data_type.split(',')]
+
+ # Check for CSV/Tabular types
+ if any(dt in ['CSV', 'TABULAR', 'TSV', 'TXT'] for dt in data_types):
+ return 'csv,tabular,tsv,txt'
+
+ # DataFrame types
+ if any('DATAFRAME' in dt for dt in data_types):
+ return 'csv,tabular,tsv,txt'
+
+ # AnnData/H5AD types
+ if any(dt in ['ANNDATA', 'H5AD', 'HDF5'] for dt in data_types):
+ return 'h5ad,h5,hdf5'
+
+ # Pickle - use 'binary' for Galaxy compatibility
+ if any('PICKLE' in dt for dt in data_types):
+ return 'binary'
+
+ # PYTHON_TRANSFORM_INPUT - default to binary (analysis objects)
+ if 'PYTHON_TRANSFORM_INPUT' in data_type:
+ return 'h5ad,binary' # Use binary instead of pickle
+
+ # Default fallback
+ return 'h5ad,binary' # Use binary instead of pickle
+
+ def build_inputs_section(self, blueprint: Dict, tool_name: str) -> Tuple[List[str], List[str]]:
+ """Build inputs from blueprint - generalized for all tools"""
+ lines = []
+ multiple_file_inputs = [] # Track which inputs accept multiple files
+
+ # Handle input datasets
+ for dataset in blueprint.get('inputDatasets', []):
+ name = dataset.get('key', 'input_data')
+ label = self.escape_xml(dataset.get('displayName', 'Input Data'))
+ desc = self.escape_xml(self.clean_description(dataset.get('description', '')))
+
+ # Determine format - now simpler with direct dataType mapping
+ formats = self.determine_input_format(dataset, tool_name)
+
+ # Check if multiple files allowed (from blueprint)
+ is_multiple = dataset.get('isMultiple', False)
+
+ if is_multiple:
+ multiple_file_inputs.append(name)
+ lines.append(
+ f' '
+ )
+ else:
+ lines.append(
+ f' '
+ )
+
+ # Handle explicit column definitions from 'columns' schema
+ for col in blueprint.get('columns', []):
+ key = col.get('key')
+ if not key:
+ continue
+
+ label = self.escape_xml(col.get('displayName', key))
+ desc = self.escape_xml(col.get('description', ''))
+ # isMulti can be True, False, or None (None means False)
+ is_multi = col.get('isMulti') == True
+
+ # Use text inputs for column names
+ if is_multi:
+ lines.append(
+ f' '
+ )
+ else:
+ lines.append(
+ f' '
+ )
+
+ # Handle regular parameters
+ for param in blueprint.get('parameters', []):
+ key = param.get('key')
+ if not key:
+ continue
+
+ label = self.escape_xml(param.get('displayName', key))
+ desc = self.escape_xml(self.clean_description(param.get('description', '')))
+ param_type = param.get('paramType', 'STRING').upper()
+ default = param.get('defaultValue', '')
+ is_optional = param.get('isOptional', False)
+
+ # Add optional attribute if needed
+ optional_attr = ' optional="true"' if is_optional else ''
+
+ if param_type == 'BOOLEAN':
+ checked = 'true' if str(default).strip().lower() == 'true' else 'false'
+ lines.append(
+ f' '
+ )
+
+ elif param_type == 'INTEGER':
+ lines.append(
+ f' '
+ )
+
+ elif param_type in ['NUMBER', 'FLOAT']:
+ lines.append(
+ f' '
+ )
+
+ elif param_type == 'SELECT':
+ options = param.get('paramValues', [])
+ lines.append(f' ')
+ for opt in options:
+ selected = ' selected="true"' if str(opt) == str(default) else ''
+ opt_escaped = self.escape_xml(str(opt))
+ lines.append(f' ')
+ lines.append(' ')
+
+ elif param_type == 'LIST':
+ # Handle LIST type parameters - convert list to simple string
+ if isinstance(default, list):
+ # Filter out empty strings and join
+ filtered = [str(x) for x in default if x and str(x).strip()]
+ default = ', '.join(filtered) if filtered else ''
+ elif default == '[""]' or default == "['']" or default == '[]':
+ # Handle common empty list representations
+ default = ''
+ lines.append(
+ f' '
+ )
+
+ else: # STRING
+ lines.append(
+ f' '
+ )
+
+ return lines, multiple_file_inputs
+
+ def build_outputs_section(self, outputs: Dict) -> List[str]:
+ """Build outputs section based on blueprint specification"""
+ lines = []
+
+ for output_type, output_path in outputs.items():
+
+ # Determine if single file or collection
+ is_collection = (output_path.endswith('_folder') or
+ output_path.endswith('_dir'))
+
+ if not is_collection:
+ # Single file output
+ if output_type == 'analysis':
+ if '.h5ad' in output_path:
+ fmt = 'h5ad'
+ elif '.pickle' in output_path or '.pkl' in output_path:
+ fmt = 'binary' # Use binary instead of pickle
+ else:
+ fmt = 'binary'
+
+ lines.append(
+ f' '
+ )
+
+ elif output_type == 'DataFrames' and (output_path.endswith('.csv') or output_path.endswith('.tsv')):
+ # Single DataFrame file output
+ fmt = 'csv' if output_path.endswith('.csv') else 'tabular'
+ lines.append(
+ f' '
+ )
+
+ elif output_type == 'figure':
+ ext = output_path.split('.')[-1] if '.' in output_path else 'png'
+ lines.append(
+ f' '
+ )
+
+ elif output_type == 'html':
+ lines.append(
+ f' '
+ )
+
+ else:
+ # Collection outputs
+ if output_type == 'DataFrames':
+ lines.append(
+ ' '
+ )
+ lines.append(f' ')
+ lines.append(f' ')
+ lines.append(' ')
+
+ elif output_type == 'figures':
+ lines.append(
+ ' '
+ )
+ lines.append(f' ')
+ lines.append(f' ')
+ lines.append(f' ')
+ lines.append(' ')
+
+ elif output_type == 'html':
+ lines.append(
+ ' '
+ )
+ lines.append(f' ')
+ lines.append(' ')
+
+ # Debug outputs
+ lines.append(' ')
+ lines.append(' ')
+ lines.append(' ')
+
+ return lines
+
+ def build_command_section(self, tool_name: str, blueprint: Dict, multiple_file_inputs: List[str], outputs_spec: Dict) -> str:
+ """Build command section - generalized for all tools
+ FIXED: Use 'set -eu' instead of 'set -euo pipefail' for broader shell compatibility
+ FIXED: Pass outputs spec as environment variable to avoid encoding issues
+ """
+
+ # Convert outputs spec to JSON string
+ outputs_json = json.dumps(outputs_spec)
+
+ # Check if any inputs accept multiple files
+ has_multiple_files = len(multiple_file_inputs) > 0
+
+ if has_multiple_files:
+ # Generate file copying logic for each multiple input
+ copy_sections = []
+ for input_name in multiple_file_inputs:
+ # Use double curly braces to escape them in f-strings
+ copy_sections.append(f'''
+ ## Create directory for {input_name}
+ mkdir -p {input_name}_dir &&
+
+ ## Copy files to directory with original names
+ #for $i, $file in enumerate(${input_name})
+ cp '${{file}}' '{input_name}_dir/${{file.name}}' &&
+ #end for''')
+
+ copy_logic = ''.join(copy_sections)
+
+ command_section = f''' &2 &&
+ cat "$params_json" >&2 &&
+ echo "==================" >&2 &&
+
+ ## Save snapshot
+ cp "$params_json" params_snapshot.json &&
+
+ ## Run wrapper
+ bash $__tool_directory__/run_spac_template.sh "$params_json" "{tool_name}"
+ ]]>'''
+ else:
+ # Standard command for single-file inputs
+ command_section = f''' &2 &&
+ cat "$params_json" >&2 &&
+ echo "==================" >&2 &&
+
+ ## Save snapshot
+ cp "$params_json" params_snapshot.json &&
+
+ ## Run wrapper
+ bash $__tool_directory__/run_spac_template.sh "$params_json" "{tool_name}"
+ ]]>'''
+
+ return command_section
+
+ def get_template_filename(self, title: str, tool_name: str) -> str:
+ """Get the correct template filename"""
+ # Check if there's a custom mapping in the blueprint
+ # Otherwise use standard naming convention
+ if title == 'Load CSV Files' or tool_name == 'load_csv_files':
+ return 'load_csv_files_with_config.py'
+ else:
+ return f'{tool_name}_template.py'
+
+ def generate_tool(self, json_path: Path, output_dir: Path) -> Dict:
+ """Generate Galaxy tool from NIDAP JSON blueprint"""
+
+ with open(json_path, 'r') as f:
+ blueprint = json.load(f)
+
+ title = blueprint.get('title', 'Unknown Tool')
+ clean_title = re.sub(r'\[.*?\]', '', title).strip()
+
+ tool_name = self.slugify(clean_title)
+ tool_id = f'spac_{tool_name}'
+
+ # Get outputs from blueprint
+ outputs_spec = blueprint.get('outputs', {})
+ if not outputs_spec:
+ outputs_spec = {'analysis': 'transform_output.pickle'}
+
+ # Get template filename (could be in blueprint too)
+ template_filename = blueprint.get('templateFilename',
+ self.get_template_filename(clean_title, tool_name))
+
+ # Build sections - pass tool_name and outputs_spec for context
+ inputs_lines, multiple_file_inputs = self.build_inputs_section(blueprint, tool_name)
+ outputs_lines = self.build_outputs_section(outputs_spec)
+ command_section = self.build_command_section(tool_name, blueprint, multiple_file_inputs, outputs_spec)
+
+ # Generate description
+ full_desc = self.clean_description(blueprint.get('description', ''))
+ short_desc = full_desc.split('\n')[0] if full_desc else ''
+ if len(short_desc) > 100:
+ short_desc = short_desc[:97] + '...'
+
+ # Build help section
+ help_sections = []
+ help_sections.append(f'**{title}**\n')
+ help_sections.append(f'{full_desc}\n')
+ help_sections.append('This tool is part of the SPAC (SPAtial single-Cell analysis) toolkit.\n')
+
+ # Add usage notes based on input types
+ if blueprint.get('columns'):
+ help_sections.append('**Column Parameters:** Enter column names as text. Use comma-separation or one per line for multiple columns.')
+
+ if any(p.get('paramType') == 'LIST' for p in blueprint.get('parameters', [])):
+ help_sections.append('**List Parameters:** Use comma-separated values or one per line.')
+ help_sections.append('**Special Values:** Enter "All" to select all items.')
+
+ if multiple_file_inputs:
+ help_sections.append(f'**Multiple File Inputs:** This tool accepts multiple files for: {", ".join(multiple_file_inputs)}')
+
+ help_text = '\n'.join(help_sections)
+
+ # Generate complete XML
+ xml_content = f'''
+ {self.escape_xml(short_desc, False)}
+
+
+ {self.docker_image}
+
+
+
+ python3
+
+
+{command_section}
+
+
+
+
+
+
+{chr(10).join(inputs_lines)}
+
+
+
+{chr(10).join(outputs_lines)}
+
+
+
+
+
+
+@misc{{spac_toolkit,
+ author = {{FNLCR DMAP Team}},
+ title = {{SPAC: SPAtial single-Cell analysis}},
+ year = {{2024}},
+ url = {{https://github.com/FNLCR-DMAP/SCSAWorkflow}}
+}}
+
+
+'''
+
+ # Write files
+ tool_dir = output_dir / tool_id
+ tool_dir.mkdir(parents=True, exist_ok=True)
+
+ xml_path = tool_dir / f'{tool_id}.xml'
+ with open(xml_path, 'w') as f:
+ f.write(xml_content)
+
+ # Copy wrapper script
+ if self.wrapper_script.exists():
+ shutil.copy2(self.wrapper_script, tool_dir / 'run_spac_template.sh')
+
+ # Copy runner script
+ if self.runner_script.exists():
+ shutil.copy2(self.runner_script, tool_dir / 'spac_galaxy_runner.py')
+ else:
+ print(f" Warning: spac_galaxy_runner.py not found in current directory")
+
+ return {
+ 'tool_id': tool_id,
+ 'tool_name': title,
+ 'xml_path': xml_path,
+ 'tool_dir': tool_dir,
+ 'template': template_filename,
+ 'outputs': outputs_spec
+ }
+
+def main():
+ parser = argparse.ArgumentParser(
+ description='Convert NIDAP templates to Galaxy tools - Generalized Version'
+ )
+ parser.add_argument('json_input', help='JSON file or directory')
+ parser.add_argument('-o', '--output-dir', default='galaxy_tools')
+ parser.add_argument('--docker-image', default='nciccbr/spac:v1')
+
+ args = parser.parse_args()
+
+ synthesizer = GeneralizedNIDAPToGalaxySynthesizer(
+ docker_image=args.docker_image
+ )
+
+ json_input = Path(args.json_input)
+ if json_input.is_file():
+ json_files = [json_input]
+ elif json_input.is_dir():
+ json_files = sorted(json_input.glob('*.json'))
+ else:
+ print(f"Error: {json_input} not found")
+ return 1
+
+ print(f"Processing {len(json_files)} files")
+ print(f"Docker image: {args.docker_image}")
+
+ output_dir = Path(args.output_dir)
+ output_dir.mkdir(parents=True, exist_ok=True)
+
+ successful = []
+ failed = []
+
+ for json_file in json_files:
+ print(f"\nProcessing: {json_file.name}")
+ try:
+ result = synthesizer.generate_tool(json_file, output_dir)
+ successful.append(result)
+ print(f" ✔ Created: {result['tool_id']}")
+ print(f" Template: {result['template']}")
+ print(f" Outputs: {list(result['outputs'].keys())}")
+ except Exception as e:
+ failed.append(json_file.name)
+ print(f" ✗ Failed: {e}")
+ import traceback
+ traceback.print_exc()
+
+ print(f"\n{'='*60}")
+ print(f"Summary: {len(successful)} successful, {len(failed)} failed")
+
+ if successful:
+ snippet_path = output_dir / 'tool_conf_snippet.xml'
+ with open(snippet_path, 'w') as f:
+ f.write('\n')
+
+ print(f"\nGenerated tool configuration snippet: {snippet_path}")
+
+ return 0 if not failed else 1
+
+if __name__ == '__main__':
+ exit(main())
\ No newline at end of file
diff --git a/galaxy_tools/refactor_tools/run_spac_template.sh b/galaxy_tools/refactor_tools/run_spac_template.sh
new file mode 100644
index 00000000..3f2a7a3e
--- /dev/null
+++ b/galaxy_tools/refactor_tools/run_spac_template.sh
@@ -0,0 +1,27 @@
+#!/usr/bin/env bash
+# run_spac_template.sh - Universal wrapper for SPAC Galaxy tools
+set -eu
+
+PARAMS_JSON="${1:?Missing params.json path}"
+TEMPLATE_NAME="${2:?Missing template name}"
+
+# Get the directory where this script is located (the tool directory)
+SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
+
+# Look for spac_galaxy_runner.py in multiple locations
+if [ -f "$SCRIPT_DIR/spac_galaxy_runner.py" ]; then
+ # If it's in the same directory as this script
+ RUNNER_PATH="$SCRIPT_DIR/spac_galaxy_runner.py"
+elif [ -f "$__tool_directory__/spac_galaxy_runner.py" ]; then
+ # If Galaxy provides tool directory
+ RUNNER_PATH="$__tool_directory__/spac_galaxy_runner.py"
+else
+ # Fallback to trying the module approach
+ echo "Warning: spac_galaxy_runner.py not found locally, trying as module" >&2
+ python3 -m spac_galaxy_runner "$PARAMS_JSON" "$TEMPLATE_NAME"
+ exit $?
+fi
+
+# Run the runner script directly
+echo "Running: python3 $RUNNER_PATH $PARAMS_JSON $TEMPLATE_NAME" >&2
+python3 "$RUNNER_PATH" "$PARAMS_JSON" "$TEMPLATE_NAME"
\ No newline at end of file
diff --git a/galaxy_tools/refactor_tools/spac_arcsinh_normalization.xml b/galaxy_tools/refactor_tools/spac_arcsinh_normalization.xml
new file mode 100644
index 00000000..69a183b7
--- /dev/null
+++ b/galaxy_tools/refactor_tools/spac_arcsinh_normalization.xml
@@ -0,0 +1,69 @@
+
+ Normalize features either by a user-defined co-factor or a determined percentile, allowing for ef...
+
+
+ nciccbr/spac:v1
+
+
+
+ python3
+
+
+ &2 &&
+ cat "$params_json" >&2 &&
+ echo "==================" >&2 &&
+
+ ## Save snapshot
+ cp "$params_json" params_snapshot.json &&
+
+ ## Run wrapper
+ bash $__tool_directory__/run_spac_template.sh "$params_json" "arcsinh_normalization"
+ ]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+@misc{spac_toolkit,
+ author = {FNLCR DMAP Team},
+ title = {SPAC: SPAtial single-Cell analysis},
+ year = {2024},
+ url = {https://github.com/FNLCR-DMAP/SCSAWorkflow}
+}
+
+
+
\ No newline at end of file
diff --git a/galaxy_tools/refactor_tools/spac_boxplot.xml b/galaxy_tools/refactor_tools/spac_boxplot.xml
new file mode 100644
index 00000000..97c9ef88
--- /dev/null
+++ b/galaxy_tools/refactor_tools/spac_boxplot.xml
@@ -0,0 +1,85 @@
+
+ Create a boxplot visualization of the features in the analysis dataset.
+
+
+ nciccbr/spac:v1
+
+
+
+ python3
+
+
+ &2 &&
+ cat "$params_json" >&2 &&
+ echo "==================" >&2 &&
+
+ ## Save snapshot
+ cp "$params_json" params_snapshot.json &&
+
+ ## Run wrapper
+ bash $__tool_directory__/run_spac_template.sh "$params_json" "boxplot"
+ ]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+@misc{spac_toolkit,
+ author = {FNLCR DMAP Team},
+ title = {SPAC: SPAtial single-Cell analysis},
+ year = {2024},
+ url = {https://github.com/FNLCR-DMAP/SCSAWorkflow}
+}
+
+
+
\ No newline at end of file
diff --git a/galaxy_tools/refactor_tools/spac_galaxy_runner.py b/galaxy_tools/refactor_tools/spac_galaxy_runner.py
new file mode 100644
index 00000000..d8535936
--- /dev/null
+++ b/galaxy_tools/refactor_tools/spac_galaxy_runner.py
@@ -0,0 +1,515 @@
+#!/usr/bin/env python3
+"""
+spac_galaxy_runner.py - Hybrid version combining refactored structure with robust parameter handling
+Incorporates critical fixes from original wrapper for parameter processing
+"""
+
+import json
+import os
+import sys
+import subprocess
+import shutil
+from pathlib import Path
+import re
+
+def main():
+ """Main entry point for SPAC Galaxy runner"""
+ if len(sys.argv) != 3:
+ print("Usage: spac_galaxy_runner.py ")
+ sys.exit(1)
+
+ params_path = sys.argv[1]
+ template_name = sys.argv[2]
+
+ print(f"=== SPAC Galaxy Runner v2.0 (Hybrid) ===")
+ print(f"Template: {template_name}")
+ print(f"Parameters: {params_path}")
+
+ # Load parameters
+ with open(params_path) as f:
+ params = json.load(f)
+
+ # Extract outputs specification from environment variable
+ outputs_spec_env = os.environ.get('GALAXY_OUTPUTS_SPEC', '')
+ if outputs_spec_env:
+ try:
+ outputs = json.loads(outputs_spec_env)
+ except json.JSONDecodeError:
+ print(f"WARNING: Could not parse GALAXY_OUTPUTS_SPEC: {outputs_spec_env}")
+ outputs = determine_default_outputs(template_name)
+ else:
+ # Fallback: try to get from params
+ outputs = params.pop('outputs', {})
+ if isinstance(outputs, str):
+ try:
+ outputs = json.loads(unsanitize_galaxy_params(outputs))
+ except json.JSONDecodeError:
+ print(f"WARNING: Could not parse outputs: {outputs}")
+ outputs = determine_default_outputs(template_name)
+
+ print(f"Outputs specification: {outputs}")
+
+ # CRITICAL: Unsanitize and normalize parameters (from original)
+ params = process_galaxy_parameters(params, template_name)
+
+ # Handle multiple file inputs that were copied to directories by Galaxy
+ handle_multiple_file_inputs(params)
+
+ # Create output directories
+ create_output_directories(outputs)
+
+ # Add output paths to params - critical for templates that save results
+ params['save_results'] = True
+
+ if 'analysis' in outputs:
+ params['output_path'] = outputs['analysis']
+ params['Output_Path'] = outputs['analysis']
+ params['Output_File'] = outputs['analysis']
+
+ if 'DataFrames' in outputs:
+ df_path = outputs['DataFrames']
+ # Check if it's a single file or a directory
+ if df_path.endswith('.csv') or df_path.endswith('.tsv'):
+ # Single file output (like Load CSV Files)
+ params['output_file'] = df_path
+ params['Output_File'] = df_path
+ print(f" Set output_file to: {df_path}")
+ else:
+ # Directory for multiple files (like boxplot)
+ params['output_dir'] = df_path
+ params['Export_Dir'] = df_path
+ params['Output_File'] = os.path.join(df_path, f'{template_name}_output.csv')
+ print(f" Set output_dir to: {df_path}")
+
+ if 'figures' in outputs:
+ fig_dir = outputs['figures']
+ params['figure_dir'] = fig_dir
+ params['Figure_Dir'] = fig_dir
+ params['Figure_File'] = os.path.join(fig_dir, f'{template_name}.png')
+ print(f" Set figure_dir to: {fig_dir}")
+
+ if 'html' in outputs:
+ html_dir = outputs['html']
+ params['html_dir'] = html_dir
+ params['Output_File'] = os.path.join(html_dir, f'{template_name}.html')
+ print(f" Set html_dir to: {html_dir}")
+
+ # Save config for debugging (without outputs key)
+ with open('config_used.json', 'w') as f:
+ config_data = {k: v for k, v in params.items() if k not in ['outputs']}
+ json.dump(config_data, f, indent=2)
+
+ # Save params for template execution
+ with open('params_exec.json', 'w') as f:
+ json.dump(params, f, indent=2)
+
+ # Find and execute template
+ template_path = find_template(template_name)
+ if not template_path:
+ print(f"ERROR: Template for {template_name} not found")
+ sys.exit(1)
+
+ # Run template
+ exit_code = execute_template(template_path, 'params_exec.json')
+ if exit_code != 0:
+ print(f"ERROR: Template failed with exit code {exit_code}")
+ sys.exit(exit_code)
+
+ # Handle output mapping for specific tools
+ handle_output_mapping(template_name, outputs)
+
+ # Verify outputs
+ verify_outputs(outputs)
+
+ # Save snapshot for debugging
+ with open('params_snapshot.json', 'w') as f:
+ json.dump(params, f, indent=2)
+
+ print("=== Execution Complete ===")
+ sys.exit(0)
+
+def unsanitize_galaxy_params(s: str) -> str:
+ """Remove Galaxy's parameter sanitization tokens"""
+ if not isinstance(s, str):
+ return s
+ replacements = {
+ '__ob__': '[', '__cb__': ']',
+ '__oc__': '{', '__cc__': '}',
+ '__dq__': '"', '__sq__': "'",
+ '__gt__': '>', '__lt__': '<',
+ '__cn__': '\n', '__cr__': '\r',
+ '__tc__': '\t', '__pd__': '#',
+ '__at__': '@', '__cm__': ','
+ }
+ for token, char in replacements.items():
+ s = s.replace(token, char)
+ return s
+
+def process_galaxy_parameters(params: dict, template_name: str) -> dict:
+ """Process Galaxy parameters - unsanitize and normalize (from original wrapper)"""
+ print("\n=== Processing Galaxy Parameters ===")
+
+ # Step 1: Recursively unsanitize all parameters
+ def recursive_unsanitize(obj):
+ if isinstance(obj, str):
+ unsanitized = unsanitize_galaxy_params(obj).strip()
+ # Try to parse JSON strings
+ if (unsanitized.startswith('[') and unsanitized.endswith(']')) or \
+ (unsanitized.startswith('{') and unsanitized.endswith('}')):
+ try:
+ return json.loads(unsanitized)
+ except:
+ return unsanitized
+ return unsanitized
+ elif isinstance(obj, dict):
+ return {k: recursive_unsanitize(v) for k, v in obj.items()}
+ elif isinstance(obj, list):
+ return [recursive_unsanitize(item) for item in obj]
+ return obj
+
+ params = recursive_unsanitize(params)
+
+ # Step 2: Handle specific parameter normalizations
+
+ # Special handling for String_Columns in load_csv templates
+ if 'load_csv' in template_name and 'String_Columns' in params:
+ value = params['String_Columns']
+ if not isinstance(value, list):
+ if value in [None, "", "[]", "__ob____cb__", []]:
+ params['String_Columns'] = []
+ elif isinstance(value, str):
+ s = value.strip()
+ if s and s != '[]':
+ if ',' in s:
+ params['String_Columns'] = [item.strip() for item in s.split(',') if item.strip()]
+ else:
+ params['String_Columns'] = [s] if s else []
+ else:
+ params['String_Columns'] = []
+ else:
+ params['String_Columns'] = []
+ print(f" Normalized String_Columns: {params['String_Columns']}")
+
+ # Handle Feature_Regex specially - MUST BE AFTER Features_to_Analyze processing
+ if 'Feature_Regex' in params:
+ value = params['Feature_Regex']
+ if value in [[], [""], "__ob____cb__", "[]", "", None]:
+ params['Feature_Regex'] = []
+ print(" Cleared empty Feature_Regex parameter")
+ elif isinstance(value, list) and value:
+ # Join regex patterns with |
+ params['Feature_Regex'] = "|".join(str(v) for v in value if v)
+ print(f" Joined Feature_Regex list: {params['Feature_Regex']}")
+
+ # Handle Features_to_Analyze - split if it's a single string with spaces or commas
+ if 'Features_to_Analyze' in params:
+ value = params['Features_to_Analyze']
+ if isinstance(value, str):
+ # Check for comma-separated or space-separated features
+ if ',' in value:
+ params['Features_to_Analyze'] = [item.strip() for item in value.split(',') if item.strip()]
+ print(f" Split Features_to_Analyze on comma: {value} -> {params['Features_to_Analyze']}")
+ elif ' ' in value:
+ # This is likely multiple features in a single string
+ params['Features_to_Analyze'] = [item.strip() for item in value.split() if item.strip()]
+ print(f" Split Features_to_Analyze on space: {value} -> {params['Features_to_Analyze']}")
+ elif value:
+ params['Features_to_Analyze'] = [value]
+ print(f" Wrapped Features_to_Analyze in list: {params['Features_to_Analyze']}")
+
+ # Handle Feature_s_to_Plot for boxplot
+ if 'Feature_s_to_Plot' in params:
+ value = params['Feature_s_to_Plot']
+ # Check if it's "All"
+ if value == "All" or value == ["All"]:
+ params['Feature_s_to_Plot'] = ["All"]
+ print(" Set Feature_s_to_Plot to ['All']")
+ elif isinstance(value, str) and value not in ["", "[]"]:
+ params['Feature_s_to_Plot'] = [value]
+ print(f" Wrapped Feature_s_to_Plot in list: {params['Feature_s_to_Plot']}")
+
+ # Normalize list parameters
+ list_params = ['Annotation_s_', 'Features', 'Markers', 'Markers_to_Plot',
+ 'Phenotypes', 'Binary_Phenotypes', 'Features_to_Analyze']
+
+ for key in list_params:
+ if key in params:
+ value = params[key]
+ if not isinstance(value, list):
+ if value in [None, ""]:
+ continue
+ elif isinstance(value, str):
+ if ',' in value:
+ params[key] = [item.strip() for item in value.split(',') if item.strip()]
+ print(f" Split {key} on comma: {params[key]}")
+ else:
+ params[key] = [value]
+ print(f" Wrapped {key} in list: {params[key]}")
+
+ # Fix single-element lists for coordinate columns
+ coordinate_keys = ['X_Coordinate_Column', 'Y_Coordinate_Column',
+ 'X_centroid', 'Y_centroid', 'Primary_Annotation',
+ 'Secondary_Annotation', 'Annotation']
+
+ for key in coordinate_keys:
+ if key in params:
+ value = params[key]
+ if isinstance(value, list) and len(value) == 1:
+ params[key] = value[0]
+ print(f" Extracted single value from {key}: {params[key]}")
+
+ return params
+
+def determine_default_outputs(template_name: str) -> dict:
+ """Determine default outputs based on template name"""
+ if 'boxplot' in template_name or 'plot' in template_name or 'histogram' in template_name:
+ return {'DataFrames': 'dataframe_folder', 'figures': 'figure_folder'}
+ elif 'load_csv' in template_name:
+ # Load CSV Files produces a single CSV file, not a folder
+ return {'DataFrames': 'combined_data.csv'}
+ elif 'interactive' in template_name:
+ return {'html': 'html_folder'}
+ else:
+ return {'analysis': 'transform_output.pickle'}
+
+def handle_multiple_file_inputs(params):
+ """
+ Handle multiple file inputs that Galaxy copies to directories.
+ Galaxy copies multiple files to xxx_dir directories.
+ """
+ print("\n=== Handling Multiple File Inputs ===")
+
+ # Check for directory inputs that indicate multiple files
+ for key in list(params.keys()):
+ # Check if Galaxy created a _dir directory for this input
+ dir_name = f"{key}_dir"
+ if os.path.isdir(dir_name):
+ params[key] = dir_name
+ print(f" Updated {key} -> {dir_name}")
+ # List files in the directory
+ files = os.listdir(dir_name)
+ print(f" Contains {len(files)} files")
+ for f in files[:3]:
+ print(f" - {f}")
+ if len(files) > 3:
+ print(f" ... and {len(files)-3} more")
+
+ # Special case for CSV_Files (Load CSV Files tool)
+ if 'CSV_Files' in params:
+ # Check for csv_input_dir created by Galaxy command
+ if os.path.exists('csv_input_dir') and os.path.isdir('csv_input_dir'):
+ params['CSV_Files'] = 'csv_input_dir'
+ print(f" Using csv_input_dir for CSV_Files")
+ elif os.path.isdir('CSV_Files_dir'):
+ params['CSV_Files'] = 'CSV_Files_dir'
+ print(f" Updated CSV_Files -> CSV_Files_dir")
+ elif isinstance(params['CSV_Files'], str) and os.path.isfile(params['CSV_Files']):
+ # Single file - get its directory
+ params['CSV_Files'] = os.path.dirname(params['CSV_Files'])
+ print(f" Using directory of CSV file: {params['CSV_Files']}")
+
+def create_output_directories(outputs):
+ """Create directories for collection outputs"""
+ print("\n=== Creating Output Directories ===")
+
+ for output_type, path in outputs.items():
+ if path.endswith('_folder') or path.endswith('_dir'):
+ # This is a directory for multiple files
+ os.makedirs(path, exist_ok=True)
+ print(f" Created directory: {path}")
+ else:
+ # For single files, ensure parent directory exists if there is one
+ parent = os.path.dirname(path)
+ if parent and not os.path.exists(parent):
+ os.makedirs(parent, exist_ok=True)
+ print(f" Created parent directory: {parent}")
+ else:
+ print(f" Single file output: {path} (no directory needed)")
+
+ # Add output parameters to params for templates that need them
+ # This is critical for templates like boxplot that check for these
+ return outputs
+
+def find_template(template_name):
+ """Find the template Python file"""
+ print("\n=== Finding Template ===")
+
+ # Determine template filename
+ if template_name == 'load_csv_files':
+ template_py = 'load_csv_files_with_config.py'
+ else:
+ template_py = f'{template_name}_template.py'
+
+ # Search paths (adjust based on your container/environment)
+ search_paths = [
+ f'/opt/spac/templates/{template_py}',
+ f'/app/spac/templates/{template_py}',
+ f'/opt/SCSAWorkflow/src/spac/templates/{template_py}',
+ f'/usr/local/lib/python3.9/site-packages/spac/templates/{template_py}',
+ f'./templates/{template_py}',
+ f'./{template_py}'
+ ]
+
+ for path in search_paths:
+ if os.path.exists(path):
+ print(f" Found: {path}")
+ return path
+
+ print(f" ERROR: {template_py} not found in:")
+ for path in search_paths:
+ print(f" - {path}")
+ return None
+
+def execute_template(template_path, params_file):
+ """Execute the SPAC template"""
+ print("\n=== Executing Template ===")
+ print(f" Command: python3 {template_path} {params_file}")
+
+ # Run template and capture output
+ result = subprocess.run(
+ ['python3', template_path, params_file],
+ capture_output=True,
+ text=True
+ )
+
+ # Save stdout and stderr
+ with open('tool_stdout.txt', 'w') as f:
+ f.write("=== STDOUT ===\n")
+ f.write(result.stdout)
+ if result.stderr:
+ f.write("\n=== STDERR ===\n")
+ f.write(result.stderr)
+
+ # Display output
+ if result.stdout:
+ print(" Output:")
+ lines = result.stdout.split('\n')
+ for line in lines[:20]: # First 20 lines
+ print(f" {line}")
+ if len(lines) > 20:
+ print(f" ... ({len(lines)-20} more lines)")
+
+ if result.stderr:
+ print(" Errors:", file=sys.stderr)
+ for line in result.stderr.split('\n'):
+ if line.strip():
+ print(f" {line}", file=sys.stderr)
+
+ return result.returncode
+
+def handle_output_mapping(template_name, outputs):
+ """
+ Map template outputs to expected locations.
+ Generic approach: find outputs based on pattern matching.
+ """
+ print("\n=== Output Mapping ===")
+
+ for output_type, expected_path in outputs.items():
+ # Skip if already exists at expected location
+ if os.path.exists(expected_path):
+ print(f" {output_type}: Already at {expected_path}")
+ continue
+
+ # Handle single file outputs
+ if expected_path.endswith('.csv') or expected_path.endswith('.tsv') or \
+ expected_path.endswith('.pickle') or expected_path.endswith('.h5ad'):
+ find_and_move_output(output_type, expected_path)
+
+ # Handle folder outputs - check if a default folder exists
+ elif expected_path.endswith('_folder') or expected_path.endswith('_dir'):
+ default_folder = output_type.lower() + '_folder'
+ if default_folder != expected_path and os.path.isdir(default_folder):
+ print(f" Moving {default_folder} to {expected_path}")
+ shutil.move(default_folder, expected_path)
+
+def find_and_move_output(output_type, expected_path):
+ """
+ Find output file based on extension and move to expected location.
+ More generic approach without hardcoded paths.
+ """
+ ext = os.path.splitext(expected_path)[1] # e.g., '.csv'
+ basename = os.path.basename(expected_path)
+
+ print(f" Looking for {output_type} output ({ext} file)...")
+
+ # Search in common output locations
+ search_dirs = ['.', 'dataframe_folder', 'output', 'results']
+
+ for search_dir in search_dirs:
+ if not os.path.exists(search_dir):
+ continue
+
+ if os.path.isdir(search_dir):
+ # Find files with matching extension
+ matches = [f for f in os.listdir(search_dir)
+ if f.endswith(ext)]
+
+ if len(matches) == 1:
+ source = os.path.join(search_dir, matches[0])
+ print(f" Found: {source}")
+ print(f" Moving to: {expected_path}")
+ shutil.move(source, expected_path)
+ return
+ elif len(matches) > 1:
+ # Multiple matches - use the largest or most recent
+ matches_with_size = [(f, os.path.getsize(os.path.join(search_dir, f)))
+ for f in matches]
+ matches_with_size.sort(key=lambda x: x[1], reverse=True)
+ source = os.path.join(search_dir, matches_with_size[0][0])
+ print(f" Found multiple {ext} files, using largest: {source}")
+ shutil.move(source, expected_path)
+ return
+
+ # Also check if file exists with different name in current dir
+ current_dir_matches = [f for f in os.listdir('.')
+ if f.endswith(ext) and f != basename]
+ if current_dir_matches:
+ source = current_dir_matches[0]
+ print(f" Found: {source}")
+ print(f" Moving to: {expected_path}")
+ shutil.move(source, expected_path)
+ return
+
+ print(f" WARNING: No {ext} file found for {output_type}")
+
+def verify_outputs(outputs):
+ """Verify that expected outputs were created"""
+ print("\n=== Output Verification ===")
+
+ all_found = True
+ for output_type, path in outputs.items():
+ if os.path.exists(path):
+ if os.path.isdir(path):
+ files = os.listdir(path)
+ total_size = sum(os.path.getsize(os.path.join(path, f))
+ for f in files)
+ print(f" ✔ {output_type}: {len(files)} files in {path} "
+ f"({format_size(total_size)})")
+ # Show first few files
+ for f in files[:3]:
+ size = os.path.getsize(os.path.join(path, f))
+ print(f" - {f} ({format_size(size)})")
+ if len(files) > 3:
+ print(f" ... and {len(files)-3} more")
+ else:
+ size = os.path.getsize(path)
+ print(f" ✔ {output_type}: {path} ({format_size(size)})")
+ else:
+ print(f" ✗ {output_type}: NOT FOUND at {path}")
+ all_found = False
+
+ if not all_found:
+ print("\n WARNING: Some outputs not found!")
+ print(" Check tool_stdout.txt for errors")
+ # Don't exit with error - let Galaxy handle missing outputs
+
+def format_size(bytes):
+ """Format byte size in human-readable format"""
+ for unit in ['B', 'KB', 'MB', 'GB']:
+ if bytes < 1024.0:
+ return f"{bytes:.1f} {unit}"
+ bytes /= 1024.0
+ return f"{bytes:.1f} TB"
+
+if __name__ == '__main__':
+ main()
\ No newline at end of file
diff --git a/galaxy_tools/refactor_tools/spac_load_csv_files.xml b/galaxy_tools/refactor_tools/spac_load_csv_files.xml
new file mode 100644
index 00000000..5d71d104
--- /dev/null
+++ b/galaxy_tools/refactor_tools/spac_load_csv_files.xml
@@ -0,0 +1,75 @@
+
+ Load CSV files from NIDAP dataset and combine them into a single pandas dataframe for downstream ...
+
+
+ nciccbr/spac:v1
+
+
+
+ python3
+
+
+ &2 &&
+ cat "$params_json" >&2 &&
+ echo "==================" >&2 &&
+
+ ## Save snapshot
+ cp "$params_json" params_snapshot.json &&
+
+ ## Run wrapper
+ bash $__tool_directory__/run_spac_template.sh "$params_json" "load_csv_files"
+ ]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+@misc{spac_toolkit,
+ author = {FNLCR DMAP Team},
+ title = {SPAC: SPAtial single-Cell analysis},
+ year = {2024},
+ url = {https://github.com/FNLCR-DMAP/SCSAWorkflow}
+}
+
+
+
\ No newline at end of file
diff --git a/galaxy_tools/refactor_tools/spac_setup_analysis.xml b/galaxy_tools/refactor_tools/spac_setup_analysis.xml
new file mode 100644
index 00000000..f762f78d
--- /dev/null
+++ b/galaxy_tools/refactor_tools/spac_setup_analysis.xml
@@ -0,0 +1,71 @@
+
+ Convert the pre-processed dataset to the analysis object for downstream analysis.
+
+
+ nciccbr/spac:v1
+
+
+
+ python3
+
+
+ &2 &&
+ cat "$params_json" >&2 &&
+ echo "==================" >&2 &&
+
+ ## Save snapshot
+ cp "$params_json" params_snapshot.json &&
+
+ ## Run wrapper
+ bash $__tool_directory__/run_spac_template.sh "$params_json" "setup_analysis"
+ ]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+@misc{spac_toolkit,
+ author = {FNLCR DMAP Team},
+ title = {SPAC: SPAtial single-Cell analysis},
+ year = {2024},
+ url = {https://github.com/FNLCR-DMAP/SCSAWorkflow}
+}
+
+
+
\ No newline at end of file
diff --git a/galaxy_tools/spac_arcsinh_normalization/run_spac_template.sh b/galaxy_tools/spac_arcsinh_normalization/run_spac_template.sh
new file mode 100644
index 00000000..a93b2d6e
--- /dev/null
+++ b/galaxy_tools/spac_arcsinh_normalization/run_spac_template.sh
@@ -0,0 +1,710 @@
+#!/usr/bin/env bash
+# run_spac_template.sh - SPAC wrapper with column index conversion
+# Version: 5.4.1 - Integrated column conversion
+set -euo pipefail
+
+PARAMS_JSON="${1:?Missing params.json path}"
+TEMPLATE_BASE="${2:?Missing template base name}"
+
+# Handle both base names and full .py filenames
+if [[ "$TEMPLATE_BASE" == *.py ]]; then
+ TEMPLATE_PY="$TEMPLATE_BASE"
+elif [[ "$TEMPLATE_BASE" == "load_csv_files_with_config" ]]; then
+ TEMPLATE_PY="load_csv_files_with_config.py"
+else
+ TEMPLATE_PY="${TEMPLATE_BASE}_template.py"
+fi
+
+# Use SPAC Python environment
+SPAC_PYTHON="${SPAC_PYTHON:-python3}"
+
+echo "=== SPAC Template Wrapper v5.3 ==="
+echo "Parameters: $PARAMS_JSON"
+echo "Template base: $TEMPLATE_BASE"
+echo "Template file: $TEMPLATE_PY"
+echo "Python: $SPAC_PYTHON"
+
+# Run template through Python
+"$SPAC_PYTHON" - <<'PYTHON_RUNNER' "$PARAMS_JSON" "$TEMPLATE_PY" 2>&1 | tee tool_stdout.txt
+import json
+import os
+import sys
+import copy
+import traceback
+import inspect
+import shutil
+import re
+import csv
+
+# Get arguments
+params_path = sys.argv[1]
+template_filename = sys.argv[2]
+
+print(f"[Runner] Loading parameters from: {params_path}")
+print(f"[Runner] Template: {template_filename}")
+
+# Load parameters
+with open(params_path, 'r') as f:
+ params = json.load(f)
+
+# Extract template name
+template_name = os.path.basename(template_filename).replace('_template.py', '').replace('.py', '')
+
+# ===========================================================================
+# DE-SANITIZATION AND PARSING
+# ===========================================================================
+def _unsanitize(s: str) -> str:
+ """Remove Galaxy's parameter sanitization tokens"""
+ if not isinstance(s, str):
+ return s
+ replacements = {
+ '__ob__': '[', '__cb__': ']',
+ '__oc__': '{', '__cc__': '}',
+ '__dq__': '"', '__sq__': "'",
+ '__gt__': '>', '__lt__': '<',
+ '__cn__': '\n', '__cr__': '\r',
+ '__tc__': '\t', '__pd__': '#',
+ '__at__': '@', '__cm__': ','
+ }
+ for token, char in replacements.items():
+ s = s.replace(token, char)
+ return s
+
+def _maybe_parse(v):
+ """Recursively de-sanitize and JSON-parse strings where possible."""
+ if isinstance(v, str):
+ u = _unsanitize(v).strip()
+ if (u.startswith('[') and u.endswith(']')) or (u.startswith('{') and u.endswith('}')):
+ try:
+ return json.loads(u)
+ except Exception:
+ return u
+ return u
+ elif isinstance(v, dict):
+ return {k: _maybe_parse(val) for k, val in v.items()}
+ elif isinstance(v, list):
+ return [_maybe_parse(item) for item in v]
+ return v
+
+# Normalize the whole params tree
+params = _maybe_parse(params)
+
+# ===========================================================================
+# COLUMN INDEX CONVERSION - CRITICAL FOR SETUP ANALYSIS
+# ===========================================================================
+def should_skip_column_conversion(template_name):
+ """Some templates don't need column index conversion"""
+ return 'load_csv' in template_name
+
+def read_file_headers(filepath):
+ """Read column headers from various file formats"""
+ try:
+ import pandas as pd
+
+ # Try pandas auto-detect
+ try:
+ df = pd.read_csv(filepath, nrows=1)
+ if len(df.columns) > 1 or not df.columns[0].startswith('Unnamed'):
+ columns = df.columns.tolist()
+ print(f"[Runner] Pandas auto-detected delimiter, found {len(columns)} columns")
+ return columns
+ except:
+ pass
+
+ # Try common delimiters
+ for sep in ['\t', ',', ';', '|', ' ']:
+ try:
+ df = pd.read_csv(filepath, sep=sep, nrows=1)
+ if len(df.columns) > 1:
+ columns = df.columns.tolist()
+ sep_name = {'\t': 'tab', ',': 'comma', ';': 'semicolon',
+ '|': 'pipe', ' ': 'space'}.get(sep, sep)
+ print(f"[Runner] Pandas found {sep_name}-delimited file with {len(columns)} columns")
+ return columns
+ except:
+ continue
+ except ImportError:
+ print("[Runner] pandas not available, using csv fallback")
+
+ # CSV module fallback
+ try:
+ with open(filepath, 'r', encoding='utf-8', errors='replace', newline='') as f:
+ sample = f.read(8192)
+ f.seek(0)
+
+ try:
+ dialect = csv.Sniffer().sniff(sample, delimiters='\t,;| ')
+ reader = csv.reader(f, dialect)
+ header = next(reader)
+ columns = [h.strip().strip('"') for h in header if h.strip()]
+ if columns:
+ print(f"[Runner] csv.Sniffer detected {len(columns)} columns")
+ return columns
+ except:
+ f.seek(0)
+ first_line = f.readline().strip()
+ for sep in ['\t', ',', ';', '|']:
+ if sep in first_line:
+ columns = [h.strip().strip('"') for h in first_line.split(sep)]
+ if len(columns) > 1:
+ print(f"[Runner] Manual parsing found {len(columns)} columns")
+ return columns
+ except Exception as e:
+ print(f"[Runner] Failed to read headers: {e}")
+
+ return None
+
+def should_convert_param(key, value):
+ """Check if parameter contains column indices"""
+ if value is None or value == "" or value == [] or value == {}:
+ return False
+
+ key_lower = key.lower()
+
+ # Skip String_Columns - it's names not indices
+ if key == 'String_Columns':
+ return False
+
+ # Skip output/path parameters
+ if any(x in key_lower for x in ['output', 'path', 'file', 'directory', 'save', 'export']):
+ return False
+
+ # Skip regex/pattern parameters (but we'll handle Feature_Regex specially)
+ if 'regex' in key_lower or 'pattern' in key_lower:
+ return False
+
+ # Parameters with 'column' likely have indices
+ if 'column' in key_lower or '_col' in key_lower:
+ return True
+
+ # Known index parameters
+ if key in {'Annotation_s_', 'Features_to_Analyze', 'Features', 'Markers', 'Markers_to_Plot', 'Phenotypes'}:
+ return True
+
+ # Check if values look like indices
+ if isinstance(value, list):
+ return all(isinstance(v, int) or (isinstance(v, str) and v.strip().isdigit()) for v in value if v)
+ elif isinstance(value, (int, str)):
+ return isinstance(value, int) or (isinstance(value, str) and value.strip().isdigit())
+
+ return False
+
+def convert_single_index(item, columns):
+ """Convert a single column index to name"""
+ if isinstance(item, str) and not item.strip().isdigit():
+ return item
+
+ try:
+ if isinstance(item, str):
+ item = int(item.strip())
+ elif isinstance(item, float):
+ item = int(item)
+ except (ValueError, AttributeError):
+ return item
+
+ if isinstance(item, int):
+ idx = item - 1 # Galaxy uses 1-based indexing
+ if 0 <= idx < len(columns):
+ return columns[idx]
+ elif 0 <= item < len(columns): # Fallback for 0-based
+ print(f"[Runner] Note: Found 0-based index {item}, converting to {columns[item]}")
+ return columns[item]
+ else:
+ print(f"[Runner] Warning: Index {item} out of range (have {len(columns)} columns)")
+
+ return item
+
+def convert_column_indices_to_names(params, template_name):
+ """Convert column indices to names for templates that need it"""
+
+ if should_skip_column_conversion(template_name):
+ print(f"[Runner] Skipping column conversion for {template_name}")
+ return params
+
+ print(f"[Runner] Checking for column index conversion (template: {template_name})")
+
+ # Find input file
+ input_file = None
+ input_keys = ['Upstream_Dataset', 'Upstream_Analysis', 'CSV_Files',
+ 'Input_File', 'Input_Dataset', 'Data_File']
+
+ for key in input_keys:
+ if key in params:
+ value = params[key]
+ if isinstance(value, list) and value:
+ value = value[0]
+ if value and os.path.exists(str(value)):
+ input_file = str(value)
+ print(f"[Runner] Found input file via {key}: {os.path.basename(input_file)}")
+ break
+
+ if not input_file:
+ print("[Runner] No input file found for column conversion")
+ return params
+
+ # Read headers
+ columns = read_file_headers(input_file)
+ if not columns:
+ print("[Runner] Could not read column headers, skipping conversion")
+ return params
+
+ print(f"[Runner] Successfully read {len(columns)} columns")
+ if len(columns) <= 10:
+ print(f"[Runner] Columns: {columns}")
+ else:
+ print(f"[Runner] First 10 columns: {columns[:10]}")
+
+ # Convert indices to names
+ converted_count = 0
+ for key, value in params.items():
+ # Skip non-column parameters
+ if not should_convert_param(key, value):
+ continue
+
+ # Convert indices
+ if isinstance(value, list):
+ converted_items = []
+ for item in value:
+ converted = convert_single_index(item, columns)
+ if converted is not None:
+ converted_items.append(converted)
+ converted_value = converted_items
+ else:
+ converted_value = convert_single_index(value, columns)
+
+ if value != converted_value:
+ params[key] = converted_value
+ converted_count += 1
+ print(f"[Runner] Converted {key}: {value} -> {converted_value}")
+
+ if converted_count > 0:
+ print(f"[Runner] Total conversions: {converted_count} parameters")
+
+ # CRITICAL: Handle Feature_Regex specially
+ if 'Feature_Regex' in params:
+ value = params['Feature_Regex']
+ if value in [[], [""], "__ob____cb__", "[]", "", None]:
+ params['Feature_Regex'] = ""
+ print("[Runner] Cleared empty Feature_Regex parameter")
+ elif isinstance(value, list) and value:
+ params['Feature_Regex'] = "|".join(str(v) for v in value if v)
+ print(f"[Runner] Joined Feature_Regex list: {params['Feature_Regex']}")
+
+ return params
+
+# ===========================================================================
+# APPLY COLUMN CONVERSION
+# ===========================================================================
+print("[Runner] Step 1: Converting column indices to names")
+params = convert_column_indices_to_names(params, template_name)
+
+# ===========================================================================
+# SPECIAL HANDLING FOR SPECIFIC TEMPLATES
+# ===========================================================================
+
+# Helper function to coerce singleton lists to strings for load_csv
+def _coerce_singleton_paths_for_load_csv(params, template_name):
+ """For load_csv templates, flatten 1-item lists to strings for path-like params."""
+ if 'load_csv' not in template_name:
+ return params
+ for key in ('CSV_Files', 'CSV_Files_Configuration'):
+ val = params.get(key)
+ if isinstance(val, list) and len(val) == 1 and isinstance(val[0], (str, bytes)):
+ params[key] = val[0]
+ print(f"[Runner] Coerced {key} from list -> string")
+ return params
+
+# Special handling for String_Columns in load_csv templates
+if 'load_csv' in template_name and 'String_Columns' in params:
+ value = params['String_Columns']
+ if not isinstance(value, list):
+ if value in [None, "", "[]", "__ob____cb__"]:
+ params['String_Columns'] = []
+ elif isinstance(value, str):
+ s = value.strip()
+ if s.startswith('[') and s.endswith(']'):
+ try:
+ params['String_Columns'] = json.loads(s)
+ except:
+ params['String_Columns'] = [s] if s else []
+ elif ',' in s:
+ params['String_Columns'] = [item.strip() for item in s.split(',') if item.strip()]
+ else:
+ params['String_Columns'] = [s] if s else []
+ else:
+ params['String_Columns'] = []
+ print(f"[Runner] Ensured String_Columns is list: {params['String_Columns']}")
+
+# Apply coercion for load_csv files
+params = _coerce_singleton_paths_for_load_csv(params, template_name)
+
+# Fix for Load CSV Files directory
+if 'load_csv' in template_name and 'CSV_Files' in params:
+ # Check if csv_input_dir was created by Galaxy command
+ if os.path.exists('csv_input_dir') and os.path.isdir('csv_input_dir'):
+ params['CSV_Files'] = 'csv_input_dir'
+ print("[Runner] Using csv_input_dir created by Galaxy")
+ elif isinstance(params['CSV_Files'], str) and os.path.isfile(params['CSV_Files']):
+ # We have a single file path, need to get its directory
+ params['CSV_Files'] = os.path.dirname(params['CSV_Files'])
+ print(f"[Runner] Using directory of CSV file: {params['CSV_Files']}")
+
+# ===========================================================================
+# LIST PARAMETER NORMALIZATION
+# ===========================================================================
+def should_normalize_as_list(key, value):
+ """Determine if a parameter should be normalized as a list"""
+ if isinstance(value, list):
+ return True
+
+ if value is None or value == "":
+ return False
+
+ key_lower = key.lower()
+
+ # Skip regex parameters
+ if 'regex' in key_lower or 'pattern' in key_lower:
+ return False
+
+ # Skip known single-value parameters
+ if any(x in key_lower for x in ['single', 'one', 'first', 'second', 'primary']):
+ return False
+
+ # Plural forms suggest lists
+ if any(x in key_lower for x in ['features', 'markers', 'phenotypes', 'annotations',
+ 'columns', 'types', 'labels', 'regions', 'radii']):
+ return True
+
+ # Check for list separators
+ if isinstance(value, str):
+ if ',' in value or '\n' in value:
+ return True
+ if value.strip().startswith('[') and value.strip().endswith(']'):
+ return True
+
+ return False
+
+def normalize_to_list(value):
+ """Convert various input formats to a proper Python list"""
+ if value in (None, "", "All", ["All"], "all", ["all"]):
+ return ["All"]
+
+ if isinstance(value, list):
+ return value
+
+ if isinstance(value, str):
+ s = value.strip()
+
+ # Try JSON parsing
+ if s.startswith('[') and s.endswith(']'):
+ try:
+ parsed = json.loads(s)
+ return parsed if isinstance(parsed, list) else [str(parsed)]
+ except:
+ pass
+
+ # Split by comma
+ if ',' in s:
+ return [item.strip() for item in s.split(',') if item.strip()]
+
+ # Split by newline
+ if '\n' in s:
+ return [item.strip() for item in s.split('\n') if item.strip()]
+
+ # Single value
+ return [s] if s else []
+
+ return [value] if value is not None else []
+
+# Normalize list parameters
+print("[Runner] Step 2: Normalizing list parameters")
+list_count = 0
+for key, value in list(params.items()):
+ if should_normalize_as_list(key, value):
+ original = value
+ normalized = normalize_to_list(value)
+ if original != normalized:
+ params[key] = normalized
+ list_count += 1
+ if len(str(normalized)) > 100:
+ print(f"[Runner] Normalized {key}: {type(original).__name__} -> list of {len(normalized)} items")
+ else:
+ print(f"[Runner] Normalized {key}: {original} -> {normalized}")
+
+if list_count > 0:
+ print(f"[Runner] Normalized {list_count} list parameters")
+
+# CRITICAL FIX: Handle single-element lists for coordinate columns
+# These should be strings, not lists
+coordinate_keys = ['X_Coordinate_Column', 'Y_Coordinate_Column', 'X_centroid', 'Y_centroid']
+for key in coordinate_keys:
+ if key in params:
+ value = params[key]
+ if isinstance(value, list) and len(value) == 1:
+ params[key] = value[0]
+ print(f"[Runner] Extracted single value from {key}: {value} -> {params[key]}")
+
+# Also check for any key ending with '_Column' that has a single-element list
+for key in list(params.keys()):
+ if key.endswith('_Column') and isinstance(params[key], list) and len(params[key]) == 1:
+ original = params[key]
+ params[key] = params[key][0]
+ print(f"[Runner] Extracted single value from {key}: {original} -> {params[key]}")
+
+# ===========================================================================
+# OUTPUTS HANDLING
+# ===========================================================================
+
+# Extract outputs specification
+raw_outputs = params.pop('outputs', {})
+outputs = {}
+
+if isinstance(raw_outputs, dict):
+ outputs = raw_outputs
+elif isinstance(raw_outputs, str):
+ try:
+ maybe = json.loads(_unsanitize(raw_outputs))
+ if isinstance(maybe, dict):
+ outputs = maybe
+ except Exception:
+ pass
+
+if not isinstance(outputs, dict) or not outputs:
+ print("[Runner] Warning: 'outputs' missing or not a dict; using defaults")
+ if 'boxplot' in template_name or 'plot' in template_name or 'histogram' in template_name:
+ outputs = {'DataFrames': 'dataframe_folder', 'figures': 'figure_folder'}
+ elif 'load_csv' in template_name:
+ outputs = {'DataFrames': 'dataframe_folder'}
+ elif 'interactive' in template_name:
+ outputs = {'html': 'html_folder'}
+ else:
+ outputs = {'analysis': 'transform_output.pickle'}
+
+print(f"[Runner] Outputs -> {list(outputs.keys())}")
+
+# Create output directories
+for output_type, path in outputs.items():
+ if output_type != 'analysis' and path:
+ os.makedirs(path, exist_ok=True)
+ print(f"[Runner] Created {output_type} directory: {path}")
+
+# Add output paths to params
+params['save_results'] = True
+
+if 'analysis' in outputs:
+ params['output_path'] = outputs['analysis']
+ params['Output_Path'] = outputs['analysis']
+ params['Output_File'] = outputs['analysis']
+
+if 'DataFrames' in outputs:
+ df_dir = outputs['DataFrames']
+ params['output_dir'] = df_dir
+ params['Export_Dir'] = df_dir
+ params['Output_File'] = os.path.join(df_dir, f'{template_name}_output.csv')
+
+if 'figures' in outputs:
+ fig_dir = outputs['figures']
+ params['figure_dir'] = fig_dir
+ params['Figure_Dir'] = fig_dir
+ params['Figure_File'] = os.path.join(fig_dir, f'{template_name}.png')
+
+if 'html' in outputs:
+ html_dir = outputs['html']
+ params['html_dir'] = html_dir
+ params['Output_File'] = os.path.join(html_dir, f'{template_name}.html')
+
+# Save runtime parameters
+with open('params.runtime.json', 'w') as f:
+ json.dump(params, f, indent=2)
+
+# Save clean params for Galaxy display
+params_display = {k: v for k, v in params.items()
+ if k not in ['Output_File', 'Figure_File', 'output_dir', 'figure_dir']}
+with open('config_used.json', 'w') as f:
+ json.dump(params_display, f, indent=2)
+
+print(f"[Runner] Saved runtime parameters")
+
+# ============================================================================
+# LOAD AND EXECUTE TEMPLATE
+# ============================================================================
+
+# Try to import from installed package first (Docker environment)
+template_module_name = template_filename.replace('.py', '')
+try:
+ import importlib
+ mod = importlib.import_module(f'spac.templates.{template_module_name}')
+ print(f"[Runner] Loaded template from package: spac.templates.{template_module_name}")
+except (ImportError, ModuleNotFoundError):
+ # Fallback to loading from file
+ print(f"[Runner] Package import failed, trying file load")
+ import importlib.util
+
+ # Standard locations
+ template_paths = [
+ f'/app/spac/templates/{template_filename}',
+ f'/opt/spac/templates/{template_filename}',
+ f'/opt/SCSAWorkflow/src/spac/templates/{template_filename}',
+ template_filename # Current directory
+ ]
+
+ spec = None
+ for path in template_paths:
+ if os.path.exists(path):
+ spec = importlib.util.spec_from_file_location("template_mod", path)
+ if spec:
+ print(f"[Runner] Found template at: {path}")
+ break
+
+ if not spec or not spec.loader:
+ print(f"[Runner] ERROR: Could not find template: {template_filename}")
+ sys.exit(1)
+
+ mod = importlib.util.module_from_spec(spec)
+ spec.loader.exec_module(mod)
+
+# Verify run_from_json exists
+if not hasattr(mod, 'run_from_json'):
+ print('[Runner] ERROR: Template missing run_from_json function')
+ sys.exit(2)
+
+# Check function signature
+sig = inspect.signature(mod.run_from_json)
+kwargs = {}
+
+if 'save_results' in sig.parameters:
+ kwargs['save_results'] = True
+if 'show_plot' in sig.parameters:
+ kwargs['show_plot'] = False
+
+print(f"[Runner] Executing template with kwargs: {kwargs}")
+
+# Execute template
+try:
+ result = mod.run_from_json('params.runtime.json', **kwargs)
+ print(f"[Runner] Template completed, returned: {type(result).__name__}")
+
+ # Handle different return types
+ if result is not None:
+ if isinstance(result, dict):
+ print(f"[Runner] Template saved files: {list(result.keys())}")
+ elif isinstance(result, tuple):
+ # Handle tuple returns
+ saved_count = 0
+ for i, item in enumerate(result):
+ if hasattr(item, 'savefig') and 'figures' in outputs:
+ import matplotlib
+ matplotlib.use('Agg')
+ import matplotlib.pyplot as plt
+ fig_path = os.path.join(outputs['figures'], f'figure_{i+1}.png')
+ item.savefig(fig_path, dpi=300, bbox_inches='tight')
+ plt.close(item)
+ saved_count += 1
+ print(f"[Runner] Saved figure to {fig_path}")
+ elif hasattr(item, 'to_csv') and 'DataFrames' in outputs:
+ df_path = os.path.join(outputs['DataFrames'], f'table_{i+1}.csv')
+ item.to_csv(df_path, index=True)
+ saved_count += 1
+ print(f"[Runner] Saved DataFrame to {df_path}")
+
+ if saved_count > 0:
+ print(f"[Runner] Saved {saved_count} in-memory results")
+
+ elif hasattr(result, 'to_csv') and 'DataFrames' in outputs:
+ df_path = os.path.join(outputs['DataFrames'], 'output.csv')
+ result.to_csv(df_path, index=True)
+ print(f"[Runner] Saved DataFrame to {df_path}")
+
+ elif hasattr(result, 'savefig') and 'figures' in outputs:
+ import matplotlib
+ matplotlib.use('Agg')
+ import matplotlib.pyplot as plt
+ fig_path = os.path.join(outputs['figures'], 'figure.png')
+ result.savefig(fig_path, dpi=300, bbox_inches='tight')
+ plt.close(result)
+ print(f"[Runner] Saved figure to {fig_path}")
+
+ elif hasattr(result, 'write_h5ad') and 'analysis' in outputs:
+ result.write_h5ad(outputs['analysis'])
+ print(f"[Runner] Saved AnnData to {outputs['analysis']}")
+
+except Exception as e:
+ print(f"[Runner] ERROR in template execution: {e}")
+ print(f"[Runner] Error type: {type(e).__name__}")
+ traceback.print_exc()
+
+ # Debug help for common issues
+ if "String Columns must be a *list*" in str(e):
+ print("\n[Runner] DEBUG: String_Columns validation failed")
+ print(f"[Runner] Current String_Columns value: {params.get('String_Columns')}")
+ print(f"[Runner] Type: {type(params.get('String_Columns'))}")
+
+ elif "regex pattern" in str(e).lower() or "^8$" in str(e):
+ print("\n[Runner] DEBUG: This appears to be a column index issue")
+ print("[Runner] Check that column indices were properly converted to names")
+ print("[Runner] Current Features_to_Analyze value:", params.get('Features_to_Analyze'))
+ print("[Runner] Current Feature_Regex value:", params.get('Feature_Regex'))
+
+ sys.exit(1)
+
+# Verify outputs
+print("[Runner] Verifying outputs...")
+found_outputs = False
+
+for output_type, path in outputs.items():
+ if output_type == 'analysis':
+ if os.path.exists(path):
+ size = os.path.getsize(path)
+ print(f"[Runner] ✔ {output_type}: {path} ({size:,} bytes)")
+ found_outputs = True
+ else:
+ print(f"[Runner] ✗ {output_type}: NOT FOUND")
+ else:
+ if os.path.exists(path) and os.path.isdir(path):
+ files = os.listdir(path)
+ if files:
+ print(f"[Runner] ✔ {output_type}: {len(files)} files")
+ for f in files[:3]:
+ print(f"[Runner] - {f}")
+ if len(files) > 3:
+ print(f"[Runner] ... and {len(files)-3} more")
+ found_outputs = True
+ else:
+ print(f"[Runner] ⚠ {output_type}: directory empty")
+
+# Check for files in working directory and move them
+print("[Runner] Checking for files in working directory...")
+for file in os.listdir('.'):
+ if os.path.isdir(file) or file in ['params.runtime.json', 'config_used.json',
+ 'tool_stdout.txt', 'outputs_returned.json']:
+ continue
+
+ if file.endswith('.csv') and 'DataFrames' in outputs:
+ if not os.path.exists(os.path.join(outputs['DataFrames'], file)):
+ target = os.path.join(outputs['DataFrames'], file)
+ shutil.move(file, target)
+ print(f"[Runner] Moved {file} to {target}")
+ found_outputs = True
+ elif file.endswith(('.png', '.pdf', '.jpg', '.svg')) and 'figures' in outputs:
+ if not os.path.exists(os.path.join(outputs['figures'], file)):
+ target = os.path.join(outputs['figures'], file)
+ shutil.move(file, target)
+ print(f"[Runner] Moved {file} to {target}")
+ found_outputs = True
+
+if found_outputs:
+ print("[Runner] === SUCCESS ===")
+else:
+ print("[Runner] WARNING: No outputs created")
+
+PYTHON_RUNNER
+
+EXIT_CODE=$?
+
+if [ $EXIT_CODE -ne 0 ]; then
+ echo "ERROR: Template execution failed with exit code $EXIT_CODE"
+ exit 1
+fi
+
+echo "=== Execution Complete ==="
+exit 0
\ No newline at end of file
diff --git a/galaxy_tools/spac_arcsinh_normalization/spac_arcsinh_normalization.xml b/galaxy_tools/spac_arcsinh_normalization/spac_arcsinh_normalization.xml
new file mode 100644
index 00000000..ad0f4baf
--- /dev/null
+++ b/galaxy_tools/spac_arcsinh_normalization/spac_arcsinh_normalization.xml
@@ -0,0 +1,70 @@
+
+ Normalize features either by a user-defined co-factor or a determined percentile, allowing for ef...
+
+
+ nciccbr/spac:v1
+
+
+
+ python3
+
+
+ tool_stdout.txt &&
+
+ ## Run the universal wrapper (template name without .py extension)
+ bash $__tool_directory__/run_spac_template.sh "$params_json" arcsinh_normalization
+ ]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ @misc{spac_toolkit,
+ author = {FNLCR DMAP Team},
+ title = {SPAC: SPAtial single-Cell analysis},
+ year = {2024},
+ url = {https://github.com/FNLCR-DMAP/SCSAWorkflow}
+ }
+
+
+
\ No newline at end of file
diff --git a/galaxy_tools/spac_boxplot/run_spac_template.sh b/galaxy_tools/spac_boxplot/run_spac_template.sh
new file mode 100644
index 00000000..5e08ae50
--- /dev/null
+++ b/galaxy_tools/spac_boxplot/run_spac_template.sh
@@ -0,0 +1,782 @@
+#!/usr/bin/env bash
+# run_spac_template.sh - SPAC wrapper with column index conversion
+# Version: 5.4.2 - Integrated column conversion
+set -euo pipefail
+
+PARAMS_JSON="${1:?Missing params.json path}"
+TEMPLATE_BASE="${2:?Missing template base name}"
+
+# Handle both base names and full .py filenames
+if [[ "$TEMPLATE_BASE" == *.py ]]; then
+ TEMPLATE_PY="$TEMPLATE_BASE"
+elif [[ "$TEMPLATE_BASE" == "load_csv_files_with_config" ]]; then
+ TEMPLATE_PY="load_csv_files_with_config.py"
+else
+ TEMPLATE_PY="${TEMPLATE_BASE}_template.py"
+fi
+
+# Use SPAC Python environment
+SPAC_PYTHON="${SPAC_PYTHON:-python3}"
+
+echo "=== SPAC Template Wrapper v5.4 ==="
+echo "Parameters: $PARAMS_JSON"
+echo "Template base: $TEMPLATE_BASE"
+echo "Template file: $TEMPLATE_PY"
+echo "Python: $SPAC_PYTHON"
+
+# Run template through Python
+"$SPAC_PYTHON" - <<'PYTHON_RUNNER' "$PARAMS_JSON" "$TEMPLATE_PY" 2>&1 | tee tool_stdout.txt
+import json
+import os
+import sys
+import copy
+import traceback
+import inspect
+import shutil
+import re
+import csv
+
+# Get arguments
+params_path = sys.argv[1]
+template_filename = sys.argv[2]
+
+print(f"[Runner] Loading parameters from: {params_path}")
+print(f"[Runner] Template: {template_filename}")
+
+# Load parameters
+with open(params_path, 'r') as f:
+ params = json.load(f)
+
+# Extract template name
+template_name = os.path.basename(template_filename).replace('_template.py', '').replace('.py', '')
+
+# ===========================================================================
+# DE-SANITIZATION AND PARSING
+# ===========================================================================
+def _unsanitize(s: str) -> str:
+ """Remove Galaxy's parameter sanitization tokens"""
+ if not isinstance(s, str):
+ return s
+ replacements = {
+ '__ob__': '[', '__cb__': ']',
+ '__oc__': '{', '__cc__': '}',
+ '__dq__': '"', '__sq__': "'",
+ '__gt__': '>', '__lt__': '<',
+ '__cn__': '\n', '__cr__': '\r',
+ '__tc__': '\t', '__pd__': '#',
+ '__at__': '@', '__cm__': ','
+ }
+ for token, char in replacements.items():
+ s = s.replace(token, char)
+ return s
+
+def _maybe_parse(v):
+ """Recursively de-sanitize and JSON-parse strings where possible."""
+ if isinstance(v, str):
+ u = _unsanitize(v).strip()
+ if (u.startswith('[') and u.endswith(']')) or (u.startswith('{') and u.endswith('}')):
+ try:
+ return json.loads(u)
+ except Exception:
+ return u
+ return u
+ elif isinstance(v, dict):
+ return {k: _maybe_parse(val) for k, val in v.items()}
+ elif isinstance(v, list):
+ return [_maybe_parse(item) for item in v]
+ return v
+
+# Normalize the whole params tree
+params = _maybe_parse(params)
+
+# ===========================================================================
+# COLUMN INDEX CONVERSION - CRITICAL FOR SETUP ANALYSIS
+# ===========================================================================
+def should_skip_column_conversion(template_name):
+ """Some templates don't need column index conversion"""
+ return 'load_csv' in template_name
+
+def read_file_headers(filepath):
+ """Read column headers from various file formats"""
+ try:
+ import pandas as pd
+
+ # Try pandas auto-detect
+ try:
+ df = pd.read_csv(filepath, nrows=1)
+ if len(df.columns) > 1 or not df.columns[0].startswith('Unnamed'):
+ columns = df.columns.tolist()
+ print(f"[Runner] Pandas auto-detected delimiter, found {len(columns)} columns")
+ return columns
+ except:
+ pass
+
+ # Try common delimiters
+ for sep in ['\t', ',', ';', '|', ' ']:
+ try:
+ df = pd.read_csv(filepath, sep=sep, nrows=1)
+ if len(df.columns) > 1:
+ columns = df.columns.tolist()
+ sep_name = {'\t': 'tab', ',': 'comma', ';': 'semicolon',
+ '|': 'pipe', ' ': 'space'}.get(sep, sep)
+ print(f"[Runner] Pandas found {sep_name}-delimited file with {len(columns)} columns")
+ return columns
+ except:
+ continue
+ except ImportError:
+ print("[Runner] pandas not available, using csv fallback")
+
+ # CSV module fallback
+ try:
+ with open(filepath, 'r', encoding='utf-8', errors='replace', newline='') as f:
+ sample = f.read(8192)
+ f.seek(0)
+
+ try:
+ dialect = csv.Sniffer().sniff(sample, delimiters='\t,;| ')
+ reader = csv.reader(f, dialect)
+ header = next(reader)
+ columns = [h.strip().strip('"') for h in header if h.strip()]
+ if columns:
+ print(f"[Runner] csv.Sniffer detected {len(columns)} columns")
+ return columns
+ except:
+ f.seek(0)
+ first_line = f.readline().strip()
+ for sep in ['\t', ',', ';', '|']:
+ if sep in first_line:
+ columns = [h.strip().strip('"') for h in first_line.split(sep)]
+ if len(columns) > 1:
+ print(f"[Runner] Manual parsing found {len(columns)} columns")
+ return columns
+ except Exception as e:
+ print(f"[Runner] Failed to read headers: {e}")
+
+ return None
+
+def should_convert_param(key, value):
+ """Check if parameter contains column indices"""
+ if value is None or value == "" or value == [] or value == {}:
+ return False
+
+ key_lower = key.lower()
+
+ # Skip String_Columns - it's names not indices
+ if key == 'String_Columns':
+ return False
+
+ # Skip output/path parameters
+ if any(x in key_lower for x in ['output', 'path', 'file', 'directory', 'save', 'export']):
+ return False
+
+ # Skip regex/pattern parameters (but we'll handle Feature_Regex specially)
+ if 'regex' in key_lower or 'pattern' in key_lower:
+ return False
+
+ # Parameters with 'column' likely have indices
+ if 'column' in key_lower or '_col' in key_lower:
+ return True
+
+ # Known index parameters
+ if key in {'Annotation_s_', 'Features_to_Analyze', 'Features', 'Markers', 'Markers_to_Plot', 'Phenotypes'}:
+ return True
+
+ # Check if values look like indices
+ if isinstance(value, list):
+ return all(isinstance(v, int) or (isinstance(v, str) and v.strip().isdigit()) for v in value if v)
+ elif isinstance(value, (int, str)):
+ return isinstance(value, int) or (isinstance(value, str) and value.strip().isdigit())
+
+ return False
+
+def convert_single_index(item, columns):
+ """Convert a single column index to name"""
+ if isinstance(item, str) and not item.strip().isdigit():
+ return item
+
+ try:
+ if isinstance(item, str):
+ item = int(item.strip())
+ elif isinstance(item, float):
+ item = int(item)
+ except (ValueError, AttributeError):
+ return item
+
+ if isinstance(item, int):
+ idx = item - 1 # Galaxy uses 1-based indexing
+ if 0 <= idx < len(columns):
+ return columns[idx]
+ elif 0 <= item < len(columns): # Fallback for 0-based
+ print(f"[Runner] Note: Found 0-based index {item}, converting to {columns[item]}")
+ return columns[item]
+ else:
+ print(f"[Runner] Warning: Index {item} out of range (have {len(columns)} columns)")
+
+ return item
+
+def convert_column_indices_to_names(params, template_name):
+ """Convert column indices to names for templates that need it"""
+
+ if should_skip_column_conversion(template_name):
+ print(f"[Runner] Skipping column conversion for {template_name}")
+ return params
+
+ print(f"[Runner] Checking for column index conversion (template: {template_name})")
+
+ # Find input file
+ input_file = None
+ input_keys = ['Upstream_Dataset', 'Upstream_Analysis', 'CSV_Files',
+ 'Input_File', 'Input_Dataset', 'Data_File']
+
+ for key in input_keys:
+ if key in params:
+ value = params[key]
+ if isinstance(value, list) and value:
+ value = value[0]
+ if value and os.path.exists(str(value)):
+ input_file = str(value)
+ print(f"[Runner] Found input file via {key}: {os.path.basename(input_file)}")
+ break
+
+ if not input_file:
+ print("[Runner] No input file found for column conversion")
+ return params
+
+ # Read headers
+ columns = read_file_headers(input_file)
+ if not columns:
+ print("[Runner] Could not read column headers, skipping conversion")
+ return params
+
+ print(f"[Runner] Successfully read {len(columns)} columns")
+ if len(columns) <= 10:
+ print(f"[Runner] Columns: {columns}")
+ else:
+ print(f"[Runner] First 10 columns: {columns[:10]}")
+
+ # Convert indices to names
+ converted_count = 0
+ for key, value in params.items():
+ # Skip non-column parameters
+ if not should_convert_param(key, value):
+ continue
+
+ # Convert indices
+ if isinstance(value, list):
+ converted_items = []
+ for item in value:
+ converted = convert_single_index(item, columns)
+ if converted is not None:
+ converted_items.append(converted)
+ converted_value = converted_items
+ else:
+ converted_value = convert_single_index(value, columns)
+
+ if value != converted_value:
+ params[key] = converted_value
+ converted_count += 1
+ print(f"[Runner] Converted {key}: {value} -> {converted_value}")
+
+ if converted_count > 0:
+ print(f"[Runner] Total conversions: {converted_count} parameters")
+
+ # CRITICAL: Handle Feature_Regex specially
+ if 'Feature_Regex' in params:
+ value = params['Feature_Regex']
+ if value in [[], [""], "__ob____cb__", "[]", "", None]:
+ params['Feature_Regex'] = ""
+ print("[Runner] Cleared empty Feature_Regex parameter")
+ elif isinstance(value, list) and value:
+ params['Feature_Regex'] = "|".join(str(v) for v in value if v)
+ print(f"[Runner] Joined Feature_Regex list: {params['Feature_Regex']}")
+
+ return params
+
+# ===========================================================================
+# APPLY COLUMN CONVERSION
+# ===========================================================================
+print("[Runner] Step 1: Converting column indices to names")
+params = convert_column_indices_to_names(params, template_name)
+
+# ===========================================================================
+# SPECIAL HANDLING FOR SPECIFIC TEMPLATES
+# ===========================================================================
+
+# Helper function to coerce singleton lists to strings for load_csv
+def _coerce_singleton_paths_for_load_csv(params, template_name):
+ """For load_csv templates, flatten 1-item lists to strings for path-like params."""
+ if 'load_csv' not in template_name:
+ return params
+ for key in ('CSV_Files', 'CSV_Files_Configuration'):
+ val = params.get(key)
+ if isinstance(val, list) and len(val) == 1 and isinstance(val[0], (str, bytes)):
+ params[key] = val[0]
+ print(f"[Runner] Coerced {key} from list -> string")
+ return params
+
+# Special handling for String_Columns in load_csv templates
+if 'load_csv' in template_name and 'String_Columns' in params:
+ value = params['String_Columns']
+ if not isinstance(value, list):
+ if value in [None, "", "[]", "__ob____cb__"]:
+ params['String_Columns'] = []
+ elif isinstance(value, str):
+ s = value.strip()
+ if s.startswith('[') and s.endswith(']'):
+ try:
+ params['String_Columns'] = json.loads(s)
+ except:
+ params['String_Columns'] = [s] if s else []
+ elif ',' in s:
+ params['String_Columns'] = [item.strip() for item in s.split(',') if item.strip()]
+ else:
+ params['String_Columns'] = [s] if s else []
+ else:
+ params['String_Columns'] = []
+ print(f"[Runner] Ensured String_Columns is list: {params['String_Columns']}")
+
+# Apply coercion for load_csv files
+params = _coerce_singleton_paths_for_load_csv(params, template_name)
+
+# Fix for Load CSV Files directory
+if 'load_csv' in template_name and 'CSV_Files' in params:
+ # Check if csv_input_dir was created by Galaxy command
+ if os.path.exists('csv_input_dir') and os.path.isdir('csv_input_dir'):
+ params['CSV_Files'] = 'csv_input_dir'
+ print("[Runner] Using csv_input_dir created by Galaxy")
+ elif isinstance(params['CSV_Files'], str) and os.path.isfile(params['CSV_Files']):
+ # We have a single file path, need to get its directory
+ params['CSV_Files'] = os.path.dirname(params['CSV_Files'])
+ print(f"[Runner] Using directory of CSV file: {params['CSV_Files']}")
+
+# ===========================================================================
+# LIST PARAMETER NORMALIZATION
+# ===========================================================================
+def should_normalize_as_list(key, value):
+ """Determine if a parameter should be normalized as a list"""
+ # CRITICAL: Skip outputs and other non-list parameters
+ key_lower = key.lower()
+ if key_lower in {'outputs', 'output', 'upstream_analysis', 'upstream_dataset',
+ 'table_to_visualize', 'figure_title', 'figure_width',
+ 'figure_height', 'figure_dpi', 'font_size'}:
+ return False
+
+ # Already a proper list?
+ if isinstance(value, list):
+ # Only re-process if it's a single JSON string that needs parsing
+ if len(value) == 1 and isinstance(value[0], str):
+ s = value[0].strip()
+ return s.startswith('[') and s.endswith(']')
+ return False
+
+ # Nothing to normalize
+ if value is None or value == "":
+ return False
+
+ # CRITICAL: Explicitly mark Feature_s_to_Plot as a list parameter
+ if key == 'Feature_s_to_Plot' or key_lower == 'feature_s_to_plot':
+ return True
+
+ # Other explicit list parameters
+ explicit_list_keys = {
+ 'features_to_analyze', 'features', 'markers', 'markers_to_plot',
+ 'phenotypes', 'labels', 'annotation_s_', 'string_columns'
+ }
+ if key_lower in explicit_list_keys:
+ return True
+
+ # Skip regex parameters
+ if 'regex' in key_lower or 'pattern' in key_lower:
+ return False
+
+ # Skip known single-value parameters
+ if any(x in key_lower for x in ['single', 'one', 'first', 'second', 'primary']):
+ return False
+
+ # Plural forms suggest lists
+ if any(x in key_lower for x in [
+ 'features', 'markers', 'phenotypes', 'annotations',
+ 'columns', 'types', 'labels', 'regions', 'radii'
+ ]):
+ return True
+
+ # List-like syntax in string values
+ if isinstance(value, str):
+ s = value.strip()
+ if s.startswith('[') and s.endswith(']'):
+ return True
+ # Only treat comma/newline as list separator if not in outputs-like params
+ if 'output' not in key_lower and 'path' not in key_lower:
+ if ',' in s or '\n' in s:
+ return True
+
+ return False
+
+def normalize_to_list(value):
+ """Convert various input formats to a proper Python list"""
+ # Handle special "All" cases first
+ if value in (None, "", "All", "all"):
+ return ["All"]
+
+ # If it's already a list
+ if isinstance(value, list):
+ # Check for already-correct lists
+ if value == ["All"] or value == ["all"]:
+ return ["All"]
+
+ # Check if it's a single-element list with a JSON string
+ if len(value) == 1 and isinstance(value[0], str):
+ s = value[0].strip()
+ # If the single element looks like JSON
+ if s.startswith('[') and s.endswith(']'):
+ try:
+ parsed = json.loads(s)
+ if isinstance(parsed, list):
+ return parsed
+ except:
+ pass
+ # If single element is "All" or "all"
+ elif s.lower() == "all":
+ return ["All"]
+
+ # Already a proper list, return as-is
+ return value
+
+ if isinstance(value, str):
+ s = value.strip()
+
+ # Check for "All" string
+ if s.lower() == "all":
+ return ["All"]
+
+ # Try JSON parsing
+ if s.startswith('[') and s.endswith(']'):
+ try:
+ parsed = json.loads(s)
+ return parsed if isinstance(parsed, list) else [str(parsed)]
+ except:
+ pass
+
+ # Split by comma
+ if ',' in s:
+ return [item.strip() for item in s.split(',') if item.strip()]
+
+ # Split by newline
+ if '\n' in s:
+ return [item.strip() for item in s.split('\n') if item.strip()]
+
+ # Single value
+ return [s] if s else []
+
+ return [value] if value is not None else []
+
+# Normalize list parameters
+print("[Runner] Step 2: Normalizing list parameters")
+list_count = 0
+for key, value in list(params.items()):
+ if should_normalize_as_list(key, value):
+ original = value
+ normalized = normalize_to_list(value)
+ if original != normalized:
+ params[key] = normalized
+ list_count += 1
+ if len(str(normalized)) > 100:
+ print(f"[Runner] Normalized {key}: {type(original).__name__} -> list of {len(normalized)} items")
+ else:
+ print(f"[Runner] Normalized {key}: {original} -> {normalized}")
+
+if list_count > 0:
+ print(f"[Runner] Normalized {list_count} list parameters")
+
+# CRITICAL FIX: Handle single-element lists for coordinate columns
+# These should be strings, not lists
+coordinate_keys = ['X_Coordinate_Column', 'Y_Coordinate_Column', 'X_centroid', 'Y_centroid']
+for key in coordinate_keys:
+ if key in params:
+ value = params[key]
+ if isinstance(value, list) and len(value) == 1:
+ params[key] = value[0]
+ print(f"[Runner] Extracted single value from {key}: {value} -> {params[key]}")
+
+# Also check for any key ending with '_Column' that has a single-element list
+for key in list(params.keys()):
+ if key.endswith('_Column') and isinstance(params[key], list) and len(params[key]) == 1:
+ original = params[key]
+ params[key] = params[key][0]
+ print(f"[Runner] Extracted single value from {key}: {original} -> {params[key]}")
+
+# ===========================================================================
+# OUTPUTS HANDLING
+# ===========================================================================
+
+# Extract outputs specification
+raw_outputs = params.pop('outputs', {})
+outputs = {}
+
+if isinstance(raw_outputs, dict):
+ outputs = raw_outputs
+elif isinstance(raw_outputs, str):
+ try:
+ maybe = json.loads(_unsanitize(raw_outputs))
+ if isinstance(maybe, dict):
+ outputs = maybe
+ except Exception:
+ pass
+
+# CRITICAL FIX: Handle outputs if it was mistakenly normalized as a list
+if isinstance(raw_outputs, list) and raw_outputs:
+ # Try to reconstruct the dict from the list
+ if len(raw_outputs) >= 2:
+ # Assume format like ["{'DataFrames': 'dataframe_folder'", "'figures': 'figure_folder'}"]
+ combined = ''.join(str(item) for item in raw_outputs)
+ # Clean up the string
+ combined = combined.replace("'", '"')
+ try:
+ outputs = json.loads(combined)
+ except:
+ # Try another approach - look for dict-like patterns
+ try:
+ dict_str = '{' + combined.split('{')[1].split('}')[0] + '}'
+ outputs = json.loads(dict_str.replace("'", '"'))
+ except:
+ pass
+
+if not isinstance(outputs, dict) or not outputs:
+ print("[Runner] Warning: 'outputs' missing or not a dict; using defaults")
+ if 'boxplot' in template_name or 'plot' in template_name or 'histogram' in template_name:
+ outputs = {'DataFrames': 'dataframe_folder', 'figures': 'figure_folder'}
+ elif 'load_csv' in template_name:
+ outputs = {'DataFrames': 'dataframe_folder'}
+ elif 'interactive' in template_name:
+ outputs = {'html': 'html_folder'}
+ else:
+ outputs = {'analysis': 'transform_output.pickle'}
+
+print(f"[Runner] Outputs -> {list(outputs.keys())}")
+
+# Create output directories
+for output_type, path in outputs.items():
+ if output_type != 'analysis' and path:
+ os.makedirs(path, exist_ok=True)
+ print(f"[Runner] Created {output_type} directory: {path}")
+
+# Add output paths to params
+params['save_results'] = True
+
+if 'analysis' in outputs:
+ params['output_path'] = outputs['analysis']
+ params['Output_Path'] = outputs['analysis']
+ params['Output_File'] = outputs['analysis']
+
+if 'DataFrames' in outputs:
+ df_dir = outputs['DataFrames']
+ params['output_dir'] = df_dir
+ params['Export_Dir'] = df_dir
+ params['Output_File'] = os.path.join(df_dir, f'{template_name}_output.csv')
+
+if 'figures' in outputs:
+ fig_dir = outputs['figures']
+ params['figure_dir'] = fig_dir
+ params['Figure_Dir'] = fig_dir
+ params['Figure_File'] = os.path.join(fig_dir, f'{template_name}.png')
+
+if 'html' in outputs:
+ html_dir = outputs['html']
+ params['html_dir'] = html_dir
+ params['Output_File'] = os.path.join(html_dir, f'{template_name}.html')
+
+# Save runtime parameters
+with open('params.runtime.json', 'w') as f:
+ json.dump(params, f, indent=2)
+
+# Save clean params for Galaxy display
+params_display = {k: v for k, v in params.items()
+ if k not in ['Output_File', 'Figure_File', 'output_dir', 'figure_dir']}
+with open('config_used.json', 'w') as f:
+ json.dump(params_display, f, indent=2)
+
+print(f"[Runner] Saved runtime parameters")
+
+# ============================================================================
+# LOAD AND EXECUTE TEMPLATE
+# ============================================================================
+
+# Try to import from installed package first (Docker environment)
+template_module_name = template_filename.replace('.py', '')
+try:
+ import importlib
+ mod = importlib.import_module(f'spac.templates.{template_module_name}')
+ print(f"[Runner] Loaded template from package: spac.templates.{template_module_name}")
+except (ImportError, ModuleNotFoundError):
+ # Fallback to loading from file
+ print(f"[Runner] Package import failed, trying file load")
+ import importlib.util
+
+ # Standard locations
+ template_paths = [
+ f'/app/spac/templates/{template_filename}',
+ f'/opt/spac/templates/{template_filename}',
+ f'/opt/SCSAWorkflow/src/spac/templates/{template_filename}',
+ template_filename # Current directory
+ ]
+
+ spec = None
+ for path in template_paths:
+ if os.path.exists(path):
+ spec = importlib.util.spec_from_file_location("template_mod", path)
+ if spec:
+ print(f"[Runner] Found template at: {path}")
+ break
+
+ if not spec or not spec.loader:
+ print(f"[Runner] ERROR: Could not find template: {template_filename}")
+ sys.exit(1)
+
+ mod = importlib.util.module_from_spec(spec)
+ spec.loader.exec_module(mod)
+
+# Verify run_from_json exists
+if not hasattr(mod, 'run_from_json'):
+ print('[Runner] ERROR: Template missing run_from_json function')
+ sys.exit(2)
+
+# Check function signature
+sig = inspect.signature(mod.run_from_json)
+kwargs = {}
+
+if 'save_results' in sig.parameters:
+ kwargs['save_results'] = True
+if 'show_plot' in sig.parameters:
+ kwargs['show_plot'] = False
+
+print(f"[Runner] Executing template with kwargs: {kwargs}")
+
+# Execute template
+try:
+ result = mod.run_from_json('params.runtime.json', **kwargs)
+ print(f"[Runner] Template completed, returned: {type(result).__name__}")
+
+ # Handle different return types
+ if result is not None:
+ if isinstance(result, dict):
+ print(f"[Runner] Template saved files: {list(result.keys())}")
+ elif isinstance(result, tuple):
+ # Handle tuple returns
+ saved_count = 0
+ for i, item in enumerate(result):
+ if hasattr(item, 'savefig') and 'figures' in outputs:
+ import matplotlib
+ matplotlib.use('Agg')
+ import matplotlib.pyplot as plt
+ fig_path = os.path.join(outputs['figures'], f'figure_{i+1}.png')
+ item.savefig(fig_path, dpi=300, bbox_inches='tight')
+ plt.close(item)
+ saved_count += 1
+ print(f"[Runner] Saved figure to {fig_path}")
+ elif hasattr(item, 'to_csv') and 'DataFrames' in outputs:
+ df_path = os.path.join(outputs['DataFrames'], f'table_{i+1}.csv')
+ item.to_csv(df_path, index=True)
+ saved_count += 1
+ print(f"[Runner] Saved DataFrame to {df_path}")
+
+ if saved_count > 0:
+ print(f"[Runner] Saved {saved_count} in-memory results")
+
+ elif hasattr(result, 'to_csv') and 'DataFrames' in outputs:
+ df_path = os.path.join(outputs['DataFrames'], 'output.csv')
+ result.to_csv(df_path, index=True)
+ print(f"[Runner] Saved DataFrame to {df_path}")
+
+ elif hasattr(result, 'savefig') and 'figures' in outputs:
+ import matplotlib
+ matplotlib.use('Agg')
+ import matplotlib.pyplot as plt
+ fig_path = os.path.join(outputs['figures'], 'figure.png')
+ result.savefig(fig_path, dpi=300, bbox_inches='tight')
+ plt.close(result)
+ print(f"[Runner] Saved figure to {fig_path}")
+
+ elif hasattr(result, 'write_h5ad') and 'analysis' in outputs:
+ result.write_h5ad(outputs['analysis'])
+ print(f"[Runner] Saved AnnData to {outputs['analysis']}")
+
+except Exception as e:
+ print(f"[Runner] ERROR in template execution: {e}")
+ print(f"[Runner] Error type: {type(e).__name__}")
+ traceback.print_exc()
+
+ # Debug help for common issues
+ if "String Columns must be a *list*" in str(e):
+ print("\n[Runner] DEBUG: String_Columns validation failed")
+ print(f"[Runner] Current String_Columns value: {params.get('String_Columns')}")
+ print(f"[Runner] Type: {type(params.get('String_Columns'))}")
+
+ elif "regex pattern" in str(e).lower() or "^8$" in str(e):
+ print("\n[Runner] DEBUG: This appears to be a column index issue")
+ print("[Runner] Check that column indices were properly converted to names")
+ print("[Runner] Current Features_to_Analyze value:", params.get('Features_to_Analyze'))
+ print("[Runner] Current Feature_Regex value:", params.get('Feature_Regex'))
+
+ sys.exit(1)
+
+# Verify outputs
+print("[Runner] Verifying outputs...")
+found_outputs = False
+
+for output_type, path in outputs.items():
+ if output_type == 'analysis':
+ if os.path.exists(path):
+ size = os.path.getsize(path)
+ print(f"[Runner] ✔ {output_type}: {path} ({size:,} bytes)")
+ found_outputs = True
+ else:
+ print(f"[Runner] ✗ {output_type}: NOT FOUND")
+ else:
+ if os.path.exists(path) and os.path.isdir(path):
+ files = os.listdir(path)
+ if files:
+ print(f"[Runner] ✔ {output_type}: {len(files)} files")
+ for f in files[:3]:
+ print(f"[Runner] - {f}")
+ if len(files) > 3:
+ print(f"[Runner] ... and {len(files)-3} more")
+ found_outputs = True
+ else:
+ print(f"[Runner] ⚠ {output_type}: directory empty")
+
+# Check for files in working directory and move them
+print("[Runner] Checking for files in working directory...")
+for file in os.listdir('.'):
+ if os.path.isdir(file) or file in ['params.runtime.json', 'config_used.json',
+ 'tool_stdout.txt', 'outputs_returned.json']:
+ continue
+
+ if file.endswith('.csv') and 'DataFrames' in outputs:
+ if not os.path.exists(os.path.join(outputs['DataFrames'], file)):
+ target = os.path.join(outputs['DataFrames'], file)
+ shutil.move(file, target)
+ print(f"[Runner] Moved {file} to {target}")
+ found_outputs = True
+ elif file.endswith(('.png', '.pdf', '.jpg', '.svg')) and 'figures' in outputs:
+ if not os.path.exists(os.path.join(outputs['figures'], file)):
+ target = os.path.join(outputs['figures'], file)
+ shutil.move(file, target)
+ print(f"[Runner] Moved {file} to {target}")
+ found_outputs = True
+
+if found_outputs:
+ print("[Runner] === SUCCESS ===")
+else:
+ print("[Runner] WARNING: No outputs created")
+
+PYTHON_RUNNER
+
+EXIT_CODE=$?
+
+if [ $EXIT_CODE -ne 0 ]; then
+ echo "ERROR: Template execution failed with exit code $EXIT_CODE"
+ exit 1
+fi
+
+echo "=== Execution Complete ==="
+exit 0
\ No newline at end of file
diff --git a/galaxy_tools/spac_boxplot/spac_boxplot.xml b/galaxy_tools/spac_boxplot/spac_boxplot.xml
new file mode 100644
index 00000000..18d80004
--- /dev/null
+++ b/galaxy_tools/spac_boxplot/spac_boxplot.xml
@@ -0,0 +1,92 @@
+
+ Create a boxplot visualization of the features in the analysis dataset.
+
+
+ nciccbr/spac:v1
+
+
+
+ python3
+
+
+ tool_stdout.txt &&
+
+ ## Run the universal wrapper (template name without .py extension)
+ bash $__tool_directory__/run_spac_template.sh "$params_json" boxplot
+ ]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ @misc{spac_toolkit,
+ author = {FNLCR DMAP Team},
+ title = {SPAC: SPAtial single-Cell analysis},
+ year = {2024},
+ url = {https://github.com/FNLCR-DMAP/SCSAWorkflow}
+ }
+
+
+
\ No newline at end of file
diff --git a/galaxy_tools/spac_load_csv_files/run_spac_template.sh b/galaxy_tools/spac_load_csv_files/run_spac_template.sh
new file mode 100644
index 00000000..4ec7c784
--- /dev/null
+++ b/galaxy_tools/spac_load_csv_files/run_spac_template.sh
@@ -0,0 +1,786 @@
+#!/usr/bin/env bash
+# run_spac_template.sh - SPAC wrapper with column index conversion
+# Version: 5.5.0 - Fixed load_csv_files to output single CSV
+set -euo pipefail
+
+PARAMS_JSON="${1:?Missing params.json path}"
+TEMPLATE_BASE="${2:?Missing template base name}"
+
+# Handle both base names and full .py filenames
+if [[ "$TEMPLATE_BASE" == *.py ]]; then
+ TEMPLATE_PY="$TEMPLATE_BASE"
+elif [[ "$TEMPLATE_BASE" == "load_csv_files_with_config" ]]; then
+ TEMPLATE_PY="load_csv_files_with_config.py"
+else
+ TEMPLATE_PY="${TEMPLATE_BASE}_template.py"
+fi
+
+# Use SPAC Python environment
+SPAC_PYTHON="${SPAC_PYTHON:-python3}"
+
+echo "=== SPAC Template Wrapper v5.5 ==="
+echo "Parameters: $PARAMS_JSON"
+echo "Template base: $TEMPLATE_BASE"
+echo "Template file: $TEMPLATE_PY"
+echo "Python: $SPAC_PYTHON"
+
+# Run template through Python
+"$SPAC_PYTHON" - <<'PYTHON_RUNNER' "$PARAMS_JSON" "$TEMPLATE_PY" 2>&1 | tee tool_stdout.txt
+import json
+import os
+import sys
+import copy
+import traceback
+import inspect
+import shutil
+import re
+import csv
+
+# Get arguments
+params_path = sys.argv[1]
+template_filename = sys.argv[2]
+
+print(f"[Runner] Loading parameters from: {params_path}")
+print(f"[Runner] Template: {template_filename}")
+
+# Load parameters
+with open(params_path, 'r') as f:
+ params = json.load(f)
+
+# Extract template name
+template_name = os.path.basename(template_filename).replace('_template.py', '').replace('.py', '')
+
+# ===========================================================================
+# DE-SANITIZATION AND PARSING
+# ===========================================================================
+def _unsanitize(s: str) -> str:
+ """Remove Galaxy's parameter sanitization tokens"""
+ if not isinstance(s, str):
+ return s
+ replacements = {
+ '__ob__': '[', '__cb__': ']',
+ '__oc__': '{', '__cc__': '}',
+ '__dq__': '"', '__sq__': "'",
+ '__gt__': '>', '__lt__': '<',
+ '__cn__': '\n', '__cr__': '\r',
+ '__tc__': '\t', '__pd__': '#',
+ '__at__': '@', '__cm__': ','
+ }
+ for token, char in replacements.items():
+ s = s.replace(token, char)
+ return s
+
+def _maybe_parse(v):
+ """Recursively de-sanitize and JSON-parse strings where possible."""
+ if isinstance(v, str):
+ u = _unsanitize(v).strip()
+ if (u.startswith('[') and u.endswith(']')) or (u.startswith('{') and u.endswith('}')):
+ try:
+ return json.loads(u)
+ except Exception:
+ return u
+ return u
+ elif isinstance(v, dict):
+ return {k: _maybe_parse(val) for k, val in v.items()}
+ elif isinstance(v, list):
+ return [_maybe_parse(item) for item in v]
+ return v
+
+# Normalize the whole params tree
+params = _maybe_parse(params)
+
+# ===========================================================================
+# COLUMN INDEX CONVERSION - CRITICAL FOR SETUP ANALYSIS
+# ===========================================================================
+def should_skip_column_conversion(template_name):
+ """Some templates don't need column index conversion"""
+ return 'load_csv' in template_name
+
+def read_file_headers(filepath):
+ """Read column headers from various file formats"""
+ try:
+ import pandas as pd
+
+ # Try pandas auto-detect
+ try:
+ df = pd.read_csv(filepath, nrows=1)
+ if len(df.columns) > 1 or not df.columns[0].startswith('Unnamed'):
+ columns = df.columns.tolist()
+ print(f"[Runner] Pandas auto-detected delimiter, found {len(columns)} columns")
+ return columns
+ except:
+ pass
+
+ # Try common delimiters
+ for sep in ['\t', ',', ';', '|', ' ']:
+ try:
+ df = pd.read_csv(filepath, sep=sep, nrows=1)
+ if len(df.columns) > 1:
+ columns = df.columns.tolist()
+ sep_name = {'\t': 'tab', ',': 'comma', ';': 'semicolon',
+ '|': 'pipe', ' ': 'space'}.get(sep, sep)
+ print(f"[Runner] Pandas found {sep_name}-delimited file with {len(columns)} columns")
+ return columns
+ except:
+ continue
+ except ImportError:
+ print("[Runner] pandas not available, using csv fallback")
+
+ # CSV module fallback
+ try:
+ with open(filepath, 'r', encoding='utf-8', errors='replace', newline='') as f:
+ sample = f.read(8192)
+ f.seek(0)
+
+ try:
+ dialect = csv.Sniffer().sniff(sample, delimiters='\t,;| ')
+ reader = csv.reader(f, dialect)
+ header = next(reader)
+ columns = [h.strip().strip('"') for h in header if h.strip()]
+ if columns:
+ print(f"[Runner] csv.Sniffer detected {len(columns)} columns")
+ return columns
+ except:
+ f.seek(0)
+ first_line = f.readline().strip()
+ for sep in ['\t', ',', ';', '|']:
+ if sep in first_line:
+ columns = [h.strip().strip('"') for h in first_line.split(sep)]
+ if len(columns) > 1:
+ print(f"[Runner] Manual parsing found {len(columns)} columns")
+ return columns
+ except Exception as e:
+ print(f"[Runner] Failed to read headers: {e}")
+
+ return None
+
+def should_convert_param(key, value):
+ """Check if parameter contains column indices"""
+ if value is None or value == "" or value == [] or value == {}:
+ return False
+
+ key_lower = key.lower()
+
+ # Skip String_Columns - it's names not indices
+ if key == 'String_Columns':
+ return False
+
+ # Skip output/path parameters
+ if any(x in key_lower for x in ['output', 'path', 'file', 'directory', 'save', 'export']):
+ return False
+
+ # Skip regex/pattern parameters (but we'll handle Feature_Regex specially)
+ if 'regex' in key_lower or 'pattern' in key_lower:
+ return False
+
+ # Parameters with 'column' likely have indices
+ if 'column' in key_lower or '_col' in key_lower:
+ return True
+
+ # Known index parameters
+ if key in {'Annotation_s_', 'Features_to_Analyze', 'Features', 'Markers', 'Markers_to_Plot', 'Phenotypes'}:
+ return True
+
+ # Check if values look like indices
+ if isinstance(value, list):
+ return all(isinstance(v, int) or (isinstance(v, str) and v.strip().isdigit()) for v in value if v)
+ elif isinstance(value, (int, str)):
+ return isinstance(value, int) or (isinstance(value, str) and value.strip().isdigit())
+
+ return False
+
+def convert_single_index(item, columns):
+ """Convert a single column index to name"""
+ if isinstance(item, str) and not item.strip().isdigit():
+ return item
+
+ try:
+ if isinstance(item, str):
+ item = int(item.strip())
+ elif isinstance(item, float):
+ item = int(item)
+ except (ValueError, AttributeError):
+ return item
+
+ if isinstance(item, int):
+ idx = item - 1 # Galaxy uses 1-based indexing
+ if 0 <= idx < len(columns):
+ return columns[idx]
+ elif 0 <= item < len(columns): # Fallback for 0-based
+ print(f"[Runner] Note: Found 0-based index {item}, converting to {columns[item]}")
+ return columns[item]
+ else:
+ print(f"[Runner] Warning: Index {item} out of range (have {len(columns)} columns)")
+
+ return item
+
+def convert_column_indices_to_names(params, template_name):
+ """Convert column indices to names for templates that need it"""
+
+ if should_skip_column_conversion(template_name):
+ print(f"[Runner] Skipping column conversion for {template_name}")
+ return params
+
+ print(f"[Runner] Checking for column index conversion (template: {template_name})")
+
+ # Find input file
+ input_file = None
+ input_keys = ['Upstream_Dataset', 'Upstream_Analysis', 'CSV_Files',
+ 'Input_File', 'Input_Dataset', 'Data_File']
+
+ for key in input_keys:
+ if key in params:
+ value = params[key]
+ if isinstance(value, list) and value:
+ value = value[0]
+ if value and os.path.exists(str(value)):
+ input_file = str(value)
+ print(f"[Runner] Found input file via {key}: {os.path.basename(input_file)}")
+ break
+
+ if not input_file:
+ print("[Runner] No input file found for column conversion")
+ return params
+
+ # Read headers
+ columns = read_file_headers(input_file)
+ if not columns:
+ print("[Runner] Could not read column headers, skipping conversion")
+ return params
+
+ print(f"[Runner] Successfully read {len(columns)} columns")
+ if len(columns) <= 10:
+ print(f"[Runner] Columns: {columns}")
+ else:
+ print(f"[Runner] First 10 columns: {columns[:10]}")
+
+ # Convert indices to names
+ converted_count = 0
+ for key, value in params.items():
+ # Skip non-column parameters
+ if not should_convert_param(key, value):
+ continue
+
+ # Convert indices
+ if isinstance(value, list):
+ converted_items = []
+ for item in value:
+ converted = convert_single_index(item, columns)
+ if converted is not None:
+ converted_items.append(converted)
+ converted_value = converted_items
+ else:
+ converted_value = convert_single_index(value, columns)
+
+ if value != converted_value:
+ params[key] = converted_value
+ converted_count += 1
+ print(f"[Runner] Converted {key}: {value} -> {converted_value}")
+
+ if converted_count > 0:
+ print(f"[Runner] Total conversions: {converted_count} parameters")
+
+ # CRITICAL: Handle Feature_Regex specially
+ if 'Feature_Regex' in params:
+ value = params['Feature_Regex']
+ if value in [[], [""], "__ob____cb__", "[]", "", None]:
+ params['Feature_Regex'] = ""
+ print("[Runner] Cleared empty Feature_Regex parameter")
+ elif isinstance(value, list) and value:
+ params['Feature_Regex'] = "|".join(str(v) for v in value if v)
+ print(f"[Runner] Joined Feature_Regex list: {params['Feature_Regex']}")
+
+ return params
+
+# ===========================================================================
+# APPLY COLUMN CONVERSION
+# ===========================================================================
+print("[Runner] Step 1: Converting column indices to names")
+params = convert_column_indices_to_names(params, template_name)
+
+# ===========================================================================
+# SPECIAL HANDLING FOR SPECIFIC TEMPLATES
+# ===========================================================================
+
+# Helper function to coerce singleton lists to strings for load_csv
+def _coerce_singleton_paths_for_load_csv(params, template_name):
+ """For load_csv templates, flatten 1-item lists to strings for path-like params."""
+ if 'load_csv' not in template_name:
+ return params
+ for key in ('CSV_Files', 'CSV_Files_Configuration'):
+ val = params.get(key)
+ if isinstance(val, list) and len(val) == 1 and isinstance(val[0], (str, bytes)):
+ params[key] = val[0]
+ print(f"[Runner] Coerced {key} from list -> string")
+ return params
+
+# Special handling for String_Columns in load_csv templates
+if 'load_csv' in template_name and 'String_Columns' in params:
+ value = params['String_Columns']
+ if not isinstance(value, list):
+ if value in [None, "", "[]", "__ob____cb__"]:
+ params['String_Columns'] = []
+ elif isinstance(value, str):
+ s = value.strip()
+ if s.startswith('[') and s.endswith(']'):
+ try:
+ params['String_Columns'] = json.loads(s)
+ except:
+ params['String_Columns'] = [s] if s else []
+ elif ',' in s:
+ params['String_Columns'] = [item.strip() for item in s.split(',') if item.strip()]
+ else:
+ params['String_Columns'] = [s] if s else []
+ else:
+ params['String_Columns'] = []
+ print(f"[Runner] Ensured String_Columns is list: {params['String_Columns']}")
+
+# Apply coercion for load_csv files
+params = _coerce_singleton_paths_for_load_csv(params, template_name)
+
+# Fix for Load CSV Files directory
+if 'load_csv' in template_name and 'CSV_Files' in params:
+ # Check if csv_input_dir was created by Galaxy command
+ if os.path.exists('csv_input_dir') and os.path.isdir('csv_input_dir'):
+ params['CSV_Files'] = 'csv_input_dir'
+ print("[Runner] Using csv_input_dir created by Galaxy")
+ elif isinstance(params['CSV_Files'], str) and os.path.isfile(params['CSV_Files']):
+ # We have a single file path, need to get its directory
+ params['CSV_Files'] = os.path.dirname(params['CSV_Files'])
+ print(f"[Runner] Using directory of CSV file: {params['CSV_Files']}")
+
+# ===========================================================================
+# LIST PARAMETER NORMALIZATION
+# ===========================================================================
+def should_normalize_as_list(key, value):
+ """Determine if a parameter should be normalized as a list"""
+ if isinstance(value, list):
+ return True
+
+ if value is None or value == "":
+ return False
+
+ key_lower = key.lower()
+
+ # Skip regex parameters
+ if 'regex' in key_lower or 'pattern' in key_lower:
+ return False
+
+ # Skip known single-value parameters
+ if any(x in key_lower for x in ['single', 'one', 'first', 'second', 'primary']):
+ return False
+
+ # Plural forms suggest lists
+ if any(x in key_lower for x in ['features', 'markers', 'phenotypes', 'annotations',
+ 'columns', 'types', 'labels', 'regions', 'radii']):
+ return True
+
+ # Check for list separators
+ if isinstance(value, str):
+ if ',' in value or '\n' in value:
+ return True
+ if value.strip().startswith('[') and value.strip().endswith(']'):
+ return True
+
+ return False
+
+def normalize_to_list(value):
+ """Convert various input formats to a proper Python list"""
+ if value in (None, "", "All", ["All"], "all", ["all"]):
+ return ["All"]
+
+ if isinstance(value, list):
+ return value
+
+ if isinstance(value, str):
+ s = value.strip()
+
+ # Try JSON parsing
+ if s.startswith('[') and s.endswith(']'):
+ try:
+ parsed = json.loads(s)
+ return parsed if isinstance(parsed, list) else [str(parsed)]
+ except:
+ pass
+
+ # Split by comma
+ if ',' in s:
+ return [item.strip() for item in s.split(',') if item.strip()]
+
+ # Split by newline
+ if '\n' in s:
+ return [item.strip() for item in s.split('\n') if item.strip()]
+
+ # Single value
+ return [s] if s else []
+
+ return [value] if value is not None else []
+
+# Normalize list parameters
+print("[Runner] Step 2: Normalizing list parameters")
+list_count = 0
+for key, value in list(params.items()):
+ if should_normalize_as_list(key, value):
+ original = value
+ normalized = normalize_to_list(value)
+ if original != normalized:
+ params[key] = normalized
+ list_count += 1
+ if len(str(normalized)) > 100:
+ print(f"[Runner] Normalized {key}: {type(original).__name__} -> list of {len(normalized)} items")
+ else:
+ print(f"[Runner] Normalized {key}: {original} -> {normalized}")
+
+if list_count > 0:
+ print(f"[Runner] Normalized {list_count} list parameters")
+
+# CRITICAL FIX: Handle single-element lists for coordinate columns
+# These should be strings, not lists
+coordinate_keys = ['X_Coordinate_Column', 'Y_Coordinate_Column', 'X_centroid', 'Y_centroid']
+for key in coordinate_keys:
+ if key in params:
+ value = params[key]
+ if isinstance(value, list) and len(value) == 1:
+ params[key] = value[0]
+ print(f"[Runner] Extracted single value from {key}: {value} -> {params[key]}")
+
+# Also check for any key ending with '_Column' that has a single-element list
+for key in list(params.keys()):
+ if key.endswith('_Column') and isinstance(params[key], list) and len(params[key]) == 1:
+ original = params[key]
+ params[key] = params[key][0]
+ print(f"[Runner] Extracted single value from {key}: {original} -> {params[key]}")
+
+# ===========================================================================
+# OUTPUTS HANDLING
+# ===========================================================================
+
+# Extract outputs specification
+raw_outputs = params.pop('outputs', {})
+outputs = {}
+
+if isinstance(raw_outputs, dict):
+ outputs = raw_outputs
+elif isinstance(raw_outputs, str):
+ try:
+ maybe = json.loads(_unsanitize(raw_outputs))
+ if isinstance(maybe, dict):
+ outputs = maybe
+ except Exception:
+ pass
+
+if not isinstance(outputs, dict) or not outputs:
+ print("[Runner] Warning: 'outputs' missing or not a dict; using defaults")
+ if 'boxplot' in template_name or 'plot' in template_name or 'histogram' in template_name:
+ outputs = {'DataFrames': 'dataframe_folder', 'figures': 'figure_folder'}
+ elif 'load_csv' in template_name:
+ outputs = {'DataFrames': 'dataframe_folder'}
+ elif 'interactive' in template_name:
+ outputs = {'html': 'html_folder'}
+ else:
+ outputs = {'analysis': 'transform_output.pickle'}
+
+print(f"[Runner] Outputs -> {list(outputs.keys())}")
+
+# Create output directories
+for output_type, path in outputs.items():
+ if output_type != 'analysis' and path:
+ os.makedirs(path, exist_ok=True)
+ print(f"[Runner] Created {output_type} directory: {path}")
+
+# Add output paths to params
+params['save_results'] = True
+
+if 'analysis' in outputs:
+ params['output_path'] = outputs['analysis']
+ params['Output_Path'] = outputs['analysis']
+ params['Output_File'] = outputs['analysis']
+
+if 'DataFrames' in outputs:
+ df_dir = outputs['DataFrames']
+ params['output_dir'] = df_dir
+ params['Export_Dir'] = df_dir
+ # For load_csv, use a specific filename for the combined dataframe
+ if 'load_csv' in template_name:
+ params['Output_File'] = os.path.join(df_dir, 'combined_dataframe.csv')
+ else:
+ params['Output_File'] = os.path.join(df_dir, f'{template_name}_output.csv')
+
+if 'figures' in outputs:
+ fig_dir = outputs['figures']
+ params['figure_dir'] = fig_dir
+ params['Figure_Dir'] = fig_dir
+ params['Figure_File'] = os.path.join(fig_dir, f'{template_name}.png')
+
+if 'html' in outputs:
+ html_dir = outputs['html']
+ params['html_dir'] = html_dir
+ params['Output_File'] = os.path.join(html_dir, f'{template_name}.html')
+
+# Save runtime parameters
+with open('params.runtime.json', 'w') as f:
+ json.dump(params, f, indent=2)
+
+# Save clean params for Galaxy display
+params_display = {k: v for k, v in params.items()
+ if k not in ['Output_File', 'Figure_File', 'output_dir', 'figure_dir']}
+with open('config_used.json', 'w') as f:
+ json.dump(params_display, f, indent=2)
+
+print(f"[Runner] Saved runtime parameters")
+
+# ============================================================================
+# LOAD AND EXECUTE TEMPLATE
+# ============================================================================
+
+# Try to import from installed package first (Docker environment)
+template_module_name = template_filename.replace('.py', '')
+try:
+ import importlib
+ mod = importlib.import_module(f'spac.templates.{template_module_name}')
+ print(f"[Runner] Loaded template from package: spac.templates.{template_module_name}")
+except (ImportError, ModuleNotFoundError):
+ # Fallback to loading from file
+ print(f"[Runner] Package import failed, trying file load")
+ import importlib.util
+
+ # Standard locations
+ template_paths = [
+ f'/app/spac/templates/{template_filename}',
+ f'/opt/spac/templates/{template_filename}',
+ f'/opt/SCSAWorkflow/src/spac/templates/{template_filename}',
+ template_filename # Current directory
+ ]
+
+ spec = None
+ for path in template_paths:
+ if os.path.exists(path):
+ spec = importlib.util.spec_from_file_location("template_mod", path)
+ if spec:
+ print(f"[Runner] Found template at: {path}")
+ break
+
+ if not spec or not spec.loader:
+ print(f"[Runner] ERROR: Could not find template: {template_filename}")
+ sys.exit(1)
+
+ mod = importlib.util.module_from_spec(spec)
+ spec.loader.exec_module(mod)
+
+# Verify run_from_json exists
+if not hasattr(mod, 'run_from_json'):
+ print('[Runner] ERROR: Template missing run_from_json function')
+ sys.exit(2)
+
+# Check function signature
+sig = inspect.signature(mod.run_from_json)
+kwargs = {}
+
+if 'save_results' in sig.parameters:
+ kwargs['save_results'] = True
+if 'show_plot' in sig.parameters:
+ kwargs['show_plot'] = False
+
+print(f"[Runner] Executing template with kwargs: {kwargs}")
+
+# Execute template
+try:
+ result = mod.run_from_json('params.runtime.json', **kwargs)
+ print(f"[Runner] Template completed, returned: {type(result).__name__}")
+
+ # ===========================================================================
+ # SPECIAL HANDLING FOR LOAD_CSV_FILES TEMPLATE
+ # ===========================================================================
+ if 'load_csv' in template_name:
+ print("[Runner] Special handling for load_csv_files template")
+
+ # The template should return a DataFrame or save CSV files
+ if result is not None:
+ try:
+ import pandas as pd
+
+ # If result is a DataFrame, save it directly
+ if hasattr(result, 'to_csv'):
+ output_path = os.path.join(outputs.get('DataFrames', 'dataframe_folder'), 'combined_dataframe.csv')
+ result.to_csv(output_path, index=False, header=True)
+ print(f"[Runner] Saved combined DataFrame to {output_path}")
+
+ # If result is a dict of DataFrames, combine them
+ elif isinstance(result, dict):
+ dfs = []
+ for name, df in result.items():
+ if hasattr(df, 'to_csv'):
+ # Add a source column to track origin
+ df['_source_file'] = name
+ dfs.append(df)
+
+ if dfs:
+ combined = pd.concat(dfs, ignore_index=True)
+ output_path = os.path.join(outputs.get('DataFrames', 'dataframe_folder'), 'combined_dataframe.csv')
+ combined.to_csv(output_path, index=False, header=True)
+ print(f"[Runner] Combined {len(dfs)} DataFrames into {output_path}")
+ except Exception as e:
+ print(f"[Runner] Could not combine DataFrames: {e}")
+
+ # Check if CSV files were saved in the dataframe folder
+ df_dir = outputs.get('DataFrames', 'dataframe_folder')
+ if os.path.exists(df_dir):
+ csv_files = [f for f in os.listdir(df_dir) if f.endswith('.csv')]
+
+ # If we have multiple CSV files but no combined_dataframe.csv, create it
+ if len(csv_files) > 1 and 'combined_dataframe.csv' not in csv_files:
+ try:
+ import pandas as pd
+ dfs = []
+ for csv_file in csv_files:
+ filepath = os.path.join(df_dir, csv_file)
+ df = pd.read_csv(filepath)
+ df['_source_file'] = csv_file.replace('.csv', '')
+ dfs.append(df)
+
+ combined = pd.concat(dfs, ignore_index=True)
+ output_path = os.path.join(df_dir, 'combined_dataframe.csv')
+ combined.to_csv(output_path, index=False, header=True)
+ print(f"[Runner] Combined {len(csv_files)} CSV files into {output_path}")
+ except Exception as e:
+ print(f"[Runner] Could not combine CSV files: {e}")
+ # If combination fails, just rename the first CSV
+ if csv_files:
+ src = os.path.join(df_dir, csv_files[0])
+ dst = os.path.join(df_dir, 'combined_dataframe.csv')
+ shutil.copy2(src, dst)
+ print(f"[Runner] Copied {csv_files[0]} to combined_dataframe.csv")
+
+ # If we have exactly one CSV file and it's not named combined_dataframe.csv, rename it
+ elif len(csv_files) == 1 and csv_files[0] != 'combined_dataframe.csv':
+ src = os.path.join(df_dir, csv_files[0])
+ dst = os.path.join(df_dir, 'combined_dataframe.csv')
+ shutil.move(src, dst)
+ print(f"[Runner] Renamed {csv_files[0]} to combined_dataframe.csv")
+
+ # ===========================================================================
+ # HANDLE OTHER RETURN TYPES
+ # ===========================================================================
+ elif result is not None:
+ if isinstance(result, dict):
+ print(f"[Runner] Template saved files: {list(result.keys())}")
+ elif isinstance(result, tuple):
+ # Handle tuple returns
+ saved_count = 0
+ for i, item in enumerate(result):
+ if hasattr(item, 'savefig') and 'figures' in outputs:
+ import matplotlib
+ matplotlib.use('Agg')
+ import matplotlib.pyplot as plt
+ fig_path = os.path.join(outputs['figures'], f'figure_{i+1}.png')
+ item.savefig(fig_path, dpi=300, bbox_inches='tight')
+ plt.close(item)
+ saved_count += 1
+ print(f"[Runner] Saved figure to {fig_path}")
+ elif hasattr(item, 'to_csv') and 'DataFrames' in outputs:
+ df_path = os.path.join(outputs['DataFrames'], f'table_{i+1}.csv')
+ item.to_csv(df_path, index=True)
+ saved_count += 1
+ print(f"[Runner] Saved DataFrame to {df_path}")
+
+ if saved_count > 0:
+ print(f"[Runner] Saved {saved_count} in-memory results")
+
+ elif hasattr(result, 'to_csv') and 'DataFrames' in outputs:
+ df_path = os.path.join(outputs['DataFrames'], 'output.csv')
+ result.to_csv(df_path, index=False, header=True)
+ print(f"[Runner] Saved DataFrame to {df_path}")
+
+ elif hasattr(result, 'savefig') and 'figures' in outputs:
+ import matplotlib
+ matplotlib.use('Agg')
+ import matplotlib.pyplot as plt
+ fig_path = os.path.join(outputs['figures'], 'figure.png')
+ result.savefig(fig_path, dpi=300, bbox_inches='tight')
+ plt.close(result)
+ print(f"[Runner] Saved figure to {fig_path}")
+
+ elif hasattr(result, 'write_h5ad') and 'analysis' in outputs:
+ result.write_h5ad(outputs['analysis'])
+ print(f"[Runner] Saved AnnData to {outputs['analysis']}")
+
+except Exception as e:
+ print(f"[Runner] ERROR in template execution: {e}")
+ print(f"[Runner] Error type: {type(e).__name__}")
+ traceback.print_exc()
+
+ # Debug help for common issues
+ if "String Columns must be a *list*" in str(e):
+ print("\n[Runner] DEBUG: String_Columns validation failed")
+ print(f"[Runner] Current String_Columns value: {params.get('String_Columns')}")
+ print(f"[Runner] Type: {type(params.get('String_Columns'))}")
+
+ elif "regex pattern" in str(e).lower() or "^8$" in str(e):
+ print("\n[Runner] DEBUG: This appears to be a column index issue")
+ print("[Runner] Check that column indices were properly converted to names")
+ print("[Runner] Current Features_to_Analyze value:", params.get('Features_to_Analyze'))
+ print("[Runner] Current Feature_Regex value:", params.get('Feature_Regex'))
+
+ sys.exit(1)
+
+# Verify outputs
+print("[Runner] Verifying outputs...")
+found_outputs = False
+
+for output_type, path in outputs.items():
+ if output_type == 'analysis':
+ if os.path.exists(path):
+ size = os.path.getsize(path)
+ print(f"[Runner] ✓ {output_type}: {path} ({size:,} bytes)")
+ found_outputs = True
+ else:
+ print(f"[Runner] ✗ {output_type}: NOT FOUND")
+ else:
+ if os.path.exists(path) and os.path.isdir(path):
+ files = os.listdir(path)
+ if files:
+ print(f"[Runner] ✓ {output_type}: {len(files)} files")
+ for f in files[:3]:
+ print(f"[Runner] - {f}")
+ if len(files) > 3:
+ print(f"[Runner] ... and {len(files)-3} more")
+ found_outputs = True
+ else:
+ print(f"[Runner] ⚠ {output_type}: directory empty")
+
+# Check for files in working directory and move them
+print("[Runner] Checking for files in working directory...")
+for file in os.listdir('.'):
+ if os.path.isdir(file) or file in ['params.runtime.json', 'config_used.json',
+ 'tool_stdout.txt', 'outputs_returned.json']:
+ continue
+
+ if file.endswith('.csv') and 'DataFrames' in outputs:
+ if not os.path.exists(os.path.join(outputs['DataFrames'], file)):
+ target = os.path.join(outputs['DataFrames'], file)
+ shutil.move(file, target)
+ print(f"[Runner] Moved {file} to {target}")
+ found_outputs = True
+ elif file.endswith(('.png', '.pdf', '.jpg', '.svg')) and 'figures' in outputs:
+ if not os.path.exists(os.path.join(outputs['figures'], file)):
+ target = os.path.join(outputs['figures'], file)
+ shutil.move(file, target)
+ print(f"[Runner] Moved {file} to {target}")
+ found_outputs = True
+
+if found_outputs:
+ print("[Runner] === SUCCESS ===")
+else:
+ print("[Runner] WARNING: No outputs created")
+
+PYTHON_RUNNER
+
+EXIT_CODE=$?
+
+if [ $EXIT_CODE -ne 0 ]; then
+ echo "ERROR: Template execution failed with exit code $EXIT_CODE"
+ exit 1
+fi
+
+echo "=== Execution Complete ==="
+exit 0
\ No newline at end of file
diff --git a/galaxy_tools/spac_load_csv_files/spac_load_csv_files.xml b/galaxy_tools/spac_load_csv_files/spac_load_csv_files.xml
new file mode 100644
index 00000000..ec185659
--- /dev/null
+++ b/galaxy_tools/spac_load_csv_files/spac_load_csv_files.xml
@@ -0,0 +1,89 @@
+
+ Load CSV files from NIDAP dataset and combine them into a single pandas dataframe for downstream ...
+
+
+ nciccbr/spac:v1
+
+
+
+ python3
+
+
+ tool_stdout.txt &&
+
+ ## Run the universal wrapper
+ bash $__tool_directory__/run_spac_template.sh "$params_json" load_csv_files_with_config
+ ]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ @misc{spac_toolkit,
+ author = {FNLCR DMAP Team},
+ title = {SPAC: SPAtial single-Cell analysis},
+ year = {2024},
+ url = {https://github.com/FNLCR-DMAP/SCSAWorkflow}
+ }
+
+
+
\ No newline at end of file
diff --git a/galaxy_tools/spac_setup_analysis/run_spac_template.sh b/galaxy_tools/spac_setup_analysis/run_spac_template.sh
new file mode 100644
index 00000000..15d7afee
--- /dev/null
+++ b/galaxy_tools/spac_setup_analysis/run_spac_template.sh
@@ -0,0 +1,849 @@
+#!/usr/bin/env bash
+# run_spac_template.sh - SPAC wrapper with column index conversion
+# Version: 5.5.0 - Enhanced text input handling for setup_analysis
+set -euo pipefail
+
+PARAMS_JSON="${1:?Missing params.json path}"
+TEMPLATE_BASE="${2:?Missing template base name}"
+
+# Handle both base names and full .py filenames
+if [[ "$TEMPLATE_BASE" == *.py ]]; then
+ TEMPLATE_PY="$TEMPLATE_BASE"
+elif [[ "$TEMPLATE_BASE" == "load_csv_files_with_config" ]]; then
+ TEMPLATE_PY="load_csv_files_with_config.py"
+else
+ TEMPLATE_PY="${TEMPLATE_BASE}_template.py"
+fi
+
+# Use SPAC Python environment
+SPAC_PYTHON="${SPAC_PYTHON:-python3}"
+
+echo "=== SPAC Template Wrapper v5.5 ==="
+echo "Parameters: $PARAMS_JSON"
+echo "Template base: $TEMPLATE_BASE"
+echo "Template file: $TEMPLATE_PY"
+echo "Python: $SPAC_PYTHON"
+
+# Run template through Python
+"$SPAC_PYTHON" - <<'PYTHON_RUNNER' "$PARAMS_JSON" "$TEMPLATE_PY" 2>&1 | tee tool_stdout.txt
+import json
+import os
+import sys
+import copy
+import traceback
+import inspect
+import shutil
+import re
+import csv
+
+# Get arguments
+params_path = sys.argv[1]
+template_filename = sys.argv[2]
+
+print(f"[Runner] Loading parameters from: {params_path}")
+print(f"[Runner] Template: {template_filename}")
+
+# Load parameters
+with open(params_path, 'r') as f:
+ params = json.load(f)
+
+# Extract template name
+template_name = os.path.basename(template_filename).replace('_template.py', '').replace('.py', '')
+
+# ===========================================================================
+# DE-SANITIZATION AND PARSING
+# ===========================================================================
+def _unsanitize(s: str) -> str:
+ """Remove Galaxy's parameter sanitization tokens"""
+ if not isinstance(s, str):
+ return s
+ replacements = {
+ '__ob__': '[', '__cb__': ']',
+ '__oc__': '{', '__cc__': '}',
+ '__dq__': '"', '__sq__': "'",
+ '__gt__': '>', '__lt__': '<',
+ '__cn__': '\n', '__cr__': '\r',
+ '__tc__': '\t', '__pd__': '#',
+ '__at__': '@', '__cm__': ','
+ }
+ for token, char in replacements.items():
+ s = s.replace(token, char)
+ return s
+
+def _maybe_parse(v):
+ """Recursively de-sanitize and JSON-parse strings where possible."""
+ if isinstance(v, str):
+ u = _unsanitize(v).strip()
+ if (u.startswith('[') and u.endswith(']')) or (u.startswith('{') and u.endswith('}')):
+ try:
+ return json.loads(u)
+ except Exception:
+ return u
+ return u
+ elif isinstance(v, dict):
+ return {k: _maybe_parse(val) for k, val in v.items()}
+ elif isinstance(v, list):
+ return [_maybe_parse(item) for item in v]
+ return v
+
+# Normalize the whole params tree
+params = _maybe_parse(params)
+
+# ===========================================================================
+# SETUP ANALYSIS SPECIAL HANDLING - Process text inputs before column conversion
+# ===========================================================================
+def process_setup_analysis_text_inputs(params, template_name):
+ """Process text-based column inputs for setup_analysis template"""
+ if 'setup_analysis' not in template_name:
+ return params
+
+ print("[Runner] Processing setup_analysis text inputs")
+
+ # Handle X_centroid and Y_centroid (single text values)
+ for coord_key in ['X_centroid', 'Y_centroid']:
+ if coord_key in params:
+ value = params[coord_key]
+ if isinstance(value, list) and len(value) == 1:
+ params[coord_key] = value[0]
+ # Ensure it's a string
+ if value:
+ params[coord_key] = str(value).strip()
+ print(f"[Runner] {coord_key} = '{params[coord_key]}'")
+
+ # Handle Annotation_s_ (text area, can be comma-separated or newline-separated)
+ if 'Annotation_s_' in params:
+ value = params['Annotation_s_']
+ if value:
+ # Convert to list if it's a string
+ if isinstance(value, str):
+ # Check for comma separation first, then newline
+ if ',' in value:
+ items = [item.strip() for item in value.split(',') if item.strip()]
+ elif '\n' in value:
+ items = [item.strip() for item in value.split('\n') if item.strip()]
+ else:
+ # Single value
+ items = [value.strip()] if value.strip() else []
+ params['Annotation_s_'] = items
+ print(f"[Runner] Parsed Annotation_s_: {len(items)} items -> {items}")
+ elif not isinstance(value, list):
+ params['Annotation_s_'] = []
+ else:
+ params['Annotation_s_'] = []
+
+ # Handle Feature_s_ (text area, can be comma-separated or newline-separated)
+ if 'Feature_s_' in params:
+ value = params['Feature_s_']
+ if value:
+ # Convert to list if it's a string
+ if isinstance(value, str):
+ # Check for comma separation first, then newline
+ if ',' in value:
+ items = [item.strip() for item in value.split(',') if item.strip()]
+ elif '\n' in value:
+ items = [item.strip() for item in value.split('\n') if item.strip()]
+ else:
+ # Single value
+ items = [value.strip()] if value.strip() else []
+ params['Feature_s_'] = items
+ print(f"[Runner] Parsed Feature_s_: {len(items)} items")
+ if len(items) <= 10:
+ print(f"[Runner] Features: {items}")
+ elif not isinstance(value, list):
+ params['Feature_s_'] = []
+ else:
+ params['Feature_s_'] = []
+
+ # Handle Feature_Regex (optional text field)
+ if 'Feature_Regex' in params:
+ value = params['Feature_Regex']
+ if value in [[], [""], "__ob____cb__", "[]", "", None]:
+ params['Feature_Regex'] = ""
+ elif isinstance(value, str):
+ params['Feature_Regex'] = value.strip()
+ print(f"[Runner] Feature_Regex = '{params.get('Feature_Regex', '')}'")
+
+ return params
+
+# ===========================================================================
+# COLUMN INDEX CONVERSION - For tools using column indices
+# ===========================================================================
+def should_skip_column_conversion(template_name):
+ """Some templates don't need column index conversion"""
+ # setup_analysis uses text inputs now, not indices
+ return 'load_csv' in template_name or 'setup_analysis' in template_name
+
+def read_file_headers(filepath):
+ """Read column headers from various file formats"""
+ try:
+ import pandas as pd
+
+ # Try pandas auto-detect
+ try:
+ df = pd.read_csv(filepath, nrows=1)
+ if len(df.columns) > 1 or not df.columns[0].startswith('Unnamed'):
+ columns = df.columns.tolist()
+ print(f"[Runner] Pandas auto-detected delimiter, found {len(columns)} columns")
+ return columns
+ except:
+ pass
+
+ # Try common delimiters
+ for sep in ['\t', ',', ';', '|', ' ']:
+ try:
+ df = pd.read_csv(filepath, sep=sep, nrows=1)
+ if len(df.columns) > 1:
+ columns = df.columns.tolist()
+ sep_name = {'\t': 'tab', ',': 'comma', ';': 'semicolon',
+ '|': 'pipe', ' ': 'space'}.get(sep, sep)
+ print(f"[Runner] Pandas found {sep_name}-delimited file with {len(columns)} columns")
+ return columns
+ except:
+ continue
+ except ImportError:
+ print("[Runner] pandas not available, using csv fallback")
+
+ # CSV module fallback
+ try:
+ with open(filepath, 'r', encoding='utf-8', errors='replace', newline='') as f:
+ sample = f.read(8192)
+ f.seek(0)
+
+ try:
+ dialect = csv.Sniffer().sniff(sample, delimiters='\t,;| ')
+ reader = csv.reader(f, dialect)
+ header = next(reader)
+ columns = [h.strip().strip('"') for h in header if h.strip()]
+ if columns:
+ print(f"[Runner] csv.Sniffer detected {len(columns)} columns")
+ return columns
+ except:
+ f.seek(0)
+ first_line = f.readline().strip()
+ for sep in ['\t', ',', ';', '|']:
+ if sep in first_line:
+ columns = [h.strip().strip('"') for h in first_line.split(sep)]
+ if len(columns) > 1:
+ print(f"[Runner] Manual parsing found {len(columns)} columns")
+ return columns
+ except Exception as e:
+ print(f"[Runner] Failed to read headers: {e}")
+
+ return None
+
+def should_convert_param(key, value):
+ """Check if parameter contains column indices"""
+ if value is None or value == "" or value == [] or value == {}:
+ return False
+
+ key_lower = key.lower()
+
+ # Skip String_Columns - it's names not indices
+ if key == 'String_Columns':
+ return False
+
+ # Skip the text-based parameters from setup_analysis
+ if key in ['X_centroid', 'Y_centroid', 'Annotation_s_', 'Feature_s_', 'Feature_Regex']:
+ return False
+
+ # Skip output/path parameters
+ if any(x in key_lower for x in ['output', 'path', 'file', 'directory', 'save', 'export']):
+ return False
+
+ # Skip regex/pattern parameters
+ if 'regex' in key_lower or 'pattern' in key_lower:
+ return False
+
+ # Parameters with 'column' likely have indices
+ if 'column' in key_lower or '_col' in key_lower:
+ return True
+
+ # Known index parameters (but not the text-based ones)
+ if key in {'Features_to_Analyze', 'Features', 'Markers', 'Markers_to_Plot', 'Phenotypes'}:
+ return True
+
+ # Check if values look like indices
+ if isinstance(value, list):
+ return all(isinstance(v, int) or (isinstance(v, str) and v.strip().isdigit()) for v in value if v)
+ elif isinstance(value, (int, str)):
+ return isinstance(value, int) or (isinstance(value, str) and value.strip().isdigit())
+
+ return False
+
+def convert_single_index(item, columns):
+ """Convert a single column index to name"""
+ if isinstance(item, str) and not item.strip().isdigit():
+ return item
+
+ try:
+ if isinstance(item, str):
+ item = int(item.strip())
+ elif isinstance(item, float):
+ item = int(item)
+ except (ValueError, AttributeError):
+ return item
+
+ if isinstance(item, int):
+ idx = item - 1 # Galaxy uses 1-based indexing
+ if 0 <= idx < len(columns):
+ return columns[idx]
+ elif 0 <= item < len(columns): # Fallback for 0-based
+ print(f"[Runner] Note: Found 0-based index {item}, converting to {columns[item]}")
+ return columns[item]
+ else:
+ print(f"[Runner] Warning: Index {item} out of range (have {len(columns)} columns)")
+
+ return item
+
+def convert_column_indices_to_names(params, template_name):
+ """Convert column indices to names for templates that need it"""
+
+ if should_skip_column_conversion(template_name):
+ print(f"[Runner] Skipping column conversion for {template_name}")
+ return params
+
+ print(f"[Runner] Checking for column index conversion (template: {template_name})")
+
+ # Find input file
+ input_file = None
+ input_keys = ['Upstream_Dataset', 'Upstream_Analysis', 'CSV_Files',
+ 'Input_File', 'Input_Dataset', 'Data_File']
+
+ for key in input_keys:
+ if key in params:
+ value = params[key]
+ if isinstance(value, list) and value:
+ value = value[0]
+ if value and os.path.exists(str(value)):
+ input_file = str(value)
+ print(f"[Runner] Found input file via {key}: {os.path.basename(input_file)}")
+ break
+
+ if not input_file:
+ print("[Runner] No input file found for column conversion")
+ return params
+
+ # Read headers
+ columns = read_file_headers(input_file)
+ if not columns:
+ print("[Runner] Could not read column headers, skipping conversion")
+ return params
+
+ print(f"[Runner] Successfully read {len(columns)} columns")
+ if len(columns) <= 10:
+ print(f"[Runner] Columns: {columns}")
+ else:
+ print(f"[Runner] First 10 columns: {columns[:10]}")
+
+ # Convert indices to names
+ converted_count = 0
+ for key, value in params.items():
+ # Skip non-column parameters
+ if not should_convert_param(key, value):
+ continue
+
+ # Convert indices
+ if isinstance(value, list):
+ converted_items = []
+ for item in value:
+ converted = convert_single_index(item, columns)
+ if converted is not None:
+ converted_items.append(converted)
+ converted_value = converted_items
+ else:
+ converted_value = convert_single_index(value, columns)
+
+ if value != converted_value:
+ params[key] = converted_value
+ converted_count += 1
+ print(f"[Runner] Converted {key}: {value} -> {converted_value}")
+
+ if converted_count > 0:
+ print(f"[Runner] Total conversions: {converted_count} parameters")
+
+ return params
+
+# ===========================================================================
+# APPLY TEXT PROCESSING AND COLUMN CONVERSION
+# ===========================================================================
+print("[Runner] Step 1: Processing text inputs for setup_analysis")
+params = process_setup_analysis_text_inputs(params, template_name)
+
+print("[Runner] Step 2: Converting column indices to names (if needed)")
+params = convert_column_indices_to_names(params, template_name)
+
+# ===========================================================================
+# SPECIAL HANDLING FOR SPECIFIC TEMPLATES
+# ===========================================================================
+
+# Helper function to coerce singleton lists to strings for load_csv
+def _coerce_singleton_paths_for_load_csv(params, template_name):
+ """For load_csv templates, flatten 1-item lists to strings for path-like params."""
+ if 'load_csv' not in template_name:
+ return params
+ for key in ('CSV_Files', 'CSV_Files_Configuration'):
+ val = params.get(key)
+ if isinstance(val, list) and len(val) == 1 and isinstance(val[0], (str, bytes)):
+ params[key] = val[0]
+ print(f"[Runner] Coerced {key} from list -> string")
+ return params
+
+# Special handling for String_Columns in load_csv templates
+if 'load_csv' in template_name and 'String_Columns' in params:
+ value = params['String_Columns']
+ if not isinstance(value, list):
+ if value in [None, "", "[]", "__ob____cb__"]:
+ params['String_Columns'] = []
+ elif isinstance(value, str):
+ s = value.strip()
+ if s.startswith('[') and s.endswith(']'):
+ try:
+ params['String_Columns'] = json.loads(s)
+ except:
+ params['String_Columns'] = [s] if s else []
+ elif ',' in s:
+ params['String_Columns'] = [item.strip() for item in s.split(',') if item.strip()]
+ elif '\n' in s:
+ params['String_Columns'] = [item.strip() for item in s.split('\n') if item.strip()]
+ else:
+ params['String_Columns'] = [s] if s else []
+ else:
+ params['String_Columns'] = []
+ print(f"[Runner] Ensured String_Columns is list: {params['String_Columns']}")
+
+# Apply coercion for load_csv files
+params = _coerce_singleton_paths_for_load_csv(params, template_name)
+
+# Fix for Load CSV Files directory
+if 'load_csv' in template_name and 'CSV_Files' in params:
+ # Check if csv_input_dir was created by Galaxy command
+ if os.path.exists('csv_input_dir') and os.path.isdir('csv_input_dir'):
+ params['CSV_Files'] = 'csv_input_dir'
+ print("[Runner] Using csv_input_dir created by Galaxy")
+ elif isinstance(params['CSV_Files'], str) and os.path.isfile(params['CSV_Files']):
+ # We have a single file path, need to get its directory
+ params['CSV_Files'] = os.path.dirname(params['CSV_Files'])
+ print(f"[Runner] Using directory of CSV file: {params['CSV_Files']}")
+
+# ===========================================================================
+# LIST PARAMETER NORMALIZATION (for other tools)
+# ===========================================================================
+def should_normalize_as_list(key, value):
+ """Determine if a parameter should be normalized as a list"""
+ # Skip if already handled by text processing
+ if key in ['Annotation_s_', 'Feature_s_'] and 'setup_analysis' in template_name:
+ return False
+
+ if isinstance(value, list):
+ return True
+
+ if value is None or value == "":
+ return False
+
+ key_lower = key.lower()
+
+ # Skip regex parameters
+ if 'regex' in key_lower or 'pattern' in key_lower:
+ return False
+
+ # Skip known single-value parameters
+ if any(x in key_lower for x in ['single', 'one', 'first', 'second', 'primary', 'centroid']):
+ return False
+
+ # Plural forms suggest lists
+ if any(x in key_lower for x in ['features', 'markers', 'phenotypes', 'annotations',
+ 'columns', 'types', 'labels', 'regions', 'radii']):
+ return True
+
+ # Check for list separators
+ if isinstance(value, str):
+ if ',' in value or '\n' in value:
+ return True
+ if value.strip().startswith('[') and value.strip().endswith(']'):
+ return True
+
+ return False
+
+def normalize_to_list(value):
+ """Convert various input formats to a proper Python list"""
+ if value in (None, "", "All", ["All"], "all", ["all"]):
+ return ["All"]
+
+ if isinstance(value, list):
+ return value
+
+ if isinstance(value, str):
+ s = value.strip()
+
+ # Try JSON parsing
+ if s.startswith('[') and s.endswith(']'):
+ try:
+ parsed = json.loads(s)
+ return parsed if isinstance(parsed, list) else [str(parsed)]
+ except:
+ pass
+
+ # Split by comma
+ if ',' in s:
+ return [item.strip() for item in s.split(',') if item.strip()]
+
+ # Split by newline
+ if '\n' in s:
+ return [item.strip() for item in s.split('\n') if item.strip()]
+
+ # Single value
+ return [s] if s else []
+
+ return [value] if value is not None else []
+
+# Normalize list parameters
+print("[Runner] Step 3: Normalizing list parameters")
+list_count = 0
+for key, value in list(params.items()):
+ if should_normalize_as_list(key, value):
+ original = value
+ normalized = normalize_to_list(value)
+ if original != normalized:
+ params[key] = normalized
+ list_count += 1
+ if len(str(normalized)) > 100:
+ print(f"[Runner] Normalized {key}: {type(original).__name__} -> list of {len(normalized)} items")
+ else:
+ print(f"[Runner] Normalized {key}: {original} -> {normalized}")
+
+if list_count > 0:
+ print(f"[Runner] Normalized {list_count} list parameters")
+
+# ===========================================================================
+# OUTPUTS HANDLING
+# ===========================================================================
+
+# Extract outputs specification
+raw_outputs = params.pop('outputs', {})
+outputs = {}
+
+if isinstance(raw_outputs, dict):
+ outputs = raw_outputs
+elif isinstance(raw_outputs, str):
+ try:
+ maybe = json.loads(_unsanitize(raw_outputs))
+ if isinstance(maybe, dict):
+ outputs = maybe
+ except Exception:
+ pass
+
+if not isinstance(outputs, dict) or not outputs:
+ print("[Runner] Warning: 'outputs' missing or not a dict; using defaults")
+ if 'boxplot' in template_name or 'plot' in template_name or 'histogram' in template_name:
+ outputs = {'DataFrames': 'dataframe_folder', 'figures': 'figure_folder'}
+ elif 'load_csv' in template_name:
+ outputs = {'DataFrames': 'dataframe_folder'}
+ elif 'interactive' in template_name:
+ outputs = {'html': 'html_folder'}
+ else:
+ outputs = {'analysis': 'transform_output.pickle'}
+
+print(f"[Runner] Outputs -> {list(outputs.keys())}")
+
+# Create output directories
+for output_type, path in outputs.items():
+ if output_type != 'analysis' and path:
+ os.makedirs(path, exist_ok=True)
+ print(f"[Runner] Created {output_type} directory: {path}")
+
+# Add output paths to params
+params['save_results'] = True
+
+if 'analysis' in outputs:
+ params['output_path'] = outputs['analysis']
+ params['Output_Path'] = outputs['analysis']
+ params['Output_File'] = outputs['analysis']
+
+if 'DataFrames' in outputs:
+ df_dir = outputs['DataFrames']
+ params['output_dir'] = df_dir
+ params['Export_Dir'] = df_dir
+ # For load_csv, use a specific filename for the combined dataframe
+ if 'load_csv' in template_name:
+ params['Output_File'] = os.path.join(df_dir, 'combined_dataframe.csv')
+ else:
+ params['Output_File'] = os.path.join(df_dir, f'{template_name}_output.csv')
+
+if 'figures' in outputs:
+ fig_dir = outputs['figures']
+ params['figure_dir'] = fig_dir
+ params['Figure_Dir'] = fig_dir
+ params['Figure_File'] = os.path.join(fig_dir, f'{template_name}.png')
+
+if 'html' in outputs:
+ html_dir = outputs['html']
+ params['html_dir'] = html_dir
+ params['Output_File'] = os.path.join(html_dir, f'{template_name}.html')
+
+# Save runtime parameters
+with open('params.runtime.json', 'w') as f:
+ json.dump(params, f, indent=2)
+
+# Save clean params for Galaxy display
+params_display = {k: v for k, v in params.items()
+ if k not in ['Output_File', 'Figure_File', 'output_dir', 'figure_dir']}
+with open('config_used.json', 'w') as f:
+ json.dump(params_display, f, indent=2)
+
+print(f"[Runner] Saved runtime parameters")
+
+# ============================================================================
+# LOAD AND EXECUTE TEMPLATE
+# ============================================================================
+
+# Try to import from installed package first (Docker environment)
+template_module_name = template_filename.replace('.py', '')
+try:
+ import importlib
+ mod = importlib.import_module(f'spac.templates.{template_module_name}')
+ print(f"[Runner] Loaded template from package: spac.templates.{template_module_name}")
+except (ImportError, ModuleNotFoundError):
+ # Fallback to loading from file
+ print(f"[Runner] Package import failed, trying file load")
+ import importlib.util
+
+ # Standard locations
+ template_paths = [
+ f'/app/spac/templates/{template_filename}',
+ f'/opt/spac/templates/{template_filename}',
+ f'/opt/SCSAWorkflow/src/spac/templates/{template_filename}',
+ template_filename # Current directory
+ ]
+
+ spec = None
+ for path in template_paths:
+ if os.path.exists(path):
+ spec = importlib.util.spec_from_file_location("template_mod", path)
+ if spec:
+ print(f"[Runner] Found template at: {path}")
+ break
+
+ if not spec or not spec.loader:
+ print(f"[Runner] ERROR: Could not find template: {template_filename}")
+ sys.exit(1)
+
+ mod = importlib.util.module_from_spec(spec)
+ spec.loader.exec_module(mod)
+
+# Verify run_from_json exists
+if not hasattr(mod, 'run_from_json'):
+ print('[Runner] ERROR: Template missing run_from_json function')
+ sys.exit(2)
+
+# Check function signature
+sig = inspect.signature(mod.run_from_json)
+kwargs = {}
+
+if 'save_results' in sig.parameters:
+ kwargs['save_results'] = True
+if 'show_plot' in sig.parameters:
+ kwargs['show_plot'] = False
+
+print(f"[Runner] Executing template with kwargs: {kwargs}")
+
+# Execute template
+try:
+ result = mod.run_from_json('params.runtime.json', **kwargs)
+ print(f"[Runner] Template completed, returned: {type(result).__name__}")
+
+ # ===========================================================================
+ # SPECIAL HANDLING FOR LOAD_CSV_FILES TEMPLATE
+ # ===========================================================================
+ if 'load_csv' in template_name:
+ print("[Runner] Special handling for load_csv_files template")
+
+ # The template should return a DataFrame or save CSV files
+ if result is not None:
+ try:
+ import pandas as pd
+
+ # If result is a DataFrame, save it directly
+ if hasattr(result, 'to_csv'):
+ output_path = os.path.join(outputs.get('DataFrames', 'dataframe_folder'), 'combined_dataframe.csv')
+ result.to_csv(output_path, index=False, header=True)
+ print(f"[Runner] Saved combined DataFrame to {output_path}")
+
+ # If result is a dict of DataFrames, combine them
+ elif isinstance(result, dict):
+ dfs = []
+ for name, df in result.items():
+ if hasattr(df, 'to_csv'):
+ # Add a source column to track origin
+ df['_source_file'] = name
+ dfs.append(df)
+
+ if dfs:
+ combined = pd.concat(dfs, ignore_index=True)
+ output_path = os.path.join(outputs.get('DataFrames', 'dataframe_folder'), 'combined_dataframe.csv')
+ combined.to_csv(output_path, index=False, header=True)
+ print(f"[Runner] Combined {len(dfs)} DataFrames into {output_path}")
+ except Exception as e:
+ print(f"[Runner] Could not combine DataFrames: {e}")
+
+ # Check if CSV files were saved in the dataframe folder
+ df_dir = outputs.get('DataFrames', 'dataframe_folder')
+ if os.path.exists(df_dir):
+ csv_files = [f for f in os.listdir(df_dir) if f.endswith('.csv')]
+
+ # If we have multiple CSV files but no combined_dataframe.csv, create it
+ if len(csv_files) > 1 and 'combined_dataframe.csv' not in csv_files:
+ try:
+ import pandas as pd
+ dfs = []
+ for csv_file in csv_files:
+ filepath = os.path.join(df_dir, csv_file)
+ df = pd.read_csv(filepath)
+ df['_source_file'] = csv_file.replace('.csv', '')
+ dfs.append(df)
+
+ combined = pd.concat(dfs, ignore_index=True)
+ output_path = os.path.join(df_dir, 'combined_dataframe.csv')
+ combined.to_csv(output_path, index=False, header=True)
+ print(f"[Runner] Combined {len(csv_files)} CSV files into {output_path}")
+ except Exception as e:
+ print(f"[Runner] Could not combine CSV files: {e}")
+ # If combination fails, just rename the first CSV
+ if csv_files:
+ src = os.path.join(df_dir, csv_files[0])
+ dst = os.path.join(df_dir, 'combined_dataframe.csv')
+ shutil.copy2(src, dst)
+ print(f"[Runner] Copied {csv_files[0]} to combined_dataframe.csv")
+
+ # If we have exactly one CSV file and it's not named combined_dataframe.csv, rename it
+ elif len(csv_files) == 1 and csv_files[0] != 'combined_dataframe.csv':
+ src = os.path.join(df_dir, csv_files[0])
+ dst = os.path.join(df_dir, 'combined_dataframe.csv')
+ shutil.move(src, dst)
+ print(f"[Runner] Renamed {csv_files[0]} to combined_dataframe.csv")
+
+ # ===========================================================================
+ # HANDLE OTHER RETURN TYPES
+ # ===========================================================================
+ elif result is not None:
+ if isinstance(result, dict):
+ print(f"[Runner] Template saved files: {list(result.keys())}")
+ elif isinstance(result, tuple):
+ # Handle tuple returns
+ saved_count = 0
+ for i, item in enumerate(result):
+ if hasattr(item, 'savefig') and 'figures' in outputs:
+ import matplotlib
+ matplotlib.use('Agg')
+ import matplotlib.pyplot as plt
+ fig_path = os.path.join(outputs['figures'], f'figure_{i+1}.png')
+ item.savefig(fig_path, dpi=300, bbox_inches='tight')
+ plt.close(item)
+ saved_count += 1
+ print(f"[Runner] Saved figure to {fig_path}")
+ elif hasattr(item, 'to_csv') and 'DataFrames' in outputs:
+ df_path = os.path.join(outputs['DataFrames'], f'table_{i+1}.csv')
+ item.to_csv(df_path, index=True)
+ saved_count += 1
+ print(f"[Runner] Saved DataFrame to {df_path}")
+
+ if saved_count > 0:
+ print(f"[Runner] Saved {saved_count} in-memory results")
+
+ elif hasattr(result, 'to_csv') and 'DataFrames' in outputs:
+ df_path = os.path.join(outputs['DataFrames'], 'output.csv')
+ result.to_csv(df_path, index=False, header=True)
+ print(f"[Runner] Saved DataFrame to {df_path}")
+
+ elif hasattr(result, 'savefig') and 'figures' in outputs:
+ import matplotlib
+ matplotlib.use('Agg')
+ import matplotlib.pyplot as plt
+ fig_path = os.path.join(outputs['figures'], 'figure.png')
+ result.savefig(fig_path, dpi=300, bbox_inches='tight')
+ plt.close(result)
+ print(f"[Runner] Saved figure to {fig_path}")
+
+ elif hasattr(result, 'write_h5ad') and 'analysis' in outputs:
+ result.write_h5ad(outputs['analysis'])
+ print(f"[Runner] Saved AnnData to {outputs['analysis']}")
+
+except Exception as e:
+ print(f"[Runner] ERROR in template execution: {e}")
+ print(f"[Runner] Error type: {type(e).__name__}")
+ traceback.print_exc()
+
+ # Debug help for common issues
+ if "String Columns must be a *list*" in str(e):
+ print("\n[Runner] DEBUG: String_Columns validation failed")
+ print(f"[Runner] Current String_Columns value: {params.get('String_Columns')}")
+ print(f"[Runner] Type: {type(params.get('String_Columns'))}")
+
+ elif "regex pattern" in str(e).lower() or "^8$" in str(e):
+ print("\n[Runner] DEBUG: This appears to be a column index issue")
+ print("[Runner] Check that column indices were properly converted to names")
+ print("[Runner] Current Features_to_Analyze value:", params.get('Features_to_Analyze'))
+ print("[Runner] Current Feature_Regex value:", params.get('Feature_Regex'))
+
+ sys.exit(1)
+
+# Verify outputs
+print("[Runner] Verifying outputs...")
+found_outputs = False
+
+for output_type, path in outputs.items():
+ if output_type == 'analysis':
+ if os.path.exists(path):
+ size = os.path.getsize(path)
+ print(f"[Runner] ✓ {output_type}: {path} ({size:,} bytes)")
+ found_outputs = True
+ else:
+ print(f"[Runner] ✗ {output_type}: NOT FOUND")
+ else:
+ if os.path.exists(path) and os.path.isdir(path):
+ files = os.listdir(path)
+ if files:
+ print(f"[Runner] ✓ {output_type}: {len(files)} files")
+ for f in files[:3]:
+ print(f"[Runner] - {f}")
+ if len(files) > 3:
+ print(f"[Runner] ... and {len(files)-3} more")
+ found_outputs = True
+ else:
+ print(f"[Runner] ⚠ {output_type}: directory empty")
+
+# Check for files in working directory and move them
+print("[Runner] Checking for files in working directory...")
+for file in os.listdir('.'):
+ if os.path.isdir(file) or file in ['params.runtime.json', 'config_used.json',
+ 'tool_stdout.txt', 'outputs_returned.json']:
+ continue
+
+ if file.endswith('.csv') and 'DataFrames' in outputs:
+ if not os.path.exists(os.path.join(outputs['DataFrames'], file)):
+ target = os.path.join(outputs['DataFrames'], file)
+ shutil.move(file, target)
+ print(f"[Runner] Moved {file} to {target}")
+ found_outputs = True
+ elif file.endswith(('.png', '.pdf', '.jpg', '.svg')) and 'figures' in outputs:
+ if not os.path.exists(os.path.join(outputs['figures'], file)):
+ target = os.path.join(outputs['figures'], file)
+ shutil.move(file, target)
+ print(f"[Runner] Moved {file} to {target}")
+ found_outputs = True
+
+if found_outputs:
+ print("[Runner] === SUCCESS ===")
+else:
+ print("[Runner] WARNING: No outputs created")
+
+PYTHON_RUNNER
+
+EXIT_CODE=$?
+
+if [ $EXIT_CODE -ne 0 ]; then
+ echo "ERROR: Template execution failed with exit code $EXIT_CODE"
+ exit 1
+fi
+
+echo "=== Execution Complete ==="
+exit 0
\ No newline at end of file
diff --git a/galaxy_tools/spac_setup_analysis/spac_setup_analysis.xml b/galaxy_tools/spac_setup_analysis/spac_setup_analysis.xml
new file mode 100644
index 00000000..fefc6f95
--- /dev/null
+++ b/galaxy_tools/spac_setup_analysis/spac_setup_analysis.xml
@@ -0,0 +1,121 @@
+
+ Set up an analysis data object for downstream processing.
+
+
+ nciccbr/spac:v1
+
+
+
+ python3
+
+
+ tool_stdout.txt &&
+
+ ## Run the universal wrapper
+ bash $__tool_directory__/run_spac_template.sh "$params_json" setup_analysis
+ ]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+@misc{spac_toolkit,
+ author = {FNLCR DMAP Team},
+ title = {SPAC: SPAtial single-Cell analysis},
+ year = {2024},
+ url = {https://github.com/FNLCR-DMAP/SCSAWorkflow}
+}
+
+
+
\ No newline at end of file
diff --git a/galaxy_tools/spac_zscore_normalization/run_spac_template.sh b/galaxy_tools/spac_zscore_normalization/run_spac_template.sh
new file mode 100644
index 00000000..a93b2d6e
--- /dev/null
+++ b/galaxy_tools/spac_zscore_normalization/run_spac_template.sh
@@ -0,0 +1,710 @@
+#!/usr/bin/env bash
+# run_spac_template.sh - SPAC wrapper with column index conversion
+# Version: 5.4.1 - Integrated column conversion
+set -euo pipefail
+
+PARAMS_JSON="${1:?Missing params.json path}"
+TEMPLATE_BASE="${2:?Missing template base name}"
+
+# Handle both base names and full .py filenames
+if [[ "$TEMPLATE_BASE" == *.py ]]; then
+ TEMPLATE_PY="$TEMPLATE_BASE"
+elif [[ "$TEMPLATE_BASE" == "load_csv_files_with_config" ]]; then
+ TEMPLATE_PY="load_csv_files_with_config.py"
+else
+ TEMPLATE_PY="${TEMPLATE_BASE}_template.py"
+fi
+
+# Use SPAC Python environment
+SPAC_PYTHON="${SPAC_PYTHON:-python3}"
+
+echo "=== SPAC Template Wrapper v5.3 ==="
+echo "Parameters: $PARAMS_JSON"
+echo "Template base: $TEMPLATE_BASE"
+echo "Template file: $TEMPLATE_PY"
+echo "Python: $SPAC_PYTHON"
+
+# Run template through Python
+"$SPAC_PYTHON" - <<'PYTHON_RUNNER' "$PARAMS_JSON" "$TEMPLATE_PY" 2>&1 | tee tool_stdout.txt
+import json
+import os
+import sys
+import copy
+import traceback
+import inspect
+import shutil
+import re
+import csv
+
+# Get arguments
+params_path = sys.argv[1]
+template_filename = sys.argv[2]
+
+print(f"[Runner] Loading parameters from: {params_path}")
+print(f"[Runner] Template: {template_filename}")
+
+# Load parameters
+with open(params_path, 'r') as f:
+ params = json.load(f)
+
+# Extract template name
+template_name = os.path.basename(template_filename).replace('_template.py', '').replace('.py', '')
+
+# ===========================================================================
+# DE-SANITIZATION AND PARSING
+# ===========================================================================
+def _unsanitize(s: str) -> str:
+ """Remove Galaxy's parameter sanitization tokens"""
+ if not isinstance(s, str):
+ return s
+ replacements = {
+ '__ob__': '[', '__cb__': ']',
+ '__oc__': '{', '__cc__': '}',
+ '__dq__': '"', '__sq__': "'",
+ '__gt__': '>', '__lt__': '<',
+ '__cn__': '\n', '__cr__': '\r',
+ '__tc__': '\t', '__pd__': '#',
+ '__at__': '@', '__cm__': ','
+ }
+ for token, char in replacements.items():
+ s = s.replace(token, char)
+ return s
+
+def _maybe_parse(v):
+ """Recursively de-sanitize and JSON-parse strings where possible."""
+ if isinstance(v, str):
+ u = _unsanitize(v).strip()
+ if (u.startswith('[') and u.endswith(']')) or (u.startswith('{') and u.endswith('}')):
+ try:
+ return json.loads(u)
+ except Exception:
+ return u
+ return u
+ elif isinstance(v, dict):
+ return {k: _maybe_parse(val) for k, val in v.items()}
+ elif isinstance(v, list):
+ return [_maybe_parse(item) for item in v]
+ return v
+
+# Normalize the whole params tree
+params = _maybe_parse(params)
+
+# ===========================================================================
+# COLUMN INDEX CONVERSION - CRITICAL FOR SETUP ANALYSIS
+# ===========================================================================
+def should_skip_column_conversion(template_name):
+ """Some templates don't need column index conversion"""
+ return 'load_csv' in template_name
+
+def read_file_headers(filepath):
+ """Read column headers from various file formats"""
+ try:
+ import pandas as pd
+
+ # Try pandas auto-detect
+ try:
+ df = pd.read_csv(filepath, nrows=1)
+ if len(df.columns) > 1 or not df.columns[0].startswith('Unnamed'):
+ columns = df.columns.tolist()
+ print(f"[Runner] Pandas auto-detected delimiter, found {len(columns)} columns")
+ return columns
+ except:
+ pass
+
+ # Try common delimiters
+ for sep in ['\t', ',', ';', '|', ' ']:
+ try:
+ df = pd.read_csv(filepath, sep=sep, nrows=1)
+ if len(df.columns) > 1:
+ columns = df.columns.tolist()
+ sep_name = {'\t': 'tab', ',': 'comma', ';': 'semicolon',
+ '|': 'pipe', ' ': 'space'}.get(sep, sep)
+ print(f"[Runner] Pandas found {sep_name}-delimited file with {len(columns)} columns")
+ return columns
+ except:
+ continue
+ except ImportError:
+ print("[Runner] pandas not available, using csv fallback")
+
+ # CSV module fallback
+ try:
+ with open(filepath, 'r', encoding='utf-8', errors='replace', newline='') as f:
+ sample = f.read(8192)
+ f.seek(0)
+
+ try:
+ dialect = csv.Sniffer().sniff(sample, delimiters='\t,;| ')
+ reader = csv.reader(f, dialect)
+ header = next(reader)
+ columns = [h.strip().strip('"') for h in header if h.strip()]
+ if columns:
+ print(f"[Runner] csv.Sniffer detected {len(columns)} columns")
+ return columns
+ except:
+ f.seek(0)
+ first_line = f.readline().strip()
+ for sep in ['\t', ',', ';', '|']:
+ if sep in first_line:
+ columns = [h.strip().strip('"') for h in first_line.split(sep)]
+ if len(columns) > 1:
+ print(f"[Runner] Manual parsing found {len(columns)} columns")
+ return columns
+ except Exception as e:
+ print(f"[Runner] Failed to read headers: {e}")
+
+ return None
+
+def should_convert_param(key, value):
+ """Check if parameter contains column indices"""
+ if value is None or value == "" or value == [] or value == {}:
+ return False
+
+ key_lower = key.lower()
+
+ # Skip String_Columns - it's names not indices
+ if key == 'String_Columns':
+ return False
+
+ # Skip output/path parameters
+ if any(x in key_lower for x in ['output', 'path', 'file', 'directory', 'save', 'export']):
+ return False
+
+ # Skip regex/pattern parameters (but we'll handle Feature_Regex specially)
+ if 'regex' in key_lower or 'pattern' in key_lower:
+ return False
+
+ # Parameters with 'column' likely have indices
+ if 'column' in key_lower or '_col' in key_lower:
+ return True
+
+ # Known index parameters
+ if key in {'Annotation_s_', 'Features_to_Analyze', 'Features', 'Markers', 'Markers_to_Plot', 'Phenotypes'}:
+ return True
+
+ # Check if values look like indices
+ if isinstance(value, list):
+ return all(isinstance(v, int) or (isinstance(v, str) and v.strip().isdigit()) for v in value if v)
+ elif isinstance(value, (int, str)):
+ return isinstance(value, int) or (isinstance(value, str) and value.strip().isdigit())
+
+ return False
+
+def convert_single_index(item, columns):
+ """Convert a single column index to name"""
+ if isinstance(item, str) and not item.strip().isdigit():
+ return item
+
+ try:
+ if isinstance(item, str):
+ item = int(item.strip())
+ elif isinstance(item, float):
+ item = int(item)
+ except (ValueError, AttributeError):
+ return item
+
+ if isinstance(item, int):
+ idx = item - 1 # Galaxy uses 1-based indexing
+ if 0 <= idx < len(columns):
+ return columns[idx]
+ elif 0 <= item < len(columns): # Fallback for 0-based
+ print(f"[Runner] Note: Found 0-based index {item}, converting to {columns[item]}")
+ return columns[item]
+ else:
+ print(f"[Runner] Warning: Index {item} out of range (have {len(columns)} columns)")
+
+ return item
+
+def convert_column_indices_to_names(params, template_name):
+ """Convert column indices to names for templates that need it"""
+
+ if should_skip_column_conversion(template_name):
+ print(f"[Runner] Skipping column conversion for {template_name}")
+ return params
+
+ print(f"[Runner] Checking for column index conversion (template: {template_name})")
+
+ # Find input file
+ input_file = None
+ input_keys = ['Upstream_Dataset', 'Upstream_Analysis', 'CSV_Files',
+ 'Input_File', 'Input_Dataset', 'Data_File']
+
+ for key in input_keys:
+ if key in params:
+ value = params[key]
+ if isinstance(value, list) and value:
+ value = value[0]
+ if value and os.path.exists(str(value)):
+ input_file = str(value)
+ print(f"[Runner] Found input file via {key}: {os.path.basename(input_file)}")
+ break
+
+ if not input_file:
+ print("[Runner] No input file found for column conversion")
+ return params
+
+ # Read headers
+ columns = read_file_headers(input_file)
+ if not columns:
+ print("[Runner] Could not read column headers, skipping conversion")
+ return params
+
+ print(f"[Runner] Successfully read {len(columns)} columns")
+ if len(columns) <= 10:
+ print(f"[Runner] Columns: {columns}")
+ else:
+ print(f"[Runner] First 10 columns: {columns[:10]}")
+
+ # Convert indices to names
+ converted_count = 0
+ for key, value in params.items():
+ # Skip non-column parameters
+ if not should_convert_param(key, value):
+ continue
+
+ # Convert indices
+ if isinstance(value, list):
+ converted_items = []
+ for item in value:
+ converted = convert_single_index(item, columns)
+ if converted is not None:
+ converted_items.append(converted)
+ converted_value = converted_items
+ else:
+ converted_value = convert_single_index(value, columns)
+
+ if value != converted_value:
+ params[key] = converted_value
+ converted_count += 1
+ print(f"[Runner] Converted {key}: {value} -> {converted_value}")
+
+ if converted_count > 0:
+ print(f"[Runner] Total conversions: {converted_count} parameters")
+
+ # CRITICAL: Handle Feature_Regex specially
+ if 'Feature_Regex' in params:
+ value = params['Feature_Regex']
+ if value in [[], [""], "__ob____cb__", "[]", "", None]:
+ params['Feature_Regex'] = ""
+ print("[Runner] Cleared empty Feature_Regex parameter")
+ elif isinstance(value, list) and value:
+ params['Feature_Regex'] = "|".join(str(v) for v in value if v)
+ print(f"[Runner] Joined Feature_Regex list: {params['Feature_Regex']}")
+
+ return params
+
+# ===========================================================================
+# APPLY COLUMN CONVERSION
+# ===========================================================================
+print("[Runner] Step 1: Converting column indices to names")
+params = convert_column_indices_to_names(params, template_name)
+
+# ===========================================================================
+# SPECIAL HANDLING FOR SPECIFIC TEMPLATES
+# ===========================================================================
+
+# Helper function to coerce singleton lists to strings for load_csv
+def _coerce_singleton_paths_for_load_csv(params, template_name):
+ """For load_csv templates, flatten 1-item lists to strings for path-like params."""
+ if 'load_csv' not in template_name:
+ return params
+ for key in ('CSV_Files', 'CSV_Files_Configuration'):
+ val = params.get(key)
+ if isinstance(val, list) and len(val) == 1 and isinstance(val[0], (str, bytes)):
+ params[key] = val[0]
+ print(f"[Runner] Coerced {key} from list -> string")
+ return params
+
+# Special handling for String_Columns in load_csv templates
+if 'load_csv' in template_name and 'String_Columns' in params:
+ value = params['String_Columns']
+ if not isinstance(value, list):
+ if value in [None, "", "[]", "__ob____cb__"]:
+ params['String_Columns'] = []
+ elif isinstance(value, str):
+ s = value.strip()
+ if s.startswith('[') and s.endswith(']'):
+ try:
+ params['String_Columns'] = json.loads(s)
+ except:
+ params['String_Columns'] = [s] if s else []
+ elif ',' in s:
+ params['String_Columns'] = [item.strip() for item in s.split(',') if item.strip()]
+ else:
+ params['String_Columns'] = [s] if s else []
+ else:
+ params['String_Columns'] = []
+ print(f"[Runner] Ensured String_Columns is list: {params['String_Columns']}")
+
+# Apply coercion for load_csv files
+params = _coerce_singleton_paths_for_load_csv(params, template_name)
+
+# Fix for Load CSV Files directory
+if 'load_csv' in template_name and 'CSV_Files' in params:
+ # Check if csv_input_dir was created by Galaxy command
+ if os.path.exists('csv_input_dir') and os.path.isdir('csv_input_dir'):
+ params['CSV_Files'] = 'csv_input_dir'
+ print("[Runner] Using csv_input_dir created by Galaxy")
+ elif isinstance(params['CSV_Files'], str) and os.path.isfile(params['CSV_Files']):
+ # We have a single file path, need to get its directory
+ params['CSV_Files'] = os.path.dirname(params['CSV_Files'])
+ print(f"[Runner] Using directory of CSV file: {params['CSV_Files']}")
+
+# ===========================================================================
+# LIST PARAMETER NORMALIZATION
+# ===========================================================================
+def should_normalize_as_list(key, value):
+ """Determine if a parameter should be normalized as a list"""
+ if isinstance(value, list):
+ return True
+
+ if value is None or value == "":
+ return False
+
+ key_lower = key.lower()
+
+ # Skip regex parameters
+ if 'regex' in key_lower or 'pattern' in key_lower:
+ return False
+
+ # Skip known single-value parameters
+ if any(x in key_lower for x in ['single', 'one', 'first', 'second', 'primary']):
+ return False
+
+ # Plural forms suggest lists
+ if any(x in key_lower for x in ['features', 'markers', 'phenotypes', 'annotations',
+ 'columns', 'types', 'labels', 'regions', 'radii']):
+ return True
+
+ # Check for list separators
+ if isinstance(value, str):
+ if ',' in value or '\n' in value:
+ return True
+ if value.strip().startswith('[') and value.strip().endswith(']'):
+ return True
+
+ return False
+
+def normalize_to_list(value):
+ """Convert various input formats to a proper Python list"""
+ if value in (None, "", "All", ["All"], "all", ["all"]):
+ return ["All"]
+
+ if isinstance(value, list):
+ return value
+
+ if isinstance(value, str):
+ s = value.strip()
+
+ # Try JSON parsing
+ if s.startswith('[') and s.endswith(']'):
+ try:
+ parsed = json.loads(s)
+ return parsed if isinstance(parsed, list) else [str(parsed)]
+ except:
+ pass
+
+ # Split by comma
+ if ',' in s:
+ return [item.strip() for item in s.split(',') if item.strip()]
+
+ # Split by newline
+ if '\n' in s:
+ return [item.strip() for item in s.split('\n') if item.strip()]
+
+ # Single value
+ return [s] if s else []
+
+ return [value] if value is not None else []
+
+# Normalize list parameters
+print("[Runner] Step 2: Normalizing list parameters")
+list_count = 0
+for key, value in list(params.items()):
+ if should_normalize_as_list(key, value):
+ original = value
+ normalized = normalize_to_list(value)
+ if original != normalized:
+ params[key] = normalized
+ list_count += 1
+ if len(str(normalized)) > 100:
+ print(f"[Runner] Normalized {key}: {type(original).__name__} -> list of {len(normalized)} items")
+ else:
+ print(f"[Runner] Normalized {key}: {original} -> {normalized}")
+
+if list_count > 0:
+ print(f"[Runner] Normalized {list_count} list parameters")
+
+# CRITICAL FIX: Handle single-element lists for coordinate columns
+# These should be strings, not lists
+coordinate_keys = ['X_Coordinate_Column', 'Y_Coordinate_Column', 'X_centroid', 'Y_centroid']
+for key in coordinate_keys:
+ if key in params:
+ value = params[key]
+ if isinstance(value, list) and len(value) == 1:
+ params[key] = value[0]
+ print(f"[Runner] Extracted single value from {key}: {value} -> {params[key]}")
+
+# Also check for any key ending with '_Column' that has a single-element list
+for key in list(params.keys()):
+ if key.endswith('_Column') and isinstance(params[key], list) and len(params[key]) == 1:
+ original = params[key]
+ params[key] = params[key][0]
+ print(f"[Runner] Extracted single value from {key}: {original} -> {params[key]}")
+
+# ===========================================================================
+# OUTPUTS HANDLING
+# ===========================================================================
+
+# Extract outputs specification
+raw_outputs = params.pop('outputs', {})
+outputs = {}
+
+if isinstance(raw_outputs, dict):
+ outputs = raw_outputs
+elif isinstance(raw_outputs, str):
+ try:
+ maybe = json.loads(_unsanitize(raw_outputs))
+ if isinstance(maybe, dict):
+ outputs = maybe
+ except Exception:
+ pass
+
+if not isinstance(outputs, dict) or not outputs:
+ print("[Runner] Warning: 'outputs' missing or not a dict; using defaults")
+ if 'boxplot' in template_name or 'plot' in template_name or 'histogram' in template_name:
+ outputs = {'DataFrames': 'dataframe_folder', 'figures': 'figure_folder'}
+ elif 'load_csv' in template_name:
+ outputs = {'DataFrames': 'dataframe_folder'}
+ elif 'interactive' in template_name:
+ outputs = {'html': 'html_folder'}
+ else:
+ outputs = {'analysis': 'transform_output.pickle'}
+
+print(f"[Runner] Outputs -> {list(outputs.keys())}")
+
+# Create output directories
+for output_type, path in outputs.items():
+ if output_type != 'analysis' and path:
+ os.makedirs(path, exist_ok=True)
+ print(f"[Runner] Created {output_type} directory: {path}")
+
+# Add output paths to params
+params['save_results'] = True
+
+if 'analysis' in outputs:
+ params['output_path'] = outputs['analysis']
+ params['Output_Path'] = outputs['analysis']
+ params['Output_File'] = outputs['analysis']
+
+if 'DataFrames' in outputs:
+ df_dir = outputs['DataFrames']
+ params['output_dir'] = df_dir
+ params['Export_Dir'] = df_dir
+ params['Output_File'] = os.path.join(df_dir, f'{template_name}_output.csv')
+
+if 'figures' in outputs:
+ fig_dir = outputs['figures']
+ params['figure_dir'] = fig_dir
+ params['Figure_Dir'] = fig_dir
+ params['Figure_File'] = os.path.join(fig_dir, f'{template_name}.png')
+
+if 'html' in outputs:
+ html_dir = outputs['html']
+ params['html_dir'] = html_dir
+ params['Output_File'] = os.path.join(html_dir, f'{template_name}.html')
+
+# Save runtime parameters
+with open('params.runtime.json', 'w') as f:
+ json.dump(params, f, indent=2)
+
+# Save clean params for Galaxy display
+params_display = {k: v for k, v in params.items()
+ if k not in ['Output_File', 'Figure_File', 'output_dir', 'figure_dir']}
+with open('config_used.json', 'w') as f:
+ json.dump(params_display, f, indent=2)
+
+print(f"[Runner] Saved runtime parameters")
+
+# ============================================================================
+# LOAD AND EXECUTE TEMPLATE
+# ============================================================================
+
+# Try to import from installed package first (Docker environment)
+template_module_name = template_filename.replace('.py', '')
+try:
+ import importlib
+ mod = importlib.import_module(f'spac.templates.{template_module_name}')
+ print(f"[Runner] Loaded template from package: spac.templates.{template_module_name}")
+except (ImportError, ModuleNotFoundError):
+ # Fallback to loading from file
+ print(f"[Runner] Package import failed, trying file load")
+ import importlib.util
+
+ # Standard locations
+ template_paths = [
+ f'/app/spac/templates/{template_filename}',
+ f'/opt/spac/templates/{template_filename}',
+ f'/opt/SCSAWorkflow/src/spac/templates/{template_filename}',
+ template_filename # Current directory
+ ]
+
+ spec = None
+ for path in template_paths:
+ if os.path.exists(path):
+ spec = importlib.util.spec_from_file_location("template_mod", path)
+ if spec:
+ print(f"[Runner] Found template at: {path}")
+ break
+
+ if not spec or not spec.loader:
+ print(f"[Runner] ERROR: Could not find template: {template_filename}")
+ sys.exit(1)
+
+ mod = importlib.util.module_from_spec(spec)
+ spec.loader.exec_module(mod)
+
+# Verify run_from_json exists
+if not hasattr(mod, 'run_from_json'):
+ print('[Runner] ERROR: Template missing run_from_json function')
+ sys.exit(2)
+
+# Check function signature
+sig = inspect.signature(mod.run_from_json)
+kwargs = {}
+
+if 'save_results' in sig.parameters:
+ kwargs['save_results'] = True
+if 'show_plot' in sig.parameters:
+ kwargs['show_plot'] = False
+
+print(f"[Runner] Executing template with kwargs: {kwargs}")
+
+# Execute template
+try:
+ result = mod.run_from_json('params.runtime.json', **kwargs)
+ print(f"[Runner] Template completed, returned: {type(result).__name__}")
+
+ # Handle different return types
+ if result is not None:
+ if isinstance(result, dict):
+ print(f"[Runner] Template saved files: {list(result.keys())}")
+ elif isinstance(result, tuple):
+ # Handle tuple returns
+ saved_count = 0
+ for i, item in enumerate(result):
+ if hasattr(item, 'savefig') and 'figures' in outputs:
+ import matplotlib
+ matplotlib.use('Agg')
+ import matplotlib.pyplot as plt
+ fig_path = os.path.join(outputs['figures'], f'figure_{i+1}.png')
+ item.savefig(fig_path, dpi=300, bbox_inches='tight')
+ plt.close(item)
+ saved_count += 1
+ print(f"[Runner] Saved figure to {fig_path}")
+ elif hasattr(item, 'to_csv') and 'DataFrames' in outputs:
+ df_path = os.path.join(outputs['DataFrames'], f'table_{i+1}.csv')
+ item.to_csv(df_path, index=True)
+ saved_count += 1
+ print(f"[Runner] Saved DataFrame to {df_path}")
+
+ if saved_count > 0:
+ print(f"[Runner] Saved {saved_count} in-memory results")
+
+ elif hasattr(result, 'to_csv') and 'DataFrames' in outputs:
+ df_path = os.path.join(outputs['DataFrames'], 'output.csv')
+ result.to_csv(df_path, index=True)
+ print(f"[Runner] Saved DataFrame to {df_path}")
+
+ elif hasattr(result, 'savefig') and 'figures' in outputs:
+ import matplotlib
+ matplotlib.use('Agg')
+ import matplotlib.pyplot as plt
+ fig_path = os.path.join(outputs['figures'], 'figure.png')
+ result.savefig(fig_path, dpi=300, bbox_inches='tight')
+ plt.close(result)
+ print(f"[Runner] Saved figure to {fig_path}")
+
+ elif hasattr(result, 'write_h5ad') and 'analysis' in outputs:
+ result.write_h5ad(outputs['analysis'])
+ print(f"[Runner] Saved AnnData to {outputs['analysis']}")
+
+except Exception as e:
+ print(f"[Runner] ERROR in template execution: {e}")
+ print(f"[Runner] Error type: {type(e).__name__}")
+ traceback.print_exc()
+
+ # Debug help for common issues
+ if "String Columns must be a *list*" in str(e):
+ print("\n[Runner] DEBUG: String_Columns validation failed")
+ print(f"[Runner] Current String_Columns value: {params.get('String_Columns')}")
+ print(f"[Runner] Type: {type(params.get('String_Columns'))}")
+
+ elif "regex pattern" in str(e).lower() or "^8$" in str(e):
+ print("\n[Runner] DEBUG: This appears to be a column index issue")
+ print("[Runner] Check that column indices were properly converted to names")
+ print("[Runner] Current Features_to_Analyze value:", params.get('Features_to_Analyze'))
+ print("[Runner] Current Feature_Regex value:", params.get('Feature_Regex'))
+
+ sys.exit(1)
+
+# Verify outputs
+print("[Runner] Verifying outputs...")
+found_outputs = False
+
+for output_type, path in outputs.items():
+ if output_type == 'analysis':
+ if os.path.exists(path):
+ size = os.path.getsize(path)
+ print(f"[Runner] ✔ {output_type}: {path} ({size:,} bytes)")
+ found_outputs = True
+ else:
+ print(f"[Runner] ✗ {output_type}: NOT FOUND")
+ else:
+ if os.path.exists(path) and os.path.isdir(path):
+ files = os.listdir(path)
+ if files:
+ print(f"[Runner] ✔ {output_type}: {len(files)} files")
+ for f in files[:3]:
+ print(f"[Runner] - {f}")
+ if len(files) > 3:
+ print(f"[Runner] ... and {len(files)-3} more")
+ found_outputs = True
+ else:
+ print(f"[Runner] ⚠ {output_type}: directory empty")
+
+# Check for files in working directory and move them
+print("[Runner] Checking for files in working directory...")
+for file in os.listdir('.'):
+ if os.path.isdir(file) or file in ['params.runtime.json', 'config_used.json',
+ 'tool_stdout.txt', 'outputs_returned.json']:
+ continue
+
+ if file.endswith('.csv') and 'DataFrames' in outputs:
+ if not os.path.exists(os.path.join(outputs['DataFrames'], file)):
+ target = os.path.join(outputs['DataFrames'], file)
+ shutil.move(file, target)
+ print(f"[Runner] Moved {file} to {target}")
+ found_outputs = True
+ elif file.endswith(('.png', '.pdf', '.jpg', '.svg')) and 'figures' in outputs:
+ if not os.path.exists(os.path.join(outputs['figures'], file)):
+ target = os.path.join(outputs['figures'], file)
+ shutil.move(file, target)
+ print(f"[Runner] Moved {file} to {target}")
+ found_outputs = True
+
+if found_outputs:
+ print("[Runner] === SUCCESS ===")
+else:
+ print("[Runner] WARNING: No outputs created")
+
+PYTHON_RUNNER
+
+EXIT_CODE=$?
+
+if [ $EXIT_CODE -ne 0 ]; then
+ echo "ERROR: Template execution failed with exit code $EXIT_CODE"
+ exit 1
+fi
+
+echo "=== Execution Complete ==="
+exit 0
\ No newline at end of file
diff --git a/galaxy_tools/spac_zscore_normalization/spac_zscore_normalization.xml b/galaxy_tools/spac_zscore_normalization/spac_zscore_normalization.xml
new file mode 100644
index 00000000..52be678c
--- /dev/null
+++ b/galaxy_tools/spac_zscore_normalization/spac_zscore_normalization.xml
@@ -0,0 +1,67 @@
+
+ Perform z-scores normalization for the selected data table in the analysis. Normalized data table...
+
+
+ nciccbr/spac:v1
+
+
+
+ python3
+
+
+ tool_stdout.txt &&
+
+ ## Run the universal wrapper (template name without .py extension)
+ bash $__tool_directory__/run_spac_template.sh "$params_json" zscore_normalization
+ ]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ @misc{spac_toolkit,
+ author = {FNLCR DMAP Team},
+ title = {SPAC: SPAtial single-Cell analysis},
+ year = {2024},
+ url = {https://github.com/FNLCR-DMAP/SCSAWorkflow}
+ }
+
+
+
\ No newline at end of file
diff --git a/galaxy_tools/test-data/setup_analysis.h5ad b/galaxy_tools/test-data/setup_analysis.h5ad
new file mode 100644
index 00000000..11cc7eec
Binary files /dev/null and b/galaxy_tools/test-data/setup_analysis.h5ad differ
diff --git a/galaxy_tools/test-data/setup_analysis.pickle b/galaxy_tools/test-data/setup_analysis.pickle
new file mode 100644
index 00000000..2aa845ab
Binary files /dev/null and b/galaxy_tools/test-data/setup_analysis.pickle differ
diff --git a/setup.py b/setup.py
index 945475ad..79b1bbff 100644
--- a/setup.py
+++ b/setup.py
@@ -2,7 +2,7 @@
setup(
name='spac',
- version="0.9.0",
+ version="0.9.1",
description=(
'SPatial Analysis for single-Cell analysis (SPAC)'
'is a Scalable Python package for single-cell spatial protein data '
diff --git a/src/spac/__init__.py b/src/spac/__init__.py
index f8b63dd6..c7a7ff09 100644
--- a/src/spac/__init__.py
+++ b/src/spac/__init__.py
@@ -22,7 +22,7 @@
functions.extend(module_functions)
# Define the package version before using it in __all__
-__version__ = "0.9.0"
+__version__ = "0.9.1"
# Define a __all__ list to specify which functions should be considered public
__all__ = functions
diff --git a/src/spac/templates/__init__.py b/src/spac/templates/__init__.py
new file mode 100644
index 00000000..89c61771
--- /dev/null
+++ b/src/spac/templates/__init__.py
@@ -0,0 +1,13 @@
+"""
+Canonical SPAC template sub‑package.
+
+Each template is a self‑contained module that
+ • reads parameters from JSON/dict
+ • runs a SPAC analysis function
+ • returns / saves results
+
+Available templates
+-------------------
+- ripley_l_template.run_from_json
+"""
+
diff --git a/src/spac/templates/analysis_to_csv_template.py b/src/spac/templates/analysis_to_csv_template.py
new file mode 100644
index 00000000..b079439e
--- /dev/null
+++ b/src/spac/templates/analysis_to_csv_template.py
@@ -0,0 +1,199 @@
+"""
+Platform-agnostic Analysis to CSV template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Reads outputs configuration from blueprint JSON file.
+
+Usage
+-----
+>>> from spac.templates.analysis_to_csv_template import run_from_json
+>>> run_from_json("examples/analysis_to_csv_params.json")
+"""
+import json
+import sys
+import logging
+from pathlib import Path
+from typing import Any, Dict, Union
+import pandas as pd
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.utils import check_table
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+ text_to_value,
+)
+
+logger = logging.getLogger(__name__)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, str], pd.DataFrame]:
+ """
+ Execute Analysis to CSV analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Analysis": "path/to/data.pickle",
+ "Table_to_Export": "Original",
+ "Save_as_CSV_File": false,
+ "outputs": {
+ "dataframe": {"type": "file", "name": "dataframe.csv"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If False, returns the dataframe
+ directly for in-memory workflows. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory.
+
+ Returns
+ -------
+ dict or DataFrame
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {"dataframe": "path/to/dataframe.csv"}
+ If save_to_disk=False: The processed DataFrame
+
+ Notes
+ -----
+ Output Structure:
+ - DataFrame is saved as a CSV file when save_to_disk is True
+ - Otherwise, the DataFrame is returned for programmatic use
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ if "outputs" not in params:
+ params["outputs"] = {
+ "dataframe": {"type": "file", "name": "dataframe.csv"}
+ }
+
+ # Load the upstream analysis data
+ adata = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters
+ input_layer = params.get("Table_to_Export", "Original")
+
+ if input_layer == "Original":
+ input_layer = None
+
+ def export_layer_to_csv(adata, layer=None):
+ """
+ Exports the specified layer or the default .X data matrix of an
+ AnnData object to a CSV file.
+ """
+ # Check if the provided layer exists in the AnnData object
+ if layer:
+ check_table(adata, tables=layer)
+ data_to_export = pd.DataFrame(
+ adata.layers[layer],
+ index=adata.obs.index,
+ columns=adata.var.index
+ )
+ else:
+ data_to_export = pd.DataFrame(
+ adata.X,
+ index=adata.obs.index,
+ columns=adata.var.index
+ )
+
+ # Join with the observation metadata
+ full_data_df = data_to_export.join(adata.obs)
+
+ # Join the spatial coordinates
+ # Extract the spatial coordinates
+ spatial_df = pd.DataFrame(
+ adata.obsm['spatial'],
+ index=adata.obs.index,
+ columns=['spatial_x', 'spatial_y']
+ )
+
+ # Join spatial_df with full_data_df
+ full_data_df = full_data_df.join(spatial_df)
+
+ return full_data_df
+
+ csv_data = export_layer_to_csv(
+ adata=adata,
+ layer=input_layer
+ )
+
+ logger.info(f"Exported DataFrame shape: {csv_data.shape}")
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ if "dataframe" in params["outputs"]:
+ results_dict["dataframe"] = csv_data
+
+ # Use centralized save_results function
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ logger.info("Analysis to CSV completed successfully.")
+ return saved_files
+ else:
+ # Return the dataframe directly for in-memory workflows
+ logger.info("Returning DataFrame for in-memory use")
+ return csv_data
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python analysis_to_csv_template.py "
+ "[output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ print("\nReturned DataFrame")
+ print(f"DataFrame shape: {result.shape}")
diff --git a/src/spac/templates/append_annotation_template.py b/src/spac/templates/append_annotation_template.py
new file mode 100644
index 00000000..1d51a83c
--- /dev/null
+++ b/src/spac/templates/append_annotation_template.py
@@ -0,0 +1,208 @@
+"""
+Platform-agnostic Append Annotation template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Reads outputs configuration from blueprint JSON file.
+
+Usage
+-----
+>>> from spac.templates.append_annotation_template import run_from_json
+>>> run_from_json("examples/append_annotation_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union
+import pandas as pd
+import logging
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.data_utils import append_annotation
+from spac.utils import check_column_name
+from spac.templates.template_utils import (
+ save_results,
+ parse_params,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, str], pd.DataFrame]:
+ """
+ Execute Append Annotation analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Dataset": "path/to/dataframe.csv",
+ "Annotation_Pair_List": ["column1:value1", "column2:value2"],
+ "outputs": {
+ "dataframe": {"type": "file", "name": "dataframe.csv"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If True, saves the DataFrame with
+ appended annotations to a CSV file. If False, returns the DataFrame
+ directly for in-memory workflows. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory. All outputs will be saved relative to this directory.
+
+ Returns
+ -------
+ dict or DataFrame
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {
+ "dataframe": "path/to/dataframe.csv"
+ }
+ If save_to_disk=False: The processed DataFrame with appended annotations
+
+ Notes
+ -----
+ Output Structure:
+ - DataFrame is saved as a single CSV file
+ - When save_to_disk=False, the DataFrame is returned for programmatic use
+
+ Examples
+ --------
+ >>> # Save results to disk
+ >>> saved_files = run_from_json("params.json")
+ >>> print(saved_files["dataframe"]) # Path to saved CSV file
+
+ >>> # Get results in memory
+ >>> annotated_df = run_from_json("params.json", save_to_disk=False)
+
+ >>> # Custom output directory
+ >>> saved = run_from_json("params.json", output_dir="/custom/path")
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ if "outputs" not in params:
+ params["outputs"] = {
+ "dataframe": {"type": "file", "name": "dataframe.csv"}
+ }
+
+ # Load upstream data - DataFrame or CSV file
+ upstream_dataset = params["Upstream_Dataset"]
+ if isinstance(upstream_dataset, pd.DataFrame):
+ input_dataframe = upstream_dataset # Direct DataFrame from previous step
+ elif isinstance(upstream_dataset, (str, Path)):
+ # Galaxy passes .dat files, but they contain CSV data
+ # Don't check extension - directly read as CSV
+ path = Path(upstream_dataset)
+ try:
+ input_dataframe = pd.read_csv(path)
+ logging.info(f"Successfully loaded CSV data from: {path}")
+ except Exception as e:
+ raise ValueError(
+ f"Failed to read CSV data from '{path}'. "
+ f"This tool expects CSV/tabular format. "
+ f"Error: {str(e)}"
+ )
+ else:
+ raise TypeError(
+ f"Upstream_Dataset must be DataFrame or file path. "
+ f"Got {type(upstream_dataset)}"
+ )
+
+ # Extract parameters
+ dataset_mapping_rules = params.get(
+ "Annotation_Pair_List", ["Example:Example"]
+ )
+
+ # Initialize an empty dictionary
+ parsed_dict = {}
+
+ # Loop through each string pair in the list
+ for pair in dataset_mapping_rules:
+ # Split the string on the colon
+ key, value = pair.split(":")
+ check_column_name(key, pair)
+ # Add the key-value pair to the dictionary
+ parsed_dict[key] = value
+
+ logging.info(f"The pairs to add are:\n{parsed_dict}")
+
+ output_dataframe = append_annotation(
+ input_dataframe,
+ parsed_dict
+ )
+
+ logging.info(output_dataframe.info())
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ # Check for dataframe output
+ if "dataframe" in params["outputs"]:
+ results_dict["dataframe"] = output_dataframe
+
+ # Use centralized save_results function
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ logging.info("Append Annotation analysis completed successfully.")
+ return saved_files
+ else:
+ # Return the DataFrame directly for in-memory workflows
+ logging.info("Returning DataFrame for in-memory use")
+ return output_dataframe
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python append_annotation_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ # Run analysis
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ # Display results based on return type
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ print("\nReturned DataFrame")
+ print(f"DataFrame shape: {result.shape}")
diff --git a/src/spac/templates/append_pin_color_rule_template.py b/src/spac/templates/append_pin_color_rule_template.py
new file mode 100644
index 00000000..eb9f06d8
--- /dev/null
+++ b/src/spac/templates/append_pin_color_rule_template.py
@@ -0,0 +1,168 @@
+"""
+Platform-agnostic Append Pin Color Rule template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Reads outputs configuration from blueprint JSON file.
+
+Usage
+-----
+>>> from spac.templates.add_pin_color_rule_template import run_from_json
+>>> run_from_json("examples/add_pin_color_rule_params.json")
+"""
+import json
+import sys
+import logging
+from pathlib import Path
+from typing import Any, Dict, Union
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.data_utils import add_pin_color_rules
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+ string_list_to_dictionary,
+)
+
+logger = logging.getLogger(__name__)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, str], Any]:
+ """
+ Execute Append Pin Color Rule analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Analysis": "path/to/data.pickle",
+ "Label_Color_Map": ["label1:red", "label2:blue"],
+ "Color_Map_Name": "_spac_colors",
+ "Overwrite_Previous_Color_Map": true,
+ "outputs": {
+ "analysis": {"type": "file", "name": "output.pickle"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If False, returns the adata object
+ directly for in-memory workflows. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory.
+
+ Returns
+ -------
+ dict or AnnData
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {"analysis": "path/to/output.pickle"}
+ If save_to_disk=False: The processed AnnData object
+
+ Notes
+ -----
+ Output Structure:
+ - Analysis output is saved as a single pickle file
+ - When save_to_disk=False, the AnnData object is returned for programmatic use
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ if "outputs" not in params:
+ params["outputs"] = {
+ "analysis": {"type": "file", "name": "output.pickle"}
+ }
+
+ # Load the upstream analysis data
+ adata = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters
+ color_dict_string_list = params.get("Label_Color_Map", [])
+ color_map_name = params.get("Color_Map_Name", "_spac_colors")
+ overwrite = params.get("Overwrite_Previous_Color_Map", True)
+
+ color_dict = string_list_to_dictionary(
+ color_dict_string_list,
+ key_name="label",
+ value_name="color"
+ )
+
+ add_pin_color_rules(
+ adata,
+ label_color_dict=color_dict,
+ color_map_name=color_map_name,
+ overwrite=overwrite
+ )
+ logger.info(f"{adata.uns[f'{color_map_name}_summary']}")
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ if "analysis" in params["outputs"]:
+ results_dict["analysis"] = adata
+
+ # Use centralized save_results function
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ logger.info("Append Pin Color Rule analysis completed successfully.")
+ return saved_files
+ else:
+ # Return the adata object directly for in-memory workflows
+ logger.info("Returning AnnData object for in-memory use")
+ return adata
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python add_pin_color_rule_template.py "
+ "[output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ print("\nReturned AnnData object")
diff --git a/src/spac/templates/arcsinh_normalization_template.py b/src/spac/templates/arcsinh_normalization_template.py
new file mode 100644
index 00000000..fcdf62da
--- /dev/null
+++ b/src/spac/templates/arcsinh_normalization_template.py
@@ -0,0 +1,218 @@
+"""
+Platform-agnostic Arcsinh Normalization template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Follows standardized output schema where analysis is saved as a file.
+
+Usage
+-----
+>>> from spac.templates.arcsinh_normalization_template import run_from_json
+>>> run_from_json("examples/arcsinh_normalization_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union
+import pandas as pd
+import logging
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.transformations import arcsinh_transformation
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+ text_to_value,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, str], Any]:
+ """
+ Execute Arcsinh Normalization analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Analysis": "path/to/data.pickle",
+ "Table_to_Process": "Original",
+ "Co_Factor": "5.0",
+ "Percentile": "None",
+ "Output_Table_Name": "arcsinh",
+ "Per_Batch": "False",
+ "Annotation": "None",
+ "outputs": {
+ "analysis": {"type": "file", "name": "output.pickle"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If True, saves the AnnData object
+ to a pickle file. If False, returns the AnnData object directly
+ for in-memory workflows. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory. All outputs will be saved relative to this directory.
+
+ Returns
+ -------
+ dict or AnnData
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {"analysis": "path/to/output.pickle"}
+ If save_to_disk=False: The processed AnnData object for in-memory use
+
+ Notes
+ -----
+ Output Structure:
+ - Analysis output is saved as a single pickle file (standardized for analysis outputs)
+ - When save_to_disk=False, the AnnData object is returned for programmatic use
+
+ Examples
+ --------
+ >>> # Save results to disk
+ >>> saved_files = run_from_json("params.json")
+ >>> print(saved_files["analysis"]) # Path to saved pickle file
+ >>> # './output.pickle'
+
+ >>> # Get results in memory for further processing
+ >>> adata = run_from_json("params.json", save_to_disk=False)
+ >>> # Can now work with adata object directly
+
+ >>> # Custom output directory
+ >>> saved = run_from_json("params.json", output_dir="/custom/path")
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ # Analysis uses file type per standardized schema
+ if "outputs" not in params:
+ params["outputs"] = {
+ "analysis": {"type": "file", "name": "output.pickle"}
+ }
+
+ # Load the upstream analysis data
+ adata = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters
+ input_layer = params.get("Table_to_Process", "Original")
+ co_factor = params.get("Co_Factor", "5.0")
+ percentile = params.get("Percentile", "None")
+ output_layer = params.get("Output_Table_Name", "arcsinh")
+ per_batch = params.get("Per_Batch", "False")
+ annotation = params.get("Annotation", "None")
+
+ input_layer = text_to_value(
+ input_layer,
+ default_none_text="Original"
+ )
+
+ co_factor = text_to_value(
+ co_factor,
+ default_none_text="None",
+ to_float=True,
+ param_name="co_factor"
+ )
+
+ percentile = text_to_value(
+ percentile,
+ default_none_text="None",
+ to_float=True,
+ param_name="percentile"
+ )
+
+ if per_batch == "True":
+ per_batch = True
+ else:
+ per_batch = False
+
+ annotation = text_to_value(
+ annotation,
+ default_none_text="None"
+ )
+
+ transformed_data = arcsinh_transformation(
+ adata,
+ input_layer=input_layer,
+ co_factor=co_factor,
+ percentile=percentile,
+ output_layer=output_layer,
+ per_batch=per_batch,
+ annotation=annotation
+ )
+
+ logging.info(f"Transformed data stored in layer: {output_layer}")
+ dataframe = pd.DataFrame(transformed_data.layers[output_layer])
+ logging.info(f"Arcsinh transformation summary:\n{dataframe.describe()}")
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ if "analysis" in params["outputs"]:
+ results_dict["analysis"] = transformed_data
+
+ # Use centralized save_results function
+ # All file handling and logging is now done by save_results
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ logging.info(
+ f"Arcsinh Normalization completed → {saved_files['analysis']}"
+ )
+ return saved_files
+ else:
+ # Return the adata object directly for in-memory workflows
+ logging.info("Returning AnnData object (not saving to file)")
+ return transformed_data
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python arcsinh_normalization_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ # Run analysis
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ # Display results based on return type
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, path in result.items():
+ print(f" {key}: {path}")
+ else:
+ print("\nReturned AnnData object for in-memory use")
+ print(f"AnnData: {result}")
diff --git a/src/spac/templates/binary_to_categorical_annotation_template.py b/src/spac/templates/binary_to_categorical_annotation_template.py
new file mode 100644
index 00000000..127a8e4a
--- /dev/null
+++ b/src/spac/templates/binary_to_categorical_annotation_template.py
@@ -0,0 +1,203 @@
+"""
+Platform-agnostic Binary to Categorical Annotation template converted from
+NIDAP. Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Reads outputs configuration from blueprint JSON file.
+
+Usage
+-----
+>>> from spac.templates.binary_to_categorical_annotation_template import \
+... run_from_json
+>>> run_from_json("examples/binary_to_categorical_annotation_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union
+import pandas as pd
+import logging
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.data_utils import bin2cat
+from spac.utils import check_column_name
+from spac.templates.template_utils import (
+ save_results,
+ parse_params,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, str], pd.DataFrame]:
+ """
+ Execute Binary to Categorical Annotation analysis with parameters from
+ JSON. Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Dataset": "path/to/dataframe.csv",
+ "Binary_Annotation_Columns": ["Col1", "Col2", "Col3"],
+ "New_Annotation_Name": "cell_labels",
+ "outputs": {
+ "dataframe": {"type": "file", "name": "dataframe.csv"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If True, saves the DataFrame with
+ converted annotations to a CSV file. If False, returns the DataFrame
+ directly for in-memory workflows. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory. All outputs will be saved relative to this directory.
+
+ Returns
+ -------
+ dict or DataFrame
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {
+ "dataframe": "path/to/dataframe.csv"
+ }
+ If save_to_disk=False: The processed DataFrame with categorical annotation
+
+ Notes
+ -----
+ Output Structure:
+ - DataFrame is saved as a single CSV file
+ - When save_to_disk=False, the DataFrame is returned for programmatic use
+
+ Examples
+ --------
+ >>> # Save results to disk
+ >>> saved_files = run_from_json("params.json")
+ >>> print(saved_files["dataframe"]) # Path to saved CSV file
+
+ >>> # Get results in memory
+ >>> converted_df = run_from_json("params.json", save_to_disk=False)
+
+ >>> # Custom output directory
+ >>> saved = run_from_json("params.json", output_dir="/custom/path")
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ if "outputs" not in params:
+ params["outputs"] = {
+ "dataframe": {"type": "file", "name": "dataframe.csv"}
+ }
+
+ # Load upstream data - DataFrame or CSV file
+ upstream_dataset = params["Upstream_Dataset"]
+ if isinstance(upstream_dataset, pd.DataFrame):
+ input_dataset = upstream_dataset # Direct DataFrame from previous step
+ elif isinstance(upstream_dataset, (str, Path)):
+ # Galaxy passes .dat files, but they contain CSV data
+ # Don't check extension - directly read as CSV
+ path = Path(upstream_dataset)
+ try:
+ input_dataset = pd.read_csv(path)
+ logging.info(f"Successfully loaded CSV data from: {path}")
+ except Exception as e:
+ raise ValueError(
+ f"Failed to read CSV data from '{path}'. "
+ f"This tool expects CSV/tabular format. "
+ f"Error: {str(e)}"
+ )
+ else:
+ raise TypeError(
+ f"Upstream_Dataset must be DataFrame or file path. "
+ f"Got {type(upstream_dataset)}"
+ )
+
+ # Extract parameters
+ one_hot_annotations = params.get(
+ "Binary_Annotation_Columns",
+ ["Normal_Cells", "Cancer_Cells", "Immuno_Cells"]
+ )
+ new_annotation = params.get("New_Annotation_Name", "cell_labels")
+
+ check_column_name(new_annotation, "New Annotation Name")
+
+ converted_df = bin2cat(
+ data=input_dataset,
+ one_hot_annotations=one_hot_annotations,
+ new_annotation=new_annotation
+ )
+
+ logging.info(converted_df.info())
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ # Check for dataframe output
+ if "dataframe" in params["outputs"]:
+ results_dict["dataframe"] = converted_df
+
+ # Use centralized save_results function
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ logging.info("Binary to Categorical Annotation completed successfully.")
+ return saved_files
+ else:
+ # Return the DataFrame directly for in-memory workflows
+ logging.info("Returning DataFrame for in-memory use")
+ return converted_df
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python binary_to_categorical_annotation_template.py "
+ " [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ # Run analysis
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ # Display results based on return type
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ print("\nReturned DataFrame")
+ print(f"DataFrame shape: {result.shape}")
diff --git a/src/spac/templates/boxplot_template.py b/src/spac/templates/boxplot_template.py
new file mode 100644
index 00000000..791e5e5e
--- /dev/null
+++ b/src/spac/templates/boxplot_template.py
@@ -0,0 +1,269 @@
+"""
+Platform-agnostic Boxplot template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Follows standardized output schema where figures are saved as directories.
+
+Usage
+-----
+>>> from spac.templates.boxplot_template import run_from_json
+>>> run_from_json("examples/boxplot_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union, List, Optional, Tuple
+import logging
+import pandas as pd
+import matplotlib.pyplot as plt
+import seaborn as sns
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.visualization import boxplot
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+ text_to_value,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ show_plot: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, Union[str, List[str]]], Tuple[Any, pd.DataFrame]]:
+ """
+ Execute Boxplot analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Analysis": "path/to/data.pickle",
+ "Primary_Annotation": "cell_type",
+ "Feature_s_to_Plot": ["CD4", "CD8"],
+ "outputs": {
+ "figures": {"type": "directory", "name": "figures"},
+ "dataframe": {"type": "file", "name": "output.csv"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If True, saves figures to a directory
+ and summary statistics to a CSV file. If False, returns the figure and
+ summary dataframe directly for in-memory workflows. Default is True.
+ show_plot : bool, optional
+ Whether to display the plot. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory. All outputs will be saved relative to this directory.
+
+ Returns
+ -------
+ dict or tuple
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {
+ "figures": ["path/to/figures/boxplot.png"], # List of figure paths
+ "DataFrame": "path/to/output.csv" # Single file path
+ }
+ If save_to_disk=False: Tuple of (matplotlib.figure.Figure, pd.DataFrame)
+ containing the figure object and summary statistics dataframe
+
+ Notes
+ -----
+ Output Structure:
+ - Figures are saved in a directory (standardized for all figure outputs)
+ - Summary statistics are saved as a single CSV file
+ - When save_to_disk=False, objects are returned for programmatic use
+
+ Examples
+ --------
+ >>> # Save results to disk
+ >>> saved_files = run_from_json("params.json")
+ >>> print(saved_files["figure"]) # List of paths to saved plots
+ >>> # ['./figures/boxplot.png']
+
+ >>> # Get results in memory
+ >>> fig, summary_df = run_from_json("params.json", save_to_disk=False)
+
+ >>> # Custom output directory
+ >>> saved = run_from_json("params.json", output_dir="/custom/path")
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ # Figures use directory type per standardized schema
+ if "outputs" not in params:
+ params["outputs"] = {
+ "figures": {"type": "directory", "name": "figures"},
+ "dataframe": {"type": "file", "name": "output.csv"}
+ }
+
+ # Load the upstream analysis data
+ adata = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters
+ annotation = params.get("Primary_Annotation", "None")
+ second_annotation = params.get("Secondary_Annotation", "None")
+ layer_to_plot = params.get("Table_to_Visualize", "Original")
+ feature_to_plot = params.get("Feature_s_to_Plot", ["All"])
+ log_scale = params.get("Value_Axis_Log_Scale", False)
+
+ # Extract figure parameters with defaults
+ figure_title = params.get("Figure_Title", "BoxPlot")
+ figure_horizontal = params.get("Horizontal_Plot", False)
+ fig_width = params.get("Figure_Width", 12)
+ fig_height = params.get("Figure_Height", 8)
+ fig_dpi = params.get("Figure_DPI", 300)
+ font_size = params.get("Font_Size", 10)
+ showfliers = params.get("Keep_Outliers", True)
+
+ # Process parameters to match expected format
+ # Convert "None" strings to actual None values
+ layer_to_plot = None if layer_to_plot == "Original" else layer_to_plot
+ second_annotation = None if second_annotation == "None" else second_annotation
+ annotation = None if annotation == "None" else annotation
+
+ # Convert horizontal flag to orientation string
+ figure_orientation = "h" if figure_horizontal else "v"
+
+ # Handle feature selection
+ if isinstance(feature_to_plot, str):
+ # Convert single string to list
+ feature_to_plot = [feature_to_plot]
+
+ # Check for "All" features selection
+ if any(item == "All" for item in feature_to_plot):
+ logging.info("Plotting All Features")
+ feature_to_plot = adata.var_names.tolist()
+ else:
+ feature_str = "\n".join(feature_to_plot)
+ logging.info(f"Plotting Feature:\n{feature_str}")
+
+ # Create the plot exactly as in NIDAP template
+ fig, ax = plt.subplots()
+ plt.rcParams.update({'font.size': font_size})
+ fig.set_size_inches(fig_width, fig_height)
+ fig.set_dpi(fig_dpi)
+
+ fig, ax, df = boxplot(
+ adata=adata,
+ ax=ax,
+ layer=layer_to_plot,
+ annotation=annotation,
+ second_annotation=second_annotation,
+ features=feature_to_plot,
+ log_scale=log_scale,
+ orient=figure_orientation,
+ showfliers=showfliers
+ )
+
+ # Set the figure title
+ ax.set_title(figure_title)
+
+ # Get summary statistics of the dataset
+ logging.info("Summary statistics of the dataset:")
+ summary = df.describe()
+
+ # Convert the summary to a DataFrame that includes the index as a column
+ summary_df = summary.reset_index()
+ logging.info(f"\n{summary_df.to_string()}")
+
+ # Move the legend outside the plotting area
+ # Check if a legend exists
+ try:
+ sns.move_legend(ax, "upper left", bbox_to_anchor=(1, 1))
+ except Exception as e:
+ logging.debug("Legend does not exist.")
+
+ # Apply tight layout to prevent label cutoff
+ plt.tight_layout()
+
+ if show_plot:
+ plt.show()
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ # Package figure in a dictionary for directory saving
+ # This ensures it's saved in a directory per standardized schema
+ if "figures" in params["outputs"]:
+ results_dict["figures"] = {"boxplot": fig} # Dict triggers directory save
+
+ # Check for DataFrames output (case-insensitive)
+ if any(k.lower() == "dataframe" for k in params["outputs"].keys()):
+ results_dict["dataframe"] = summary_df
+
+ # Use centralized save_results function
+ # All file handling and logging is now done by save_results
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ logging.info("Boxplot analysis completed successfully.")
+ return saved_files
+ else:
+ # Return objects directly for in-memory workflows
+ logging.info(
+ "Returning figure and summary dataframe for in-memory use"
+ )
+ return fig, summary_df
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python boxplot_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ # Run analysis
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ # Display results based on return type
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ fig, summary_df = result
+ print("\nReturned figure and summary dataframe for in-memory use")
+ print(f"Figure size: {fig.get_size_inches()}")
+ print(f"Summary shape: {summary_df.shape}")
+ print("\nSummary statistics preview:")
+ print(summary_df.head())
\ No newline at end of file
diff --git a/src/spac/templates/calculate_centroid_template.py b/src/spac/templates/calculate_centroid_template.py
new file mode 100644
index 00000000..b59add49
--- /dev/null
+++ b/src/spac/templates/calculate_centroid_template.py
@@ -0,0 +1,211 @@
+"""
+Platform-agnostic Calculate Centroid template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Reads outputs configuration from blueprint JSON file.
+
+Usage
+-----
+>>> from spac.templates.calculate_centroid_template import run_from_json
+>>> run_from_json("examples/calculate_centroid_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union, Tuple
+import pandas as pd
+import logging
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.data_utils import calculate_centroid
+from spac.utils import check_column_name
+from spac.templates.template_utils import (
+ save_results,
+ parse_params,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, str], pd.DataFrame]:
+ """
+ Execute Calculate Centroid analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Dataset": "path/to/dataframe.csv",
+ "Min_X_Coordinate_Column_Name": "XMin",
+ "Max_X_Coordinate_Column_Name": "XMax",
+ "Min_Y_Coordinate_Column_Name": "YMin",
+ "Max_Y_Coordinate_Column_Name": "YMax",
+ "outputs": {
+ "dataframe": {"type": "file", "name": "dataframe.csv"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If True, saves the DataFrame with
+ calculated centroids to a CSV file. If False, returns the DataFrame
+ directly for in-memory workflows. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory. All outputs will be saved relative to this directory.
+
+ Returns
+ -------
+ dict or DataFrame
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {
+ "dataframe": "path/to/dataframe.csv"
+ }
+ If save_to_disk=False: The processed DataFrame with centroids
+
+ Notes
+ -----
+ Output Structure:
+ - DataFrame is saved as a single CSV file
+ - When save_to_disk=False, the DataFrame is returned for programmatic use
+
+ Examples
+ --------
+ >>> # Save results to disk
+ >>> saved_files = run_from_json("params.json")
+ >>> print(saved_files["dataframe"]) # Path to saved CSV file
+
+ >>> # Get results in memory
+ >>> centroid_df = run_from_json("params.json", save_to_disk=False)
+
+ >>> # Custom output directory
+ >>> saved = run_from_json("params.json", output_dir="/custom/path")
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ # DataFrames typically use file type per standardized schema
+ if "outputs" not in params:
+ params["outputs"] = {
+ "dataframe": {"type": "file", "name": "dataframe.csv"}
+ }
+
+ # Load upstream data - DataFrame or CSV file
+ upstream_dataset = params["Upstream_Dataset"]
+ if isinstance(upstream_dataset, pd.DataFrame):
+ input_dataset = upstream_dataset # Direct DataFrame from previous step
+ elif isinstance(upstream_dataset, (str, Path)):
+ # Galaxy passes .dat files, but they contain CSV data
+ # Don't check extension - directly read as CSV
+ path = Path(upstream_dataset)
+ try:
+ input_dataset = pd.read_csv(path)
+ logging.info(f"Successfully loaded CSV data from: {path}")
+ except Exception as e:
+ raise ValueError(
+ f"Failed to read CSV data from '{path}'. "
+ f"This tool expects CSV/tabular format. "
+ f"Error: {str(e)}"
+ )
+ else:
+ raise TypeError(
+ f"Upstream_Dataset must be DataFrame or file path. "
+ f"Got {type(upstream_dataset)}"
+ )
+
+ # Extract parameters using .get() with defaults from JSON template
+ x_min = params.get("Min_X_Coordinate_Column_Name", "XMin")
+ x_max = params.get("Max_X_Coordinate_Column_Name", "XMax")
+ y_min = params.get("Min_Y_Coordinate_Column_Name", "YMin")
+ y_max = params.get("Max_Y_Coordinate_Column_Name", "YMax")
+ new_x = params.get("X_Centroid_Name", "XCentroid")
+ new_y = params.get("Y_Centroid_Name", "YCentroid")
+
+ check_column_name(new_x, "X Centroid Name")
+ check_column_name(new_y, "Y Centroid Name")
+
+ centroid_calculated = calculate_centroid(
+ input_dataset,
+ x_min=x_min,
+ x_max=x_max,
+ y_min=y_min,
+ y_max=y_max,
+ new_x=new_x,
+ new_y=new_y
+ )
+
+ logging.info(centroid_calculated.info())
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ # Check for dataframe output
+ if "dataframe" in params["outputs"]:
+ results_dict["dataframe"] = centroid_calculated
+
+ # Use centralized save_results function
+ # All file handling and logging is now done by save_results
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ logging.info("Calculate Centroid analysis completed successfully.")
+ return saved_files
+ else:
+ # Return the DataFrame directly for in-memory workflows
+ logging.info("Returning DataFrame for in-memory use")
+ return centroid_calculated
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python calculate_centroid_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ # Run analysis
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ # Display results based on return type
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ print("\nReturned DataFrame")
+ print(f"DataFrame shape: {result.shape}")
diff --git a/src/spac/templates/combine_annotations_template.py b/src/spac/templates/combine_annotations_template.py
new file mode 100644
index 00000000..b152978b
--- /dev/null
+++ b/src/spac/templates/combine_annotations_template.py
@@ -0,0 +1,181 @@
+"""
+Platform-agnostic Combine Annotations template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Reads outputs configuration from blueprint JSON file.
+
+Usage
+-----
+>>> from spac.templates.combine_annotations_template import run_from_json
+>>> run_from_json("examples/combine_annotations_params.json")
+"""
+import json
+import sys
+import logging
+from pathlib import Path
+from typing import Any, Dict, Union, List
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.data_utils import combine_annotations
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+)
+
+logger = logging.getLogger(__name__)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, str], Any]:
+ """
+ Execute Combine Annotations analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Analysis": "path/to/data.pickle",
+ "Annotations_Names": ["annotation1", "annotation2"],
+ "New_Annotation_Name": "combined_annotation",
+ "Separator": "_",
+ "outputs": {
+ "dataframe": {"type": "file", "name": "dataframe.csv"},
+ "analysis": {"type": "file", "name": "output.pickle"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If False, returns the adata object
+ directly for in-memory workflows. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory.
+
+ Returns
+ -------
+ dict or AnnData
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {
+ "dataframe": "path/to/dataframe.csv",
+ "analysis": "path/to/output.pickle"
+ }
+ If save_to_disk=False: The processed AnnData object
+
+ Notes
+ -----
+ Output Structure:
+ - Analysis output is saved as a pickle file
+ - DataFrame (label counts) is saved as a CSV file
+ - When save_to_disk=False, the AnnData object is returned for programmatic use
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ if "outputs" not in params:
+ params["outputs"] = {
+ "dataframe": {"type": "file", "name": "dataframe.csv"},
+ "analysis": {"type": "file", "name": "output.pickle"}
+ }
+
+ # Load the upstream analysis data
+ adata = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters
+ annotations_list = params["Annotations_Names"]
+ new_annotation = params.get("New_Annotation_Name", "combined_annotation")
+ separator = params.get("Separator", "_")
+
+ combine_annotations(
+ adata,
+ annotations=annotations_list,
+ separator=separator,
+ new_annotation_name=new_annotation
+ )
+
+ logger.info(f"After combining annotations: \n{adata}")
+ value_counts = adata.obs[new_annotation].value_counts(dropna=False)
+ logger.info(f"Unique labels in {new_annotation}")
+ logger.info(f"{value_counts}")
+
+ # Create the frequency CSV for download
+ df_counts = (
+ value_counts
+ .rename_axis(new_annotation) # move index to a column name
+ .reset_index(name='count') # two columns: label | count
+ )
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ if "dataframe" in params["outputs"]:
+ results_dict["dataframe"] = df_counts
+
+ if "analysis" in params["outputs"]:
+ results_dict["analysis"] = adata
+
+ # Use centralized save_results function
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ logger.info("Combine Annotations analysis completed successfully.")
+ return saved_files
+ else:
+ # Return the adata object directly for in-memory workflows
+ logger.info("Returning AnnData object for in-memory use")
+ return adata
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python combine_annotations_template.py "
+ "[output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ print("\nReturned AnnData object")
diff --git a/src/spac/templates/combine_dataframes_template.py b/src/spac/templates/combine_dataframes_template.py
new file mode 100644
index 00000000..6d23bbc4
--- /dev/null
+++ b/src/spac/templates/combine_dataframes_template.py
@@ -0,0 +1,217 @@
+"""
+Platform-agnostic Combine DataFrames template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Reads outputs configuration from blueprint JSON file.
+
+Usage
+-----
+>>> from spac.templates.combine_dataframes_template import run_from_json
+>>> run_from_json("examples/combine_dataframes_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union
+import pandas as pd
+import logging
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.data_utils import combine_dfs
+from spac.templates.template_utils import (
+ save_results,
+ parse_params,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, str], pd.DataFrame]:
+ """
+ Execute Combine DataFrames analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "First_Dataframe": "path/to/first.csv",
+ "Second_Dataframe": "path/to/second.csv",
+ "outputs": {
+ "dataframe": {"type": "file", "name": "dataframe.csv"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If True, saves the combined DataFrame
+ to a CSV file. If False, returns the DataFrame directly for in-memory
+ workflows. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory. All outputs will be saved relative to this directory.
+
+ Returns
+ -------
+ dict or DataFrame
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {
+ "dataframe": "path/to/dataframe.csv"
+ }
+ If save_to_disk=False: The combined DataFrame
+
+ Notes
+ -----
+ Output Structure:
+ - DataFrame is saved as a single CSV file
+ - When save_to_disk=False, the DataFrame is returned for programmatic use
+
+ Examples
+ --------
+ >>> # Save results to disk
+ >>> saved_files = run_from_json("params.json")
+ >>> print(saved_files["dataframe"]) # Path to saved CSV file
+
+ >>> # Get results in memory
+ >>> combined_df = run_from_json("params.json", save_to_disk=False)
+
+ >>> # Custom output directory
+ >>> saved = run_from_json("params.json", output_dir="/custom/path")
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ if "outputs" not in params:
+ params["outputs"] = {
+ "dataframe": {"type": "file", "name": "dataframe.csv"}
+ }
+
+ # Load the first dataframe
+ dataset_A = params["First_Dataframe"]
+ if isinstance(dataset_A, pd.DataFrame):
+ dataset_A = dataset_A # Direct DataFrame from previous step
+ elif isinstance(dataset_A, (str, Path)):
+ # Galaxy passes .dat files, but they contain CSV data
+ # Don't check extension - directly read as CSV
+ path = Path(dataset_A)
+ try:
+ dataset_A = pd.read_csv(path)
+ logging.info(f"Successfully loaded first DataFrame from: {path}")
+ except Exception as e:
+ raise ValueError(
+ f"Failed to read CSV data from '{path}'. "
+ f"This tool expects CSV/tabular format. "
+ f"Error: {str(e)}"
+ )
+ else:
+ raise TypeError(
+ f"First_Dataframe must be DataFrame or file path. "
+ f"Got {type(dataset_A)}"
+ )
+
+ # Load the second dataframe
+ dataset_B = params["Second_Dataframe"]
+ if isinstance(dataset_B, pd.DataFrame):
+ dataset_B = dataset_B # Direct DataFrame from previous step
+ elif isinstance(dataset_B, (str, Path)):
+ # Galaxy passes .dat files, but they contain CSV data
+ # Don't check extension - directly read as CSV
+ path = Path(dataset_B)
+ try:
+ dataset_B = pd.read_csv(path)
+ logging.info(f"Successfully loaded second DataFrame from: {path}")
+ except Exception as e:
+ raise ValueError(
+ f"Failed to read CSV data from '{path}'. "
+ f"This tool expects CSV/tabular format. "
+ f"Error: {str(e)}"
+ )
+ else:
+ raise TypeError(
+ f"Second_Dataframe must be DataFrame or file path. "
+ f"Got {type(dataset_B)}"
+ )
+
+ # Extract parameters
+ input_df_lists = [dataset_A, dataset_B]
+
+ logging.info("Information about the first dataset:")
+ logging.info(dataset_A.info())
+ logging.info("\n\nInformation about the second dataset:")
+ logging.info(dataset_B.info())
+
+ combined_dfs = combine_dfs(input_df_lists)
+ logging.info("\n\nInformation about the combined dataset:")
+ logging.info(combined_dfs.info())
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ # Check for dataframe output
+ if "dataframe" in params["outputs"]:
+ results_dict["dataframe"] = combined_dfs
+
+ # Use centralized save_results function
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ logging.info("Combine DataFrames completed successfully.")
+ return saved_files
+ else:
+ # Return the DataFrame directly for in-memory workflows
+ logging.info("Returning combined DataFrame for in-memory use")
+ return combined_dfs
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python combine_dataframes_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ # Run analysis
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ # Display results based on return type
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ print("\nReturned combined DataFrame")
+ print(f"DataFrame shape: {result.shape}")
diff --git a/src/spac/templates/downsample_cells_template.py b/src/spac/templates/downsample_cells_template.py
new file mode 100644
index 00000000..761135e0
--- /dev/null
+++ b/src/spac/templates/downsample_cells_template.py
@@ -0,0 +1,208 @@
+"""
+Platform-agnostic Downsample Cells template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Reads outputs configuration from blueprint JSON file.
+
+Usage
+-----
+>>> from spac.templates.downsample_cells_template import run_from_json
+>>> run_from_json("examples/downsample_cells_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union, Tuple
+import pandas as pd
+import logging
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.data_utils import downsample_cells
+from spac.utils import check_column_name
+from spac.templates.template_utils import (
+ save_results,
+ parse_params,
+ text_to_value,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, str], pd.DataFrame]:
+ """
+ Execute Downsample Cells analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Dataset": "path/to/dataframe.csv",
+ "Annotations_List": ["cell_type", "tissue"],
+ "Number_of_Samples": 1000,
+ "Stratify_Option": true,
+ "Random_Selection": true,
+ "outputs": {
+ "dataframe": {"type": "file", "name": "dataframe.csv"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If True, saves the downsampled DataFrame
+ to a CSV file. If False, returns the DataFrame directly for in-memory
+ workflows. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory. All outputs will be saved relative to this directory.
+
+ Returns
+ -------
+ dict or DataFrame
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {
+ "dataframe": "path/to/dataframe.csv"
+ }
+ If save_to_disk=False: The downsampled DataFrame
+
+ Notes
+ -----
+ Output Structure:
+ - DataFrame is saved as a single CSV file
+ - When save_to_disk=False, the DataFrame is returned for programmatic use
+
+ Examples
+ --------
+ >>> # Save results to disk
+ >>> saved_files = run_from_json("params.json")
+ >>> print(saved_files["dataframe"]) # Path to saved CSV file
+
+ >>> # Get results in memory
+ >>> downsampled_df = run_from_json("params.json", save_to_disk=False)
+
+ >>> # Custom output directory
+ >>> saved = run_from_json("params.json", output_dir="/custom/path")
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ # DataFrames typically use file type per standardized schema
+ if "outputs" not in params:
+ params["outputs"] = {
+ "dataframe": {"type": "file", "name": "dataframe.csv"}
+ }
+
+ # Load upstream data - could be DataFrame, CSV
+ upstream_dataset = params["Upstream_Dataset"]
+ if isinstance(upstream_dataset, pd.DataFrame):
+ input_dataset = upstream_dataset # Direct DF from previous step
+ elif isinstance(upstream_dataset, (str, Path)):
+ try:
+ input_dataset = pd.read_csv(upstream_dataset)
+ except Exception as e:
+ raise ValueError(f"Failed to read CSV from {upstream_dataset}: {e}")
+ else:
+ raise TypeError(
+ f"Upstream_Dataset must be DataFrame or file path. "
+ f"Got {type(upstream_dataset)}"
+ )
+
+ # Extract parameters
+ annotations = params["Annotations_List"]
+ n_samples = params["Number_of_Samples"]
+ stratify = params["Stratify_Option"]
+ rand = params["Random_Selection"]
+ combined_col_name = params.get(
+ "New_Combined_Annotation_Name", "_combined_"
+ )
+ min_threshold = params.get("Minimum_Threshold", 5)
+
+ check_column_name(
+ combined_col_name, "New Combined Annotation Name"
+ )
+
+ down_sampled_dataset = downsample_cells(
+ input_data=input_dataset,
+ annotations=annotations,
+ n_samples=n_samples,
+ stratify=stratify,
+ rand=rand,
+ combined_col_name=combined_col_name,
+ min_threshold=min_threshold
+ )
+
+ logging.info("Downsampled! Processed dataset info:")
+ logging.info(down_sampled_dataset.info())
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ # Check for dataframe output
+ if "dataframe" in params["outputs"]:
+ results_dict["dataframe"] = down_sampled_dataset
+
+ # Use centralized save_results function
+ # All file handling and logging is now done by save_results
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ logging.info("Downsample Cells analysis completed successfully.")
+ return saved_files
+ else:
+ # Return the dataframe directly for in-memory workflows
+ logging.info("Returning DataFrame for in-memory use")
+ return down_sampled_dataset
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python downsample_cells_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ # Run analysis
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ # Display results based on return type
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ print("\nReturned DataFrame")
+ print(f"DataFrame shape: {result.shape}")
diff --git a/src/spac/templates/hierarchical_heatmap_template.py b/src/spac/templates/hierarchical_heatmap_template.py
new file mode 100644
index 00000000..92bec6cc
--- /dev/null
+++ b/src/spac/templates/hierarchical_heatmap_template.py
@@ -0,0 +1,215 @@
+"""
+Platform-agnostic Hierarchical Heatmap template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Usage
+-----
+>>> from spac.templates.hierarchical_heatmap_template import run_from_json
+>>> run_from_json("examples/hierarchical_heatmap_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union, List, Optional
+import pandas as pd
+import matplotlib.pyplot as plt
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.visualization import hierarchical_heatmap
+from spac.utils import check_feature
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+ text_to_value,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_results_flag: bool = True,
+ show_plot: bool = True,
+ output_dir: Union[str, Path] = None
+) -> Union[Dict[str, str], pd.DataFrame]:
+ """
+ Execute Hierarchical Heatmap analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary
+ save_results_flag : bool, optional
+ Whether to save results to file. If False, returns the figure and
+ dataframe directly for in-memory workflows. Default is True.
+ show_plot : bool, optional
+ Whether to display the plot. Default is True.
+ output_dir : str or Path, optional
+ Directory for outputs. If None, uses params['Output_Directory'] or '.'
+
+ Returns
+ -------
+ dict or DataFrame
+ If save_results_flag=True: Dictionary of saved file paths
+ If save_results_flag=False: The mean intensity dataframe
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Load the upstream analysis data
+ adata = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters
+ annotation = params["Annotation"]
+ layer_to_plot = params.get("Table_to_Visualize", "Original")
+ features = params.get("Feature_s_", ["All"])
+ standard_scale = params.get("Standard_Scale_", "None")
+ z_score = params.get("Z_Score", "None")
+ cluster_feature = params.get("Feature_Dendrogram", True)
+ cluster_annotations = params.get("Annotation_Dendrogram", True)
+ Figure_Title = params.get("Figure_Title", "Hierarchical Heatmap")
+ fig_width = params.get("Figure_Width", 8)
+ fig_height = params.get("Figure_Height", 8)
+ fig_dpi = params.get("Figure_DPI", 300)
+ font_size = params.get("Font_Size", 10)
+ matrix_ratio = params.get("Matrix_Plot_Ratio", 0.8)
+ swap_axes = params.get("Swap_Axes", False)
+ rotate_label = params.get("Rotate_Label_", False)
+ r_h_axis_dendrogram = params.get(
+ "Horizontal_Dendrogram_Display_Ratio", 0.2
+ )
+ r_v_axis_dendrogram = params.get(
+ "Vertical_Dendrogram_Display_Ratio", 0.2
+ )
+ v_min = params.get("Value_Min", "None")
+ v_max = params.get("Value_Max", "None")
+ color_map = params.get("Color_Map", 'seismic')
+
+ # Use check_feature to validate features
+ if len(features) == 1 and features[0] == "All":
+ features = None
+ else:
+ check_feature(adata, features)
+
+ if not swap_axes:
+ features = None
+
+ # Use text_to_value for parameter conversions
+ standard_scale = text_to_value(
+ standard_scale, to_int=True, param_name='Standard Scale'
+ )
+ layer_to_plot = text_to_value(
+ layer_to_plot, default_none_text="Original"
+ )
+ z_score = text_to_value(z_score, param_name='Z Score')
+ vmin = text_to_value(
+ v_min, default_none_text="none", to_float=True,
+ param_name="Value Min"
+ )
+ vmax = text_to_value(
+ v_max, default_none_text="none", to_float=True,
+ param_name="Value Max"
+ )
+
+ fig, ax = plt.subplots()
+ plt.rcParams.update({'font.size': font_size})
+ fig.set_size_inches(fig_width, fig_height)
+ fig.set_dpi(fig_dpi)
+
+ mean_intensity, clustergrid, dendrogram_data = hierarchical_heatmap(
+ adata,
+ annotation=annotation,
+ features=features,
+ layer=layer_to_plot,
+ cluster_feature=cluster_feature,
+ cluster_annotations=cluster_annotations,
+ standard_scale=standard_scale,
+ z_score=z_score,
+ swap_axes=swap_axes,
+ rotate_label=rotate_label,
+ figsize=(fig_width, fig_height),
+ dendrogram_ratio=(r_h_axis_dendrogram, r_v_axis_dendrogram),
+ vmin=vmin,
+ vmax=vmax,
+ cmap=color_map
+ )
+ print("Printing mean intensity data.")
+ print(mean_intensity)
+ print()
+ print("Printing dendrogram data.")
+ for data in dendrogram_data:
+ print(data)
+ print(dendrogram_data[data])
+
+ # Ensure the mean_intensity index matches phenograph clusters
+ row_clusters = adata.obs[annotation].astype(str).unique()
+ mean_intensity[annotation] = mean_intensity.index.astype(str)
+
+ # Reorder columns to move 'clusters' to the first position
+ cols = mean_intensity.columns.tolist()
+ cols = [annotation] + [col for col in cols if col != annotation]
+ mean_intensity = mean_intensity[cols]
+
+ # Show the modified plot
+ clustergrid.ax_heatmap.set_title(Figure_Title)
+ clustergrid.height = fig_height * matrix_ratio
+ clustergrid.width = fig_width * matrix_ratio
+ plt.close(1)
+
+ if show_plot:
+ plt.show()
+
+ # Handle results based on save_results_flag
+ if save_results_flag:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ # Package figure in a dictionary for directory saving
+ # This ensures it's saved in a directory per standardized schema
+ if "figures" in params.get("outputs", {}):
+ results_dict["figures"] = {"hierarchical_heatmap": clustergrid.fig}
+
+ # Check for dataframe output
+ if "dataframe" in params.get("outputs", {}):
+ results_dict["dataframe"] = mean_intensity
+
+ # Use centralized save_results function
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ print("Hierarchical Heatmap completed successfully.")
+ return saved_files
+ else:
+ # Return the dataframe directly for in-memory workflows
+ print("Returning mean intensity dataframe (not saving to file)")
+ return mean_intensity
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python hierarchical_heatmap_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ result = run_from_json(sys.argv[1], output_dir=output_dir)
+
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for filename, filepath in result.items():
+ if isinstance(filepath, list):
+ print(f" {filename}: {len(filepath)} files in directory")
+ else:
+ print(f" {filename}: {filepath}")
+ else:
+ print("\nReturned mean intensity dataframe")
diff --git a/src/spac/templates/histogram_template.py b/src/spac/templates/histogram_template.py
new file mode 100644
index 00000000..0a3924d4
--- /dev/null
+++ b/src/spac/templates/histogram_template.py
@@ -0,0 +1,349 @@
+"""
+Platform-agnostic Histogram template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Reads outputs configuration from blueprint JSON file.
+
+Usage
+-----
+>>> from spac.templates.histogram_template import run_from_json
+>>> run_from_json("examples/histogram_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union, Optional, Tuple, List
+import pandas as pd
+import matplotlib.pyplot as plt
+import seaborn as sns
+import logging
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.visualization import histogram
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+ text_to_value,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ show_plot: bool = False,
+ output_dir: str = None,
+) -> Union[Dict[str, Union[str, List[str]]], Tuple[Any, pd.DataFrame]]:
+ """
+ Execute Histogram analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Analysis": "path/to/data.pickle",
+ "Plot_By": "Annotation",
+ "Annotation": "cell_type",
+ ...
+ "outputs": {
+ "dataframe": {"type": "file", "name": "dataframe.csv"},
+ "figures": {"type": "directory", "name": "figures_dir"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If False, returns the figure and
+ dataframe directly for in-memory workflows. Default is True.
+ show_plot : bool, optional
+ Whether to display the plot. Default is False.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory.
+
+ Returns
+ -------
+ dict or tuple
+ If save_to_disk=True: Dictionary of saved file paths
+ If save_to_disk=False: Tuple of (figure, dataframe)
+ """
+ # Set up logging
+ logging.basicConfig(level=logging.INFO)
+ logger = logging.getLogger(__name__)
+
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ if "outputs" not in params:
+ params["outputs"] = {
+ "dataframe": {"type": "file", "name": "dataframe.csv"},
+ "figures": {"type": "directory", "name": "figures_dir"}
+ }
+
+ # Load the upstream analysis data
+ adata = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters
+ feature = text_to_value(params.get("Feature", "None"))
+ annotation = text_to_value(params.get("Annotation", "None"))
+ layer = params.get("Table_", "Original")
+ group_by = params.get("Group_by", "None")
+ together = params.get("Together", True)
+ fig_width = params.get("Figure_Width", 8)
+ fig_height = params.get("Figure_Height", 6)
+ font_size = params.get("Font_Size", 12)
+ fig_dpi = params.get("Figure_DPI", 300)
+ legend_location = params.get("Legend_Location", "best")
+ legend_in_figure = params.get("Legend_in_Figure", False)
+ take_X_log = params.get("Take_X_Log", False)
+ take_Y_log = params.get("Take_Y_log", False)
+ multiple = params.get("Multiple", "dodge")
+ shrink = params.get("Shrink_Number", 1)
+ bins = params.get("Bins", "auto")
+ alpha = params.get("Bin_Transparency", 0.75)
+ stat = params.get("Stat", "count")
+ x_rotate = params.get("X_Axis_Label_Rotation", 0)
+ histplot_by = params.get("Plot_By", "Annotation")
+
+ # Close all existing figures to prevent extra plots
+ plt.close('all')
+ existing_fig_nums = plt.get_fignums()
+
+ plt.rcParams.update({'font.size': font_size})
+
+ # Adjust feature and annotation based on histplot_by
+ if histplot_by == "Annotation":
+ feature = None
+ else:
+ annotation = None
+
+ # If both feature and annotation are None, set default
+ if feature is None and annotation is None:
+ if histplot_by == "Annotation":
+ if adata.obs.columns.size > 0:
+ annotation = adata.obs.columns[0]
+ logger.info(
+ f'No annotation specified. Using the first annotation '
+ f'"{annotation}" as default.'
+ )
+ else:
+ raise ValueError(
+ 'No annotations available in adata.obs to plot.'
+ )
+ else:
+ if adata.var_names.size > 0:
+ feature = adata.var_names[0]
+ logger.info(
+ f'No feature specified. Using the first feature '
+ f'"{feature}" as default.'
+ )
+ else:
+ raise ValueError(
+ 'No features available in adata.var_names to plot.'
+ )
+
+ # Validate and set bins
+ if feature is not None:
+ bins = text_to_value(
+ bins,
+ default_none_text="auto",
+ to_int=True,
+ param_name="bins"
+ )
+ if bins is None:
+ num_rows = adata.X.shape[0]
+ bins = max(int(2 * (num_rows ** (1/3))), 1)
+ elif bins <= 0:
+ raise ValueError(
+ f'Bins should be a positive integer. Received "{bins}"'
+ )
+ elif annotation is not None:
+ if take_X_log:
+ take_X_log = False
+ logger.warning(
+ "Take X log should only apply to feature. "
+ "Setting Take X Log to False."
+ )
+ if bins != 'auto':
+ bins = 'auto'
+ logger.warning(
+ "Bin number should only apply to feature. "
+ "Setting bin number calculation to auto."
+ )
+
+ if (x_rotate < 0) or (x_rotate > 360):
+ raise ValueError(
+ f'The X label rotation should fall within 0 to 360 degree. '
+ f'Received "{x_rotate}".'
+ )
+
+ # Initialize the x-variable before the loop
+ if histplot_by == "Annotation":
+ x_var = annotation
+ else:
+ x_var = feature
+
+ result = histogram(
+ adata=adata,
+ feature=feature,
+ annotation=annotation,
+ layer=text_to_value(layer, "Original"),
+ group_by=text_to_value(group_by),
+ together=together,
+ ax=None,
+ x_log_scale=take_X_log,
+ y_log_scale=take_Y_log,
+ multiple=multiple,
+ shrink=shrink,
+ bins=bins,
+ alpha=alpha,
+ stat=stat
+ )
+
+ fig = result["fig"]
+ axs = result["axs"]
+ df_counts = result["df"]
+
+ # Set figure size and dpi
+ fig.set_size_inches(fig_width, fig_height)
+ fig.set_dpi(fig_dpi)
+
+ # Ensure axes is a list
+ if isinstance(axs, list):
+ axes = axs
+ else:
+ axes = [axs]
+
+ # Close any extra figures created during the histogram call
+ fig_nums_after = plt.get_fignums()
+ new_fig_nums = [
+ num for num in fig_nums_after if num not in existing_fig_nums
+ ]
+ histogram_fig_num = fig.number
+
+ for num in new_fig_nums:
+ if num != histogram_fig_num:
+ plt.close(plt.figure(num))
+ logger.debug(f"Closed extra figure {num}")
+
+ # Process each axis
+ for ax in axes:
+ if feature:
+ logger.info(f'Plotting Feature: "{feature}"')
+ if ax.get_legend() is not None:
+ if legend_in_figure:
+ sns.move_legend(ax, legend_location)
+ else:
+ sns.move_legend(
+ ax, legend_location, bbox_to_anchor=(1, 1)
+ )
+
+ # Rotate x labels
+ ax.tick_params(axis='x', rotation=x_rotate)
+
+ # Set titles based on group_by
+ if text_to_value(group_by):
+ if together:
+ for ax in axes:
+ ax.set_title(
+ f'Histogram of "{x_var}" grouped by "{group_by}"'
+ )
+ else:
+ # compute unique groups directly from adata.obs.
+ unique_groups = adata.obs[
+ text_to_value(group_by)
+ ].dropna().unique()
+ if len(axes) != len(unique_groups):
+ logger.warning(
+ "Number of axes does not match number of "
+ "groups. Titles may not correspond correctly."
+ )
+ for ax, grp in zip(axes, unique_groups):
+ ax.set_title(
+ f'Histogram of "{x_var}" for group: "{grp}"'
+ )
+ else:
+ for ax in axes:
+ ax.set_title(f'Count plot of "{x_var}"')
+
+ plt.tight_layout()
+
+ logger.info("Displaying top 10 rows of histogram dataframe:")
+ print(df_counts.head(10))
+
+ if show_plot:
+ plt.show()
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ # Check for dataframe output
+ if "dataframe" in params["outputs"]:
+ results_dict["dataframe"] = df_counts
+
+ # Check for figures output
+ if "figures" in params["outputs"]:
+ results_dict["figures"] = {"histogram": fig}
+
+ # Use centralized save_results function
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ plt.close('all')
+
+ logger.info("Histogram analysis completed successfully.")
+ return saved_files
+ else:
+ # Return the figure and dataframe directly for in-memory workflows
+ logger.info("Returning figure and dataframe for in-memory use")
+ return fig, df_counts
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python histogram_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ print("\nReturned figure and dataframe")
diff --git a/src/spac/templates/interactive_spatial_plot_template.py b/src/spac/templates/interactive_spatial_plot_template.py
new file mode 100644
index 00000000..e63e0df2
--- /dev/null
+++ b/src/spac/templates/interactive_spatial_plot_template.py
@@ -0,0 +1,241 @@
+"""
+Platform-agnostic Interactive Spatial Plot template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Follows standardized output schema where HTML files are saved as a directory.
+
+Usage
+-----
+>>> from spac.templates.interactive_spatial_plot_template import run_from_json
+>>> run_from_json("examples/interactive_spatial_plot_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union, List, Optional
+import pandas as pd
+import plotly.io as pio
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+# Import SPAC functions from NIDAP template
+from spac.visualization import interactive_spatial_plot
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+ text_to_value,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, Union[str, List[str]]], None]:
+ """
+ Execute Interactive Spatial Plot analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Analysis": "path/to/data.pickle",
+ "Color_By": "Annotation",
+ "Annotation_s_to_Highlight": ["renamed_phenotypes"],
+ "outputs": {
+ "html": {"type": "directory", "name": "html_dir"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If False, returns None as plots are
+ shown interactively. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory. All outputs will be saved relative to this directory.
+
+ Returns
+ -------
+ dict or None
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {"html": ["path/to/html_dir/plot1.html", ...]}
+ If save_to_disk=False: None (plots are shown interactively)
+
+ Notes
+ -----
+ Output Structure:
+ - HTML files are saved in a directory (standardized for HTML outputs)
+ - When save_to_disk=False, plots are shown interactively
+
+ Examples
+ --------
+ >>> # Save results to disk
+ >>> saved_files = run_from_json("params.json")
+ >>> print(saved_files["html"]) # List of HTML file paths
+ >>> # ['./html_dir/plot_1.html', './html_dir/plot_2.html']
+
+ >>> # Display plots interactively without saving
+ >>> run_from_json("params.json", save_to_disk=False)
+
+ >>> # Custom output directory
+ >>> saved = run_from_json("params.json", output_dir="/custom/path")
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ # HTML uses directory type per standardized schema
+ if "outputs" not in params:
+ params["outputs"] = {
+ "html": {"type": "directory", "name": "html_dir"}
+ }
+
+ # Load the upstream analysis data
+ adata = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters
+ color_by = params["Color_By"]
+ annotations = params.get("Annotation_s_to_Highlight", [""])
+ feature = params.get("Feature_to_Highlight", "None")
+ layer = params.get("Table", "Original")
+
+ dot_size = params.get("Dot_Size", 1.5)
+ dot_transparency = params.get("Dot_Transparency", 0.75)
+ color_map = params.get("Feature_Color_Scale", "balance")
+ desired_width_in = params.get("Figure_Width", 6)
+ desired_height_in = params.get("Figure_Height", 4)
+ dpi = params.get("Figure_DPI", 200)
+ Font_size = params.get("Font_Size", 12)
+ stratify_by = text_to_value(
+ params.get("Stratify_By", "None"),
+ param_name="Stratify By"
+ )
+
+ defined_color_map = text_to_value(
+ params.get("Define_Label_Color_Mapping", "None"),
+ param_name="Define Label Color Mapping"
+ )
+
+ cmin = params.get("Lower_Colorbar_Bound", 999)
+ cmax = params.get("Upper_Colorbar_Bound", -999)
+
+ flip_y = params.get("Flip_Vertical_Axis", False)
+
+ # Process parameters
+ feature = text_to_value(feature)
+ if color_by == "Annotation":
+ feature = None
+ if len(annotations) == 0:
+ raise ValueError(
+ 'Please set at least one value in the '
+ '"Annotation(s) to Highlight" parameter'
+ )
+ else:
+ annotations = None
+ if feature is None:
+ raise ValueError('Please set the "Feature to Highlight" parameter.')
+
+ layer = text_to_value(layer, "Original")
+
+ # Execute the interactive spatial plot
+ result_list = interactive_spatial_plot(
+ adata=adata,
+ annotations=annotations,
+ feature=feature,
+ layer=layer,
+ dot_size=dot_size,
+ dot_transparency=dot_transparency,
+ feature_colorscale=color_map,
+ figure_width=desired_width_in,
+ figure_height=desired_height_in,
+ figure_dpi=dpi,
+ font_size=Font_size,
+ stratify_by=stratify_by,
+ defined_color_map=defined_color_map,
+ reverse_y_axis=flip_y,
+ cmin=cmin,
+ cmax=cmax
+ )
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare HTML outputs as a dictionary for directory saving
+ html_dict = {}
+
+ for result in result_list:
+ image_name = result['image_name']
+ image_object = result['image_object']
+
+ # Show the plot (as in NIDAP template)
+ image_object.show()
+
+ # Convert to HTML
+ html_content = pio.to_html(image_object, full_html=True)
+
+ # Add to dictionary with appropriate name
+ html_dict[image_name] = html_content
+
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+ if "html" in params["outputs"]:
+ results_dict["html"] = html_dict
+
+ # Use centralized save_results function
+ # All file handling and logging is now done by save_results
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ print(
+ f"Interactive Spatial Plot completed → "
+ f"{saved_files.get('html', [])}"
+ )
+ return saved_files
+ else:
+ # Just show the plots without saving
+ for result in result_list:
+ result['image_object'].show()
+
+ print("Displayed interactive plots without saving")
+ return None
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python interactive_spatial_plot_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ print("\nDisplayed interactive plots")
diff --git a/src/spac/templates/load_csv_files_template.py b/src/spac/templates/load_csv_files_template.py
new file mode 100644
index 00000000..0bb7cf87
--- /dev/null
+++ b/src/spac/templates/load_csv_files_template.py
@@ -0,0 +1,94 @@
+"""
+Platform-agnostic Load CSV Files template converted from NIDAP.
+Handles both Galaxy (list of file paths) and NIDAP (directory path) inputs.
+
+Usage
+-----
+>>> from spac.templates.load_csv_files_template import run_from_json
+>>> run_from_json("examples/load_csv_params.json")
+"""
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union
+import pandas as pd
+import logging
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.templates.template_utils import (
+ save_results,
+ parse_params,
+ load_csv_files,
+)
+
+logger = logging.getLogger(__name__)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, str], pd.DataFrame]:
+ """
+ Execute Load CSV Files analysis with parameters from JSON.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file or parameter dictionary
+ save_to_disk : bool, optional
+ Whether to save results to disk. Default is True.
+ output_dir : str, optional
+ Base directory for outputs.
+
+ Returns
+ -------
+ dict or DataFrame
+ If save_to_disk=True: Dictionary of saved file paths
+ If save_to_disk=False: The processed DataFrame
+ """
+ params = parse_params(json_path)
+
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ if "outputs" not in params:
+ params["outputs"] = {"dataframe": {"type": "file", "name": "dataframe.csv"}}
+
+ # Load configuration
+ files_config = pd.read_csv(params["CSV_Files_Configuration"])
+
+ # Load and combine CSV files using centralized utility
+ final_df = load_csv_files(
+ csv_input=params["CSV_Files"],
+ files_config=files_config,
+ string_columns=params.get("String_Columns", [])
+ )
+
+ logger.info(f"Load CSV Files completed: {final_df.shape}")
+
+ # Save or return results
+ if save_to_disk:
+ saved_files = save_results(
+ results={"dataframe": final_df},
+ params=params,
+ output_base_dir=output_dir
+ )
+ return saved_files
+ else:
+ return final_df
+
+
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print("Usage: python load_csv_files_template.py [output_dir]")
+ sys.exit(1)
+
+ logging.basicConfig(level=logging.INFO)
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+ result = run_from_json(sys.argv[1], output_dir=output_dir)
+
+ if isinstance(result, dict):
+ for key, path in result.items():
+ print(f"{key}: {path}")
diff --git a/src/spac/templates/manual_phenotyping_template.py b/src/spac/templates/manual_phenotyping_template.py
new file mode 100644
index 00000000..85f11024
--- /dev/null
+++ b/src/spac/templates/manual_phenotyping_template.py
@@ -0,0 +1,236 @@
+#!/usr/bin/env python3
+"""
+Platform-agnostic Manual Phenotyping template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Reads outputs configuration from blueprint JSON file.
+
+Usage
+-----
+>>> from spac.templates.manual_phenotyping_template import run_from_json
+>>> run_from_json("examples/manual_phenotyping_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union, List, Optional
+import pandas as pd
+import logging
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.phenotyping import assign_manual_phenotypes
+from spac.templates.template_utils import (
+ save_results,
+ parse_params,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, str], pd.DataFrame]:
+ """
+ Execute Manual Phenotyping analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Dataset": "path/to/dataframe.csv",
+ "Phenotypes_Code": "path/to/phenotypes.csv",
+ "Classification_Column_Prefix": "",
+ "Classification_Column_Suffix": "",
+ "Allow_Multiple_Phenotypes": true,
+ "Manual_Annotation_Name": "manual_phenotype",
+ "outputs": {
+ "dataframe": {"type": "file", "name": "dataframe.csv"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If True, saves the DataFrame with
+ phenotype annotations to a CSV file. If False, returns the DataFrame
+ directly for in-memory workflows. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory. All outputs will be saved relative to this directory.
+
+ Returns
+ -------
+ dict or DataFrame
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {
+ "dataframe": "path/to/dataframe.csv"
+ }
+ If save_to_disk=False: The processed DataFrame with phenotype annotations
+
+ Notes
+ -----
+ Output Structure:
+ - DataFrame is saved as a single CSV file
+ - When save_to_disk=False, the DataFrame is returned for programmatic use
+
+ Examples
+ --------
+ >>> # Save results to disk
+ >>> saved_files = run_from_json("params.json")
+ >>> print(saved_files["dataframe"]) # Path to saved CSV file
+
+ >>> # Get results in memory
+ >>> phenotyped_df = run_from_json("params.json", save_to_disk=False)
+
+ >>> # Custom output directory
+ >>> saved = run_from_json("params.json", output_dir="/custom/path")
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ if "outputs" not in params:
+ params["outputs"] = {
+ "dataframe": {"type": "file", "name": "dataframe.csv"}
+ }
+
+ # Load upstream data - DataFrame or CSV file
+ upstream = params['Upstream_Dataset']
+ if isinstance(upstream, pd.DataFrame):
+ dataframe = upstream # Direct DataFrame from previous step
+ elif isinstance(upstream, (str, Path)):
+ # Galaxy passes .dat files, but they contain CSV data
+ # Don't check extension - directly read as CSV
+ path = Path(upstream)
+ try:
+ dataframe = pd.read_csv(path)
+ logging.info(f"Successfully loaded CSV data from: {path}")
+ except Exception as e:
+ raise ValueError(
+ f"Failed to read CSV data from '{path}'. "
+ f"This tool expects CSV/tabular format. "
+ f"Error: {str(e)}"
+ )
+ else:
+ raise TypeError(
+ f"Upstream_Dataset must be DataFrame or file path. "
+ f"Got {type(upstream)}"
+ )
+
+ # Load phenotypes code - DataFrame or CSV file
+ phenotypes_input = params['Phenotypes_Code']
+ if isinstance(phenotypes_input, pd.DataFrame):
+ phenotypes = phenotypes_input
+ elif isinstance(phenotypes_input, (str, Path)):
+ # Galaxy passes .dat files, but they contain CSV data
+ # Don't check extension - directly read as CSV
+ path = Path(phenotypes_input)
+ try:
+ phenotypes = pd.read_csv(path)
+ logging.info(f"Successfully loaded phenotypes from: {path}")
+ except Exception as e:
+ raise ValueError(
+ f"Failed to read CSV data from '{path}'. "
+ f"This tool expects CSV/tabular format. "
+ f"Error: {str(e)}"
+ )
+ else:
+ raise TypeError(
+ f"Phenotypes_Code must be DataFrame or file path. "
+ f"Got {type(phenotypes_input)}"
+ )
+
+ # Extract parameters
+ prefix = params.get('Classification_Column_Prefix', '')
+ suffix = params.get('Classification_Column_Suffix', '')
+ multiple = params.get('Allow_Multiple_Phenotypes', True)
+ manual_annotation = params.get('Manual_Annotation_Name', 'manual_phenotype')
+
+ logging.info(f"Phenotypes configuration:\n{phenotypes}")
+
+ # returned_dic is not used, but copy from original NIDAP logic
+ returned_dic = assign_manual_phenotypes(
+ dataframe,
+ phenotypes,
+ prefix=prefix,
+ suffix=suffix,
+ annotation=manual_annotation,
+ multiple=multiple
+ )
+
+ # The dataframe changes in place
+
+ # Print summary statistics
+ phenotype_counts = dataframe[manual_annotation].value_counts()
+ logging.info(f"\nPhenotype distribution:\n{phenotype_counts}")
+
+ logging.info("\nManual Phenotyping completed successfully.")
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ # Check for dataframe output
+ if "dataframe" in params["outputs"]:
+ results_dict["dataframe"] = dataframe
+
+ # Use centralized save_results function
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ logging.info("Manual Phenotyping analysis completed successfully.")
+ return saved_files
+ else:
+ # Return the DataFrame directly for in-memory workflows
+ logging.info("Returning DataFrame for in-memory use")
+ return dataframe
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python manual_phenotyping_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ # Run analysis
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ # Display results based on return type
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ print("\nReturned DataFrame")
+ print(f"DataFrame shape: {result.shape}")
diff --git a/src/spac/templates/nearest_neighbor_calculation_template.py b/src/spac/templates/nearest_neighbor_calculation_template.py
new file mode 100644
index 00000000..45dabb71
--- /dev/null
+++ b/src/spac/templates/nearest_neighbor_calculation_template.py
@@ -0,0 +1,207 @@
+"""
+Platform-agnostic Nearest Neighbor Calculation template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Usage
+-----
+>>> from spac.templates.nearest_neighbor_calculation_template import (
+... run_from_json
+... )
+>>> run_from_json("examples/nearest_neighbor_calculation_params.json")
+"""
+import logging
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.spatial_analysis import calculate_nearest_neighbor
+from spac.templates.template_utils import (
+ load_input,
+ parse_params,
+ save_results,
+ text_to_value,
+)
+
+# Set up logging
+logger = logging.getLogger(__name__)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: Union[str, Path] = None
+) -> Union[Dict[str, str], Any]:
+ """
+ Execute Nearest Neighbor Calculation analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Analysis": "path/to/input.pickle",
+ "Annotation": "cell_type",
+ "ImageID": "None",
+ "Nearest_Neighbor_Associated_Table": "spatial_distance",
+ "Verbose": true,
+ "outputs": {
+ "analysis": {"type": "file", "name": "output.pickle"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If False, returns the adata object
+ directly for in-memory workflows. Default is True.
+ output_dir : str or Path, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory. All outputs will be saved relative to this directory.
+
+ Returns
+ -------
+ dict or AnnData
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {"analysis": "path/to/output.pickle"}
+ If save_to_disk=False: The processed AnnData object
+
+ Notes
+ -----
+ Output Structure:
+ - Analysis output is saved as a single pickle file (standardized for analysis outputs)
+ - When save_to_disk=False, the AnnData object is returned for programmatic use
+
+ Examples
+ --------
+ >>> # Save results to disk
+ >>> saved_files = run_from_json("params.json")
+ >>> print(saved_files["analysis"]) # Path to saved pickle file
+ >>> # './output.pickle'
+
+ >>> # Get results in memory for further processing
+ >>> adata = run_from_json("params.json", save_to_disk=False)
+ >>> # Can now work with adata object directly
+
+ >>> # Custom output directory
+ >>> saved = run_from_json("params.json", output_dir="/custom/path")
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ # Analysis uses file type per standardized schema
+ if "outputs" not in params:
+ params["outputs"] = {
+ "analysis": {"type": "file", "name": "output.pickle"}
+ }
+
+ # Load the upstream analysis data
+ adata = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters
+ annotation = params["Annotation"]
+ spatial_associated_table = "spatial"
+ imageid = params.get("ImageID", "None")
+ label = params.get(
+ "Nearest_Neighbor_Associated_Table", "spatial_distance"
+ )
+ verbose = params.get("Verbose", True)
+
+ # Convert any string "None" to actual None for Python
+ imageid = text_to_value(imageid, default_none_text="None")
+
+ logger.info(
+ "Running `calculate_nearest_neighbor` with the following parameters:"
+ )
+ logger.info(f" annotation: {annotation}")
+ logger.info(f" spatial_associated_table: {spatial_associated_table}")
+ logger.info(f" imageid: {imageid}")
+ logger.info(f" label: {label}")
+ logger.info(f" verbose: {verbose}")
+
+ # Perform the nearest neighbor calculation
+ calculate_nearest_neighbor(
+ adata=adata,
+ annotation=annotation,
+ spatial_associated_table=spatial_associated_table,
+ imageid=imageid,
+ label=label,
+ verbose=verbose
+ )
+
+ logger.info("Nearest neighbor calculation complete.")
+ logger.info(f"adata.obsm keys: {list(adata.obsm.keys())}")
+ if label in adata.obsm:
+ logger.info(
+ f"Preview of adata.obsm['{label}']:\n{adata.obsm[label].head()}"
+ )
+
+ logger.info(f"{adata}")
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ if "analysis" in params["outputs"]:
+ results_dict["analysis"] = adata
+
+ # Use centralized save_results function
+ # All file handling and logging is now done by save_results
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ logger.info(
+ f"Nearest Neighbor Calculation completed → "
+ f"{saved_files['analysis']}"
+ )
+ return saved_files
+ else:
+ # Return the adata object directly for in-memory workflows
+ logger.info("Returning AnnData object (not saving to file)")
+ return adata
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python nearest_neighbor_calculation_template.py "
+ " [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ # Run analysis
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ # Display results based on return type
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, path in result.items():
+ print(f" {key}: {path}")
+ else:
+ print("\nReturned AnnData object for in-memory use")
+ print(f"AnnData: {result}")
+ print(f"Shape: {result.shape}")
diff --git a/src/spac/templates/neighborhood_profile_template.py b/src/spac/templates/neighborhood_profile_template.py
new file mode 100644
index 00000000..fabe5e21
--- /dev/null
+++ b/src/spac/templates/neighborhood_profile_template.py
@@ -0,0 +1,272 @@
+"""
+Platform-agnostic Neighborhood Profile template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Usage
+-----
+>>> from spac.templates.neighborhood_profile_template import run_from_json
+>>> run_from_json("examples/neighborhood_profile_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union, List, Optional, Tuple
+import pandas as pd
+import numpy as np
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.spatial_analysis import neighborhood_profile
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+ text_to_value,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: Union[str, Path] = None
+) -> Union[Dict[str, str], Dict[Tuple[str, str], pd.DataFrame]]:
+ """
+ Execute Neighborhood Profile analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary
+ save_to_disk : bool, optional
+ Whether to save results to file. If False, returns the dataframes
+ directly for in-memory workflows. Default is True.
+ output_dir : str or Path, optional
+ Output directory for results. If None, uses params['Output_Directory'] or '.'
+
+ Returns
+ -------
+ dict
+ If save_to_disk=True: Dictionary of saved file paths
+ If save_to_disk=False: Dictionary of (anchor, neighbor) tuples
+ to DataFrames
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ # Neighborhood Profile dataframes use directory type per special case in template_utils
+ if "outputs" not in params:
+ params["outputs"] = {
+ "dataframe": {"type": "directory", "name": "dataframe_dir"}
+ }
+
+ # Load the upstream analysis data
+ adata = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters
+ cell_types_annotation = params["Annotation_of_interest"]
+ bins = params["Bins"]
+ slide_names = params.get("Stratify_By", "None")
+ normalization = None
+ output_table = "neighborhood_profile"
+
+ anchor_neighbor_list = params["Anchor_Neighbor_List"]
+ anchor_neighbor_list = [
+ tuple(map(str.strip, item.split(";")))
+ for item in anchor_neighbor_list
+ ]
+
+ # Call the spatial umap calculation
+ bins = [float(radius) for radius in bins]
+ slide_names = text_to_value(slide_names)
+
+ neighborhood_profile(
+ adata,
+ phenotypes=cell_types_annotation,
+ distances=bins,
+ regions=slide_names,
+ spatial_key="spatial",
+ normalize=normalization,
+ associated_table_name=output_table
+ )
+
+ print(adata)
+ print(adata.obsm[output_table].shape)
+ print(adata.uns[output_table])
+
+ dataframes, filenames = neighborhood_profiles_for_pairs(
+ adata,
+ cell_types_annotation,
+ slide_names,
+ bins,
+ anchor_neighbor_list,
+ output_table
+ )
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Package dataframes in a dictionary for directory saving
+ # This ensures they're saved in a directory per standardized schema
+ results_dict = {}
+
+ # Create a dictionary of dataframes with their filenames as keys
+ dataframe_dict = {}
+ for (anchor_label, neighbor_label), filename in zip(
+ dataframes.keys(), filenames
+ ):
+ df = dataframes[(anchor_label, neighbor_label)]
+ # Remove .csv extension as save_results will add it
+ key = filename.replace('.csv', '')
+ dataframe_dict[key] = df
+
+ # Store in results with "dataframe" key to match outputs config
+ if "dataframe" in params["outputs"]:
+ results_dict["dataframe"] = dataframe_dict
+
+ # Use centralized save_results function
+ # All file handling and logging is now done by save_results
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ print(f"Neighborhood Profile completed → {len(saved_files.get('dataframe', []))} files")
+ return saved_files
+ else:
+ # Return the dataframes directly for in-memory workflows
+ print("Returning dataframes (not saving to file)")
+ return dataframes
+
+
+# Global imports and functions included below
+
+def neighborhood_profiles_for_pairs(
+ adata,
+ cell_types_annotation,
+ slide_names,
+ bins,
+ anchor_neighbor_list,
+ output_table
+):
+ """
+ Compute neighborhood profiles for all anchor-neighbor pairs and return
+ a tuple containing a dictionary of DataFrames and a list of filenames
+ for saving.
+
+ Parameters
+ ----------
+ adata : AnnData
+ The AnnData object containing spatial and phenotypic data.
+
+ cell_types_annotation : str
+ The column name in adata.obs containing the cell phenotype labels.
+
+ slide_names : str
+ The column name in adata.obs containing the slide names.
+
+ bins : list
+ List of increasing distance bins.
+
+ anchor_neighbor_list : list of tuples
+ List of (anchor_label, neighbor_label) pairs.
+
+ output_table : str
+ The key in adata.obsm containing neighborhood profile data.
+
+ Returns
+ -------
+ tuple
+ - A dictionary of DataFrames for each (anchor, neighbor) pair.
+ - A list of filenames where each DataFrame should be saved.
+ """
+
+ dataframes = {}
+ filenames = []
+
+ # Get the array of neighbor labels
+ neighbor_labels = adata.uns[output_table]["labels"]
+
+ for anchor_label, neighbor_label in anchor_neighbor_list:
+ # Create bin labels with the neighbor type
+ bins_with_ranges = [
+ f"{neighbor_label}_{bins[i]}-{bins[i+1]}"
+ for i in range(len(bins) - 1)
+ ]
+
+ # Find the index of the requested neighbor label
+ neighbor_index = np.where(neighbor_labels == neighbor_label)[0]
+
+ if len(neighbor_index) == 0:
+ raise ValueError(
+ f"Neighbor label '{neighbor_label}' not found in "
+ f"{output_table} labels."
+ )
+
+ neighbor_index = neighbor_index[0] # Extract the first index
+
+ # Extract the neighborhood profile for the specific neighbor
+ # Shape: (n_cells, n_bins)
+ profile_data = adata.obsm[output_table][:, neighbor_index, :]
+
+ # Construct DataFrame
+ df = pd.DataFrame(profile_data, columns=bins_with_ranges)
+
+ # Add cell phenotype labels and slide names
+ df.insert(
+ 0, cell_types_annotation,
+ adata.obs[cell_types_annotation].values
+ )
+ if slide_names is not None:
+ df.insert(0, slide_names, adata.obs[slide_names].values)
+
+ # Filter for the anchor cell type
+ filtered_df = df[df[cell_types_annotation] == anchor_label]
+
+ # Generate a filename for saving
+ filename = f"anchor_{anchor_label}_neighbor_{neighbor_label}.csv"
+
+ # Store the DataFrame and filename
+ dataframes[(anchor_label, neighbor_label)] = filtered_df
+ filenames.append(filename)
+
+ return dataframes, filenames
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python neighborhood_profile_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ # Run analysis
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths[:3]: # Show first 3 files
+ print(f" - {path}")
+ if len(paths) > 3:
+ print(f" ... and {len(paths) - 3} more files")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ print("\nReturned dataframes for in-memory use")
diff --git a/src/spac/templates/normalize_batch_template.py b/src/spac/templates/normalize_batch_template.py
new file mode 100644
index 00000000..73ef838e
--- /dev/null
+++ b/src/spac/templates/normalize_batch_template.py
@@ -0,0 +1,187 @@
+"""
+Platform-agnostic Normalize Batch template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Reads outputs configuration from blueprint JSON file.
+
+Usage
+-----
+>>> from spac.templates.normalize_batch_template import run_from_json
+>>> run_from_json("examples/normalize_batch_params.json")
+"""
+import json
+import sys
+import logging
+from pathlib import Path
+from typing import Any, Dict, Union
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.transformations import batch_normalize
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+)
+
+logger = logging.getLogger(__name__)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, str], Any]:
+ """
+ Execute Normalize Batch analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Analysis": "path/to/data.pickle",
+ "Annotation": "batch_column",
+ "Input_Table_Name": "Original",
+ "Output_Table_Name": "batch_normalized_table",
+ "Normalization_Method": "median",
+ "Take_Log": false,
+ "Need_Normalization": true,
+ "outputs": {
+ "analysis": {"type": "file", "name": "output.pickle"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If False, returns the adata object
+ directly for in-memory workflows. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory.
+
+ Returns
+ -------
+ dict or AnnData
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {"analysis": "path/to/output.pickle"}
+ If save_to_disk=False: The processed AnnData object
+
+ Notes
+ -----
+ Output Structure:
+ - Analysis output is saved as a single pickle file
+ - When save_to_disk=False, the AnnData object is returned for programmatic use
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ if "outputs" not in params:
+ params["outputs"] = {
+ "analysis": {"type": "file", "name": "output.pickle"}
+ }
+
+ # Load the upstream analysis data
+ all_data = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters
+ annotation = params["Annotation"]
+ input_layer = params.get("Input_Table_Name", "Original")
+
+ if input_layer == 'Original':
+ input_layer = None
+
+ output_layer = params.get("Output_Table_Name", "batch_normalized_table")
+ method = params.get("Normalization_Method", "median")
+ take_log = params.get("Take_Log", False)
+
+ need_normalization = params.get("Need_Normalization", False)
+ if need_normalization:
+ batch_normalize(
+ adata=all_data,
+ annotation=annotation,
+ input_layer=input_layer,
+ output_layer=output_layer,
+ method=method,
+ log=take_log
+ )
+
+ logger.info(
+ f"Statistics of original data:\n{all_data.to_df().describe()}"
+ )
+ logger.info(
+ f"Statistics of layer data:\n"
+ f"{all_data.to_df(layer=output_layer).describe()}"
+ )
+ else:
+ logger.info(
+ f"Statistics of original data:\n{all_data.to_df().describe()}"
+ )
+
+ logger.info(f"Current Analysis contains:\n{all_data}")
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ if "analysis" in params["outputs"]:
+ results_dict["analysis"] = all_data
+
+ # Use centralized save_results function
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ logger.info("Normalize Batch analysis completed successfully.")
+ return saved_files
+ else:
+ # Return the adata object directly for in-memory workflows
+ logger.info("Returning AnnData object for in-memory use")
+ return all_data
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python normalize_batch_template.py "
+ "[output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ print("\nReturned AnnData object")
diff --git a/src/spac/templates/phenograph_clustering_template.py b/src/spac/templates/phenograph_clustering_template.py
new file mode 100644
index 00000000..99d84f62
--- /dev/null
+++ b/src/spac/templates/phenograph_clustering_template.py
@@ -0,0 +1,197 @@
+"""
+Platform-agnostic Phenograph Clustering template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Follows standardized output schema where analysis is saved as a file.
+
+Usage
+-----
+>>> from spac.templates.phenograph_clustering_template import run_from_json
+>>> run_from_json("examples/phenograph_clustering_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union
+import pandas as pd
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.transformations import phenograph_clustering
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+ text_to_value,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, str], Any]:
+ """
+ Execute Phenograph Clustering analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Analysis": "path/to/data.pickle",
+ "Table_to_Process": "Original",
+ "K_Nearest_Neighbors": 30,
+ "outputs": {
+ "analysis": {"type": "file", "name": "output.pickle"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If False, returns the AnnData object
+ directly for in-memory workflows. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory.
+
+ Returns
+ -------
+ dict or AnnData
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {"analysis": "path/to/output.pickle"}
+ If save_to_disk=False: The processed AnnData object
+
+ Notes
+ -----
+ Output Structure:
+ - Analysis output is saved as a single pickle file (standardized for analysis outputs)
+ - When save_to_disk=False, the AnnData object is returned for programmatic use
+
+ Examples
+ --------
+ >>> # Save results to disk
+ >>> saved_files = run_from_json("params.json")
+ >>> print(saved_files["analysis"]) # Path to saved pickle file
+ >>> # './output.pickle'
+
+ >>> # Get results in memory
+ >>> adata = run_from_json("params.json", save_to_disk=False)
+ >>> # Can now work with adata object directly
+
+ >>> # Custom output directory
+ >>> saved = run_from_json("params.json", output_dir="/custom/path")
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ # Analysis uses file type per standardized schema
+ if "outputs" not in params:
+ params["outputs"] = {
+ "analysis": {"type": "file", "name": "output.pickle"}
+ }
+
+ # Load the upstream analysis data
+ adata = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters
+ Layer_name = params.get("Table_to_Process", "Original")
+ K_cluster = params.get("K_Nearest_Neighbors", 30)
+ Seed = params.get("Seed", 42)
+ resolution_parameter = params.get("Resolution_Parameter", 1.0)
+ output_annotation_name = params.get(
+ "Output_Annotation_Name", "phenograph"
+ )
+ # Used only in HPC profiling mode (not implemented in SPAC)
+ resolution_list = params.get("Resolution_List", [])
+
+ n_iterations = params.get("Number_of_Iterations", 100)
+
+ if Layer_name == "Original":
+ Layer_name = None
+
+ intensities = adata.var.index.to_list()
+
+ print("Before Phenograph Clustering: \n", adata)
+
+ phenograph_clustering(
+ adata=adata,
+ features=intensities,
+ layer=Layer_name,
+ k=K_cluster,
+ seed=Seed,
+ resolution_parameter=resolution_parameter,
+ n_iterations=n_iterations
+ )
+ if output_annotation_name != "phenograph":
+ adata.obs = adata.obs.rename(
+ columns={'phenograph': output_annotation_name}
+ )
+
+ print("After Phenograph Clustering: \n", adata)
+
+ # Count and display occurrences of each label in the annotation
+ print(
+ f'Count of cells in the output annotation:'
+ f'"{output_annotation_name}":'
+ )
+ label_counts = adata.obs[output_annotation_name].value_counts()
+ print(label_counts)
+ print("\n")
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary
+ results_dict = {}
+ if "analysis" in params["outputs"]:
+ results_dict["analysis"] = adata
+
+ # Use centralized save_results function
+ # All file handling and logging is now done by save_results
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ print(
+ f"Phenograph Clustering completed → "
+ f"{saved_files['analysis']}"
+ )
+ return saved_files
+ else:
+ # Return the adata object directly for in-memory workflows
+ print("Returning AnnData object (not saving to file)")
+ return adata
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python phenograph_clustering_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for filename, filepath in result.items():
+ print(f" {filename}: {filepath}")
+ else:
+ print("\nReturned AnnData object")
diff --git a/src/spac/templates/posit_it_python_template.py b/src/spac/templates/posit_it_python_template.py
new file mode 100644
index 00000000..2b4bf440
--- /dev/null
+++ b/src/spac/templates/posit_it_python_template.py
@@ -0,0 +1,281 @@
+"""
+Platform-agnostic Post-It-Python template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Reads outputs configuration from blueprint JSON file.
+
+Usage
+-----
+>>> from spac.templates.posit_it_python_template import run_from_json
+>>> run_from_json("examples/posit_it_python_params.json")
+"""
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union, List
+import logging
+import matplotlib.pyplot as plt
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.templates.template_utils import (
+ save_results,
+ parse_params,
+ text_to_value,
+)
+
+# Color palette mapping color names to hex codes
+PAINTS = {
+ 'White': '#FFFFFF',
+ 'LightGrey': '#D3D3D3',
+ 'Grey': '#999999',
+ 'Black': '#000000',
+ 'Red1': '#F44E3B',
+ 'Red2': '#D33115',
+ 'Red3': '#9F0500',
+ 'Orange1': '#FE9200',
+ 'Orange2': '#E27300',
+ 'Orange3': '#C45100',
+ 'Yellow1': '#FCDC00',
+ 'Yellow2': '#FCC400',
+ 'Yellow3': '#FB9E00',
+ 'YellowGreen1': '#DBDF00',
+ 'YellowGreen2': '#B0BC00',
+ 'YellowGreen3': '#808900',
+ 'Green1': '#A4DD00',
+ 'Green2': '#68BC00',
+ 'Green3': '#194D33',
+ 'Teal1': '#68CCCA',
+ 'Teal2': '#16A5A5',
+ 'Teal3': '#0C797D',
+ 'Blue1': '#73D8FF',
+ 'Blue2': '#009CE0',
+ 'Blue3': '#0062B1',
+ 'Purple1': '#AEA1FF',
+ 'Purple2': '#7B64FF',
+ 'Purple3': '#653294',
+ 'Magenta1': '#FDA1FF',
+ 'Magenta2': '#FA28FF',
+ 'Magenta3': '#AB149E',
+}
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ show_plot: bool = False,
+ output_dir: str = None,
+) -> Union[Dict[str, Union[str, List[str]]], plt.Figure]:
+ """
+ Execute Post-It-Python analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Label": "Post-It",
+ "Label_font_color": "Black",
+ "Label_font_size": "80",
+ ...
+ "outputs": {
+ "figures": {"type": "directory", "name": "figures_dir"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If False, returns the figure
+ directly for in-memory workflows. Default is True.
+ show_plot : bool, optional
+ Whether to display the plot. Default is False.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory.
+
+ Returns
+ -------
+ dict or Figure
+ If save_to_disk=True: Dictionary of saved file paths
+ If save_to_disk=False: The matplotlib figure object
+ """
+ # Set up logging
+ logging.basicConfig(level=logging.INFO)
+ logger = logging.getLogger(__name__)
+
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ if "outputs" not in params:
+ params["outputs"] = {
+ "figures": {"type": "directory", "name": "figures_dir"}
+ }
+
+ # Extract parameters using .get() with defaults from JSON template
+ text = params.get("Label", "Post-It")
+ text_color = params.get("Label_font_color", "Black")
+ text_size = params.get("Label_font_size", "80")
+ text_fontface = params.get("Label_font_type", "normal")
+ text_fontfamily = params.get("Label_font_family", "Arial")
+ bold = params.get("Label_Bold", "False")
+
+ # background params
+ fill_color = params.get("Background_fill_color", "Yellow1")
+ fill_alpha = params.get("Background_fill_opacity", "10")
+
+ # image params
+ image_width = params.get("Page_width", "18")
+ image_height = params.get("Page_height", "6")
+ image_resolution = params.get("Page_DPI", "300")
+
+ # Convert string parameters to appropriate types
+ text_size = text_to_value(
+ text_size,
+ to_int=True,
+ param_name="Label_font_size"
+ )
+
+ bold = text_to_value(bold) == "True"
+
+ fill_alpha = text_to_value(
+ fill_alpha,
+ to_float=True,
+ param_name="Background_fill_opacity"
+ )
+
+ image_width = text_to_value(
+ image_width,
+ to_float=True,
+ param_name="Page_width"
+ )
+
+ image_height = text_to_value(
+ image_height,
+ to_float=True,
+ param_name="Page_height"
+ )
+
+ image_resolution = text_to_value(
+ image_resolution,
+ to_int=True,
+ param_name="Page_DPI"
+ )
+
+ # RUN ====
+
+ # Create figure
+ fig = plt.figure(
+ figsize=(image_width, image_height),
+ dpi=image_resolution
+ )
+ fig.patch.set_facecolor(PAINTS[fill_color])
+ fig.patch.set_alpha(fill_alpha / 100)
+ for ax in fig.get_axes():
+ for item in ([ax.title, ax.xaxis.label, ax.yaxis.label] +
+ ax.get_xticklabels() + ax.get_yticklabels()):
+ item.set_fontsize(text_size)
+ item.set_fontfamily(text_fontfamily)
+ item.set_fontstyle(text_fontface)
+ if bold:
+ item.set_fontweight('bold')
+
+ fig.text(
+ 0.5, 0.5, text,
+ fontsize=text_size,
+ color=PAINTS[text_color],
+ ha='center',
+ va='center',
+ fontfamily=text_fontfamily,
+ fontstyle=text_fontface,
+ fontweight='bold' if bold else 'normal'
+ )
+
+ if show_plot:
+ plt.show()
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ if "figures" in params["outputs"]:
+ results_dict["figures"] = {"postit": fig}
+
+ # Use centralized save_results function
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ # Close figure after saving
+ plt.close(fig)
+
+ logger.info("Post-It-Python completed successfully.")
+ return saved_files
+ else:
+ # Return the figure object directly for in-memory workflows
+ logger.info("Returning figure object for in-memory use")
+ return fig
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python posit_it_python_template.py "
+ " [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ print(f"\nReturned figure object")
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) != 2:
+ print(
+ "Usage: python posit_it_python_template.py ",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ result = run_from_json(sys.argv[1])
+
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for filename, filepath in result.items():
+ print(f" {filename}: {filepath}")
+ else:
+ print("\nReturned figure object")
\ No newline at end of file
diff --git a/src/spac/templates/quantile_scaling_template.py b/src/spac/templates/quantile_scaling_template.py
new file mode 100644
index 00000000..48cc8bf8
--- /dev/null
+++ b/src/spac/templates/quantile_scaling_template.py
@@ -0,0 +1,319 @@
+"""
+Platform-agnostic Quantile Scaling template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Follows standardized output schema where html outputs are saved as directories.
+
+Usage
+-----
+>>> from spac.templates.quantile_scaling_template import run_from_json
+>>> run_from_json("examples/quantile_scaling_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union, List, Tuple
+import logging
+import pandas as pd
+import plotly.graph_objects as go
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.transformations import normalize_features
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+ text_to_value,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ show_plot: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, Union[str, List[str]]], Tuple[Any, go.Figure]]:
+ """
+ Execute Quantile Scaling analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Analysis": "path/to/data.pickle",
+ "Low_Quantile": "0.02",
+ "High_Quantile": "0.98",
+ "Interpolation": "nearest",
+ "Table_to_Process": "Original",
+ "Output_Table_Name": "normalized_feature",
+ "Per_Batch": "False",
+ "Annotation": null,
+ "outputs": {
+ "analysis": {"type": "file", "name": "quantile_scaled_data.pickle"},
+ "html": {"type": "directory", "name": "normalization_summary"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If False, returns the adata object
+ and figure directly for in-memory workflows. Default is True.
+ show_plot : bool, optional
+ Whether to display the plot. Default is True.
+ output_dir : str, optional
+ Override output directory from params. Default uses params value.
+
+ Returns
+ -------
+ dict or tuple
+ If save_to_disk=True: Dictionary of saved file paths
+ If save_to_disk=False: Tuple of (adata, figure)
+ """
+ # Set up logging
+ logging.basicConfig(level=logging.INFO)
+ logger = logging.getLogger(__name__)
+
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Load the upstream analysis data
+ logger.info(f"Loading upstream analysis data from {params['Upstream_Analysis']}")
+ adata = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters using .get() with defaults from JSON template
+ low_quantile = params.get("Low_Quantile", "0.02")
+ high_quantile = params.get("High_Quantile", "0.98")
+ interpolation = params.get("Interpolation", "nearest")
+ input_layer = params.get("Table_to_Process", "Original")
+ output_layer = params.get("Output_Table_Name", "normalized_feature")
+ per_batch = params.get("Per_Batch", "False")
+ # Annotation may be None, '', 'None', or a real name
+ annotation = params.get("Annotation")
+
+ # Convert parameters using text_to_value
+ if input_layer == "Original":
+ input_layer = None
+
+ low_quantile = text_to_value(
+ low_quantile,
+ to_float=True,
+ param_name='Low_Quantile'
+ )
+
+ high_quantile = text_to_value(
+ high_quantile,
+ to_float=True,
+ param_name='High_Quantile'
+ )
+
+ # Convert "True"/"False" string to boolean (case-insensitive)
+ per_batch = str(per_batch).strip().lower() == "true"
+
+ # Annotation is optional - empty string or "None" becomes None
+ annotation = text_to_value(annotation)
+
+ # Validate annotation is provided when per_batch is True
+ if per_batch and annotation is None:
+ raise ValueError(
+ 'Parameter "Annotation" is required when "Per Batch" is set '
+ 'to True.'
+ )
+
+ # Check if output_layer already exists in adata
+ logger.info(f"Checking if output layer '{output_layer}' exists in adata layers...")
+ if output_layer in adata.layers.keys():
+ raise ValueError(
+ f"Output Table Name '{output_layer}' already exists, "
+ f"please rename it."
+ )
+ else:
+ logger.info(f"Output layer '{output_layer}' does not exist. "
+ f"Proceeding with normalization.")
+
+ def df_as_html(
+ df,
+ columns_to_plot,
+ font_size=12,
+ column_scaler=1
+ ):
+ df = df.reset_index()
+ df = df[columns_to_plot]
+ df_str = df.astype(str)
+
+ column_widths = [
+ max(df_str[col].apply(len)) * font_size * column_scaler
+ for col in df.columns
+ ]
+ column_widths[0] = 200
+
+ fig_width = sum(column_widths) * 1.1
+ # Create a table trace with the DataFrame data
+ table_trace = go.Table(
+ header=dict(values=list(df.columns),
+ font=dict(size=font_size)),
+ cells=dict(values=df_str.values.T,
+ font=dict(size=font_size),
+ align='left'),
+ columnwidth=column_widths
+ )
+
+ layout = go.Layout(
+ autosize=True
+ )
+
+ fig = go.Figure(
+ data=[table_trace],
+ layout=layout
+ )
+
+ return fig
+
+ def create_normalization_info(
+ adata,
+ low_quantile,
+ high_quantile,
+ input_layer,
+ output_layer
+ ):
+ pre_dataframe = adata.to_df(layer=input_layer)
+ quantiles = pre_dataframe.quantile([low_quantile, high_quantile])
+ new_row_names = {
+ high_quantile: 'quantile_high',
+ low_quantile: 'quantile_low'
+ }
+ quantiles.index = quantiles.index.map(new_row_names)
+
+ pre_info = pre_dataframe.describe()
+ pre_info = pd.concat([pre_info, quantiles])
+ pre_info = pre_info.reset_index()
+ pre_info['index'] = 'Pre-Norm: ' + pre_info['index'].astype(str)
+ del pre_dataframe
+
+ post_dataframe = adata.to_df(layer=output_layer)
+ post_info = post_dataframe.describe()
+ post_info = post_info.reset_index()
+ post_info['index'] = 'Post-Norm: ' + post_info['index'].astype(str)
+ del post_dataframe
+
+ normalization_info = pd.concat([pre_info, post_info]).transpose()
+ normalization_info.columns = normalization_info.iloc[0]
+ normalization_info = normalization_info.drop(
+ normalization_info.index[0]
+ )
+ normalization_info = normalization_info.astype(float)
+ normalization_info = normalization_info.round(3)
+ normalization_info = normalization_info.astype(str)
+
+ return normalization_info
+
+ logger.info(f"High quantile used: {str(high_quantile)}")
+ logger.info(f"Low quantile used: {str(low_quantile)}")
+
+ transformed_data = normalize_features(
+ adata=adata,
+ low_quantile=low_quantile,
+ high_quantile=high_quantile,
+ interpolation=interpolation,
+ input_layer=input_layer,
+ output_layer=output_layer,
+ per_batch=per_batch,
+ annotation=annotation
+ )
+
+ logger.info(f"Transformed data stored in layer: {output_layer}")
+ dataframe = pd.DataFrame(transformed_data.layers[output_layer])
+ logger.info(f"Transform summary:\n{dataframe.describe()}")
+
+ normalization_info = create_normalization_info(
+ adata,
+ low_quantile,
+ high_quantile,
+ input_layer,
+ output_layer
+ )
+
+ columns_to_plot = [
+ 'index', 'Pre-Norm: mean', 'Pre-Norm: std',
+ 'Pre-Norm: quantile_high', 'Pre-Norm: quantile_low',
+ 'Post-Norm: mean', 'Post-Norm: std',
+ ]
+
+ html_plot = df_as_html(
+ normalization_info,
+ columns_to_plot
+ )
+
+ if show_plot:
+ html_plot.show()
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ # Add analysis output (single file)
+ if "analysis" in params["outputs"]:
+ results_dict["analysis"] = transformed_data
+
+ # Add HTML output (directory)
+ if "html" in params["outputs"]:
+ results_dict["html"] = {"normalization_summary": html_plot}
+
+ # Use centralized save_results function
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ logger.info("Quantile Scaling analysis completed successfully.")
+ return saved_files
+ else:
+ # Return the adata object and figure directly for in-memory workflows
+ logger.info("Returning AnnData object and figure for in-memory use")
+ return transformed_data, html_plot
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python quantile_scaling_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ # Run analysis
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ # Display results based on return type
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ adata, html_plot = result
+ print("\nReturned AnnData object and figure for in-memory use")
+ print(f"AnnData shape: {adata.shape}")
+ print(f"Output layer: {list(adata.layers.keys())}")
diff --git a/src/spac/templates/relational_heatmap_template.py b/src/spac/templates/relational_heatmap_template.py
new file mode 100644
index 00000000..2087f5ff
--- /dev/null
+++ b/src/spac/templates/relational_heatmap_template.py
@@ -0,0 +1,246 @@
+"""
+Relational Heatmap with Plotly-matplotlib color synchronization.
+Extracts actual colors from Plotly and uses them in matplotlib.
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union, Tuple
+import pandas as pd
+import numpy as np
+import matplotlib
+matplotlib.use('Agg')
+import matplotlib.pyplot as plt
+import matplotlib.colors as mcolors
+import plotly.io as pio
+import plotly.express as px
+
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.visualization import relational_heatmap
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+ text_to_value,
+)
+
+
+def get_plotly_colorscale_as_matplotlib(plotly_colormap: str) -> mcolors.LinearSegmentedColormap:
+ """
+ Extract actual colors from Plotly colorscale and create matplotlib colormap.
+ This ensures exact color matching between Plotly and matplotlib.
+ """
+ # Get Plotly's colorscale
+ try:
+ # Use plotly express to get the actual color sequence
+ colorscale = getattr(px.colors.sequential, plotly_colormap, None)
+ if colorscale is None:
+ colorscale = getattr(px.colors.diverging, plotly_colormap, None)
+ if colorscale is None:
+ colorscale = getattr(px.colors.cyclical, plotly_colormap, None)
+
+ if colorscale is None:
+ # Fallback to a default
+ print(f"Warning: Could not find Plotly colorscale '{plotly_colormap}', using default")
+ colorscale = px.colors.sequential.Viridis
+
+ # Convert to matplotlib colormap
+ if isinstance(colorscale, list):
+ # Create custom colormap from color list
+ cmap = mcolors.LinearSegmentedColormap.from_list(
+ f"plotly_{plotly_colormap}",
+ colorscale
+ )
+ return cmap
+ except Exception as e:
+ print(f"Error extracting Plotly colors: {e}")
+
+ # Fallback to matplotlib's viridis
+ return plt.cm.viridis
+
+
+def create_matplotlib_heatmap_matching_plotly(
+ data: pd.DataFrame,
+ plotly_fig: Any,
+ source_annotation: str,
+ target_annotation: str,
+ colormap_name: str,
+ figsize: tuple,
+ dpi: int,
+ font_size: int
+) -> plt.Figure:
+ """
+ Create matplotlib heatmap that matches Plotly's appearance.
+ Extracts color information from the Plotly figure.
+ """
+ fig, ax = plt.subplots(figsize=figsize, dpi=dpi)
+
+ # Get the actual colormap from Plotly
+ cmap = get_plotly_colorscale_as_matplotlib(colormap_name)
+
+ # Extract data range from Plotly figure if possible
+ try:
+ zmin = plotly_fig.data[0].zmin if hasattr(plotly_fig.data[0], 'zmin') else data.min().min()
+ zmax = plotly_fig.data[0].zmax if hasattr(plotly_fig.data[0], 'zmax') else data.max().max()
+ except:
+ zmin, zmax = data.min().min(), data.max().max()
+
+ # Create heatmap matching Plotly's style
+ im = ax.imshow(
+ data.values,
+ aspect='auto',
+ cmap=cmap,
+ interpolation='nearest',
+ vmin=zmin,
+ vmax=zmax
+ )
+
+ # Match Plotly's tick placement
+ ax.set_xticks(np.arange(len(data.columns)))
+ ax.set_yticks(np.arange(len(data.index)))
+ ax.set_xticklabels(data.columns, rotation=45, ha='right', fontsize=font_size)
+ ax.set_yticklabels(data.index, fontsize=font_size)
+
+ # Add colorbar
+ cbar = plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04)
+ cbar.set_label('Count', fontsize=font_size)
+ cbar.ax.tick_params(labelsize=font_size)
+
+ # Title matching Plotly
+ ax.set_title(
+ f'Relational Heatmap: {source_annotation} vs {target_annotation}',
+ fontsize=font_size + 2,
+ pad=20
+ )
+ ax.set_xlabel(target_annotation, fontsize=font_size)
+ ax.set_ylabel(source_annotation, fontsize=font_size)
+
+ # Add grid for clarity (like Plotly)
+ ax.set_xticks(np.arange(len(data.columns) + 1) - 0.5, minor=True)
+ ax.set_yticks(np.arange(len(data.index) + 1) - 0.5, minor=True)
+ ax.grid(which='minor', color='gray', linestyle='-', linewidth=0.3, alpha=0.3)
+ ax.tick_params(which='both', length=0)
+
+ plt.tight_layout()
+ return fig
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+ show_static_image: bool = False
+) -> Union[Dict, Tuple]:
+ """Execute Relational Heatmap with color-matched outputs.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to parameters JSON file or dict of parameters.
+ save_to_disk : bool, default True
+ Whether to save results to disk.
+ output_dir : str, optional
+ Output directory. If None, read from params.
+ show_static_image : bool, default False
+ When True, generate a static PNG figure using matplotlib.
+ When False (default), only produce interactive HTML output.
+ Disabled by default because Plotly HTML-to-PNG conversion
+ hangs inside the Galaxy container environment.
+ """
+
+ params = parse_params(json_path)
+
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ if "outputs" not in params:
+ params["outputs"] = {
+ "figures": {"type": "directory", "name": "figures_dir"},
+ "html": {"type": "directory", "name": "html_dir"},
+ "dataframe": {"type": "file", "name": "dataframe.csv"}
+ }
+
+ # Load data
+ adata = load_input(params["Upstream_Analysis"])
+ print(f"Data loaded: {adata.shape[0]} cells, {adata.shape[1]} genes")
+
+ # Parameters
+ source_annotation = text_to_value(params.get("Source_Annotation_Name", "None"))
+ target_annotation = text_to_value(params.get("Target_Annotation_Name", "None"))
+
+ dpi = float(params.get("Figure_DPI", 300))
+ width_in = float(params.get("Figure_Width_inch", 8))
+ height_in = float(params.get("Figure_Height_inch", 10))
+ font_size = float(params.get("Font_Size", 8))
+ colormap = params.get("Colormap", "darkmint")
+
+ print(f"Creating heatmap: {source_annotation} vs {target_annotation}")
+
+ # Run SPAC relational heatmap
+ result_dict = relational_heatmap(
+ adata=adata,
+ source_annotation=source_annotation,
+ target_annotation=target_annotation,
+ color_map=colormap,
+ font_size=font_size
+ )
+
+ rhmap_data = result_dict['data']
+ plotly_fig = result_dict['figure']
+
+ # Update Plotly figure
+ if plotly_fig:
+ plotly_fig.update_layout(
+ width=width_in * 96,
+ height=height_in * 96,
+ font=dict(size=font_size)
+ )
+
+ if save_to_disk:
+ results_dict = {
+ "html": {"relational_heatmap": pio.to_html(plotly_fig, full_html=True, include_plotlyjs='cdn')},
+ "dataframe": rhmap_data
+ }
+
+ if show_static_image:
+ # Generate static matplotlib figure matching Plotly colors.
+ # Disabled by default on Galaxy because Plotly HTML-to-PNG
+ # conversion hangs in the Galaxy container environment.
+ print("Creating color-matched matplotlib figure...")
+ static_fig = create_matplotlib_heatmap_matching_plotly(
+ rhmap_data,
+ plotly_fig,
+ source_annotation,
+ target_annotation,
+ colormap,
+ (width_in, height_in),
+ int(dpi),
+ int(font_size)
+ )
+ results_dict["figures"] = {"relational_heatmap": static_fig}
+
+ saved_files = save_results(results_dict, params, output_base_dir=output_dir)
+
+ if show_static_image:
+ plt.close(static_fig)
+
+ print("✓ Relational Heatmap completed")
+ return saved_files
+ else:
+ return plotly_fig, rhmap_data
+
+
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print("Usage: python relational_heatmap_template.py ", file=sys.stderr)
+ sys.exit(1)
+
+ try:
+ run_from_json(sys.argv[1], save_to_disk=True)
+ sys.exit(0)
+ except Exception as e:
+ print(f"ERROR: {e}", file=sys.stderr)
+ import traceback
+ traceback.print_exc()
+ sys.exit(1)
diff --git a/src/spac/templates/rename_labels_template.py b/src/spac/templates/rename_labels_template.py
new file mode 100644
index 00000000..5527e3b7
--- /dev/null
+++ b/src/spac/templates/rename_labels_template.py
@@ -0,0 +1,173 @@
+"""
+Platform-agnostic Rename Labels template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Follows standardized output schema.
+
+Usage
+-----
+>>> from spac.templates.rename_labels_template import run_from_json
+>>> run_from_json("examples/rename_labels_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union
+import logging
+import pandas as pd
+import pickle
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.transformations import rename_annotations
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+ text_to_value,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, str], Any]:
+ """
+ Execute Rename Labels analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Analysis": "path/to/data.pickle",
+ "Cluster_Mapping_Dictionary": "path/to/mapping.csv",
+ "Source_Annotation": "original_column",
+ "New_Annotation": "new_column",
+ "outputs": {
+ "analysis": {"type": "file", "name": "renamed_data.pickle"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If False, returns the adata object
+ directly for in-memory workflows. Default is True.
+ output_dir : str, optional
+ Override output directory from params. Default uses params value.
+
+ Returns
+ -------
+ dict or AnnData
+ If save_to_disk=True: Dictionary of saved file paths
+ If save_to_disk=False: The processed AnnData object
+ """
+ # Set up logging
+ logging.basicConfig(level=logging.INFO)
+ logger = logging.getLogger(__name__)
+
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Load the upstream analysis data
+ logger.info(f"Loading upstream analysis data from {params['Upstream_Analysis']}")
+ all_data = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters
+ rename_list_path = params["Cluster_Mapping_Dictionary"]
+ original_column = params.get("Source_Annotation", "None")
+ renamed_column = params.get("New_Annotation", "None")
+
+ # Load the mapping dictionary CSV
+ logger.info(f"Loading cluster mapping dictionary from {rename_list_path}")
+ rename_list = pd.read_csv(rename_list_path)
+
+ original_column = text_to_value(original_column)
+ renamed_column = text_to_value(renamed_column)
+
+ # Create a new dictionary with the desired format
+ dict_list = rename_list.to_dict('records')
+ mappings = {d['Original']: d['New'] for d in dict_list}
+
+ logger.info(f"Cluster Name Mapping: \n{mappings}")
+
+ rename_annotations(
+ all_data,
+ src_annotation=original_column,
+ dest_annotation=renamed_column,
+ mappings=mappings)
+
+ logger.info(f"After Renaming Clusters: \n{all_data}")
+
+ # Count and display occurrences of each label in the annotation
+ logger.info(f'Count of cells in the output annotation:"{renamed_column}":')
+ label_counts = all_data.obs[renamed_column].value_counts()
+ logger.info(f"{label_counts}")
+
+ object_to_output = all_data
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ # Add analysis output (single file)
+ if "analysis" in params["outputs"]:
+ results_dict["analysis"] = object_to_output
+
+ # Use centralized save_results function
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ logger.info("Rename Labels analysis completed successfully.")
+ return saved_files
+ else:
+ # Return the adata object directly for in-memory workflows
+ logger.info("Returning AnnData object for in-memory use")
+ return object_to_output
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python rename_labels_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ # Run analysis
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ # Display results based on return type
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ print("\nReturned AnnData object")
+ print(f"AnnData shape: {result.shape}")
+ print(f"Observations columns: {list(result.obs.columns)}")
diff --git a/src/spac/templates/ripley_l_calculation_template.py b/src/spac/templates/ripley_l_calculation_template.py
new file mode 100644
index 00000000..68b12812
--- /dev/null
+++ b/src/spac/templates/ripley_l_calculation_template.py
@@ -0,0 +1,151 @@
+"""
+Platform-agnostic Ripley-L template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Usage
+-----
+>>> from spac.templates.ripley_l_template import run_from_json
+>>> run_from_json("examples/ripley_l_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union, List, Optional
+import logging
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.spatial_analysis import ripley_l
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+ text_to_value,
+ convert_to_floats
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: Optional[Union[str, Path]] = None
+) -> Union[Dict[str, str], Any]:
+ """
+ Execute Ripley-L analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary
+ save_to_disk : bool, optional
+ Whether to save results to file. If False, returns the adata object
+ directly for in-memory workflows. Default is True.
+ output_dir : str or Path, optional
+ Directory for outputs. If None, uses current directory.
+
+ Returns
+ -------
+ dict or AnnData
+ If save_to_disk=True: Dictionary of saved file paths
+ If save_to_disk=False: The processed AnnData object
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Load the upstream analysis data
+ adata = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters
+ radii = params["Radii"]
+ annotation = params["Annotation"]
+ phenotypes = [params["Center_Phenotype"], params["Neighbor_Phenotype"]]
+ regions = params.get("Stratify_By", "None")
+ n_simulations = params.get("Number_of_Simulations", 100)
+ area = params.get("Area", "None")
+ seed = params.get("Seed", 42)
+ spatial_key = params.get("Spatial_Key", "spatial")
+ edge_correction = params.get("Edge_Correction", True)
+
+ # Process parameters
+ regions = text_to_value(
+ regions,
+ default_none_text="None"
+ )
+
+ area = text_to_value(
+ area,
+ default_none_text="None",
+ value_to_convert_to=None,
+ to_float=True,
+ param_name='Area'
+ )
+
+ # Convert radii to floats
+ radii = convert_to_floats(radii)
+
+ # Run the analysis
+ ripley_l(
+ adata,
+ annotation=annotation,
+ phenotypes=phenotypes,
+ distances=radii,
+ regions=regions,
+ n_simulations=n_simulations,
+ area=area,
+ seed=seed,
+ spatial_key=spatial_key,
+ edge_correction=edge_correction
+ )
+
+ logging.info("Ripley-L analysis completed successfully.")
+ logging.debug(f"AnnData object: {adata}")
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ if "analysis" in params["outputs"]:
+ results_dict["analysis"] = adata
+
+ # Use centralized save_results function
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ logging.info(f"Ripley-L completed → {saved_files['analysis']}")
+ return saved_files
+ else:
+ # Return the adata object directly for in-memory workflows
+ logging.info("Returning AnnData object (not saving to file)")
+ return adata
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print("Usage: python ripley_l_template.py ", file=sys.stderr)
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ # Run analysis
+ result = run_from_json(sys.argv[1], output_dir=output_dir)
+
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for filename, filepath in result.items():
+ print(f" {filename}: {filepath}")
+ else:
+ print("\nReturned AnnData object")
diff --git a/src/spac/templates/sankey_plot_template.py b/src/spac/templates/sankey_plot_template.py
new file mode 100644
index 00000000..c34a2c81
--- /dev/null
+++ b/src/spac/templates/sankey_plot_template.py
@@ -0,0 +1,236 @@
+"""
+Production version of Sankey Plot template for Galaxy.
+save files only, no show() calls, no blocking operations.
+"""
+import json
+import sys
+import os
+from pathlib import Path
+from typing import Any, Dict, List, Union, Optional, Tuple
+import pandas as pd
+import matplotlib
+# Set non-interactive backend for Galaxy
+matplotlib.use('Agg')
+import matplotlib.pyplot as plt
+import plotly.io as pio
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.visualization import sankey_plot
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+ text_to_value,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True, # Always True for Galaxy
+ output_dir: str = None,
+ show_static_image: bool = False,
+) -> Union[Dict[str, Union[str, List[str]]], None]:
+ """
+ Execute Sankey Plot analysis for Galaxy.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to parameters JSON file or dict of parameters.
+ save_to_disk : bool, default True
+ Whether to save results to disk. Always True for Galaxy.
+ output_dir : str, optional
+ Output directory. If None, read from params.
+ show_static_image : bool, default False
+ When True, generate a static PNG placeholder figure.
+ When False (default), only produce interactive HTML output.
+ Disabled by default because Plotly HTML-to-PNG conversion
+ hangs inside the Galaxy container environment.
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+ print(f"Loaded parameters for {params.get('Source_Annotation_Name')} -> {params.get('Target_Annotation_Name')}")
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ if "outputs" not in params:
+ params["outputs"] = {
+ "figures": {"type": "directory", "name": "figures_dir"},
+ "html": {"type": "directory", "name": "html_dir"}
+ }
+
+ # Load the upstream analysis data
+ print("Loading upstream analysis data...")
+ adata = load_input(params["Upstream_Analysis"])
+ print(f"Data loaded: {adata.shape[0]} cells, {adata.shape[1]} genes")
+
+ # Extract parameters
+ annotation_columns = [
+ params.get("Source_Annotation_Name", "None"),
+ params.get("Target_Annotation_Name", "None")
+ ]
+
+ # Parse numeric parameters with error handling
+ try:
+ dpi = float(params.get("Figure_DPI", 300))
+ except (ValueError, TypeError):
+ dpi = 300
+ print(f"Warning: Invalid DPI value, using default {dpi}")
+
+ width_num = float(params.get("Figure_Width_inch", 6))
+ height_num = float(params.get("Figure_Height_inch", 6))
+
+ source_color_map = params.get("Source_Annotation_Color_Map", "tab20")
+ target_color_map = params.get("Target_Annotation_Color_Map", "tab20b")
+
+ try:
+ sankey_font = float(params.get("Font_Size", 12))
+ except (ValueError, TypeError):
+ sankey_font = 12
+ print(f"Warning: Invalid font size, using default {sankey_font}")
+
+ target_annotation = text_to_value(annotation_columns[1])
+ source_annotation = text_to_value(annotation_columns[0])
+
+ print(f"Creating Sankey plot: {source_annotation} -> {target_annotation}")
+
+ # Execute the sankey plot
+ fig = sankey_plot(
+ adata=adata,
+ source_annotation=source_annotation,
+ target_annotation=target_annotation,
+ source_color_map=source_color_map,
+ target_color_map=target_color_map,
+ sankey_font=sankey_font
+ )
+
+ # Customize the Sankey diagram layout
+ width_in_pixels = width_num * dpi
+ height_in_pixels = height_num * dpi
+
+ fig.update_layout(
+ width=width_in_pixels,
+ height=height_in_pixels
+ )
+
+ print("Sankey plot generated")
+
+ # IMPORTANT: No show() calls — causes hang in Galaxy
+ # plt.show() - REMOVED
+ # fig.show() - REMOVED
+
+ # Handle saving — always save to disk for Galaxy
+ if save_to_disk:
+ # Prepare results dictionary
+ results_dict = {}
+
+ # Save Plotly HTML (the actual interactive Sankey diagram)
+ if "html" in params["outputs"]:
+ html_content = pio.to_html(fig, full_html=True, include_plotlyjs='cdn')
+ results_dict["html"] = {"sankey_plot": html_content}
+ print("Plotly HTML prepared for saving")
+
+ if show_static_image:
+ # Generate a static matplotlib placeholder figure.
+ # Disabled by default on Galaxy because Plotly HTML-to-PNG
+ # conversion hangs in the Galaxy container environment.
+ # The interactive HTML is the first-class output.
+ print("Creating matplotlib figure...")
+ static_fig, ax = plt.subplots(
+ figsize=(width_num, height_num), dpi=dpi
+ )
+ ax.text(
+ 0.5, 0.6, 'Sankey Diagram',
+ ha='center', va='center', transform=ax.transAxes,
+ fontsize=16, fontweight='bold'
+ )
+ ax.text(
+ 0.5, 0.5,
+ f'{source_annotation} → {target_annotation}',
+ ha='center', va='center', transform=ax.transAxes,
+ fontsize=12
+ )
+ ax.text(
+ 0.5, 0.3,
+ 'View HTML output for interactive diagram',
+ ha='center', va='center', transform=ax.transAxes,
+ fontsize=10, style='italic'
+ )
+ ax.axis('off')
+ ax.add_patch(plt.Rectangle(
+ (0.1, 0.2), 0.8, 0.5,
+ fill=False, edgecolor='gray', linewidth=1,
+ transform=ax.transAxes
+ ))
+
+ if "figures" in params["outputs"]:
+ results_dict["figures"] = {"sankey_plot": static_fig}
+ print("Matplotlib figure prepared for saving")
+
+ # Use centralized save_results function
+ print("Saving all results...")
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ if show_static_image:
+ plt.close(static_fig)
+
+ print(f"✓ Sankey Plot completed successfully")
+ print(f" Outputs saved: {list(saved_files.keys())}")
+
+ return saved_files
+ else:
+ # For non-Galaxy use (testing)
+ print("Returning None (display mode not supported)")
+ return None
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python sankey_plot_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ print("\n" + "="*60)
+ print("SANKEY PLOT - GALAXY PRODUCTION VERSION")
+ print("="*60 + "\n")
+
+ try:
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir,
+ save_to_disk=True # Always save for Galaxy
+ )
+
+ if isinstance(result, dict):
+ print("\nOutput files generated:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+
+ print("\n✓ SUCCESS - Job completed without hanging")
+ sys.exit(0)
+
+ except Exception as e:
+ print(f"\n✗ ERROR: {e}", file=sys.stderr)
+ import traceback
+ traceback.print_exc()
+ sys.exit(1)
diff --git a/src/spac/templates/select_values_template.py b/src/spac/templates/select_values_template.py
new file mode 100644
index 00000000..e84723a4
--- /dev/null
+++ b/src/spac/templates/select_values_template.py
@@ -0,0 +1,204 @@
+"""
+Platform-agnostic Select Values template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Reads outputs configuration from blueprint JSON file.
+
+Usage
+-----
+>>> from spac.templates.select_values_template import run_from_json
+>>> run_from_json("examples/select_values_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union, Tuple
+import pandas as pd
+import warnings
+import logging
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.data_utils import select_values
+from spac.templates.template_utils import (
+ save_results,
+ parse_params,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, str], pd.DataFrame]:
+ """
+ Execute Select Values analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Dataset": "path/to/dataframe.csv",
+ "Annotation_of_Interest": "cell_type",
+ "Label_s_of_Interest": ["T cells", "B cells"],
+ "outputs": {
+ "dataframe": {"type": "file", "name": "dataframe.csv"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If True, saves the filtered DataFrame
+ to a CSV file. If False, returns the DataFrame directly for in-memory
+ workflows. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory. All outputs will be saved relative to this directory.
+
+ Returns
+ -------
+ dict or DataFrame
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {
+ "dataframe": "path/to/dataframe.csv"
+ }
+ If save_to_disk=False: The filtered DataFrame
+
+ Notes
+ -----
+ Output Structure:
+ - DataFrame is saved as a single CSV file
+ - When save_to_disk=False, the DataFrame is returned for programmatic use
+
+ Examples
+ --------
+ >>> # Save results to disk
+ >>> saved_files = run_from_json("params.json")
+ >>> print(saved_files["dataframe"]) # Path to saved CSV file
+
+ >>> # Get results in memory
+ >>> filtered_df = run_from_json("params.json", save_to_disk=False)
+
+ >>> # Custom output directory
+ >>> saved = run_from_json("params.json", output_dir="/custom/path")
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ # DataFrames typically use file type per standardized schema
+ if "outputs" not in params:
+ params["outputs"] = {
+ "dataframe": {"type": "file", "name": "dataframe.csv"}
+ }
+
+ # Load upstream data - could be DataFrame, CSV
+ upstream_dataset = params["Upstream_Dataset"]
+
+ if isinstance(upstream_dataset, pd.DataFrame):
+ input_dataset = upstream_dataset # Direct DataFrame from previous step
+ elif isinstance(upstream_dataset, (str, Path)):
+ try:
+ input_dataset = pd.read_csv(upstream_dataset)
+ except Exception as e:
+ raise ValueError(f"Failed to read CSV from {upstream_dataset}: {e}")
+ else:
+ raise TypeError(
+ f"Upstream_Dataset must be DataFrame or file path. "
+ f"Got {type(upstream_dataset)}"
+ )
+
+ # Extract parameters - support both "Label_s_of_Interest" and "Labels_of_Interest"
+ # for backward compatibility with JSON template
+ observation = params.get("Annotation_of_Interest")
+ values = params.get("Label_s_of_Interest") or params.get("Labels_of_Interest")
+
+ with warnings.catch_warnings(record=True) as caught_warnings:
+ warnings.simplefilter("always")
+ filtered_dataset = select_values(
+ data=input_dataset,
+ annotation=observation,
+ values=values
+ )
+ # Only process warnings that are relevant to the select_values operation
+ if caught_warnings:
+ for warning in caught_warnings:
+ # Skip deprecation warnings from numpy/pandas
+ if (hasattr(warning, 'category') and
+ issubclass(warning.category, DeprecationWarning)):
+ continue
+ # Raise actual operational warnings as errors
+ if hasattr(warning, 'message'):
+ raise ValueError(str(warning.message))
+
+ logging.info(filtered_dataset.info())
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ # Check for dataframe output
+ if "dataframe" in params["outputs"]:
+ results_dict["dataframe"] = filtered_dataset
+
+ # Use centralized save_results function
+ # All file handling and logging is now done by save_results
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ logging.info("Select Values analysis completed successfully.")
+ return saved_files
+ else:
+ # Return the dataframe directly for in-memory workflows
+ logging.info("Returning DataFrame for in-memory use")
+ return filtered_dataset
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python select_values_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ # Run analysis
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ # Display results based on return type
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ print("\nReturned DataFrame")
+ print(f"DataFrame shape: {result.shape}")
diff --git a/src/spac/templates/setup_analysis_template.py b/src/spac/templates/setup_analysis_template.py
new file mode 100644
index 00000000..8bc8100b
--- /dev/null
+++ b/src/spac/templates/setup_analysis_template.py
@@ -0,0 +1,233 @@
+"""
+Platform-agnostic Setup Analysis template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Follows standardized output schema where analysis is saved as a file.
+
+Usage
+-----
+>>> from spac.templates.setup_analysis_template import run_from_json
+>>> run_from_json("examples/setup_analysis_params.json")
+"""
+
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union
+import pandas as pd
+import ast
+import logging
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.data_utils import ingest_cells
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+ text_to_value
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, str], Any]:
+ """
+ Execute Setup Analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Dataset": "path/to/data.csv",
+ "Features_to_Analyze": ["CD25", "CD3D"],
+ "Feature_Regex": [],
+ "X_Coordinate_Column": "X_centroid",
+ "Y_Coordinate_Column": "Y_centroid",
+ "Annotation_s_": ["cell_type"],
+ "outputs": {
+ "analysis": {"type": "file", "name": "output.pickle"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If True, saves the AnnData object
+ to a pickle file. If False, returns the AnnData object directly
+ for in-memory workflows. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory. All outputs will be saved relative to this directory.
+
+ Returns
+ -------
+ dict or AnnData
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {"analysis": "path/to/output.pickle"}
+ If save_to_disk=False: The processed AnnData object for in-memory use
+
+ Notes
+ -----
+ Output Structure:
+ - Analysis output is saved as a single pickle file (standardized for analysis outputs)
+ - When save_to_disk=False, the AnnData object is returned for programmatic use
+
+ Examples
+ --------
+ >>> # Save results to disk
+ >>> saved_files = run_from_json("params.json")
+ >>> print(saved_files["analysis"]) # Path to saved pickle file
+ >>> # './output.pickle'
+
+ >>> # Get results in memory for further processing
+ >>> adata = run_from_json("params.json", save_to_disk=False)
+ >>> # Can now work with adata object directly
+
+ >>> # Custom output directory
+ >>> saved = run_from_json("params.json", output_dir="/custom/path")
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Ensure outputs configuration exists with standardized defaults
+ # Analysis uses file type per standardized schema
+ if "outputs" not in params:
+ # Get output filename from params or use default
+ output_file = params.get("Output_File", "output.pickle")
+ if not output_file.endswith(('.pickle', '.pkl', '.h5ad')):
+ output_file = output_file + '.pickle'
+
+ params["outputs"] = {
+ "analysis": {"type": "file", "name": output_file}
+ }
+
+ # Extract parameters
+ upstream_dataset = params["Upstream_Dataset"]
+ feature_names = params["Features_to_Analyze"]
+ regex_str = params.get("Feature_Regex", [])
+ x_col = params["X_Coordinate_Column"]
+ y_col = params["Y_Coordinate_Column"]
+ annotation = params["Annotation_s_"]
+
+ # Load upstream data - could be DataFrame or CSV
+ if isinstance(upstream_dataset, (str, Path)):
+ try:
+ input_dataset = pd.read_csv(upstream_dataset)
+ # Validate it's a proper DataFrame
+ if input_dataset.empty:
+ raise ValueError("CSV file is empty")
+ except Exception as e:
+ raise ValueError(f"Failed to read CSV from {upstream_dataset}: {e}")
+ else:
+ # Already a DataFrame
+ input_dataset = upstream_dataset
+
+ # Process annotation parameter
+ if isinstance(annotation, str):
+ annotation = [annotation]
+
+ if len(annotation) == 1 and annotation[0] == "None":
+ annotation = None
+
+ if annotation and len(annotation) != 1 and "None" in annotation:
+ error_msg = 'String "None" found in the annotation list'
+ raise ValueError(error_msg)
+
+ # Process coordinate columns
+ x_col = text_to_value(x_col, default_none_text="None")
+ y_col = text_to_value(y_col, default_none_text="None")
+
+ # Process feature names and regex
+ if isinstance(feature_names, str):
+ feature_names = [feature_names]
+ if isinstance(regex_str, str):
+ try:
+ regex_str = ast.literal_eval(regex_str)
+ except (ValueError, SyntaxError):
+ regex_str = [regex_str] if regex_str else []
+
+ # Processing two search methods
+ for feature in feature_names:
+ regex_str.append(f"^{feature}$")
+
+ # Sanitizing search list
+ regex_str_set = set(regex_str)
+ regex_str_list = list(regex_str_set)
+
+ # Run the ingestion
+ ingested_anndata = ingest_cells(
+ dataframe=input_dataset,
+ regex_str=regex_str_list,
+ x_col=x_col,
+ y_col=y_col,
+ annotation=annotation
+ )
+
+ logging.info("Analysis Setup:")
+ logging.info(f"{ingested_anndata}")
+ logging.info("Schema:")
+ logging.info(f"{ingested_anndata.var_names.tolist()}")
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ if "analysis" in params["outputs"]:
+ results_dict["analysis"] = ingested_anndata
+
+ # Use centralized save_results function
+ # All file handling and logging is now done by save_results
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ logging.info(
+ f"Setup Analysis completed → {saved_files['analysis']}"
+ )
+ return saved_files
+ else:
+ # Return the adata object directly for in-memory workflows
+ logging.info("Returning AnnData object (not saving to file)")
+ return ingested_anndata
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python setup_analysis_template.py ",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ # Run analysis
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ # Display results based on return type
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, path in result.items():
+ print(f" {key}: {path}")
+ else:
+ print("\nReturned AnnData object for in-memory use")
+ print(f"AnnData: {result}")
+ print(f"Shape: {result.shape}")
\ No newline at end of file
diff --git a/src/spac/templates/spatial_interaction_template.py b/src/spac/templates/spatial_interaction_template.py
new file mode 100644
index 00000000..bee16a41
--- /dev/null
+++ b/src/spac/templates/spatial_interaction_template.py
@@ -0,0 +1,324 @@
+"""
+Platform-agnostic Spatial Interaction template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Follows standardized output schema where figures are saved as directories.
+
+Usage
+-----
+>>> from spac.templates.spatial_interaction_template import run_from_json
+>>> run_from_json("examples/spatial_interaction_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union, List, Optional, Tuple
+import pandas as pd
+import numpy as np
+from PIL import Image
+from pprint import pprint
+import matplotlib.pyplot as plt
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.spatial_analysis import spatial_interaction
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+ text_to_value,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ show_plot: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, Union[str, List[str]]], Tuple[List[Any], Dict[str, pd.DataFrame]]]:
+ """
+ Execute Spatial Interaction analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Analysis": "path/to/data.pickle",
+ "Annotation": "cell_type",
+ "Spatial_Analysis_Method": "Neighborhood Enrichment",
+ "outputs": {
+ "figures": {"type": "directory", "name": "figures_dir"},
+ "dataframe": {"type": "file", "name": "dataframe.csv"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If True, saves figures to a directory
+ and matrices to CSV files using centralized save_results. Default is True.
+ show_plot : bool, optional
+ Whether to display the plot. Default is True.
+ output_dir : str or Path, optional
+ Base directory for outputs. If None, uses params['Output_Directory'] or '.'
+
+ Returns
+ -------
+ dict or tuple
+ If save_to_disk=True: Dictionary mapping output types to saved file paths
+ If save_to_disk=False: Tuple of (figures_list, matrices_dict) for in-memory use
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Load the upstream analysis data
+ adata = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters
+ annotation = params["Annotation"]
+ analysis_method = params["Spatial_Analysis_Method"]
+ # Two analysis methods available:
+ # 1. "Neighborhood Enrichment": Calculates how often pairs of cell types
+ # are neighbors compared to random chance. Positive scores indicate
+ # attraction/co-location, negative scores indicate avoidance.
+ # Output: z-scores (can be positive or negative)
+ # Files: neighborhood_enrichment_{identifier}.csv
+ # 2. "Cluster Interaction Matrix": Counts the number of edges/connections
+ # between different cell types in the spatial neighborhood graph.
+ # Shows absolute interaction frequencies rather than enrichment.
+ # Output: raw counts (always positive integers)
+ # Files: cluster_interaction_matrix_{identifier}.csv
+ # Both methods produce the same data structure, just different values
+ stratify_by = params.get("Stratify_By", ["None"])
+ seed = params.get("Seed", "None")
+ coord_type = params.get("Coordinate_Type", "None")
+ n_rings = 1
+ n_neighs = params.get("K_Nearest_Neighbors", 6)
+ radius = params.get("Radius", "None")
+ image_width = params.get("Figure_Width", 15)
+ image_height = params.get("Figure_Height", 12)
+ dpi = params.get("Figure_DPI", 200)
+ font_size = params.get("Font_Size", 12)
+ color_bar_range = params.get("Color_Bar_Range", "Automatic")
+
+ def save_matrix(matrix):
+ for file_name in matrix:
+ data_df = matrix[file_name]
+ print("\n")
+ print(file_name)
+ print(data_df)
+ # In SPAC, collect matrices for later saving instead of
+ # direct file write. Store them with proper extension if missing.
+ if not file_name.endswith('.csv'):
+ file_name = f"{file_name}.csv"
+ matrices[file_name] = data_df
+
+ def update_nidap_display(
+ axs,
+ image_width,
+ image_height,
+ dpi,
+ font_size
+ ):
+ # NIDAP display logic is different than the generic python
+ # image output. For example, a 12in*8in image with font 12
+ # should properly display all text in generic Image
+ # But in nidap code workbook resizing, the text will be reduced.
+ # This function is to adjust the image sizing and font sizing
+ # to fit the NIDAP display
+ # Get the figure associated with the axes
+ fig = axs.get_figure()
+
+ # Set figure size and DPI
+ fig.set_size_inches(image_width, image_height)
+ fig.set_dpi(dpi)
+
+ # Customize font sizes
+ axs.title.set_fontsize(font_size) # Title font size
+ axs.xaxis.label.set_fontsize(font_size) # X-axis label font size
+ axs.yaxis.label.set_fontsize(font_size) # Y-axis label font size
+ axs.tick_params(axis='both', labelsize=font_size) # Tick labels
+ # Return the updated figure and axes for chaining or further use
+ # Note: This adjustment was specific to NIDAP display resizing
+ # behavior and may not be necessary in other environments
+ return fig, axs
+
+ for i, item in enumerate(stratify_by):
+ item_is_none = text_to_value(item)
+ if item_is_none is None and i == 0:
+ stratify_by = item_is_none
+ elif item_is_none is None and i != 0:
+ raise ValueError(
+ 'Found string "None" in the stratify by list that is '
+ 'not the first entry.\n'
+ 'Please remove the "None" to proceed with the list of '
+ 'stratify by options, \n'
+ 'or move the "None" to start of the list to disable '
+ 'stratification. Thank you.')
+
+ seed = text_to_value(seed, to_int=True)
+ radius = text_to_value(radius, to_float=True)
+ coord_type = text_to_value(coord_type)
+ color_bar_range = text_to_value(
+ color_bar_range,
+ "Automatic",
+ to_float=True)
+
+ if color_bar_range is not None:
+ cmap = "seismic"
+ vmin = -abs(color_bar_range)
+ vmax = abs(color_bar_range)
+ else:
+ cmap = "seismic"
+ vmin = vmax = color_bar_range
+
+ plt.rcParams['font.size'] = font_size
+
+ result_dictionary = spatial_interaction(
+ adata=adata,
+ annotation=annotation,
+ analysis_method=analysis_method,
+ stratify_by=stratify_by,
+ return_matrix=True,
+ seed=seed,
+ coord_type=coord_type,
+ n_rings=n_rings,
+ n_neighs=n_neighs,
+ radius=radius,
+ cmap=cmap,
+ vmin=vmin,
+ vmax=vmax,
+ figsize=(image_width, image_height),
+ dpi=dpi
+ )
+
+ # Track figures and matrices for optional saving
+ figures = []
+ matrices = {}
+
+ if not stratify_by:
+ axs = result_dictionary['Ax']
+ fig, axs = update_nidap_display(
+ axs=axs,
+ image_width=image_width,
+ image_height=image_height,
+ dpi=dpi,
+ font_size=font_size
+ )
+ figures.append(fig)
+ if show_plot:
+ plt.show()
+
+ matrix = result_dictionary['Matrix']['annotation']
+ save_matrix(matrix)
+ else:
+ plt.close(1)
+ axs_dict = result_dictionary['Ax']
+ for key in axs_dict:
+ axs = axs_dict[key]
+ fig, axs = update_nidap_display(
+ axs=axs,
+ image_width=image_width,
+ image_height=image_height,
+ dpi=dpi,
+ font_size=font_size
+ )
+ figures.append(fig)
+ if show_plot:
+ plt.show()
+
+ matrix_dict = result_dictionary['Matrix']
+ for identifier in matrix_dict:
+ matrix = matrix_dict[identifier]
+ save_matrix(matrix)
+
+ # Handle saving if requested (separate from NIDAP logic)
+ if save_to_disk:
+ # Ensure outputs configuration exists
+ if "outputs" not in params:
+ # Provide default outputs config if not present
+ params["outputs"] = {
+ "figures": {"type": "directory", "name": "figures"},
+ "dataframes": {"type": "directory", "name": "matrices"}
+ }
+
+ # Prepare results dictionary
+ results_dict = {}
+
+ # Package figures in a dictionary for directory saving
+ if figures:
+ # Store figures with meaningful names
+ figures_dict = {}
+ for i, fig in enumerate(figures):
+ # Extract title if available for better naming
+ try:
+ ax = fig.axes[0] if fig.axes else None
+ title = ax.get_title() if ax and ax.get_title() else f"interaction_plot_{i+1}"
+ # Clean title for filename
+ title = title.replace(" ", "_").replace("/", "_").replace(":", "")
+ figures_dict[f"{title}.png"] = fig
+ except:
+ figures_dict[f"interaction_plot_{i+1}.png"] = fig
+
+ results_dict["figures"] = figures_dict
+
+ # Add matrices (already have .csv extension added)
+ if matrices:
+ results_dict["dataframes"] = matrices
+
+ # Use centralized save_results function
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ # Close figures after saving to free memory
+ for fig in figures:
+ plt.close(fig)
+
+ print(
+ f"Spatial Interaction completed -> "
+ f"{list(saved_files.keys())}"
+ )
+ return saved_files
+ else:
+ # Return objects directly for in-memory workflows
+ return figures, matrices
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python spatial_interaction_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ # Run analysis
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ # Display results based on return type
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ figures_list, matrices_dict = result
+ print("\nReturned figures and matrices for in-memory use")
+ print(f"Number of figures: {len(figures_list)}")
+ print(f"Number of matrices: {len(matrices_dict)}")
diff --git a/src/spac/templates/spatial_plot_template.py b/src/spac/templates/spatial_plot_template.py
new file mode 100644
index 00000000..deb93239
--- /dev/null
+++ b/src/spac/templates/spatial_plot_template.py
@@ -0,0 +1,271 @@
+"""
+Platform-agnostic Spatial Plot template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Reads outputs configuration from blueprint JSON file.
+
+Usage
+-----
+>>> from spac.templates.spatial_plot_template import run_from_json
+>>> run_from_json("examples/spatial_plot_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union, List, Optional
+import matplotlib.pyplot as plt
+from functools import partial
+import logging
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.visualization import spatial_plot
+from spac.data_utils import select_values
+from spac.utils import check_annotation
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+ text_to_value,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ show_plots: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, Union[str, List[str]]], List[plt.Figure]]:
+ """
+ Execute Spatial Plot analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Analysis": "path/to/data.pickle",
+ "Stratify": true,
+ "Stratify_By": ["slide_id"],
+ "Color_By": "Annotation",
+ ...
+ "outputs": {
+ "figures": {"type": "directory", "name": "figures_dir"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If False, returns the figures
+ directly for in-memory workflows. Default is True.
+ show_plots : bool, optional
+ Whether to display the plots. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory.
+
+ Returns
+ -------
+ dict or list
+ If save_to_disk=True: Dictionary of saved file paths
+ If save_to_disk=False: List of matplotlib figures
+ """
+ # Set up logging
+ logging.basicConfig(level=logging.INFO)
+ logger = logging.getLogger(__name__)
+
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ if "outputs" not in params:
+ params["outputs"] = {
+ "figures": {"type": "directory", "name": "figures_dir"}
+ }
+
+ # Load the upstream analysis data
+ adata = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters exactly as in NIDAP template
+ annotation = params.get("Annotation_to_Highlight", "None")
+ feature = params.get("Feature_to_Highlight", "")
+ layer = params.get("Table", "Original")
+
+ alpha = params.get("Dot_Transparency", 0.5)
+ spot_size = params.get("Dot_Size", 25)
+ image_height = params.get("Figure_Height", 6)
+ image_width = params.get("Figure_Width", 12)
+ dpi = params.get("Figure_DPI", 200)
+ font_size = params.get("Font_Size", 12)
+ vmin = params.get("Lower_Colorbar_Bound", 999)
+ vmax = params.get("Upper_Colorbar_Bound", -999)
+ color_by = params.get("Color_By", "Annotation")
+ stratify = params.get("Stratify", True)
+ stratify_by = params.get("Stratify_By", [])
+
+ if stratify and len(stratify_by) == 0:
+ raise ValueError(
+ 'Please set at least one annotation in the "Stratify By" '
+ 'option, or set the "Stratify" to False.'
+ )
+
+ if stratify:
+ check_annotation(
+ adata,
+ annotations=stratify_by
+ )
+
+ # Process feature and annotation with text_to_value
+ feature = text_to_value(feature)
+ annotation = text_to_value(annotation)
+
+ if color_by == "Annotation":
+ feature = None
+ else:
+ annotation = None
+
+ layer = text_to_value(layer, "Original")
+
+ prefilled_spatial = partial(
+ spatial_plot,
+ spot_size=spot_size,
+ alpha=alpha,
+ vmin=vmin,
+ vmax=vmax,
+ annotation=annotation,
+ feature=feature,
+ layer=layer
+ )
+
+ # Track figures for saving
+ figures_dict = {}
+
+ if not stratify:
+ plt.rcParams['font.size'] = font_size
+ fig, ax = plt.subplots(
+ figsize=(image_width, image_height), dpi=dpi
+ )
+
+ ax = prefilled_spatial(adata=adata, ax=ax)
+
+ if color_by == "Annotation":
+ title = f'Annotation: {annotation}'
+ else:
+ title = f'Table:"{layer}" \n Feature:"{feature}"'
+ ax[0].set_title(title)
+
+ figures_dict["spatial_plot"] = fig
+
+ if show_plots:
+ plt.show()
+ else:
+ combined_label = "concatenated_label"
+
+ adata.obs[combined_label] = adata.obs[stratify_by].astype(str).agg(
+ '_'.join, axis=1
+ )
+
+ unique_values = adata.obs[combined_label].unique()
+
+ logger.info(f"Unique stratification values: {unique_values}")
+
+ max_length = min(len(unique_values), 20)
+ if len(unique_values) > 20:
+ logger.warning(
+ f'There are "{len(unique_values)}" unique plots, '
+ 'displaying only the first 20 plots.'
+ )
+
+ for idx, value in enumerate(unique_values[:max_length]):
+ filtered_adata = select_values(
+ data=adata, annotation=combined_label, values=value
+ )
+
+ fig, ax = plt.subplots(
+ figsize=(image_width, image_height), dpi=dpi
+ )
+
+ ax = prefilled_spatial(adata=filtered_adata, ax=ax)
+
+ if color_by == "Annotation":
+ title = f'Annotation: {annotation}'
+ else:
+ title = f'Table:"{layer}" \n Feature:"{feature}"'
+ title = f'{title}\n Stratify by: {value}'
+ ax[0].set_title(title)
+
+ # Use sanitized value for figure name
+ safe_value = str(value).replace('/', '_').replace('\\', '_')
+ figures_dict[f"spatial_plot_{safe_value}"] = fig
+
+ if show_plots:
+ plt.show()
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ # Check for figures output
+ if "figures" in params["outputs"]:
+ results_dict["figures"] = figures_dict
+
+ # Use centralized save_results function
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ # Close figures after saving
+ for fig in figures_dict.values():
+ plt.close(fig)
+
+ logger.info("Spatial Plot analysis completed successfully.")
+ return saved_files
+ else:
+ # Return the figures directly for in-memory workflows
+ logger.info("Returning figures for in-memory use")
+ return list(figures_dict.values())
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python spatial_plot_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ print(f"\nReturned {len(result)} figures")
diff --git a/src/spac/templates/subset_analysis_template.py b/src/spac/templates/subset_analysis_template.py
new file mode 100644
index 00000000..e32286de
--- /dev/null
+++ b/src/spac/templates/subset_analysis_template.py
@@ -0,0 +1,219 @@
+"""
+Platform-agnostic Subset Analysis template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Reads outputs configuration from blueprint JSON file.
+
+Usage
+-----
+>>> from spac.templates.subset_analysis_template import run_from_json
+>>> run_from_json("examples/subset_analysis_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union, List, Tuple
+import pandas as pd
+import warnings
+import logging
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+# Import SPAC functions from NIDAP template
+from spac.data_utils import select_values
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+ text_to_value,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, str], Any]:
+ """
+ Execute Subset Analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Analysis": "path/to/data.pickle",
+ "Annotation_of_interest": "cell_type",
+ "Labels": ["T cells", "B cells"],
+ "Include_Exclude": "Include Selected Labels",
+ "outputs": {
+ "analysis": {"type": "file", "name": "transform_output.pickle"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If True, saves the filtered AnnData
+ to a pickle file. If False, returns the AnnData object directly for
+ in-memory workflows. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory. All outputs will be saved relative to this directory.
+
+ Returns
+ -------
+ dict or AnnData
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {
+ "analysis": "path/to/transform_output.pickle"
+ }
+ If save_to_disk=False: The processed AnnData object
+
+ Notes
+ -----
+ Output Structure:
+ - Analysis output is saved as a single pickle file
+ - When save_to_disk=False, the AnnData object is returned for programmatic use
+
+ Examples
+ --------
+ >>> # Save results to disk
+ >>> saved_files = run_from_json("params.json")
+ >>> print(saved_files["analysis"]) # Path to saved pickle file
+
+ >>> # Get results in memory
+ >>> filtered_adata = run_from_json("params.json", save_to_disk=False)
+
+ >>> # Custom output directory
+ >>> saved = run_from_json("params.json", output_dir="/custom/path")
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ # Analysis outputs use file type per standardized schema
+ if "outputs" not in params:
+ params["outputs"] = {
+ "analysis": {"type": "file", "name": "transform_output.pickle"}
+ }
+
+ # Load the upstream analysis data
+ adata = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters
+ # Use direct dictionary access for required parameters (NIDAP style)
+ annotation = params["Annotation_of_interest"]
+ labels = params["Labels"]
+
+ # Use .get() with defaults for optional parameters from JSON template
+ toggle = params.get("Include_Exclude", "Include Selected Labels")
+
+ if toggle == "Include Selected Labels":
+ values_to_include = labels
+ values_to_exclude = None
+ else:
+ values_to_include = None
+ values_to_exclude = labels
+
+ with warnings.catch_warnings(record=True) as caught_warnings:
+ warnings.simplefilter("always")
+ filtered_adata = select_values(
+ data=adata,
+ annotation=annotation,
+ values=values_to_include,
+ exclude_values=values_to_exclude
+ )
+ # Only process warnings that are relevant to the select_values operation
+ if caught_warnings:
+ for warning in caught_warnings:
+ # Skip deprecation warnings from numpy/pandas
+ if (hasattr(warning, 'category') and
+ issubclass(warning.category, DeprecationWarning)):
+ continue
+ # Raise actual operational warnings as errors
+ if hasattr(warning, 'message'):
+ raise ValueError(str(warning.message))
+
+ logging.info(filtered_adata)
+ logging.info("\n")
+
+ # Count and display occurrences of each label in the annotation
+ label_counts = filtered_adata.obs[annotation].value_counts()
+ logging.info(label_counts)
+ logging.info("\n")
+
+ dataframe = pd.DataFrame(
+ filtered_adata.X,
+ columns=filtered_adata.var.index,
+ index=filtered_adata.obs.index
+ )
+ logging.info(dataframe.describe())
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ # Check for analysis output (backward compatibility with "Output_File")
+ if "analysis" in params["outputs"]:
+ results_dict["analysis"] = filtered_adata
+
+ # Use centralized save_results function
+ # All file handling and logging is now done by save_results
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ logging.info("Subset Analysis completed successfully.")
+ return saved_files
+ else:
+ # Return the adata object directly for in-memory workflows
+ logging.info("Returning AnnData object for in-memory use")
+ return filtered_adata
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python subset_analysis_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ # Run analysis
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ # Display results based on return type
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ print("\nReturned AnnData object")
+ print(f"AnnData shape: {result.shape}")
diff --git a/src/spac/templates/summarize_annotation_statistics_template.py b/src/spac/templates/summarize_annotation_statistics_template.py
new file mode 100644
index 00000000..04557b35
--- /dev/null
+++ b/src/spac/templates/summarize_annotation_statistics_template.py
@@ -0,0 +1,185 @@
+"""
+Platform-agnostic Summarize Annotation's Statistics template converted from
+NIDAP. Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Reads outputs configuration from blueprint JSON file.
+
+Usage
+-----
+>>> from spac.templates.summarize_annotation_statistics_template import \
+... run_from_json
+>>> run_from_json("examples/summarize_annotation_statistics_params.json")
+"""
+import json
+import sys
+import logging
+from pathlib import Path
+from typing import Any, Dict, Union, List, Optional
+import pandas as pd
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.transformations import get_cluster_info
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+ text_to_value,
+)
+
+logger = logging.getLogger(__name__)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, str], pd.DataFrame]:
+ """
+ Execute Summarize Annotation's Statistics analysis with parameters from
+ JSON. Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Analysis": "path/to/data.pickle",
+ "Table_to_Process": "Original",
+ "Annotation": "phenotype",
+ "Feature_s_": ["All"],
+ "outputs": {
+ "dataframe": {"type": "file", "name": "dataframe.csv"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If False, returns the dataframe
+ directly for in-memory workflows. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory.
+
+ Returns
+ -------
+ dict or DataFrame
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {"dataframe": "path/to/dataframe.csv"}
+ If save_to_disk=False: The processed DataFrame
+
+ Notes
+ -----
+ Output Structure:
+ - DataFrame is saved as a single CSV file
+ - When save_to_disk=False, the DataFrame is returned for programmatic use
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ if "outputs" not in params:
+ params["outputs"] = {
+ "dataframe": {"type": "file", "name": "dataframe.csv"}
+ }
+
+ # Load the upstream analysis data
+ adata = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters
+ layer = params.get("Table_to_Process", "Original")
+ features = params.get("Feature_s_", ["All"])
+ annotation = params.get("Annotation", "None")
+
+ if layer == "Original":
+ layer = None
+
+ if len(features) == 1 and features[0] == "All":
+ features = None
+
+ if annotation == "None":
+ annotation = None
+
+ info = get_cluster_info(
+ adata=adata,
+ layer=layer,
+ annotation=annotation,
+ features=features
+ )
+
+ df = pd.DataFrame(info)
+
+ # Renaming columns to avoid spaces and special characters
+ df.columns = [
+ col.replace(" ", "_").replace("-", "_") for col in df.columns
+ ]
+
+ # Get summary statistics of returned dataset
+ logger.info(f"Summary statistics of the dataset:\n{df.describe()}")
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ if "dataframe" in params["outputs"]:
+ results_dict["dataframe"] = df
+
+ # Use centralized save_results function
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ logger.info(
+ "Summarize Annotation's Statistics analysis completed successfully."
+ )
+ return saved_files
+ else:
+ # Return the dataframe directly for in-memory workflows
+ logger.info("Returning DataFrame for in-memory use")
+ return df
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python summarize_annotation_statistics_template.py "
+ " [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ print("\nReturned DataFrame")
+ print(f"DataFrame shape: {result.shape}")
diff --git a/src/spac/templates/summarize_dataframe_template.py b/src/spac/templates/summarize_dataframe_template.py
new file mode 100644
index 00000000..92a43e0d
--- /dev/null
+++ b/src/spac/templates/summarize_dataframe_template.py
@@ -0,0 +1,207 @@
+"""
+Platform-agnostic Summarize DataFrame template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Reads outputs configuration from blueprint JSON file.
+
+Usage
+-----
+>>> from spac.templates.summarize_dataframe_template import run_from_json
+>>> run_from_json("examples/summarize_dataframe_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union, List, Optional, Tuple
+import pandas as pd
+import logging
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.data_utils import summarize_dataframe
+from spac.visualization import present_summary_as_figure
+from spac.templates.template_utils import (
+ save_results,
+ parse_params,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+ show_plot: bool = False,
+) -> Union[Dict[str, str], Tuple[Any, pd.DataFrame]]:
+ """
+ Execute Summarize DataFrame analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Dataset": "path/to/dataframe.csv",
+ "Columns": ["col1", "col2"],
+ "Print_Missing_Location": false,
+ "outputs": {
+ "html": {"type": "directory", "name": "html_dir"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If True, saves the HTML summary
+ to a directory. If False, returns the figure and dataframe directly
+ for in-memory workflows. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory. All outputs will be saved relative to this directory.
+ show_plot : bool, optional
+ Whether to display the plot interactively. Default is False.
+
+ Returns
+ -------
+ dict or tuple
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {
+ "html": ["path/to/html_dir/summary.html"]
+ }
+ If save_to_disk=False: Tuple of (figure, summary_dataframe)
+
+ Notes
+ -----
+ Output Structure:
+ - HTML is saved to a directory as specified in outputs config
+ - When save_to_disk=False, returns (figure, summary_df) for programmatic use
+
+ Examples
+ --------
+ >>> # Save results to disk
+ >>> saved_files = run_from_json("params.json")
+ >>> print(saved_files["html"]) # List of paths to saved HTML files
+
+ >>> # Get results in memory
+ >>> fig, summary_df = run_from_json("params.json", save_to_disk=False)
+
+ >>> # Custom output directory with interactive display
+ >>> saved = run_from_json("params.json", output_dir="/custom/path", show_plot=True)
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ # HTML outputs use directory type per standardized schema
+ if "outputs" not in params:
+ params["outputs"] = {
+ "html": {"type": "directory", "name": "html_dir"}
+ }
+
+ # Load upstream data - DataFrame or CSV file
+ # Corrected "Calculate_Centroids" to "Upstream_Dataset" in the blueprint
+ input_path = params.get("Upstream_Dataset")
+ if isinstance(input_path, pd.DataFrame):
+ df = input_path # Direct DataFrame from previous step
+ elif isinstance(input_path, (str, Path)):
+ # Galaxy passes .dat files, but they contain CSV data
+ # Don't check extension - directly read as CSV
+ path = Path(input_path)
+ try:
+ df = pd.read_csv(path)
+ logging.info(f"Successfully loaded CSV data from: {path}")
+ except Exception as e:
+ raise ValueError(
+ f"Failed to read CSV data from '{path}'. "
+ f"This tool expects CSV/tabular format. "
+ f"Error: {str(e)}"
+ )
+ else:
+ raise TypeError(
+ f"Input dataset must be DataFrame or file path. "
+ f"Got {type(input_path)}"
+ )
+
+ # Extract parameters
+ columns = params["Columns"]
+ print_missing_location = params.get("Print_Missing_Location", False)
+
+ # Run the analysis exactly as in NIDAP template
+ summary = summarize_dataframe(
+ df,
+ columns=columns,
+ print_nan_locations=print_missing_location
+ )
+
+ # Generate figure from the summary
+ fig = present_summary_as_figure(summary)
+
+ if show_plot:
+ fig.show() # Opens in an interactive Plotly window
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ # Check for html output - convert figure to HTML string
+ if "html" in params["outputs"]:
+ # Convert Plotly figure to HTML string for save_results
+ html_content = fig.to_html(full_html=True, include_plotlyjs='cdn')
+ results_dict["html"] = {"summary": html_content}
+
+ # Use centralized save_results function
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ logging.info("Summarize DataFrame analysis completed successfully.")
+ return saved_files
+ else:
+ # Return the figure and summary dataframe directly for in-memory workflows
+ logging.info("Returning figure and dataframe for in-memory use")
+ return fig, summary
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python summarize_dataframe_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ # Run analysis
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ # Display results based on return type
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ print("\nReturned figure and dataframe")
diff --git a/src/spac/templates/template_utils.py b/src/spac/templates/template_utils.py
new file mode 100644
index 00000000..7c7e9872
--- /dev/null
+++ b/src/spac/templates/template_utils.py
@@ -0,0 +1,876 @@
+from pathlib import Path
+import pickle
+from typing import Any, Dict, Union, Optional, List
+import json
+import pandas as pd
+import anndata as ad
+import re
+import logging
+import matplotlib.pyplot as plt
+
+logger = logging.getLogger(__name__)
+
+
+def load_input(file_path: Union[str, Path]):
+ """
+ Load input data from either h5ad or pickle file.
+
+ Parameters
+ ----------
+ file_path : str or Path
+ Path to input file (h5ad or pickle)
+
+ Returns
+ -------
+ Loaded data object (typically AnnData)
+ """
+ path = Path(file_path)
+
+ if not path.exists():
+ raise FileNotFoundError(f"Input file not found: {file_path}")
+
+ # Check file extension
+ suffix = path.suffix.lower()
+
+ if suffix in ['.h5ad', '.h5']:
+ # Load h5ad file
+ try:
+ return ad.read_h5ad(path)
+ except ImportError:
+ raise ImportError(
+ "anndata package required to read h5ad files"
+ )
+ except Exception as e:
+ raise ValueError(f"Error reading h5ad file: {e}")
+
+ elif suffix in ['.pickle', '.pkl', '.p']:
+ # Load pickle file
+ with path.open('rb') as fh:
+ return pickle.load(fh)
+
+ else:
+ # Try to detect file type by content
+ try:
+ # First try h5ad
+ return ad.read_h5ad(path)
+ except Exception:
+ # Fall back to pickle
+ try:
+ with path.open('rb') as fh:
+ return pickle.load(fh)
+ except Exception as e:
+ raise ValueError(
+ f"Unable to load file '{file_path}'. "
+ f"Supported formats: h5ad, pickle. Error: {e}"
+ )
+
+
+def save_results(
+ results: Dict[str, Any],
+ params: Dict[str, Any],
+ output_base_dir: Union[str, Path] = None
+) -> Dict[str, Union[str, List[str]]]:
+ """
+ Save results based on output configuration in params.
+
+ This function reads the output configuration from the params dictionary
+ and saves results accordingly. It applies a standardized schema where:
+ - figures → directory (may contain one or many)
+ - analysis → file
+ - dataframe → file (or directory for exceptions like "Neighborhood Profile")
+ - html → directory
+
+ Parameters
+ ----------
+ results : dict
+ Dictionary of results to save where:
+ - key: result type ("analysis", "dataframes", "figures", "html")
+ - value: object(s) to save (single object, list, or dict of objects)
+ params : dict
+ Parameters dict containing 'outputs' configuration with structure:
+ {
+ "outputs": {
+ "figures": {"type": "directory", "name": "figures_dir"},
+ "html": {"type": "directory", "name": "html_dir"},
+ "dataframe": {"type": "file", "name": "dataframe.csv"},
+ "analysis": {"type": "file", "name": "output.pickle"}
+ }
+ }
+ output_base_dir : str or Path, optional
+ Base directory for outputs. If None, uses params['Output_Directory'] or '.'
+
+ Returns
+ -------
+ dict
+ Dictionary mapping output types to saved file paths:
+ - For files: string path
+ - For directories: list of string paths
+
+ Example
+ -------
+ >>> params = {
+ ... "outputs": {
+ ... "figures": {"type": "directory", "name": "figure_outputs"},
+ ... "dataframe": {"type": "file", "name": "summary.csv"}
+ ... }
+ ... }
+ >>> results = {"figures": {"boxplot": fig}, "dataframe": df}
+ >>> saved = save_results(results, params)
+ """
+ # Get output directory from params if not provided
+ if output_base_dir is None:
+ output_base_dir = params.get("Output_Directory", ".")
+ output_base_dir = Path(output_base_dir)
+
+ # Get outputs config from params
+ outputs_config = params.get("outputs", {})
+ if not outputs_config:
+ logger.warning("No outputs configuration found in params")
+ return {}
+
+ saved_files = {}
+
+ # Process each result based on configuration
+ for result_key, data in results.items():
+ # Find matching config (case-insensitive match)
+ config = None
+ config_key = None
+
+ for key, value in outputs_config.items():
+ if key.lower() == result_key.lower():
+ config = value
+ config_key = key
+ break
+
+ if not config:
+ logger.warning(f"No output config for '{result_key}', skipping")
+ continue
+
+ # Determine output type and name
+ output_type = config.get("type")
+ output_name = config.get("name", result_key)
+
+ # Apply standardized schema if type not explicitly specified
+ if not output_type:
+ result_key_lower = result_key.lower()
+ if "figures" in result_key_lower:
+ output_type = "directory"
+ elif "analysis" in result_key_lower:
+ output_type = "file"
+ elif "dataframe" in result_key_lower:
+ # Special case: Neighborhood Profile gets directory treatment
+ if "neighborhood" in output_name.lower() and "profile" in output_name.lower():
+ output_type = "directory"
+ else:
+ output_type = "file"
+ elif "html" in result_key_lower:
+ output_type = "directory"
+ else:
+ # Default based on data structure
+ output_type = "directory" if isinstance(data, (dict, list)) else "file"
+
+ logger.debug(f"Auto-determined type '{output_type}' for '{result_key}'")
+
+ # Save based on determined type
+ if output_type == "directory":
+ # Create directory and save multiple files
+ output_dir = output_base_dir / output_name
+ output_dir.mkdir(parents=True, exist_ok=True)
+ saved_files[config_key or result_key] = []
+
+ if isinstance(data, dict):
+ # Dictionary of named items
+ for name, obj in data.items():
+ filepath = _save_single_object(obj, name, output_dir)
+ saved_files[config_key or result_key].append(str(filepath))
+
+ elif isinstance(data, (list, tuple)):
+ # List of items - auto-name them
+ for idx, obj in enumerate(data):
+ name = f"{result_key}_{idx}"
+ filepath = _save_single_object(obj, name, output_dir)
+ saved_files[config_key or result_key].append(str(filepath))
+
+ else:
+ # Single item saved to directory
+ filepath = _save_single_object(data, result_key, output_dir)
+ saved_files[config_key or result_key] = [str(filepath)]
+
+ elif output_type == "file":
+ # Save as single file
+ output_path = output_base_dir / output_name
+ output_path.parent.mkdir(parents=True, exist_ok=True)
+
+ # Handle different file types based on extension
+ if output_name.endswith('.pickle'):
+ with open(output_path, 'wb') as f:
+ pickle.dump(data, f)
+
+ elif output_name.endswith('.csv'):
+ if isinstance(data, pd.DataFrame):
+ data.to_csv(output_path, index=False)
+ else:
+ # Convert to DataFrame if possible
+ df = pd.DataFrame(data)
+ df.to_csv(output_path, index=False)
+
+ elif output_name.endswith('.h5ad'):
+ if hasattr(data, 'write_h5ad'):
+ data.write_h5ad(str(output_path))
+
+ elif output_name.endswith('.html'):
+ with open(output_path, 'w') as f:
+ f.write(str(data))
+
+ elif output_name.endswith(('.png', '.pdf', '.svg')):
+ if hasattr(data, 'savefig'):
+ data.savefig(output_path, dpi=300, bbox_inches='tight')
+ plt.close(data) # Close figure to free memory
+
+ else:
+ # Default to pickle for unknown types
+ if not output_name.endswith('.pickle'):
+ output_path = output_path.with_suffix('.pickle')
+ with open(output_path, 'wb') as f:
+ pickle.dump(data, f)
+
+ saved_files[config_key or result_key] = str(output_path)
+
+ # Log summary of saved files
+ logger.info(f"Results saved to {output_base_dir}:")
+ for key, paths in saved_files.items():
+ if isinstance(paths, list):
+ output_name = outputs_config.get(key, {}).get('name', key)
+ logger.info(f" {key}: {len(paths)} files in {output_base_dir}/{output_name}/")
+ for path in paths[:3]: # Show first 3 files
+ logger.debug(f" - {Path(path).name}")
+ if len(paths) > 3:
+ logger.debug(f" ... and {len(paths) - 3} more files")
+ else:
+ logger.info(f" {key}: {Path(paths).name}")
+
+ return saved_files
+
+
+def _save_single_object(obj: Any, name: str, output_dir: Path) -> Path:
+ """
+ Save a single object to file with appropriate format.
+ Internal helper function for save_results.
+
+ Parameters
+ ----------
+ obj : Any
+ Object to save
+ name : str
+ Base name for the file (extension will be added if needed)
+ output_dir : Path
+ Directory to save to
+
+ Returns
+ -------
+ Path
+ Path to saved file
+ """
+ # Determine file format based on object type
+ if isinstance(obj, pd.DataFrame):
+ # DataFrames -> CSV
+ if not name.endswith('.csv'):
+ name = f"{name}.csv"
+ filepath = output_dir / name
+ obj.to_csv(filepath, index=False)
+
+ elif hasattr(obj, 'savefig'):
+ # Matplotlib figures -> PNG only
+ if not name.endswith('.png'):
+ name = f"{name}.png"
+ filepath = output_dir / name
+ obj.savefig(filepath, dpi=300, bbox_inches='tight')
+ plt.close(obj) # Close figure to free memory
+
+ elif isinstance(obj, str) and (' pickle (for consistency, could be h5ad)
+ if not name.endswith('.pickle'):
+ name = f"{name}.pickle"
+ filepath = output_dir / name
+ with open(filepath, 'wb') as f:
+ pickle.dump(obj, f)
+
+ else:
+ # Everything else -> pickle
+ if '.' not in name:
+ name = f"{name}.pickle"
+ filepath = output_dir / name
+ with open(filepath, 'wb') as f:
+ pickle.dump(obj, f)
+
+ logger.debug(f"Saved {type(obj).__name__} to {filepath}")
+ return filepath
+
+
+def parse_params(
+ json_input: Union[str, Path, Dict[str, Any]]
+) -> Dict[str, Any]:
+ """
+ Parse parameters from JSON file, string, or dict.
+
+ Parameters
+ ----------
+ json_input : str, Path, or dict
+ JSON file path, JSON string, or dictionary
+
+ Returns
+ -------
+ dict
+ Parsed parameters
+ """
+ if isinstance(json_input, dict):
+ return json_input
+
+ if isinstance(json_input, (str, Path)):
+ path = Path(json_input)
+
+ # Check if it's a file path
+ if path.exists() or str(json_input).endswith('.json'):
+ with open(path, 'r') as file:
+ return json.load(file)
+ else:
+ # It's a JSON string
+ return json.loads(str(json_input))
+
+ raise TypeError(
+ "json_input must be dict, JSON string, or path to JSON file"
+ )
+
+
+def text_to_value(
+ var: Any,
+ default_none_text: str = "None",
+ value_to_convert_to: Any = None,
+ to_float: bool = False,
+ to_int: bool = False,
+ param_name: str = ''
+):
+ """
+ Converts a string to a specified value or type. Handles conversion to
+ float or integer and provides a default value if the input string
+ matches a specified 'None' text.
+
+ Parameters
+ ----------
+ var : str
+ The input string to be converted.
+ default_none_text : str, optional
+ The string that represents a 'None' value. If `var` matches this
+ string, it will be converted to `value_to_convert_to`.
+ Default is "None".
+ value_to_convert_to : any, optional
+ The value to assign to `var` if it matches `default_none_text` or
+ is an empty string. Default is None.
+ to_float : bool, optional
+ If True, attempt to convert `var` to a float. Default is False.
+ to_int : bool, optional
+ If True, attempt to convert `var` to an integer. Default is False.
+ param_name : str, optional
+ The name of the parameter, used in error messages for conversion
+ failures. Default is ''.
+
+ Returns
+ -------
+ any
+ The converted value, which may be the original string, a float,
+ an integer, or the specified `value_to_convert_to`.
+
+ Raises
+ ------
+ ValueError
+ If `to_float` or `to_int` is set to True and conversion fails.
+
+ Notes
+ -----
+ - If both `to_float` and `to_int` are set to True, the function will
+ prioritize conversion to float.
+ - If the string `var` matches `default_none_text` or is an empty
+ string, `value_to_convert_to` is returned.
+
+ Examples
+ --------
+ Convert a string representing a float:
+
+ >>> text_to_value("3.14", to_float=True)
+ 3.14
+
+ Handle a 'None' string:
+
+ >>> text_to_value("None", value_to_convert_to=None)
+ None
+
+ Convert a string to an integer:
+
+ >>> text_to_value("42", to_int=True)
+ 42
+
+ Handle invalid conversion:
+
+ >>> text_to_value("abc", to_int=True, param_name="test_param")
+ Error: can't convert test_param to integer. Received:"abc"
+ 'abc'
+ """
+ # Handle non-string inputs
+ if not isinstance(var, str):
+ var = str(var)
+
+ none_condition = (
+ var.lower().strip() == default_none_text.lower().strip() or
+ var.strip() == ''
+ )
+
+ if none_condition:
+ var = value_to_convert_to
+
+ elif to_float:
+ try:
+ var = float(var)
+ except ValueError:
+ error_msg = (
+ f'Error: can\'t convert {param_name} to float. '
+ f'Received:"{var}"'
+ )
+ raise ValueError(error_msg)
+
+ elif to_int:
+ try:
+ var = int(var)
+ except ValueError:
+ error_msg = (
+ f'Error: can\'t convert {param_name} to integer. '
+ f'Received:"{var}"'
+ )
+ raise ValueError(error_msg)
+
+ return var
+
+
+def convert_to_floats(text_list: List[Any]) -> List[float]:
+ """
+ Convert list of text values to floats.
+
+ Parameters
+ ----------
+ text_list : list
+ List of values to convert
+
+ Returns
+ -------
+ list
+ List of float values
+
+ Raises
+ ------
+ ValueError
+ If any value cannot be converted to float
+ """
+ float_list = []
+ for value in text_list:
+ try:
+ float_list.append(float(value))
+ except ValueError:
+ msg = f"Failed to convert value: '{value}' to float."
+ raise ValueError(msg)
+ return float_list
+
+
+def convert_pickle_to_h5ad(
+ pickle_path: Union[str, Path],
+ h5ad_path: Optional[Union[str, Path]] = None
+) -> str:
+ """
+ Convert a pickle file containing AnnData to h5ad format.
+
+ Parameters
+ ----------
+ pickle_path : str or Path
+ Path to input pickle file
+ h5ad_path : str or Path, optional
+ Path for output h5ad file. If None, uses same name with .h5ad
+ extension
+
+ Returns
+ -------
+ str
+ Path to saved h5ad file
+ """
+ pickle_path = Path(pickle_path)
+
+ if not pickle_path.exists():
+ raise FileNotFoundError(f"Pickle file not found: {pickle_path}")
+
+ # Load from pickle
+ with pickle_path.open('rb') as fh:
+ adata = pickle.load(fh)
+
+ # Check if it's AnnData
+ try:
+ import anndata as ad
+ if not isinstance(adata, ad.AnnData):
+ raise TypeError(
+ f"Loaded object is not AnnData, got {type(adata)}"
+ )
+ except ImportError:
+ raise ImportError(
+ "anndata package required for conversion to h5ad"
+ )
+
+ # Determine output path
+ if h5ad_path is None:
+ h5ad_path = pickle_path.with_suffix('.h5ad')
+ else:
+ h5ad_path = Path(h5ad_path)
+
+ # Save as h5ad
+ adata.write_h5ad(h5ad_path)
+
+ return str(h5ad_path)
+
+
+def spell_out_special_characters(text: str) -> str:
+ """
+ Clean column names by replacing special characters with text equivalents.
+
+ Handles biological marker names like:
+ - "CD4+" → "CD4_pos"
+ - "CD8-" → "CD8_neg"
+ - "CD4+CD20-" → "CD4_pos_CD20_neg"
+ - "CD4+/CD20-" → "CD4_pos_slashCD20_neg"
+ - "CD4+ CD20-" → "CD4_pos_CD20_neg"
+ - "Area µm²" → "Area_um2"
+
+ Parameters
+ ----------
+ text : str
+ The text to clean
+
+ Returns
+ -------
+ str
+ Cleaned text with special characters replaced
+ """
+ # Replace spaces with underscores
+ text = text.replace(' ', '_')
+
+ # Replace specific substrings for units
+ text = text.replace('µm²', 'um2')
+ text = text.replace('µm', 'um')
+
+ # Handle hyphens between alphanumeric characters FIRST
+ # (before + and - replacements)
+ # This pattern matches a hyphen that has alphanumeric on both sides
+ text = re.sub(r'(?<=[A-Za-z0-9])-(?=[A-Za-z0-9])', '_', text)
+
+ # Now replace remaining '+' with '_pos_' and '-' with '_neg_'
+ text = text.replace('+', '_pos_')
+ text = text.replace('-', '_neg_')
+
+ # Mapping for specific characters
+ special_char_map = {
+ 'µ': 'u', # Micro symbol replaced with 'u'
+ '²': '2', # Superscript two replaced with '2'
+ '@': 'at',
+ '#': 'hash',
+ '$': 'dollar',
+ '%': 'percent',
+ '&': 'and',
+ '*': 'asterisk',
+ '/': 'slash',
+ '\\': 'backslash',
+ '=': 'equals',
+ '^': 'caret',
+ '!': 'exclamation',
+ '?': 'question',
+ '~': 'tilde',
+ '|': 'pipe',
+ ',': '', # Remove commas
+ '(': '', # Remove parentheses
+ ')': '', # Remove parentheses
+ '[': '', # Remove brackets
+ ']': '', # Remove brackets
+ '{': '', # Remove braces
+ '}': '', # Remove braces
+ }
+
+ # Replace special characters using special_char_map
+ for char, replacement in special_char_map.items():
+ text = text.replace(char, replacement)
+
+ # Remove any remaining disallowed characters
+ # (keep only alphanumeric and underscore)
+ text = re.sub(r'[^a-zA-Z0-9_]', '', text)
+
+ # Remove multiple consecutive underscores and
+ # replace with single underscore
+ text = re.sub(r'_+', '_', text)
+
+ # Strip both leading and trailing underscores
+ text = text.strip('_')
+
+ return text
+
+
+def clean_column_name(column_name: str) -> str:
+ """
+ Clean a single column name using spell_out_special_characters.
+
+ Parameters
+ ----------
+ column_name : str
+ Original column name
+
+ Returns
+ -------
+ str
+ Cleaned column name
+ """
+ original = column_name
+ cleaned = spell_out_special_characters(column_name)
+ # Ensure doesn't start with digit
+ if cleaned and cleaned[0].isdigit():
+ cleaned = f'col_{cleaned}'
+ if original != cleaned:
+ logger.info(f'Column Name Updated: "{original}" -> "{cleaned}"')
+ return cleaned
+
+
+def load_csv_files(
+ csv_input: Union[str, Path, List[str]],
+ files_config: pd.DataFrame,
+ string_columns: Optional[List[str]] = None
+) -> pd.DataFrame:
+ """
+ Load and combine CSV files based on configuration.
+
+ Supports both:
+ - Galaxy input: list of file paths
+ - NIDAP input: directory path
+
+ Parameters
+ ----------
+ csv_input : str, Path, or list
+ Either a directory path (NIDAP) or list of file paths (Galaxy)
+ files_config : pd.DataFrame
+ Configuration dataframe with 'file_name' column and optional metadata
+ string_columns : list, optional
+ Columns to force as string type
+
+ Returns
+ -------
+ pd.DataFrame
+ Combined dataframe with all CSV data
+ """
+ import pprint
+
+ filename_col = "file_name"
+
+ # Build file path mapping based on input type
+ if isinstance(csv_input, list):
+ # Galaxy: list of file paths
+ file_path_map = {Path(p).name: Path(p) for p in csv_input}
+ logger.info(f"Galaxy mode: {len(file_path_map)} files provided")
+ else:
+ # NIDAP: directory path
+ csv_dir = Path(csv_input)
+ file_path_map = {p.name: p for p in csv_dir.glob("*.csv")}
+ logger.info(f"NIDAP mode: {len(file_path_map)} CSV files in {csv_dir}")
+
+ # Clean configuration
+ files_config = files_config.applymap(
+ lambda x: x.strip() if isinstance(x, str) else x
+ )
+
+ # Get column names
+ all_column_names = files_config.columns.tolist()
+ metadata_columns = [
+ col for col in all_column_names if col != filename_col
+ ]
+
+ # Validate string_columns
+ if string_columns is None:
+ string_columns = []
+ elif not isinstance(string_columns, list):
+ raise ValueError(
+ "String Columns must be a *list* of column names (strings)."
+ )
+
+ # Handle ["None"] or [""] => empty list
+ if (len(string_columns) == 1 and
+ isinstance(string_columns[0], str) and
+ text_to_value(string_columns[0]) is None):
+ string_columns = []
+
+ # Extract data types
+ dtypes = files_config.dtypes.to_dict()
+
+ # Get files to process
+ files_config = files_config.astype(str)
+ files_to_use = [
+ f.strip() for f in files_config[filename_col].tolist()
+ ]
+
+ # Check all files exist
+ missing_files = [f for f in files_to_use if f not in file_path_map]
+ if missing_files:
+ raise FileNotFoundError(
+ f"Files not found: {', '.join(missing_files)}\n"
+ f"Available: {', '.join(file_path_map.keys())}"
+ )
+
+ # Prepare dtype override
+ dtype_override = (
+ {col: str for col in string_columns} if string_columns else None
+ )
+
+ # Process files
+ processed_df_list = []
+
+ for file_name in files_to_use:
+ file_path = file_path_map[file_name]
+
+ try:
+ current_df = pd.read_csv(file_path, dtype=dtype_override)
+ logger.info(f'Processing: "{file_name}"')
+ current_df.columns = [
+ clean_column_name(col) for col in current_df.columns
+ ]
+
+ except pd.errors.EmptyDataError:
+ raise ValueError(f'File "{file_name}" is empty.')
+ except pd.errors.ParserError:
+ raise ValueError(
+ f'File "{file_name}" could not be parsed as CSV.'
+ )
+
+ current_df[filename_col] = file_name
+
+ # Reorder columns: filename first
+ cols = [filename_col] + [c for c in current_df.columns if c != filename_col]
+ current_df = current_df[cols]
+
+ processed_df_list.append(current_df)
+ logger.info(f'File "{file_name}" processed: {current_df.shape}')
+
+ # Combine dataframes
+ final_df = pd.concat(processed_df_list, ignore_index=True)
+
+ # Ensure string columns remain strings
+ for col in string_columns:
+ if col in final_df.columns:
+ final_df[col] = final_df[col].astype(str)
+
+ # Add metadata columns
+ if metadata_columns:
+ for column in metadata_columns:
+ file_to_value = (
+ files_config.set_index(filename_col)[column].to_dict()
+ )
+ final_df[column] = final_df[filename_col].map(file_to_value)
+ final_df[column] = final_df[column].astype(dtypes[column])
+
+ logger.info(f'Added metadata column "{column}"')
+ logger.debug(f'Mapping: {file_to_value}')
+
+ logger.info(f"Combined {len(processed_df_list)} files -> {final_df.shape}")
+
+ return final_df
+
+
+def string_list_to_dictionary(
+ input_list: List[str],
+ key_name: str = "key",
+ value_name: str = "color"
+) -> Dict[str, str]:
+ """
+ Validate that a list contains strings in the "key:value" format
+ and return the parsed dictionary. Reports all invalid entries with
+ custom key and value names in error messages.
+
+ Parameters
+ ----------
+ input_list : list
+ List of strings to validate and parse
+ key_name : str, optional
+ Name to describe the 'key' part in error messages. Default is "key"
+ value_name : str, optional
+ Name to describe the 'value' part in error messages. Default is "color"
+
+ Returns
+ -------
+ dict
+ A dictionary parsed from the input list if all entries are valid
+
+ Raises
+ ------
+ TypeError
+ If input is not a list
+ ValueError
+ If any entry in the list is not a valid "key:value" format
+
+ Examples
+ --------
+ >>> string_list_to_dictionary(["red:#FF0000", "blue:#0000FF"])
+ {'red': '#FF0000', 'blue': '#0000FF'}
+
+ >>> string_list_to_dictionary(["TypeA:Cancer", "TypeB:Normal"], "cell_type", "diagnosis")
+ {'TypeA': 'Cancer', 'TypeB': 'Normal'}
+ """
+ if not isinstance(input_list, list):
+ raise TypeError("Input must be a list.")
+
+ parsed_dict = {}
+ errors = []
+ seen_keys = set()
+
+ for entry in input_list:
+ if not isinstance(entry, str):
+ errors.append(
+ f"\nInvalid entry '{entry}': Must be a string in the "
+ f"'{key_name}:{value_name}' format."
+ )
+ continue
+ if ":" not in entry:
+ errors.append(
+ f"\nInvalid entry '{entry}': Missing ':' separator to "
+ f"separate '{key_name}' and '{value_name}'."
+ )
+ continue
+
+ key, *value = map(str.strip, entry.split(":", 1))
+ if not key or not value:
+ errors.append(
+ f"\nInvalid entry '{entry}': Both '{key_name}' and "
+ f"'{value_name}' must be non-empty."
+ )
+ continue
+
+ if key in seen_keys:
+ errors.append(f"\nDuplicate {key_name} '{key}' found.")
+ else:
+ seen_keys.add(key)
+ parsed_dict[key] = value[0]
+
+ # Add to dictionary if valid
+ parsed_dict[key] = value[0]
+
+ # Raise error if there are invalid entries
+ if errors:
+ raise ValueError(
+ "\nValidation failed for the following entries:\n" +
+ "\n".join(errors)
+ )
+
+ return parsed_dict
diff --git a/src/spac/templates/tsne_analysis_template.py b/src/spac/templates/tsne_analysis_template.py
new file mode 100644
index 00000000..d72de6ec
--- /dev/null
+++ b/src/spac/templates/tsne_analysis_template.py
@@ -0,0 +1,163 @@
+"""
+Platform-agnostic tSNE Analysis template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Reads outputs configuration from blueprint JSON file.
+
+Usage
+-----
+>>> from spac.templates.tsne_analysis_template import run_from_json
+>>> run_from_json("examples/tsne_analysis_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.transformations import tsne
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, str], Any]:
+ """
+ Execute tSNE Analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Analysis": "path/to/data.pickle",
+ "Table_to_Process": "Original",
+ "outputs": {
+ "analysis": {"type": "file", "name": "output.pickle"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If False, returns the AnnData object
+ directly for in-memory workflows. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory.
+
+ Returns
+ -------
+ dict or AnnData
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {"analysis": "path/to/output.pickle"}
+ If save_to_disk=False: The processed AnnData object
+
+ Notes
+ -----
+ Output Structure:
+ - Analysis output is saved as a single pickle file (standardized for analysis outputs)
+ - When save_to_disk=False, the AnnData object is returned for programmatic use
+
+ Examples
+ --------
+ >>> # Save results to disk
+ >>> saved_files = run_from_json("params.json")
+ >>> print(saved_files["analysis"]) # Path to saved pickle file
+
+ >>> # Get results in memory
+ >>> adata = run_from_json("params.json", save_to_disk=False)
+
+ >>> # Custom output directory
+ >>> saved = run_from_json("params.json", output_dir="/custom/path")
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ # Analysis uses file type per standardized schema
+ if "outputs" not in params:
+ params["outputs"] = {
+ "analysis": {"type": "file", "name": "output.pickle"}
+ }
+
+ # Load the upstream analysis data
+ all_data = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters
+ # Select layer to perform tSNE
+ Layer_to_Analysis = params.get("Table_to_Process", "Original")
+
+ print(all_data)
+ if Layer_to_Analysis == "Original":
+ Layer_to_Analysis = None
+
+ print("tSNE Layer: \n", Layer_to_Analysis)
+
+ print("Performing tSNE ...")
+
+ tsne(all_data, layer=Layer_to_Analysis)
+
+ print("tSNE Done!")
+
+ print(all_data)
+
+ object_to_output = all_data
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary
+ results_dict = {}
+ if "analysis" in params["outputs"]:
+ results_dict["analysis"] = object_to_output
+
+ # Use centralized save_results function
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ print(f"tSNE Analysis completed → {saved_files['analysis']}")
+ return saved_files
+ else:
+ # Return the adata object directly for in-memory workflows
+ print("Returning AnnData object (not saving to file)")
+ return object_to_output
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python tsne_analysis_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for filename, filepath in result.items():
+ print(f" {filename}: {filepath}")
+ else:
+ print("\nReturned AnnData object")
diff --git a/src/spac/templates/umap_transformation_template.py b/src/spac/templates/umap_transformation_template.py
new file mode 100644
index 00000000..388f3d11
--- /dev/null
+++ b/src/spac/templates/umap_transformation_template.py
@@ -0,0 +1,174 @@
+"""
+Platform-agnostic UMAP transformation template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Reads outputs configuration from blueprint JSON file.
+
+Usage
+-----
+>>> from spac.templates.umap_transformation_template import run_from_json
+>>> run_from_json("examples/umap_transformation_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union
+import pickle
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+# Import SPAC functions from NIDAP template
+from spac.transformations import run_umap
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+ text_to_value,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, str], Any]:
+ """
+ Execute UMAP transformation analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Analysis": "path/to/data.pickle",
+ "Table_to_Process": "Original",
+ "Number_of_Neighbors": 75,
+ "outputs": {
+ "analysis": {"type": "file", "name": "output.pickle"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If False, returns the AnnData object
+ directly for in-memory workflows. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory.
+
+ Returns
+ -------
+ dict or AnnData
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {"analysis": "path/to/output.pickle"}
+ If save_to_disk=False: The processed AnnData object
+
+ Notes
+ -----
+ Output Structure:
+ - Analysis output is saved as a single pickle file (standardized for analysis outputs)
+ - When save_to_disk=False, the AnnData object is returned for programmatic use
+
+ Examples
+ --------
+ >>> # Save results to disk
+ >>> saved_files = run_from_json("params.json")
+ >>> print(saved_files["analysis"]) # Path to saved pickle file
+
+ >>> # Get results in memory
+ >>> adata = run_from_json("params.json", save_to_disk=False)
+
+ >>> # Custom output directory
+ >>> saved = run_from_json("params.json", output_dir="/custom/path")
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ # Analysis uses file type per standardized schema
+ if "outputs" not in params:
+ params["outputs"] = {
+ "analysis": {"type": "file", "name": "output.pickle"}
+ }
+
+ # Load the upstream analysis data
+ adata = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters - Note: HPC parameters are ignored in SPAC version
+ n_neighbors = params.get("Number_of_Neighbors", 75)
+ min_dist = params.get("Minimum_Distance_between_Points", 0.1)
+ n_components = params.get("Target_Dimension_Number", 2)
+ metric = params.get("Computational_Metric", "euclidean")
+ random_state = params.get("Random_State", 0)
+ transform_seed = params.get("Transform_Seed", 42)
+ layer = params.get("Table_to_Process", "Original")
+
+ if layer == "Original":
+ layer = None
+
+ updated_dataset = run_umap(
+ adata=adata,
+ n_neighbors=n_neighbors,
+ min_dist=min_dist,
+ n_components=n_components,
+ metric=metric,
+ random_state=random_state,
+ transform_seed=transform_seed,
+ layer=layer,
+ verbose=True
+ )
+
+ # Print adata info as in NIDAP
+ print(adata)
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary
+ results_dict = {}
+ if "analysis" in params["outputs"]:
+ results_dict["analysis"] = updated_dataset
+
+ # Use centralized save_results function
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ print(f"UMAP transformation completed → {saved_files['analysis']}")
+ return saved_files
+ else:
+ # Return the adata object directly for in-memory workflows
+ print("Returning AnnData object (not saving to file)")
+ return adata
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python umap_transformation_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for filename, filepath in result.items():
+ print(f" {filename}: {filepath}")
+ else:
+ print("\nReturned data object")
diff --git a/src/spac/templates/umap_tsne_pca_visualization_template.py b/src/spac/templates/umap_tsne_pca_visualization_template.py
new file mode 100644
index 00000000..47394b04
--- /dev/null
+++ b/src/spac/templates/umap_tsne_pca_visualization_template.py
@@ -0,0 +1,242 @@
+"""
+Platform-agnostic UMAP\\tSNE\\PCA Visualization template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Reads outputs configuration from blueprint JSON file.
+
+Usage
+-----
+>>> from spac.templates.umap_tsne_pca_template import run_from_json
+>>> run_from_json("examples/umap_tsne_pca_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union, Optional, List
+import matplotlib.pyplot as plt
+import logging
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.visualization import dimensionality_reduction_plot
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+ text_to_value,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ show_plot: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, Union[str, List[str]]], plt.Figure]:
+ """
+ Execute UMAP\\tSNE\\PCA Visualization analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Analysis": "path/to/data.pickle",
+ "Color_By": "Annotation",
+ "Annotation_to_Highlight": "cell_type",
+ "Dimension_Reduction_Method": "umap",
+ ...
+ "outputs": {
+ "figures": {"type": "directory", "name": "figures_dir"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If False, returns the figure
+ directly for in-memory workflows. Default is True.
+ show_plot : bool, optional
+ Whether to display the plot. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory.
+
+ Returns
+ -------
+ dict or Figure
+ If save_to_disk=True: Dictionary of saved file paths
+ If save_to_disk=False: The matplotlib figure
+ """
+ # Set up logging
+ logging.basicConfig(level=logging.INFO)
+ logger = logging.getLogger(__name__)
+
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ if "outputs" not in params:
+ params["outputs"] = {
+ "figures": {"type": "directory", "name": "figures_dir"}
+ }
+
+ # Load the upstream analysis data
+ adata = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters
+ annotation = params.get("Annotation_to_Highlight", "None")
+ feature = params.get("Feature_to_Highlight", "None")
+ layer = params.get("Table", "Original")
+ method = params.get("Dimension_Reduction_Method", "umap")
+ fig_width = params.get("Figure_Width", 12)
+ fig_height = params.get("Figure_Height", 12)
+ font_size = params.get("Font_Size", 12)
+ fig_dpi = params.get("Figure_DPI", 300)
+ legend_location = params.get("Legend_Location", "best")
+ legend_label_size = params.get("Legend_Font_Size", 16)
+ legend_marker_scale = params.get("Legend_Marker_Size", 5.0)
+ color_by = params.get("Color_By", "Annotation")
+ point_size = params.get("Dot_Size", 1)
+ v_min = params.get("Value_Min", "None")
+ v_max = params.get("Value_Max", "None")
+
+ feature = text_to_value(feature)
+ annotation = text_to_value(annotation)
+
+ if color_by == "Annotation":
+ feature = None
+ else:
+ annotation = None
+
+ # Store the original value of layer
+ layer_input = layer
+
+ layer = text_to_value(layer, default_none_text="Original")
+
+ vmin = text_to_value(
+ v_min,
+ default_none_text="None",
+ value_to_convert_to=None,
+ to_float=True,
+ param_name="Value Min"
+ )
+
+ vmax = text_to_value(
+ v_max,
+ default_none_text="None",
+ value_to_convert_to=None,
+ to_float=True,
+ param_name="Value Max"
+ )
+
+ plt.rcParams.update({'font.size': font_size})
+
+ fig, ax = dimensionality_reduction_plot(
+ adata=adata,
+ method=method,
+ annotation=annotation,
+ feature=feature,
+ layer=layer,
+ point_size=point_size,
+ vmin=vmin,
+ vmax=vmax
+ )
+
+ if color_by == "Annotation":
+ title = annotation
+ else:
+ title = f'Table:"{layer_input}" \n Feature:"{feature}"'
+ ax.set_title(title)
+
+ fig = ax.get_figure()
+
+ fig.set_size_inches(
+ fig_width,
+ fig_height
+ )
+ fig.set_dpi(fig_dpi)
+
+ legend = ax.get_legend()
+ has_legend = legend is not None
+
+ if has_legend:
+ ax.legend(
+ loc=legend_location,
+ bbox_to_anchor=(1, 0.5),
+ fontsize=legend_label_size,
+ markerscale=legend_marker_scale
+ )
+
+ plt.tight_layout()
+
+ if show_plot:
+ plt.show()
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ # Check for figures output
+ if "figures" in params["outputs"]:
+ results_dict["figures"] = {f"{method}_plot": fig}
+
+ # Use centralized save_results function
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ plt.close(fig)
+
+ logger.info(
+ f"{method.upper()} Visualization completed successfully."
+ )
+ return saved_files
+ else:
+ # Return the figure directly for in-memory workflows
+ logger.info("Returning figure for in-memory use")
+ return fig
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python umap_tsne_pca_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ print("\nReturned figure")
diff --git a/src/spac/templates/utag_clustering_template.py b/src/spac/templates/utag_clustering_template.py
new file mode 100644
index 00000000..304f6149
--- /dev/null
+++ b/src/spac/templates/utag_clustering_template.py
@@ -0,0 +1,229 @@
+"""
+Platform-agnostic UTAG Clustering template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Reads outputs configuration from blueprint JSON file.
+
+Usage
+-----
+>>> from spac.templates.utag_clustering_template import run_from_json
+>>> run_from_json("examples/utag_clustering_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union
+import pandas as pd
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.transformations import run_utag_clustering
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+ text_to_value,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, str], Any]:
+ """
+ Execute UTAG Clustering analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Analysis": "path/to/data.pickle",
+ "Table_to_Process": "Original",
+ "K_Nearest_Neighbors": 15,
+ "outputs": {
+ "analysis": {"type": "file", "name": "output.pickle"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If False, returns the AnnData object
+ directly for in-memory workflows. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory.
+
+ Returns
+ -------
+ dict or AnnData
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {"analysis": "path/to/output.pickle"}
+ If save_to_disk=False: The processed AnnData object
+
+ Notes
+ -----
+ Output Structure:
+ - Analysis output is saved as a single pickle file (standardized for analysis outputs)
+ - When save_to_disk=False, the AnnData object is returned for programmatic use
+
+ Examples
+ --------
+ >>> # Save results to disk
+ >>> saved_files = run_from_json("params.json")
+ >>> print(saved_files["analysis"]) # Path to saved pickle file
+
+ >>> # Get results in memory
+ >>> adata = run_from_json("params.json", save_to_disk=False)
+
+ >>> # Custom output directory
+ >>> saved = run_from_json("params.json", output_dir="/custom/path")
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ # Analysis uses file type per standardized schema
+ if "outputs" not in params:
+ params["outputs"] = {
+ "analysis": {"type": "file", "name": "output.pickle"}
+ }
+
+ # Load the upstream analysis data
+ adata = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters
+ layer = params.get("Table_to_Process", "Original")
+ features = params.get("Features", ["All"])
+ slide = params.get("Slide_Annotation", "None")
+ Distance_threshold = params.get("Distance_Threshold", 20.0)
+ K_neighbors = params.get("K_Nearest_Neighbors", 15)
+ resolution = params.get("Resolution_Parameter", 1)
+ principal_components = params.get("PCA_Components", "None")
+ random_seed = params.get("Random_Seed", 42)
+ n_jobs = params.get("N_Jobs", 1)
+ N_iterations = params.get("Leiden_Iterations", 5)
+ Parallel_processes = params.get("Parellel_Processes", False)
+ output_annotation = params.get("Output_Annotation_Name", "UTAG")
+
+ # layer: convert "Original" → None
+ layer_arg = None if layer.lower().strip() == "original" else layer
+
+ # features: ["All"] → None, else leave list and print selection
+ if isinstance(features, list) and any(
+ item == "All" for item in features
+ ):
+ print("Clustering all features")
+ features_arg = None
+ else:
+ feature_str = "\n".join(features)
+ print(f"Clustering features:\n{feature_str}")
+ features_arg = features
+
+ # slide: "None" → None
+ slide_arg = text_to_value(
+ slide,
+ default_none_text="None",
+ value_to_convert_to=None
+ )
+
+ # principal_components: "None" or integer string → None or int
+ principal_components_arg = text_to_value(
+ principal_components,
+ default_none_text="None",
+ value_to_convert_to=None,
+ to_int=True,
+ param_name="principal_components"
+ )
+
+ print("\nBefore UTAG Clustering: \n", adata)
+
+ run_utag_clustering(
+ adata,
+ features=features_arg,
+ k=K_neighbors,
+ resolution=resolution,
+ max_dist=Distance_threshold,
+ n_pcs=principal_components_arg,
+ random_state=random_seed,
+ n_jobs=n_jobs,
+ n_iterations=N_iterations,
+ slide_key=slide_arg,
+ layer=layer_arg,
+ output_annotation=output_annotation,
+ parallel=Parallel_processes,
+ )
+
+ print("\nAfter UTAG Clustering: \n", adata)
+
+ print(
+ "\nUTAG Cluster Count: \n",
+ len(adata.obs[output_annotation].unique().tolist())
+ )
+
+ print(
+ "\nUTAG Cluster Names: \n",
+ adata.obs[output_annotation].unique().tolist()
+ )
+
+ # Count and display occurrences of each label in the annotation
+ print(
+ f'\nCount of cells in the output annotation:'
+ f'"{output_annotation}":'
+ )
+ label_counts = adata.obs[output_annotation].value_counts()
+ print(label_counts)
+ print("\n")
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary
+ results_dict = {}
+ if "analysis" in params["outputs"]:
+ results_dict["analysis"] = adata
+
+ # Use centralized save_results function
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ print(f"UTAG Clustering completed → {saved_files['analysis']}")
+ return saved_files
+ else:
+ # Return the adata object directly for in-memory workflows
+ print("Returning AnnData object (not saving to file)")
+ return adata
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python utag_clustering_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for filename, filepath in result.items():
+ print(f" {filename}: {filepath}")
+ else:
+ print("\nReturned AnnData object")
diff --git a/src/spac/templates/visualize_nearest_neighbor_template.py b/src/spac/templates/visualize_nearest_neighbor_template.py
new file mode 100644
index 00000000..42193531
--- /dev/null
+++ b/src/spac/templates/visualize_nearest_neighbor_template.py
@@ -0,0 +1,523 @@
+"""
+Platform-agnostic Visualize Nearest Neighbor template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Usage
+-----
+>>> from spac.templates.visualize_nearest_neighbor_template import (
+... run_from_json
+... )
+>>> run_from_json("examples/visualize_nearest_neighbor_params.json")
+"""
+import logging
+import sys
+from pathlib import Path
+from typing import Any, Dict, List, Tuple, Union
+import pandas as pd
+import numpy as np
+from matplotlib.axes import Axes
+import matplotlib.pyplot as plt
+import matplotlib.patches as mpatches
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.visualization import visualize_nearest_neighbor
+from spac.templates.template_utils import (
+ load_input,
+ parse_params,
+ save_results,
+ text_to_value,
+)
+
+# Set up logging
+logger = logging.getLogger(__name__)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: Union[str, Path] = None,
+ show_plot: bool = True
+) -> Union[Dict[str, Union[str, List[str]]], Tuple[Any, pd.DataFrame]]:
+ """
+ Execute Visualize Nearest Neighbor analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Analysis": "path/to/input.pickle",
+ "Annotation": "cell_type",
+ "Source_Anchor_Cell_Label": "CD4_T",
+ "Target_Cell_Label": "All",
+ "Plot_Method": "numeric",
+ "Plot_Type": "boxen",
+ "outputs": {
+ "figures": {"type": "directory", "name": "figures_dir"},
+ "dataframe": {"type": "file", "name": "dataframe.csv"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If False, returns the figure and
+ dataframe directly for in-memory workflows. Default is True.
+ output_dir : str or Path, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory.
+ show_plot : bool, optional
+ Whether to display the plot. Default is True.
+
+ Returns
+ -------
+ dict or tuple
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {"figures": ["path/to/fig1.png", ...], "dataframe": "path/to/df.csv"}
+ If save_to_disk=False: Tuple of (figure(s), dataframe)
+
+ Notes
+ -----
+ Output Structure:
+ - Figures are saved as a directory containing one or more plot files (standardized)
+ - DataFrame is saved as a single CSV file (standardized)
+ - When save_to_disk=False, returns (figure(s), dataframe) for programmatic use
+
+ Examples
+ --------
+ >>> # Save results to disk
+ >>> saved_files = run_from_json("params.json")
+ >>> print(saved_files["figures"]) # List of figure paths
+ >>> print(saved_files["dataframe"]) # Path to CSV
+
+ >>> # Get results in memory for further processing
+ >>> figures, df = run_from_json("params.json", save_to_disk=False)
+
+ >>> # Custom output directory
+ >>> saved = run_from_json("params.json", output_dir="/custom/path")
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ # Figures use directory type, dataframe uses file type per standardized schema
+ if "outputs" not in params:
+ params["outputs"] = {
+ "figures": {"type": "directory", "name": "figures"},
+ "dataframe": {"type": "file", "name": "dataframe.csv"}
+ }
+
+ # Load the upstream analysis data
+ adata = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters
+ # Use direct dictionary access for required parameters
+ # Will raise KeyError if missing
+ annotation = params["Annotation"]
+ source_label = params["Source_Anchor_Cell_Label"]
+
+ # Use .get() with defaults for optional parameters from JSON template
+ image_id = params.get("ImageID", "None")
+ method = params.get("Plot_Method", "numeric")
+ plot_type = params.get("Plot_Type", "boxen")
+ target_label = params.get("Target_Cell_Label", "All")
+ distance_key = params.get(
+ "Nearest_Neighbor_Associated_Table", "spatial_distance"
+ )
+ log_scale = params.get("Log_Scale", False)
+ facet_plot = params.get("Facet_Plot", False)
+ x_axis_title_rotation = params.get("X_Axis_Label_Rotation", 0)
+ shared_x_axis_title = params.get("Shared_X_Axis_Title_", True)
+ x_axis_title_fontsize = params.get("X_Axis_Title_Font_Size", "None")
+
+ defined_color_map = text_to_value(
+ params.get("Defined_Color_Mapping", "None"),
+ param_name="Define Label Color Mapping"
+ )
+ annotation_colorscale = "rainbow"
+
+ fig_width = params.get("Figure_Width", 12)
+ fig_height = params.get("Figure_Height", 6)
+ fig_dpi = params.get("Figure_DPI", 300)
+ global_font_size = params.get("Font_Size", 12)
+ fig_title = (
+ f'Nearest Neighbor Distance Distribution\nMeasured from '
+ f'"{source_label}"'
+ )
+
+ image_id = text_to_value(
+ image_id,
+ default_none_text="None",
+ value_to_convert_to=None
+ )
+
+ # If target_label is None, it means "All distance columns"
+ # If it's a comma-separated string (e.g. "Stroma,Immune"),
+ # split into a list
+ target_label = text_to_value(
+ target_label,
+ default_none_text="All",
+ value_to_convert_to=None
+ )
+
+ if target_label is not None:
+ distance_to_processed = [x.strip() for x in target_label.split(",")]
+ else:
+ distance_to_processed = None
+
+ x_axis_title_fontsize = text_to_value(
+ x_axis_title_fontsize,
+ default_none_text="None",
+ to_int="True"
+ )
+
+ # Configure Matplotlib font size
+ plt.rcParams.update({'font.size': global_font_size})
+
+ # If facet_plot=True but no valid stratify column => revert to single figure
+ if facet_plot and image_id is None:
+ warning_message = (
+ "Facet plotting was requested, but there is no annotation "
+ "to group by. Switching to a single-figure display."
+ )
+ logger.warning(warning_message)
+ facet_plot = False
+
+ result_dict = visualize_nearest_neighbor(
+ adata=adata,
+ annotation=annotation,
+ spatial_distance=distance_key,
+ distance_from=source_label,
+ distance_to=distance_to_processed,
+ method=method,
+ plot_type=plot_type,
+ stratify_by=image_id,
+ facet_plot=facet_plot,
+ log=log_scale,
+ annotation_colorscale=annotation_colorscale,
+ defined_color_map=defined_color_map,
+ )
+
+ # Extract the data and figure(s)
+ df_long = result_dict["data"]
+ figs_out = result_dict["fig"] # Single Figure or List of Figures
+ palette_hex = result_dict["palette"]
+ axes_out = result_dict["ax"]
+
+ logger.info("Summary statistics of the dataset:")
+ logger.info(f"\n{df_long.describe()}")
+
+ # Customize figure legends & X-axis rotation
+ legend_labels = (
+ distance_to_processed or df_long["group"].unique().tolist()
+ )
+ legend_labels = (
+ legend_labels if distance_to_processed else sorted(legend_labels)
+ )
+
+ handles = [
+ mpatches.Patch(
+ facecolor=palette_hex[label],
+ edgecolor='none',
+ label=label
+ )
+ for label in legend_labels
+ ]
+
+ def _flatten_axes(ax_input):
+ if isinstance(ax_input, Axes):
+ return [ax_input]
+ if isinstance(ax_input, (list, tuple, np.ndarray)):
+ return [
+ ax for ax in np.ravel(ax_input) if isinstance(ax, Axes)
+ ]
+ return []
+
+ flat_axes_list = _flatten_axes(axes_out)
+ shared_x_title_applied_to_fig = None
+
+ if flat_axes_list:
+ # Attach legend to the last axis
+ flat_axes_list[-1].legend(
+ handles=handles,
+ title="Target phenotype",
+ bbox_to_anchor=(1.02, 1),
+ loc="upper left",
+ frameon=False,
+ )
+
+ # X-Axis Title Handling
+ current_x_label_text = ""
+ if flat_axes_list[0].get_xlabel():
+ current_x_label_text = flat_axes_list[0].get_xlabel()
+
+ if not current_x_label_text:
+ current_x_label_text = (
+ f"Log({distance_key})" if log_scale else distance_key
+ )
+ if not current_x_label_text:
+ current_x_label_text = "Distance" # Ultimate fallback
+
+ effective_fontsize = (
+ x_axis_title_fontsize if x_axis_title_fontsize is not None
+ else global_font_size
+ )
+
+ if (facet_plot and shared_x_axis_title and
+ isinstance(figs_out, plt.Figure)):
+ for ax_item in flat_axes_list:
+ ax_item.set_xlabel('')
+
+ sup_ha_align = 'center'
+ if 0 < x_axis_title_rotation % 360 < 180:
+ sup_ha_align = 'right'
+ elif 180 < x_axis_title_rotation % 360 < 360:
+ sup_ha_align = 'left'
+
+ figs_out.supxlabel(
+ current_x_label_text, y=0.02, fontsize=effective_fontsize,
+ rotation=x_axis_title_rotation, ha=sup_ha_align
+ )
+ shared_x_title_applied_to_fig = figs_out
+
+ else: # Apply to individual subplot x-axis titles
+ for ax_item in flat_axes_list:
+ label_object = ax_item.xaxis.get_label()
+ if not label_object.get_text(): # If no label, set it
+ ax_item.set_xlabel(current_x_label_text)
+ label_object = ax_item.xaxis.get_label()
+
+ if label_object.get_text(): # Configure if actual label
+ label_object.set_rotation(x_axis_title_rotation)
+ label_object.set_fontsize(effective_fontsize)
+ ha_align_val = 'center'
+ if 0 < x_axis_title_rotation % 360 < 180:
+ ha_align_val = 'right'
+ elif 180 < x_axis_title_rotation % 360 < 360:
+ ha_align_val = 'left'
+ label_object.set_ha(ha_align_val)
+
+ # Stratification Info
+ if image_id is not None and image_id in df_long.columns:
+ unique_vals = df_long[image_id].unique()
+ n_unique = len(unique_vals)
+
+ if n_unique == 0:
+ logger.warning(
+ f"The annotation '{image_id}' has 0 unique values or is empty. "
+ "No data to plot => Potential empty plot."
+ )
+ elif n_unique == 1 and facet_plot:
+ logger.info(
+ f"The annotation '{image_id}' has only one unique value "
+ f"({unique_vals[0]}). Facet plot will resemble a single plot."
+ )
+ elif n_unique > 1:
+ logger.info(
+ f"The annotation '{image_id}' has {n_unique} unique values: "
+ f"{unique_vals}"
+ )
+
+ # Figure Configuration & Display
+ def _title_main(fig, title):
+ """
+ Sets a bold, centered main title on the figure, and
+ adjusts figure size and layout accordingly.
+ """
+ fig.set_size_inches(fig_width, fig_height)
+ fig.set_dpi(fig_dpi)
+ fig.suptitle(
+ title,
+ fontsize=global_font_size + 4,
+ weight='bold',
+ x=0.5, # center horizontally
+ horizontalalignment='center'
+ )
+
+ def _label_each_figure(fig_list, categories):
+ """
+ Adds a title to each figure, typically used when multiple
+ separate figures are returned (one per category).
+ """
+ for fig, cat in zip(fig_list, categories):
+ if fig:
+ _title_main(fig, f"{fig_title}\n{image_id}: {cat}")
+ # Adjust top for the suptitle
+ fig.tight_layout(rect=[0.01, 0.01, 0.99, 0.96])
+ if show_plot:
+ plt.show()
+
+ # Determine the actual distance column name used in df_long for summary
+ distance_col = (
+ "log_distance" if "log_distance" in df_long.columns else "distance"
+ )
+
+ # Displaying Figures
+ cat_list = []
+ if image_id and (image_id in df_long.columns):
+ if pd.api.types.is_categorical_dtype(df_long[image_id]):
+ cat_list = list(df_long[image_id].cat.categories)
+ else:
+ cat_list = df_long[image_id].unique().tolist()
+
+ # Track figures for saving
+ figures_to_save = []
+
+ if isinstance(figs_out, list) and not facet_plot and \
+ cat_list and len(figs_out) == len(cat_list):
+ # Scenario: Multiple separate figures, one per category (non-faceted)
+ figures_to_save = figs_out
+ _label_each_figure(figs_out, cat_list)
+ if show_plot:
+ plt.show()
+ else:
+ # Scenario: Single figure (faceted) or list of figures not matching categories
+ figures_to_display = (
+ figs_out if isinstance(figs_out, list) else [figs_out]
+ )
+ figures_to_save = figures_to_display
+ for fig_item_to_display in figures_to_display:
+ if fig_item_to_display is not None:
+ _title_main(fig_item_to_display, fig_title)
+
+ bottom_padding = 0.01
+ # Make space for shared x-title
+ if fig_item_to_display is shared_x_title_applied_to_fig:
+ bottom_padding = 0.01 # Adjusted from 0.05
+
+ top_padding = 0.99 # Adjusted from 0.90
+
+ # rect=[left, bottom, right, top]
+ fig_item_to_display.tight_layout(
+ rect=[0.01, bottom_padding, 0.99, top_padding]
+ )
+ if show_plot:
+ plt.show()
+
+ # Summary statistics
+ # 1) Per-group summary
+ df_summary_group = (
+ df_long
+ .groupby("group")[distance_col]
+ .describe()
+ .reset_index()
+ )
+
+ # 2) Per-group-and-stratify, if image_id is valid
+ if image_id and (image_id in df_long.columns):
+ df_summary_group_strat = (
+ df_long
+ .groupby([image_id, "group"])[distance_col]
+ .describe()
+ .reset_index()
+ )
+ else:
+ df_summary_group_strat = None
+
+ if df_summary_group_strat is not None:
+ logger.info(f"\nSummary by group(target phenotypes) AND '{image_id}':")
+ logger.info(f"\n{df_summary_group_strat}")
+ else:
+ logger.info("\nSummary: By group(target phenotypes) only")
+ logger.info(f"\n{df_summary_group}")
+
+ # CSV Output
+ final_df = (
+ df_summary_group_strat if df_summary_group_strat is not None
+ else df_summary_group
+ )
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ # Package figures in a dictionary for directory saving
+ # This ensures they're saved in a directory per standardized schema
+ if "figures" in params["outputs"] and figures_to_save:
+ # Create a dictionary with named figures
+ figures_dict = {}
+ for idx, fig in enumerate(figures_to_save):
+ if fig is not None:
+ # Name figures appropriately
+ if cat_list and len(cat_list) == len(figures_to_save):
+ fig_name = f"nearest_neighbor_{cat_list[idx]}"
+ else:
+ fig_name = f"nearest_neighbor_{idx}"
+ figures_dict[fig_name] = fig
+ results_dict["figures"] = figures_dict # Dict triggers directory save
+
+ # Check for DataFrame output (case-insensitive)
+ if any(k.lower() == "dataframe" for k in params["outputs"].keys()):
+ results_dict["dataframe"] = final_df
+
+ # Use centralized save_results function
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ logger.info("Visualize Nearest Neighbor completed successfully.")
+ logger.info(f"Saved summary statistics to dataframe output.")
+ return saved_files
+ else:
+ # Return the figure(s) and dataframe directly for in-memory workflows
+ logger.info("Returning figure(s) and dataframe (not saving to file)")
+ # If single figure, return it directly; if multiple, return list
+ if len(figures_to_save) == 1:
+ return figures_to_save[0], final_df
+ else:
+ return figures_to_save, final_df
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python visualize_nearest_neighbor_template.py "
+ " [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ # Run analysis
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ # Display results based on return type
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, paths in result.items():
+ if isinstance(paths, list):
+ print(f" {key}:")
+ for path in paths:
+ print(f" - {path}")
+ else:
+ print(f" {key}: {paths}")
+ else:
+ figures, df = result
+ print("\nReturned figure(s) and dataframe for in-memory use")
+ if isinstance(figures, list):
+ print(f"Number of figures: {len(figures)}")
+ else:
+ print(f"Figure size: {figures.get_size_inches()}")
+ print(f"DataFrame shape: {df.shape}")
+ print("\nSummary statistics preview:")
+ print(df.head())
diff --git a/src/spac/templates/visualize_ripley_l_template.py b/src/spac/templates/visualize_ripley_l_template.py
new file mode 100644
index 00000000..17e8b7b2
--- /dev/null
+++ b/src/spac/templates/visualize_ripley_l_template.py
@@ -0,0 +1,155 @@
+"""
+Platform-agnostic Visualize Ripley L template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Usage
+-----
+>>> from spac.templates.visualize_ripley_template import run_from_json
+>>> run_from_json("examples/visualize_ripley_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union, List, Optional, Tuple
+import pandas as pd
+import matplotlib.pyplot as plt
+import logging
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.visualization import plot_ripley_l
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+ text_to_value,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ show_plot: bool = True,
+ output_dir: Optional[Union[str, Path]] = None
+) -> Union[Dict[str, str], Tuple[Any, pd.DataFrame]]:
+ """
+ Execute Visualize Ripley L analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary
+ save_to_disk : bool, optional
+ Whether to save results to file. If False, returns the figure and
+ dataframe directly for in-memory workflows. Default is True.
+ show_plot : bool, optional
+ Whether to display the plot. Default is True.
+ output_dir : str or Path, optional
+ Directory for outputs. If None, uses current directory.
+
+ Returns
+ -------
+ dict or tuple
+ If save_to_disk=True: Dictionary of saved file paths
+ If save_to_disk=False: Tuple of (figure, dataframe)
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Load the upstream analysis data
+ adata = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters
+ center_phenotype = params["Center_Phenotype"]
+ neighbor_phenotype = params["Neighbor_Phenotype"]
+ plot_specific_regions = params.get("Plot_Specific_Regions", False)
+ regions_labels = params.get("Regions_Labels", [])
+ plot_simulations = params.get("Plot_Simulations", True)
+
+ logging.info(f"Running with center_phenotype: {center_phenotype}, neighbor_phenotype: {neighbor_phenotype}")
+
+ # Process regions parameter exactly as in NIDAP template
+ if plot_specific_regions:
+ if len(regions_labels) == 0:
+ raise ValueError(
+ 'Please identify at least one region in the '
+ '"Regions Label(s) parameter'
+ )
+ else:
+ regions_labels = None
+
+ # Run the visualization exactly as in NIDAP template
+ fig, plots_df = plot_ripley_l(
+ adata,
+ phenotypes=(center_phenotype, neighbor_phenotype),
+ regions=regions_labels,
+ sims=plot_simulations,
+ return_df=True
+ )
+
+ if show_plot:
+ plt.show()
+
+ # Print the dataframe to console
+ logging.info(f"\n{plots_df.to_string()}")
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ # Check for dataframe output in config
+ if "dataframe" in params["outputs"]:
+ results_dict["dataframe"] = plots_df
+
+ # Add figure if configured (usually not in the original template)
+ # but we can add it as an enhancement
+ if "figures" in params.get("outputs", {}):
+ # Package figure in a dictionary for directory saving
+ results_dict["figures"] = {"ripley_l_plot": fig}
+
+ # Add analysis output if in config (for compatibility)
+ if "analysis" in params.get("outputs", {}):
+ results_dict["analysis"] = adata
+
+ # Use centralized save_results function
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ logging.info(f"Visualize Ripley L completed → {list(saved_files.keys())}")
+ return saved_files
+ else:
+ # Return the figure and dataframe directly for in-memory workflows
+ logging.info("Returning figure and dataframe (not saving to file)")
+ return fig, plots_df
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print("Usage: python visualize_ripley_template.py ", file=sys.stderr)
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ # Run analysis
+ result = run_from_json(sys.argv[1], output_dir=output_dir)
+
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for filename, filepath in result.items():
+ print(f" {filename}: {filepath}")
+ else:
+ print("\nReturned figure and dataframe")
diff --git a/src/spac/templates/z_score_normalization_template.py b/src/spac/templates/z_score_normalization_template.py
new file mode 100644
index 00000000..38e90a30
--- /dev/null
+++ b/src/spac/templates/z_score_normalization_template.py
@@ -0,0 +1,181 @@
+"""
+Platform-agnostic Z-Score Normalization template converted from NIDAP.
+Maintains the exact logic from the NIDAP template.
+
+Refactored to use centralized save_results from template_utils.
+Follows standardized output schema where analysis is saved as a file.
+
+Usage
+-----
+>>> from spac.templates.zscore_normalization_template import run_from_json
+>>> run_from_json("examples/zscore_normalization_params.json")
+"""
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, Union
+import pandas as pd
+import pickle
+import logging
+
+# Add parent directory to path for imports
+sys.path.append(str(Path(__file__).parent.parent.parent))
+
+from spac.transformations import z_score_normalization
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ parse_params,
+)
+
+
+def run_from_json(
+ json_path: Union[str, Path, Dict[str, Any]],
+ save_to_disk: bool = True,
+ output_dir: str = None,
+) -> Union[Dict[str, str], Any]:
+ """
+ Execute Z-Score Normalization analysis with parameters from JSON.
+ Replicates the NIDAP template functionality exactly.
+
+ Parameters
+ ----------
+ json_path : str, Path, or dict
+ Path to JSON file, JSON string, or parameter dictionary.
+ Expected JSON structure:
+ {
+ "Upstream_Analysis": "path/to/data.pickle",
+ "Table_to_Process": "Original",
+ "Output_Table_Name": "z_scores",
+ "outputs": {
+ "analysis": {"type": "file", "name": "output.pickle"}
+ }
+ }
+ save_to_disk : bool, optional
+ Whether to save results to disk. If True, saves the AnnData object
+ to a pickle file. If False, returns the AnnData object directly
+ for in-memory workflows. Default is True.
+ output_dir : str, optional
+ Base directory for outputs. If None, uses params['Output_Directory']
+ or current directory. All outputs will be saved relative to this directory.
+
+ Returns
+ -------
+ dict or AnnData
+ If save_to_disk=True: Dictionary of saved file paths with structure:
+ {"analysis": "path/to/output.pickle"}
+ If save_to_disk=False: The processed AnnData object for in-memory use
+
+ Notes
+ -----
+ Output Structure:
+ - Analysis output is saved as a single pickle file (standardized for analysis outputs)
+ - When save_to_disk=False, the AnnData object is returned for programmatic use
+
+ Examples
+ --------
+ >>> # Save results to disk
+ >>> saved_files = run_from_json("params.json")
+ >>> print(saved_files["analysis"]) # Path to saved pickle file
+ >>> # './output.pickle'
+
+ >>> # Get results in memory for further processing
+ >>> adata = run_from_json("params.json", save_to_disk=False)
+ >>> # Can now work with adata object directly
+
+ >>> # Custom output directory
+ >>> saved = run_from_json("params.json", output_dir="/custom/path")
+ """
+ # Parse parameters from JSON
+ params = parse_params(json_path)
+
+ # Set output directory
+ if output_dir is None:
+ output_dir = params.get("Output_Directory", ".")
+
+ # Ensure outputs configuration exists with standardized defaults
+ # Analysis uses file type per standardized schema
+ if "outputs" not in params:
+ params["outputs"] = {
+ "analysis": {"type": "file", "name": "output.pickle"}
+ }
+
+ # Load the upstream analysis data
+ adata = load_input(params["Upstream_Analysis"])
+
+ # Extract parameters
+ input_layer = params["Table_to_Process"]
+ output_layer = params["Output_Table_Name"]
+
+ if input_layer == "Original":
+ input_layer = None
+
+ z_score_normalization(
+ adata,
+ output_layer=output_layer,
+ input_layer=input_layer
+ )
+
+ # Convert the normalized layer to a DataFrame and print its summary
+ post_dataframe = adata.to_df(layer=output_layer)
+ logging.info(f"Z-score normalization summary:\n{post_dataframe.describe()}")
+ logging.info(f"Transformed data:\n{adata}")
+
+ # Handle results based on save_to_disk flag
+ if save_to_disk:
+ # Prepare results dictionary based on outputs config
+ results_dict = {}
+
+ if "analysis" in params["outputs"]:
+ results_dict["analysis"] = adata
+
+ # Use centralized save_results function
+ # All file handling and logging is now done by save_results
+ saved_files = save_results(
+ results=results_dict,
+ params=params,
+ output_base_dir=output_dir
+ )
+
+ logging.info(
+ f"Z-Score Normalization completed → {saved_files['analysis']}"
+ )
+ return saved_files
+ else:
+ # Return the adata object directly for in-memory workflows
+ logging.info("Returning AnnData object (not saving to file)")
+ return adata
+
+
+# CLI interface
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print(
+ "Usage: python zscore_normalization_template.py [output_dir]",
+ file=sys.stderr
+ )
+ sys.exit(1)
+
+ # Set up logging for CLI usage
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+ )
+
+ # Get output directory if provided
+ output_dir = sys.argv[2] if len(sys.argv) > 2 else None
+
+ # Run analysis
+ result = run_from_json(
+ json_path=sys.argv[1],
+ output_dir=output_dir
+ )
+
+ # Display results based on return type
+ if isinstance(result, dict):
+ print("\nOutput files:")
+ for key, path in result.items():
+ print(f" {key}: {path}")
+ else:
+ print("\nReturned AnnData object for in-memory use")
+ print(f"AnnData: {result}")
diff --git a/src/spac/transformations.py b/src/spac/transformations.py
index b2044f1c..228c55f1 100644
--- a/src/spac/transformations.py
+++ b/src/spac/transformations.py
@@ -8,7 +8,7 @@
from spac.utils import check_table, check_annotation, check_feature
from scipy import stats
import umap as umap_lib
-from scipy.sparse import issparse
+from scipy.sparse import issparse, csr_matrix
from typing import List, Union, Optional
from numpy.lib import NumpyVersion
from sklearn.neighbors import KNeighborsClassifier
@@ -16,6 +16,9 @@
import multiprocessing
import parmap
from spac.utag_functions import utag
+from anndata import AnnData
+from spac.utils import compute_summary_qc_stats
+from typing import List, Optional
# Configure logging
logging.basicConfig(level=logging.INFO,
@@ -1286,3 +1289,179 @@ def run_utag_clustering(
cluster_list = utag_results.obs[cur_cluster_col].copy()
adata.obs[output_annotation] = cluster_list.copy()
adata.uns["utag_features"] = features
+
+# add QC metrics to AnnData object
+def add_qc_metrics(adata,
+ organism="hs",
+ mt_match_pattern=None,
+ layer=None):
+ """
+ Adds quality control (QC) metrics to the AnnData object.
+
+ Parameters:
+ -----------
+ adata : AnnData
+ The AnnData object containing single-cell or spatial
+ transcriptomics data.
+ organism : str, optional
+ The organism type. Default is "hs" (human). Use "mm" for mouse.
+ Determines the mitochondrial gene prefix
+ ("MT-" for human, "mt-" for mouse).
+ mt_match_pattern : str, optional
+ A custom pattern to identify mitochondrial genes. If None, it defaults
+ to "MT-" for human or "mt-" for mouse based on the `organism` parameter.
+ Takes precedence over the default patterns.
+ If provided, it should match the prefix of mitochondrial gene names in
+ `adata.var_names`.
+ layer : str, optional
+ The name of the layer in `adata.layers` to use for calculations.
+ If None, the default `adata.X` matrix is used.
+
+ Modifies:
+ ---------
+ adata.obs : pandas.DataFrame
+ Adds the following QC metrics as new columns:
+ - "nFeature": Number of genes with non-zero expression for each cell.
+ - "nCount": Total counts (sum of all gene expression values)
+ for each cell.
+ - "nCount_mt": Total counts for mitochondrial genes for each cell.
+ - "percent.mt": Percentage of counts in mitochondrial genes
+ for each cell.
+
+ Raises:
+ -------
+ ValueError
+ If the specified `layer` is not found in `adata.layers`.
+
+ Notes:
+ ------
+ - If the input matrix (`adata.X` or the specified layer) is dense,
+ it is converted to a sparse matrix for efficient computation.
+ - Mitochondrial genes are identified based on the `mt_match_pattern`.
+
+ Example:
+ --------
+ >>> add_qc_metrics(adata, organism="hs")
+ >>> print(adata.obs[["nFeature", "nCount", "nCount_mt", "percent.mt"]])
+ """
+ # identify mitochondrial genes pattern
+ if mt_match_pattern is None:
+ if organism == "hs":
+ mt_match_pattern = "MT-"
+ elif organism == "mm":
+ mt_match_pattern = "mt-"
+ else:
+ raise ValueError(f"Unsupported organism '{organism}'. Supported values are 'hs' and 'mm'.")
+
+ if layer is None:
+ test_matrix = adata.X
+ else:
+ check_table(adata, tables=layer)
+ test_matrix = adata.layers[layer]
+
+ # Check if adata.X is sparse, and convert if necessary
+ if not issparse(test_matrix):
+ test_matrix = csr_matrix(test_matrix)
+
+ # Calculate total number of genes with values > 0 for each cell
+ adata.obs["nFeature"] = np.array((test_matrix > 0).sum(axis=1)).flatten()
+ # Calculate the sum of counts for all genes for each cell
+ adata.obs["nCount"] = np.array(test_matrix.sum(axis=1)).flatten()
+ # Identify mitochondrial genes based on the match pattern
+ mt_genes = adata.var_names.str.startswith(mt_match_pattern)
+ # Calculate the sum of counts for mitochondrial genes for each cell
+ adata.obs["nCount_mt"] = np.array(test_matrix[:, mt_genes]
+ .sum(axis=1)).flatten()
+ # Calculate the percentage of counts in mitochondrial genes for each cell
+ adata.obs["percent.mt"] = (adata.obs["nCount_mt"] /
+ adata.obs["nCount"]) * 100
+ # Handle NaN values in percent.mt
+ adata.obs["percent.mt"] = adata.obs["percent.mt"].fillna(0)
+ # Ensure percent.mt is stored as a float
+ adata.obs["percent.mt"] = adata.obs["percent.mt"].astype(float)
+
+# Add the QC summary table to AnnData object
+def get_qc_summary_table(
+ adata: AnnData,
+ n_mad: int = 5,
+ upper_quantile: float = 0.95,
+ lower_quantile: float = 0.05,
+ stat_columns_list: Optional[List[str]] = None,
+ sample_column: str = None
+) -> None:
+ """
+ Compute summary statistics for quality control metrics in an AnnData object
+ and store the result in adata.uns['qc_summary_table'].
+ If QC columns are not in the adata.obs, run add_qc_metrics first.
+
+ Parameters:
+ adata (AnnData): The AnnData object containing the data.
+ n_mad (int): Number of MADs to use for upper/lower thresholds.
+ upper_quantile (float): Upper quantile to compute (e.g., 0.95).
+ lower_quantile (float): Lower quantile to compute (e.g., 0.05).
+ stat_columns_list (list): List of column names to compute statistics for.
+ If None, defaults to ['nFeature', 'nCount', 'percent.mt'].
+ sample_column (str, optional): Column name to group by sample.
+ If None, computes for all data.
+
+ Returns:
+ None. The summary table is stored in adata.uns['qc_summary_table'].
+ """
+ # if not provided select default stat columns
+ if stat_columns_list is None:
+ stat_columns_list = ['nFeature', 'nCount', 'percent.mt']
+
+ # Check that required columns exist in adata.obs
+ check_annotation(
+ adata,
+ annotations=stat_columns_list,
+ should_exist=True)
+
+ # check that stat_column_list is not empty
+ if not stat_columns_list: # catches [], (), None
+ raise ValueError(
+ 'Parameter "stat_columns_list" must contain at least one column name.'
+ )
+
+ # check grouping column
+ if sample_column is not None:
+ check_annotation(adata, annotations=[sample_column], should_exist=True)
+
+ # validate numerical parameters input
+ if not 0 <= upper_quantile <= 1:
+ raise ValueError(f'Parameter "upper_quantile" must be between 0 and 1, got "{upper_quantile}"'
+ )
+ if not 0 <= lower_quantile <= 1:
+ raise ValueError(f'Parameter "lower_quantile" must be between 0 and 1, got "{lower_quantile}"'
+ )
+ if n_mad < 0:
+ raise ValueError(f'Parameter "n_mad" must be non-negative, got "{n_mad}"')
+
+ obs_df = adata.obs
+ summary_table = pd.DataFrame()
+ # If no sample_column, compute stats for all data
+ if sample_column is None:
+ stat_df = compute_summary_qc_stats(df=obs_df,
+ n_mad=n_mad,
+ upper_quantile=upper_quantile,
+ lower_quantile=lower_quantile,
+ stat_columns_list=stat_columns_list)
+ stat_df["Sample"] = "All"
+ summary_table = stat_df
+ else:
+ # Otherwise, compute stats for each sample group
+ samples_list = pd.unique(obs_df[sample_column])
+ stat_dfs = []
+ for current_sample in samples_list:
+ sample_df = obs_df[obs_df[sample_column] == current_sample].copy()
+ stat_df = compute_summary_qc_stats(df=sample_df,
+ n_mad=n_mad,
+ upper_quantile=upper_quantile,
+ lower_quantile=lower_quantile,
+ stat_columns_list=stat_columns_list)
+ stat_df["Sample"] = current_sample
+ stat_dfs.append(stat_df)
+ summary_table = pd.concat(stat_dfs, ignore_index=True)
+ # Reset index and store in adata.uns
+ summary_table = summary_table.reset_index(drop=True)
+ adata.uns["qc_summary_table"] = summary_table
\ No newline at end of file
diff --git a/src/spac/utils.py b/src/spac/utils.py
index f3506a20..2a616b68 100644
--- a/src/spac/utils.py
+++ b/src/spac/utils.py
@@ -7,6 +7,8 @@
import logging
import warnings
import numbers
+from scipy.stats import median_abs_deviation
+from typing import List, Optional
# Configure logging
logging.basicConfig(level=logging.INFO,
@@ -1108,12 +1110,13 @@ def compute_metrics(data):
# Ensure the maximum and minimum outliers are included
max_outlier = outlier_series.max()
min_outlier = outlier_series.min()
- outliers_sampled = outliers_sampled.append(
- pd.Series([max_outlier, min_outlier])
+ outliers_sampled = pd.concat(
+ [outliers_sampled, pd.Series([max_outlier, min_outlier])],
+ ignore_index=True
)
# Convert the sampled values back to a list
- outliers = outliers_sampled.reset_index(drop=True).tolist()
+ outliers = outliers_sampled.tolist()
metrics = [
lower_whisker,
@@ -1190,3 +1193,84 @@ def compute_metrics(data):
return metrics
return metrics
+
+# compute summary statistics for the specified columns
+def compute_summary_qc_stats(
+ df: pd.DataFrame,
+ n_mad: int = 5,
+ upper_quantile: float = 0.95,
+ lower_quantile: float = 0.05,
+ stat_columns_list: List[str] = ['nFeature', 'nCount', 'percent.mt']
+ ) -> pd.DataFrame:
+
+ """
+ Compute summary quality control statistics for specified columns in a dataset.
+
+ For each column in stat_columns_list, this function calculates:
+ - Mean
+ - Median
+ - Upper and lower thresholds based on median ± n_mad * MAD
+ (median absolute deviation)
+ - Upper and lower quantiles
+
+ Parameters
+ ----------
+ df : pd.DataFrame
+ Input DataFrame containing the data.
+ n_mad : int, optional
+ Number of MADs to use for upper and lower thresholds (default is 5).
+ upper_quantile : float, optional
+ Upper quantile to compute (default is 0.95).
+ lower_quantile : float, optional
+ Lower quantile to compute (default is 0.05).
+ stat_columns_list : list of str, optional
+ List of column names to compute statistics for. Columns must be numeric.
+
+ Returns
+ -------
+ pd.DataFrame
+ DataFrame with summary statistics for each specified column.
+ Columns: ["metric_name", "mean", "median", "upper_mad", "lower_mad",
+ "upper_quantile", "lower_quantile"]
+
+ Raises
+ ------
+ TypeError
+ If any column in stat_columns_list is not numeric or all values are NaN.
+ """
+ stat_vals = []
+ for col_name in stat_columns_list:
+ # Ensure the column is numeric
+ if not pd.api.types.is_numeric_dtype(df[col_name]):
+ raise TypeError(
+ f'Column "{col_name}" must be numeric to compute statistics.'
+ )
+ # Check for all-NaN column
+ if df[col_name].isna().all():
+ raise TypeError(
+ f'Column "{col_name}" must be numeric to compute statistics. '
+ 'All values are NaN.'
+ )
+ # Compute median and MAD (median absolute deviation)
+ median = df[col_name].median()
+ mad = median_abs_deviation(df[col_name], nan_policy='omit')
+ # Collect statistics for this column
+ col_stats = [
+ col_name,
+ df[col_name].mean(),
+ median,
+ median + n_mad * mad,
+ median - n_mad * mad,
+ df[col_name].quantile(upper_quantile),
+ df[col_name].quantile(lower_quantile)
+ ]
+ stat_vals.append(col_stats)
+ # Return DataFrame with statistics for all columns
+ return pd.DataFrame(
+ stat_vals,
+ columns=[
+ "metric_name", "mean", "median",
+ "upper_mad", "lower_mad",
+ "upper_quantile", "lower_quantile"
+ ]
+ )
\ No newline at end of file
diff --git a/tests/templates/__init__.py b/tests/templates/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/tests/templates/test_add_pin_color_rule.py b/tests/templates/test_add_pin_color_rule.py
new file mode 100644
index 00000000..762da8d6
--- /dev/null
+++ b/tests/templates/test_add_pin_color_rule.py
@@ -0,0 +1,98 @@
+# tests/templates/test_add_pin_color_rule.py
+"""
+Real (non-mocked) unit test for the Append Pin Color Rule template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.append_pin_color_rule_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData: 4 cells for color rule assignment."""
+ rng = np.random.default_rng(42)
+ X = rng.random((4, 2))
+ obs = pd.DataFrame({"cell_type": ["A", "B", "A", "B"]})
+ var = pd.DataFrame(index=["Gene_0", "Gene_1"])
+ return ad.AnnData(X=X, obs=obs, var=var)
+
+
+class TestAddPinColorRuleTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the append pin color rule template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Label_Color_Map": ["A:red", "B:blue"],
+ "Color_Map_Name": "_spac_colors",
+ "Overwrite_Previous_Color_Map": True,
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "analysis": {"type": "file", "name": "output.pickle"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_add_pin_color_rule_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run pin color rule and verify outputs.
+
+ Validates:
+ 1. saved_files dict has 'analysis' key
+ 2. Pickle exists and contains AnnData
+ 3. Color map is stored in .uns
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("analysis", saved_files)
+
+ pkl_path = Path(saved_files["analysis"])
+ self.assertTrue(pkl_path.exists())
+ self.assertGreater(pkl_path.stat().st_size, 0)
+
+ with open(pkl_path, "rb") as f:
+ result_adata = pickle.load(f)
+ self.assertIsInstance(result_adata, ad.AnnData)
+ self.assertIn("_spac_colors", result_adata.uns)
+
+ mem_adata = run_from_json(self.json_file, save_to_disk=False)
+ self.assertIsInstance(mem_adata, ad.AnnData)
+ self.assertIn("_spac_colors", mem_adata.uns)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_analysis_to_csv_template.py b/tests/templates/test_analysis_to_csv_template.py
new file mode 100644
index 00000000..3eaeb051
--- /dev/null
+++ b/tests/templates/test_analysis_to_csv_template.py
@@ -0,0 +1,98 @@
+# tests/templates/test_analysis_to_csv_template.py
+"""
+Real (non-mocked) unit test for the Analysis to CSV template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.analysis_to_csv_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData: 4 cells, 2 genes for CSV export."""
+ rng = np.random.default_rng(42)
+ X = rng.random((4, 2))
+ obs = pd.DataFrame({"cell_type": ["A", "B", "A", "B"]})
+ var = pd.DataFrame(index=["Gene_0", "Gene_1"])
+ adata = ad.AnnData(X=X, obs=obs, var=var)
+ adata.obsm["spatial"] = rng.random((4, 2)) * 100
+ return adata
+
+
+class TestAnalysisToCSVTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the analysis to CSV template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Table_to_Export": "Original",
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "dataframe": {"type": "file", "name": "dataframe.csv"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_analysis_to_csv_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: export AnnData to CSV and verify outputs.
+
+ Validates:
+ 1. saved_files dict has 'dataframe' key
+ 2. CSV exists, is non-empty
+ 3. CSV has expected columns (genes + obs)
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("dataframe", saved_files)
+
+ csv_path = Path(saved_files["dataframe"])
+ self.assertTrue(csv_path.exists())
+ self.assertGreater(csv_path.stat().st_size, 0)
+
+ result_df = pd.read_csv(csv_path)
+ # Should have gene columns and obs columns
+ self.assertIn("Gene_0", result_df.columns)
+ self.assertIn("Gene_1", result_df.columns)
+ self.assertEqual(len(result_df), 4)
+
+ mem_df = run_from_json(self.json_file, save_to_disk=False)
+ self.assertIsInstance(mem_df, pd.DataFrame)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_append_annotation_template.py b/tests/templates/test_append_annotation_template.py
new file mode 100644
index 00000000..fcec1b56
--- /dev/null
+++ b/tests/templates/test_append_annotation_template.py
@@ -0,0 +1,114 @@
+# tests/templates/test_append_annotation_template.py
+"""
+Real (non-mocked) unit test for the Append Annotation template.
+
+Validates template I/O behaviour only:
+ - Expected output files are produced on disk
+ - Filenames follow the convention
+ - Output artifacts are non-empty
+
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.append_annotation_template import run_from_json
+
+
+def _make_tiny_dataframe() -> pd.DataFrame:
+ """
+ Minimal synthetic DataFrame for append annotation testing.
+
+ 4 rows, 2 columns -- the smallest dataset that exercises the
+ template's column-append code path.
+ """
+ return pd.DataFrame({
+ "cell_type": ["B cell", "T cell", "B cell", "T cell"],
+ "marker": [1.0, 2.0, 3.0, 4.0],
+ })
+
+
+class TestAppendAnnotationTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the append annotation template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.csv")
+
+ _make_tiny_dataframe().to_csv(self.in_file, index=False)
+
+ params = {
+ "Upstream_Dataset": self.in_file,
+ "Annotation_Pair_List": ["batch_id:batch_1", "site:lung"],
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "dataframe": {"type": "file", "name": "dataframe.csv"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_append_annotation_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run append annotation template and verify
+ output artifacts.
+
+ Validates:
+ 1. saved_files is a dict with 'dataframe' key
+ 2. Output CSV exists and is non-empty
+ 3. New annotation columns are present in the output
+ 4. In-memory return is a DataFrame with the appended columns
+ """
+ # -- Act (save_to_disk=True) -----------------------------------
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ # -- Assert: return type ---------------------------------------
+ self.assertIsInstance(saved_files, dict)
+
+ # -- Assert: CSV file exists and is non-empty ------------------
+ self.assertIn("dataframe", saved_files)
+ csv_path = Path(saved_files["dataframe"])
+ self.assertTrue(csv_path.exists(), f"CSV not found: {csv_path}")
+ self.assertGreater(csv_path.stat().st_size, 0)
+
+ # -- Assert: appended columns present --------------------------
+ result_df = pd.read_csv(csv_path)
+ self.assertIn("batch_id", result_df.columns)
+ self.assertIn("site", result_df.columns)
+ self.assertEqual(result_df["batch_id"].unique().tolist(), ["batch_1"])
+ self.assertEqual(result_df["site"].unique().tolist(), ["lung"])
+
+ # -- Act (save_to_disk=False) ----------------------------------
+ mem_df = run_from_json(
+ self.json_file,
+ save_to_disk=False,
+ )
+
+ # -- Assert: in-memory return is DataFrame ---------------------
+ self.assertIsInstance(mem_df, pd.DataFrame)
+ self.assertIn("batch_id", mem_df.columns)
+ self.assertIn("site", mem_df.columns)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_arcsinh_normalization_template.py b/tests/templates/test_arcsinh_normalization_template.py
new file mode 100644
index 00000000..9e9c61f3
--- /dev/null
+++ b/tests/templates/test_arcsinh_normalization_template.py
@@ -0,0 +1,98 @@
+# tests/templates/test_arcsinh_normalization_template.py
+"""
+Real (non-mocked) unit test for the Arcsinh Normalization template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.arcsinh_normalization_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData: 4 cells, 2 genes with positive values."""
+ rng = np.random.default_rng(42)
+ X = rng.integers(1, 100, size=(4, 2)).astype(float)
+ obs = pd.DataFrame({"cell_type": ["A", "B", "A", "B"]})
+ var = pd.DataFrame(index=["Gene_0", "Gene_1"])
+ return ad.AnnData(X=X, obs=obs, var=var)
+
+
+class TestArcsinhNormalizationTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the arcsinh normalization template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Co_Factor": "5",
+ "Percentile": "None",
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "analysis": {"type": "file", "name": "output.pickle"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_arcsinh_normalization_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run arcsinh normalization and verify outputs.
+
+ Validates:
+ 1. saved_files is a dict with 'analysis' key
+ 2. Output pickle exists and is non-empty
+ 3. Output pickle contains an AnnData with 'arcsinh' layer
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("analysis", saved_files)
+
+ pkl_path = Path(saved_files["analysis"])
+ self.assertTrue(pkl_path.exists(), f"Pickle not found: {pkl_path}")
+ self.assertGreater(pkl_path.stat().st_size, 0)
+
+ with open(pkl_path, "rb") as f:
+ result_adata = pickle.load(f)
+ self.assertIsInstance(result_adata, ad.AnnData)
+ self.assertIn("arcsinh", result_adata.layers)
+
+ # -- save_to_disk=False returns AnnData in memory --------------
+ mem_adata = run_from_json(self.json_file, save_to_disk=False)
+ self.assertIsInstance(mem_adata, ad.AnnData)
+ self.assertIn("arcsinh", mem_adata.layers)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_binary_to_categorical_annotation_template.py b/tests/templates/test_binary_to_categorical_annotation_template.py
new file mode 100644
index 00000000..9e5a44f4
--- /dev/null
+++ b/tests/templates/test_binary_to_categorical_annotation_template.py
@@ -0,0 +1,117 @@
+# tests/templates/test_binary_to_categorical_annotation_template.py
+"""
+Real (non-mocked) unit test for the Binary to Categorical Annotation template.
+
+Validates template I/O behaviour only:
+ - Expected output files are produced on disk
+ - Filenames follow the convention
+ - Output artifacts are non-empty
+
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.binary_to_categorical_annotation_template import (
+ run_from_json,
+)
+
+
+def _make_tiny_dataframe() -> pd.DataFrame:
+ """
+ Minimal synthetic DataFrame with binary one-hot columns.
+
+ 4 rows -- each row has exactly one 1 across the binary columns.
+ """
+ return pd.DataFrame({
+ "B_cell": [1, 0, 0, 0],
+ "T_cell": [0, 1, 0, 1],
+ "NK_cell": [0, 0, 1, 0],
+ "marker": [1.5, 2.5, 3.5, 4.5],
+ })
+
+
+class TestBinaryToCategoricalAnnotationTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the binary-to-categorical template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.csv")
+
+ _make_tiny_dataframe().to_csv(self.in_file, index=False)
+
+ params = {
+ "Upstream_Dataset": self.in_file,
+ "Binary_Annotation_Columns": ["B_cell", "T_cell", "NK_cell"],
+ "New_Annotation_Name": "cell_labels",
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "dataframe": {"type": "file", "name": "dataframe.csv"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_bin2cat_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run binary-to-categorical template and verify
+ output artifacts.
+
+ Validates:
+ 1. saved_files is a dict with 'dataframe' key
+ 2. Output CSV exists and is non-empty
+ 3. New categorical column 'cell_labels' is present
+ 4. Categorical values match the original binary column names
+ """
+ # -- Act (save_to_disk=True) -----------------------------------
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ # -- Assert: return type ---------------------------------------
+ self.assertIsInstance(saved_files, dict)
+
+ # -- Assert: CSV file exists and is non-empty ------------------
+ self.assertIn("dataframe", saved_files)
+ csv_path = Path(saved_files["dataframe"])
+ self.assertTrue(csv_path.exists(), f"CSV not found: {csv_path}")
+ self.assertGreater(csv_path.stat().st_size, 0)
+
+ # -- Assert: categorical column present with expected values ---
+ result_df = pd.read_csv(csv_path)
+ self.assertIn("cell_labels", result_df.columns)
+ expected_labels = {"B_cell", "T_cell", "NK_cell"}
+ actual_labels = set(result_df["cell_labels"].dropna().unique())
+ self.assertEqual(actual_labels, expected_labels)
+
+ # -- Act (save_to_disk=False) ----------------------------------
+ mem_df = run_from_json(
+ self.json_file,
+ save_to_disk=False,
+ )
+
+ # -- Assert: in-memory return is DataFrame ---------------------
+ self.assertIsInstance(mem_df, pd.DataFrame)
+ self.assertIn("cell_labels", mem_df.columns)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_boxplot_template.py b/tests/templates/test_boxplot_template.py
new file mode 100644
index 00000000..3588f618
--- /dev/null
+++ b/tests/templates/test_boxplot_template.py
@@ -0,0 +1,194 @@
+# tests/templates/test_boxplot_template.py
+"""
+Real (non-mocked) unit test for the Boxplot template.
+
+Snowball seed test — validates template I/O behaviour only:
+ • Expected output files are produced on disk
+ • Filenames follow the convention
+ • Output artifacts are non-empty
+
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import matplotlib
+matplotlib.use("Agg") # Headless backend for CI
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.boxplot_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """
+ Minimal synthetic AnnData for boxplot template testing.
+
+ 4 cells, 2 genes, 2 cell types — the smallest dataset that exercises
+ the template's grouping, plotting, and summary-stats code paths.
+ """
+ rng = np.random.default_rng(42)
+
+ # 4 cells × 2 genes — small enough to reason about,
+ # large enough for describe() to return meaningful stats
+ n_cells, n_genes = 4, 2
+ X = rng.integers(1, 10, size=(n_cells, n_genes)).astype(float)
+
+ obs = pd.DataFrame(
+ {"cell_type": ["B cell", "T cell", "B cell", "T cell"]},
+ )
+ var = pd.DataFrame(index=["Gene_0", "Gene_1"])
+
+ return ad.AnnData(X=X, obs=obs, var=var)
+
+
+class TestBoxplotTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the boxplot template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ # Save minimal real data as pickle (simulates upstream analysis)
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ # Write a JSON params file — the actual input the template receives
+ # in production (from Galaxy / Code Ocean)
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Primary_Annotation": "cell_type",
+ "Secondary_Annotation": "None",
+ "Table_to_Visualize": "Original",
+ "Feature_s_to_Plot": ["All"],
+ "Value_Axis_Log_Scale": False,
+ "Figure_Title": "Test BoxPlot",
+ "Horizontal_Plot": False,
+ "Figure_Width": 6,
+ "Figure_Height": 4,
+ "Figure_DPI": 72, # low DPI for fast save
+ "Font_Size": 10,
+ "Keep_Outliers": True,
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "figures": {"type": "directory", "name": "figures"},
+ "dataframe": {"type": "file", "name": "output.csv"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_boxplot_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run boxplot template and verify output
+ artifacts.
+
+ Validates:
+ 1. saved_files is a dict with 'figures' and 'dataframe' keys
+ 2. A figures directory is created containing a non-empty PNG
+ 3. The figure title matches the "Figure_Title" param
+ 4. A summary CSV is created with the exact describe() rows
+ """
+ # -- Act (save_to_disk=True): write outputs to disk ------------
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ show_plot=False, # no GUI in CI
+ output_dir=self.tmp_dir.name,
+ )
+
+ # -- Act (save_to_disk=False): get figure + df in memory -------
+ fig, summary_df_mem = run_from_json(
+ self.json_file,
+ save_to_disk=False,
+ show_plot=False,
+ )
+
+ # -- Assert: return type ---------------------------------------
+ self.assertIsInstance(
+ saved_files, dict,
+ f"Expected dict from run_from_json, got {type(saved_files)}"
+ )
+
+ # -- Assert: figures directory contains at least one PNG -------
+ self.assertIn("figures", saved_files,
+ "Missing 'figures' key in saved_files")
+ figure_paths = saved_files["figures"]
+ self.assertGreaterEqual(
+ len(figure_paths), 1, "No figure files were saved"
+ )
+
+ for fig_path in figure_paths:
+ fig_file = Path(fig_path)
+ self.assertTrue(
+ fig_file.exists(), f"Figure not found: {fig_path}"
+ )
+ self.assertGreater(
+ fig_file.stat().st_size, 0,
+ f"Figure file is empty: {fig_path}"
+ )
+ # Template saves matplotlib figures as .png
+ self.assertEqual(
+ fig_file.suffix, ".png",
+ f"Expected .png extension, got {fig_file.suffix}"
+ )
+
+ # -- Assert: figure has the correct title ----------------------
+ # The template calls ax.set_title(figure_title), so the axes
+ # title must match the "Figure_Title" parameter we passed in.
+ axes_title = fig.axes[0].get_title()
+ self.assertEqual(
+ axes_title, "Test BoxPlot",
+ f"Expected figure title 'Test BoxPlot', got '{axes_title}'"
+ )
+
+ # -- Assert: summary CSV exists and is non-empty ---------------
+ self.assertIn("dataframe", saved_files,
+ "Missing 'dataframe' key in saved_files")
+ csv_path = Path(saved_files["dataframe"])
+ self.assertTrue(
+ csv_path.exists(), f"Summary CSV not found: {csv_path}"
+ )
+ self.assertGreater(
+ csv_path.stat().st_size, 0,
+ f"Summary CSV is empty: {csv_path}"
+ )
+
+ # -- Assert: CSV has the exact describe() stat rows ------------
+ # The template calls df.describe().reset_index() which produces
+ # exactly these 8 rows in this order.
+ summary_df = pd.read_csv(csv_path)
+ expected_stats = [
+ "count", "mean", "std", "min",
+ "25%", "50%", "75%", "max",
+ ]
+
+ # First column after reset_index() is called "index"
+ actual_stats = summary_df["index"].tolist()
+ self.assertEqual(
+ actual_stats, expected_stats,
+ f"Summary CSV stat rows don't match.\n"
+ f" Expected: {expected_stats}\n"
+ f" Actual: {actual_stats}"
+ )
+
+
+if __name__ == "__main__":
+ unittest.main()
\ No newline at end of file
diff --git a/tests/templates/test_calculate_centroid_template.py b/tests/templates/test_calculate_centroid_template.py
new file mode 100644
index 00000000..7093034e
--- /dev/null
+++ b/tests/templates/test_calculate_centroid_template.py
@@ -0,0 +1,126 @@
+# tests/templates/test_calculate_centroid_template.py
+"""
+Real (non-mocked) unit test for the Calculate Centroid template.
+
+Validates template I/O behaviour only:
+ - Expected output files are produced on disk
+ - Filenames follow the convention
+ - Output artifacts are non-empty
+
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.calculate_centroid_template import run_from_json
+
+
+def _make_tiny_dataframe() -> pd.DataFrame:
+ """
+ Minimal synthetic DataFrame with bounding-box coordinate columns.
+
+ 4 rows -- enough to exercise the centroid calculation.
+ """
+ return pd.DataFrame({
+ "XMin": [0.0, 10.0, 20.0, 30.0],
+ "XMax": [10.0, 20.0, 30.0, 40.0],
+ "YMin": [0.0, 5.0, 10.0, 15.0],
+ "YMax": [4.0, 9.0, 14.0, 19.0],
+ "cell_type": ["A", "B", "A", "B"],
+ })
+
+
+class TestCalculateCentroidTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the calculate centroid template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.csv")
+
+ _make_tiny_dataframe().to_csv(self.in_file, index=False)
+
+ params = {
+ "Upstream_Dataset": self.in_file,
+ "Min_X_Coordinate_Column_Name": "XMin",
+ "Max_X_Coordinate_Column_Name": "XMax",
+ "Min_Y_Coordinate_Column_Name": "YMin",
+ "Max_Y_Coordinate_Column_Name": "YMax",
+ "X_Centroid_Name": "XCentroid",
+ "Y_Centroid_Name": "YCentroid",
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "dataframe": {"type": "file", "name": "dataframe.csv"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_calculate_centroid_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run calculate centroid template and verify
+ output artifacts.
+
+ Validates:
+ 1. saved_files is a dict with 'dataframe' key
+ 2. Output CSV exists and is non-empty
+ 3. Centroid columns are present and correctly computed
+ """
+ # -- Act (save_to_disk=True) -----------------------------------
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ # -- Assert: return type ---------------------------------------
+ self.assertIsInstance(saved_files, dict)
+
+ # -- Assert: CSV file exists and is non-empty ------------------
+ self.assertIn("dataframe", saved_files)
+ csv_path = Path(saved_files["dataframe"])
+ self.assertTrue(csv_path.exists(), f"CSV not found: {csv_path}")
+ self.assertGreater(csv_path.stat().st_size, 0)
+
+ # -- Assert: centroid columns are present and correct ----------
+ result_df = pd.read_csv(csv_path)
+ self.assertIn("XCentroid", result_df.columns)
+ self.assertIn("YCentroid", result_df.columns)
+
+ # XCentroid = (XMin + XMax) / 2
+ expected_x = [5.0, 15.0, 25.0, 35.0]
+ self.assertEqual(result_df["XCentroid"].tolist(), expected_x)
+
+ # YCentroid = (YMin + YMax) / 2
+ expected_y = [2.0, 7.0, 12.0, 17.0]
+ self.assertEqual(result_df["YCentroid"].tolist(), expected_y)
+
+ # -- Act (save_to_disk=False) ----------------------------------
+ mem_df = run_from_json(
+ self.json_file,
+ save_to_disk=False,
+ )
+
+ # -- Assert: in-memory return is DataFrame ---------------------
+ self.assertIsInstance(mem_df, pd.DataFrame)
+ self.assertIn("XCentroid", mem_df.columns)
+ self.assertIn("YCentroid", mem_df.columns)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_combine_annotations_template.py b/tests/templates/test_combine_annotations_template.py
new file mode 100644
index 00000000..a8f91a23
--- /dev/null
+++ b/tests/templates/test_combine_annotations_template.py
@@ -0,0 +1,110 @@
+# tests/templates/test_combine_annotations_template.py
+"""
+Real (non-mocked) unit test for the Combine Annotations template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.combine_annotations_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData: 4 cells with two annotation columns to combine."""
+ rng = np.random.default_rng(42)
+ X = rng.random((4, 2))
+ obs = pd.DataFrame({
+ "tissue": ["lung", "liver", "lung", "liver"],
+ "cell_type": ["B cell", "T cell", "T cell", "B cell"],
+ })
+ var = pd.DataFrame(index=["Gene_0", "Gene_1"])
+ return ad.AnnData(X=X, obs=obs, var=var)
+
+
+class TestCombineAnnotationsTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the combine annotations template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Annotations_Names": ["tissue", "cell_type"],
+ "Separator": "_",
+ "New_Annotation_Name": "combined",
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "dataframe": {"type": "file", "name": "dataframe.csv"},
+ "analysis": {"type": "file", "name": "output.pickle"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_combine_annotations_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run combine annotations and verify outputs.
+
+ Validates:
+ 1. saved_files dict has 'analysis' and 'dataframe' keys
+ 2. Pickle contains AnnData with 'combined' obs column
+ 3. CSV exists and is non-empty
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("analysis", saved_files)
+ self.assertIn("dataframe", saved_files)
+
+ # -- Pickle output --
+ pkl_path = Path(saved_files["analysis"])
+ self.assertTrue(pkl_path.exists())
+ self.assertGreater(pkl_path.stat().st_size, 0)
+
+ with open(pkl_path, "rb") as f:
+ result_adata = pickle.load(f)
+ self.assertIsInstance(result_adata, ad.AnnData)
+ self.assertIn("combined", result_adata.obs.columns)
+
+ # -- CSV output --
+ csv_path = Path(saved_files["dataframe"])
+ self.assertTrue(csv_path.exists())
+ self.assertGreater(csv_path.stat().st_size, 0)
+
+ # -- In-memory --
+ mem_adata = run_from_json(self.json_file, save_to_disk=False)
+ self.assertIsInstance(mem_adata, ad.AnnData)
+ self.assertIn("combined", mem_adata.obs.columns)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_combine_dataframes_template.py b/tests/templates/test_combine_dataframes_template.py
new file mode 100644
index 00000000..6942eb78
--- /dev/null
+++ b/tests/templates/test_combine_dataframes_template.py
@@ -0,0 +1,114 @@
+# tests/templates/test_combine_dataframes_template.py
+"""
+Real (non-mocked) unit test for the Combine DataFrames template.
+
+Validates template I/O behaviour only:
+ - Expected output files are produced on disk
+ - Filenames follow the convention
+ - Output artifacts are non-empty
+
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.combine_dataframes_template import run_from_json
+
+
+def _make_tiny_dataframes():
+ """Two minimal DataFrames with the same schema for concatenation."""
+ df_a = pd.DataFrame({
+ "cell_type": ["B cell", "T cell"],
+ "marker": [1.0, 2.0],
+ })
+ df_b = pd.DataFrame({
+ "cell_type": ["NK cell", "Monocyte"],
+ "marker": [3.0, 4.0],
+ })
+ return df_a, df_b
+
+
+class TestCombineDataFramesTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the combine dataframes template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+
+ df_a, df_b = _make_tiny_dataframes()
+ self.file_a = os.path.join(self.tmp_dir.name, "first.csv")
+ self.file_b = os.path.join(self.tmp_dir.name, "second.csv")
+ df_a.to_csv(self.file_a, index=False)
+ df_b.to_csv(self.file_b, index=False)
+
+ params = {
+ "First_Dataframe": self.file_a,
+ "Second_Dataframe": self.file_b,
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "dataframe": {"type": "file", "name": "dataframe.csv"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_combine_dataframes_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run combine dataframes template and verify
+ output artifacts.
+
+ Validates:
+ 1. saved_files is a dict with 'dataframe' key
+ 2. Output CSV exists and is non-empty
+ 3. Combined DataFrame has all rows from both inputs
+ """
+ # -- Act (save_to_disk=True) -----------------------------------
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ # -- Assert: return type ---------------------------------------
+ self.assertIsInstance(saved_files, dict)
+
+ # -- Assert: CSV file exists and is non-empty ------------------
+ self.assertIn("dataframe", saved_files)
+ csv_path = Path(saved_files["dataframe"])
+ self.assertTrue(csv_path.exists(), f"CSV not found: {csv_path}")
+ self.assertGreater(csv_path.stat().st_size, 0)
+
+ # -- Assert: combined row count --------------------------------
+ result_df = pd.read_csv(csv_path)
+ self.assertEqual(len(result_df), 4)
+ expected_types = {"B cell", "T cell", "NK cell", "Monocyte"}
+ self.assertEqual(set(result_df["cell_type"]), expected_types)
+
+ # -- Act (save_to_disk=False) ----------------------------------
+ mem_df = run_from_json(
+ self.json_file,
+ save_to_disk=False,
+ )
+
+ # -- Assert: in-memory return is DataFrame ---------------------
+ self.assertIsInstance(mem_df, pd.DataFrame)
+ self.assertEqual(len(mem_df), 4)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_downsample_cells_template.py b/tests/templates/test_downsample_cells_template.py
new file mode 100644
index 00000000..7f76d5b3
--- /dev/null
+++ b/tests/templates/test_downsample_cells_template.py
@@ -0,0 +1,116 @@
+# tests/templates/test_downsample_cells_template.py
+"""
+Real (non-mocked) unit test for the Downsample Cells template.
+
+Validates template I/O behaviour only:
+ - Expected output files are produced on disk
+ - Filenames follow the convention
+ - Output artifacts are non-empty
+
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.downsample_cells_template import run_from_json
+
+
+def _make_tiny_dataframe() -> pd.DataFrame:
+ """
+ Minimal synthetic DataFrame for downsampling.
+
+ 8 rows, 2 groups of 4 -- enough to exercise group-based downsampling.
+ """
+ return pd.DataFrame({
+ "cell_type": ["A", "A", "A", "A", "B", "B", "B", "B"],
+ "marker": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0],
+ })
+
+
+class TestDownsampleCellsTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the downsample cells template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.csv")
+
+ _make_tiny_dataframe().to_csv(self.in_file, index=False)
+
+ params = {
+ "Upstream_Dataset": self.in_file,
+ "Annotations_List": ["cell_type"],
+ "Number_of_Samples": 2,
+ "Stratify_Option": False,
+ "Random_Selection": False,
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "dataframe": {"type": "file", "name": "dataframe.csv"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_downsample_cells_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run downsample cells template and verify
+ output artifacts.
+
+ Validates:
+ 1. saved_files is a dict with 'dataframe' key
+ 2. Output CSV exists and is non-empty
+ 3. Row count is reduced (2 per group = 4 total from 8)
+ """
+ # -- Act (save_to_disk=True) -----------------------------------
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ # -- Assert: return type ---------------------------------------
+ self.assertIsInstance(saved_files, dict)
+
+ # -- Assert: CSV file exists and is non-empty ------------------
+ self.assertIn("dataframe", saved_files)
+ csv_path = Path(saved_files["dataframe"])
+ self.assertTrue(csv_path.exists(), f"CSV not found: {csv_path}")
+ self.assertGreater(csv_path.stat().st_size, 0)
+
+ # -- Assert: downsampled row count -----------------------------
+ result_df = pd.read_csv(csv_path)
+ # 2 samples per group * 2 groups = 4 rows
+ self.assertEqual(len(result_df), 4)
+ # Both groups should still be present
+ self.assertEqual(
+ set(result_df["cell_type"].unique()), {"A", "B"}
+ )
+
+ # -- Act (save_to_disk=False) ----------------------------------
+ mem_df = run_from_json(
+ self.json_file,
+ save_to_disk=False,
+ )
+
+ # -- Assert: in-memory return is DataFrame ---------------------
+ self.assertIsInstance(mem_df, pd.DataFrame)
+ self.assertEqual(len(mem_df), 4)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_hierarchical_heatmap_template.py b/tests/templates/test_hierarchical_heatmap_template.py
new file mode 100644
index 00000000..1964b9a0
--- /dev/null
+++ b/tests/templates/test_hierarchical_heatmap_template.py
@@ -0,0 +1,106 @@
+# tests/templates/test_hierarchical_heatmap_template.py
+"""
+Real (non-mocked) unit test for the Hierarchical Heatmap template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import matplotlib
+matplotlib.use("Agg")
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.hierarchical_heatmap_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData: 8 cells, 3 genes, 2 groups for heatmap."""
+ rng = np.random.default_rng(42)
+ X = rng.integers(1, 20, size=(8, 3)).astype(float)
+ obs = pd.DataFrame({
+ "cell_type": ["A", "A", "B", "B", "A", "A", "B", "B"],
+ })
+ var = pd.DataFrame(index=["Gene_0", "Gene_1", "Gene_2"])
+ return ad.AnnData(X=X, obs=obs, var=var)
+
+
+class TestHierarchicalHeatmapTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the hierarchical heatmap template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Annotation": "cell_type",
+ "Table_to_Visualize": "Original",
+ "Features_to_Visualize": ["All"],
+ "Standard_Scale": "None",
+ "Method": "average",
+ "Metric": "euclidean",
+ "Figure_Width": 6,
+ "Figure_Height": 4,
+ "Figure_DPI": 72,
+ "Font_Size": 8,
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "figures": {"type": "directory", "name": "figures_dir"},
+ "dataframe": {"type": "file", "name": "dataframe.csv"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_hierarchical_heatmap_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run hierarchical heatmap and verify outputs.
+
+ Validates:
+ 1. saved_files dict has 'figures' and 'dataframe' keys
+ 2. Figures directory contains non-empty PNG(s)
+ 3. Summary CSV exists
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_results_flag=True,
+ show_plot=False,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("figures", saved_files)
+
+ figure_paths = saved_files["figures"]
+ self.assertGreaterEqual(len(figure_paths), 1)
+ for fig_path in figure_paths:
+ fig_file = Path(fig_path)
+ self.assertTrue(fig_file.exists())
+ self.assertGreater(fig_file.stat().st_size, 0)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_histogram_template.py b/tests/templates/test_histogram_template.py
new file mode 100644
index 00000000..5a8e49e8
--- /dev/null
+++ b/tests/templates/test_histogram_template.py
@@ -0,0 +1,111 @@
+# tests/templates/test_histogram_template.py
+"""
+Real (non-mocked) unit test for the Histogram template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import matplotlib
+matplotlib.use("Agg")
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.histogram_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData: 4 cells, 2 genes for histogram plotting."""
+ rng = np.random.default_rng(42)
+ X = rng.integers(1, 10, size=(4, 2)).astype(float)
+ obs = pd.DataFrame({"cell_type": ["A", "B", "A", "B"]})
+ var = pd.DataFrame(index=["Gene_0", "Gene_1"])
+ return ad.AnnData(X=X, obs=obs, var=var)
+
+
+class TestHistogramTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the histogram template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Annotation": "cell_type",
+ "Table_to_Visualize": "Original",
+ "Feature_s_to_Plot": ["All"],
+ "Figure_Title": "Test Histogram",
+ "Legend_Title": "Cell Type",
+ "Figure_Width": 6,
+ "Figure_Height": 4,
+ "Figure_DPI": 72,
+ "Font_Size": 10,
+ "Number_of_Bins": 20,
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "dataframe": {"type": "file", "name": "dataframe.csv"},
+ "figures": {"type": "directory", "name": "figures_dir"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_histogram_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run histogram and verify outputs.
+
+ Validates:
+ 1. saved_files dict has 'figures' and 'dataframe' keys
+ 2. Figures directory contains non-empty PNG(s)
+ 3. Summary CSV exists and is non-empty
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ show_plot=False,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("figures", saved_files)
+ self.assertIn("dataframe", saved_files)
+
+ # Figures
+ figure_paths = saved_files["figures"]
+ self.assertGreaterEqual(len(figure_paths), 1)
+ for fig_path in figure_paths:
+ fig_file = Path(fig_path)
+ self.assertTrue(fig_file.exists())
+ self.assertGreater(fig_file.stat().st_size, 0)
+
+ # CSV
+ csv_path = Path(saved_files["dataframe"])
+ self.assertTrue(csv_path.exists())
+ self.assertGreater(csv_path.stat().st_size, 0)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_interactive_spatial_plot_template.py b/tests/templates/test_interactive_spatial_plot_template.py
new file mode 100644
index 00000000..ecb9e8f4
--- /dev/null
+++ b/tests/templates/test_interactive_spatial_plot_template.py
@@ -0,0 +1,97 @@
+# tests/templates/test_interactive_spatial_plot_template.py
+"""
+Real (non-mocked) unit test for the Interactive Spatial Plot template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.interactive_spatial_plot_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData: 8 cells with spatial coords."""
+ rng = np.random.default_rng(42)
+ X = rng.random((8, 2))
+ obs = pd.DataFrame({
+ "cell_type": ["A", "B", "A", "B", "A", "B", "A", "B"],
+ })
+ var = pd.DataFrame(index=["Gene_0", "Gene_1"])
+ spatial = rng.random((8, 2)) * 100
+ adata = ad.AnnData(X=X, obs=obs, var=var)
+ adata.obsm["spatial"] = spatial
+ return adata
+
+
+class TestInteractiveSpatialPlotTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the interactive spatial plot template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Color_By": "Annotation",
+ "Annotation_s_to_Highlight": ["cell_type"],
+ "Feature_to_Highlight": "None",
+ "Dot_Size": 5,
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "html": {"type": "directory", "name": "html_dir"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_interactive_spatial_plot_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run interactive spatial plot and verify outputs.
+
+ Validates:
+ 1. saved_files dict has 'html' key
+ 2. HTML directory contains non-empty file(s)
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("html", saved_files)
+
+ html_paths = saved_files["html"]
+ self.assertGreaterEqual(len(html_paths), 1)
+ for html_path in html_paths:
+ html_file = Path(html_path)
+ self.assertTrue(html_file.exists())
+ self.assertGreater(html_file.stat().st_size, 0)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_load_csv_files_with_config.py b/tests/templates/test_load_csv_files_with_config.py
new file mode 100644
index 00000000..38a8fa1e
--- /dev/null
+++ b/tests/templates/test_load_csv_files_with_config.py
@@ -0,0 +1,104 @@
+# tests/templates/test_load_csv_files_with_config.py
+"""
+Real (non-mocked) unit test for the Load CSV Files template.
+
+Snowball test -- validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.load_csv_files_template import run_from_json
+
+
+class TestLoadCSVFilesWithConfig(unittest.TestCase):
+ """Real (non-mocked) tests for the load CSV files template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+
+ # Create CSV data directory
+ csv_dir = os.path.join(self.tmp_dir.name, "csv_data")
+ os.makedirs(csv_dir)
+
+ df1 = pd.DataFrame({
+ "Feature_A": [1.0, 2.0],
+ "Feature_B": [3.0, 4.0],
+ "ID": ["cell_1", "cell_2"],
+ })
+ df2 = pd.DataFrame({
+ "Feature_A": [5.0, 6.0],
+ "Feature_B": [7.0, 8.0],
+ "ID": ["cell_3", "cell_4"],
+ })
+
+ df1.to_csv(os.path.join(csv_dir, "data1.csv"), index=False)
+ df2.to_csv(os.path.join(csv_dir, "data2.csv"), index=False)
+
+ # Configuration CSV with file_name column + metadata
+ config_df = pd.DataFrame({
+ "file_name": ["data1.csv", "data2.csv"],
+ "experiment": ["Exp1", "Exp2"],
+ })
+ config_file = os.path.join(self.tmp_dir.name, "config.csv")
+ config_df.to_csv(config_file, index=False)
+
+ params = {
+ "CSV_Files": csv_dir,
+ "CSV_Files_Configuration": config_file,
+ "String_Columns": ["ID"],
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "dataframe": {"type": "file", "name": "dataframe.csv"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_load_csv_files_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: load CSV files with config and verify.
+
+ Validates:
+ 1. saved_files dict has 'dataframe' key
+ 2. CSV exists and is non-empty
+ 3. Combined data has rows from both input files
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("dataframe", saved_files)
+
+ csv_path = Path(saved_files["dataframe"])
+ self.assertTrue(csv_path.exists())
+ self.assertGreater(csv_path.stat().st_size, 0)
+
+ result_df = pd.read_csv(csv_path)
+ self.assertEqual(len(result_df), 4)
+
+ mem_df = run_from_json(self.json_file, save_to_disk=False)
+ self.assertIsInstance(mem_df, pd.DataFrame)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_manual_phenotyping_template.py b/tests/templates/test_manual_phenotyping_template.py
new file mode 100644
index 00000000..c3a9227b
--- /dev/null
+++ b/tests/templates/test_manual_phenotyping_template.py
@@ -0,0 +1,132 @@
+#!/usr/bin/env python3
+# tests/templates/test_manual_phenotyping_template.py
+"""
+Real (non-mocked) unit test for the Manual Phenotyping template.
+
+Validates template I/O behaviour only:
+ - Expected output files are produced on disk
+ - Filenames follow the convention
+ - Output artifacts are non-empty
+
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.manual_phenotyping_template import run_from_json
+
+
+def _make_tiny_dataframe() -> pd.DataFrame:
+ """
+ Minimal synthetic DataFrame with binary phenotype marker columns.
+
+ 4 rows -- each row has one positive marker matching a phenotype rule.
+ """
+ return pd.DataFrame({
+ "cd4": [1, 0, 0, 1],
+ "cd8": [0, 1, 0, 0],
+ "cd20": [0, 0, 1, 0],
+ "marker_intensity": [1.5, 2.5, 3.5, 4.5],
+ })
+
+
+def _make_phenotype_rules() -> pd.DataFrame:
+ """
+ Phenotype rule table: maps binary codes to phenotype names.
+
+ Each row uses a '+' or '-' code referencing column names.
+ """
+ return pd.DataFrame({
+ "phenotype_name": ["T_helper", "Cytotoxic_T", "B_cell"],
+ "phenotype_code": ["cd4+cd8-", "cd4-cd8+", "cd20+"],
+ })
+
+
+class TestManualPhenotypingTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the manual phenotyping template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.csv")
+ self.rules_file = os.path.join(self.tmp_dir.name, "phenotypes.csv")
+
+ _make_tiny_dataframe().to_csv(self.in_file, index=False)
+ _make_phenotype_rules().to_csv(self.rules_file, index=False)
+
+ params = {
+ "Upstream_Dataset": self.in_file,
+ "Phenotypes_Code": self.rules_file,
+ "Classification_Column_Prefix": "",
+ "Classification_Column_Suffix": "",
+ "Allow_Multiple_Phenotypes": True,
+ "Manual_Annotation_Name": "manual_phenotype",
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "dataframe": {"type": "file", "name": "dataframe.csv"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_manual_phenotyping_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run manual phenotyping template and verify
+ output artifacts.
+
+ Validates:
+ 1. saved_files is a dict with 'dataframe' key
+ 2. Output CSV exists and is non-empty
+ 3. Phenotype annotation column is present in output
+ """
+ # -- Act (save_to_disk=True) -----------------------------------
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ # -- Assert: return type ---------------------------------------
+ self.assertIsInstance(saved_files, dict)
+
+ # -- Assert: CSV file exists and is non-empty ------------------
+ self.assertIn("dataframe", saved_files)
+ csv_path = Path(saved_files["dataframe"])
+ self.assertTrue(csv_path.exists(), f"CSV not found: {csv_path}")
+ self.assertGreater(csv_path.stat().st_size, 0)
+
+ # -- Assert: phenotype column present --------------------------
+ result_df = pd.read_csv(csv_path)
+ self.assertIn("manual_phenotype", result_df.columns)
+ # At least some rows should have assigned phenotypes
+ non_null = result_df["manual_phenotype"].dropna()
+ self.assertGreater(len(non_null), 0)
+
+ # -- Act (save_to_disk=False) ----------------------------------
+ mem_df = run_from_json(
+ self.json_file,
+ save_to_disk=False,
+ )
+
+ # -- Assert: in-memory return is DataFrame ---------------------
+ self.assertIsInstance(mem_df, pd.DataFrame)
+ self.assertIn("manual_phenotype", mem_df.columns)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_nearest_neighbor_calculation_template.py b/tests/templates/test_nearest_neighbor_calculation_template.py
new file mode 100644
index 00000000..cc888a57
--- /dev/null
+++ b/tests/templates/test_nearest_neighbor_calculation_template.py
@@ -0,0 +1,99 @@
+# tests/templates/test_nearest_neighbor_calculation_template.py
+"""
+Real (non-mocked) unit test for the Nearest Neighbor Calculation template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.nearest_neighbor_calculation_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData: 8 cells with spatial coords and annotation."""
+ rng = np.random.default_rng(42)
+ X = rng.random((8, 2))
+ obs = pd.DataFrame({
+ "cell_type": ["A", "B", "A", "B", "A", "B", "A", "B"],
+ })
+ var = pd.DataFrame(index=["Gene_0", "Gene_1"])
+ spatial = rng.random((8, 2)) * 100
+ adata = ad.AnnData(X=X, obs=obs, var=var)
+ adata.obsm["spatial"] = spatial
+ return adata
+
+
+class TestNearestNeighborCalculationTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for nearest neighbor calculation."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Annotation": "cell_type",
+ "ImageID": "None",
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "analysis": {"type": "file", "name": "output.pickle"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_nearest_neighbor_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: calculate nearest neighbors and verify outputs.
+
+ Validates:
+ 1. saved_files dict has 'analysis' key
+ 2. Pickle contains AnnData with nearest neighbor results
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("analysis", saved_files)
+
+ pkl_path = Path(saved_files["analysis"])
+ self.assertTrue(pkl_path.exists())
+ self.assertGreater(pkl_path.stat().st_size, 0)
+
+ with open(pkl_path, "rb") as f:
+ result_adata = pickle.load(f)
+ self.assertIsInstance(result_adata, ad.AnnData)
+
+ mem_adata = run_from_json(self.json_file, save_to_disk=False)
+ self.assertIsInstance(mem_adata, ad.AnnData)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_neighborhood_profile_template.py b/tests/templates/test_neighborhood_profile_template.py
new file mode 100644
index 00000000..36a1fad9
--- /dev/null
+++ b/tests/templates/test_neighborhood_profile_template.py
@@ -0,0 +1,97 @@
+# tests/templates/test_neighborhood_profile_template.py
+"""
+Real (non-mocked) unit test for the Neighborhood Profile template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.neighborhood_profile_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData: 20 cells with spatial coords and annotation."""
+ rng = np.random.default_rng(42)
+ X = rng.random((20, 2))
+ obs = pd.DataFrame({
+ "cell_type": (["A"] * 10) + (["B"] * 10),
+ })
+ var = pd.DataFrame(index=["Gene_0", "Gene_1"])
+ spatial = rng.random((20, 2)) * 100
+ adata = ad.AnnData(X=X, obs=obs, var=var)
+ adata.obsm["spatial"] = spatial
+ return adata
+
+
+class TestNeighborhoodProfileTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the neighborhood profile template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Annotation_of_interest": "cell_type",
+ "Bins": [10, 25, 50],
+ "Anchor_Neighbor_List": ["A;B"],
+ "Stratify_By": "None",
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "dataframe": {"type": "directory", "name": "dataframe_dir"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_neighborhood_profile_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: compute neighborhood profiles and verify.
+
+ Validates:
+ 1. saved_files dict has 'dataframe' key
+ 2. Output directory contains CSV file(s)
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("dataframe", saved_files)
+
+ csv_paths = saved_files["dataframe"]
+ self.assertGreaterEqual(len(csv_paths), 1)
+ for csv_path in csv_paths:
+ csv_file = Path(csv_path)
+ self.assertTrue(csv_file.exists())
+ self.assertGreater(csv_file.stat().st_size, 0)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_normalize_batch_template.py b/tests/templates/test_normalize_batch_template.py
new file mode 100644
index 00000000..f6d87e61
--- /dev/null
+++ b/tests/templates/test_normalize_batch_template.py
@@ -0,0 +1,97 @@
+# tests/templates/test_normalize_batch_template.py
+"""
+Real (non-mocked) unit test for the Normalize Batch template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.normalize_batch_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData: 6 cells, 2 genes, 2 batches for batch normalization."""
+ rng = np.random.default_rng(42)
+ X = rng.integers(1, 50, size=(6, 2)).astype(float)
+ obs = pd.DataFrame({
+ "batch": ["A", "A", "A", "B", "B", "B"],
+ "cell_type": ["T", "B", "T", "B", "T", "B"],
+ })
+ var = pd.DataFrame(index=["Gene_0", "Gene_1"])
+ return ad.AnnData(X=X, obs=obs, var=var)
+
+
+class TestNormalizeBatchTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the normalize batch template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Annotation": "batch",
+ "Need_Normalization": True,
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "analysis": {"type": "file", "name": "output.pickle"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_normalize_batch_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run normalize batch and verify outputs.
+
+ Validates:
+ 1. saved_files is a dict with 'analysis' key
+ 2. Output pickle exists, is non-empty, and contains AnnData
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("analysis", saved_files)
+
+ pkl_path = Path(saved_files["analysis"])
+ self.assertTrue(pkl_path.exists())
+ self.assertGreater(pkl_path.stat().st_size, 0)
+
+ with open(pkl_path, "rb") as f:
+ result_adata = pickle.load(f)
+ self.assertIsInstance(result_adata, ad.AnnData)
+
+ mem_adata = run_from_json(self.json_file, save_to_disk=False)
+ self.assertIsInstance(mem_adata, ad.AnnData)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_phenograph_clustering_template.py b/tests/templates/test_phenograph_clustering_template.py
new file mode 100644
index 00000000..d87dc3fe
--- /dev/null
+++ b/tests/templates/test_phenograph_clustering_template.py
@@ -0,0 +1,103 @@
+# tests/templates/test_phenograph_clustering_template.py
+"""
+Real (non-mocked) unit test for the Phenograph Clustering template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.phenograph_clustering_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData: 50 cells, 5 genes for Phenograph clustering."""
+ rng = np.random.default_rng(42)
+ # Two distinct clusters
+ X_a = rng.normal(0, 1, size=(25, 5))
+ X_b = rng.normal(5, 1, size=(25, 5))
+ X = np.vstack([X_a, X_b])
+ obs = pd.DataFrame({"cell_type": ["A"] * 25 + ["B"] * 25})
+ var = pd.DataFrame(index=[f"Gene_{i}" for i in range(5)])
+ return ad.AnnData(X=X, obs=obs, var=var)
+
+
+class TestPhenographClusteringTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the phenograph clustering template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Table_to_Process": "Original",
+ "K_Nearest_Neighbors": 10,
+ "Seed": 42,
+ "Resolution_Parameter": 1.0,
+ "Output_Annotation_Name": "phenograph",
+ "Number_of_Iterations": 10,
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "analysis": {"type": "file", "name": "output.pickle"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_phenograph_clustering_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run phenograph clustering and verify outputs.
+
+ Validates:
+ 1. saved_files dict has 'analysis' key
+ 2. Pickle contains AnnData with 'phenograph' obs column
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("analysis", saved_files)
+
+ pkl_path = Path(saved_files["analysis"])
+ self.assertTrue(pkl_path.exists())
+ self.assertGreater(pkl_path.stat().st_size, 0)
+
+ with open(pkl_path, "rb") as f:
+ result_adata = pickle.load(f)
+ self.assertIsInstance(result_adata, ad.AnnData)
+ self.assertIn("phenograph", result_adata.obs.columns)
+
+ mem_adata = run_from_json(self.json_file, save_to_disk=False)
+ self.assertIsInstance(mem_adata, ad.AnnData)
+ self.assertIn("phenograph", mem_adata.obs.columns)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_posit_it_python_template.py b/tests/templates/test_posit_it_python_template.py
new file mode 100644
index 00000000..fdfd64f6
--- /dev/null
+++ b/tests/templates/test_posit_it_python_template.py
@@ -0,0 +1,131 @@
+# tests/templates/test_posit_it_python_template.py
+"""
+Real (non-mocked) unit test for the Posit-It Python template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import matplotlib
+matplotlib.use("Agg")
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.posit_it_python_template import run_from_json
+
+
+class TestPostItPythonTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the posit-it python template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+
+ params = {
+ "Label": "Test Note",
+ "Label_font_color": "Black",
+ "Label_font_size": "40",
+ "Label_font_type": "normal",
+ "Label_font_family": "Arial",
+ "Label_Bold": "False",
+ "Background_fill_color": "Yellow1",
+ "Background_fill_opacity": "10",
+ "Page_width": "6",
+ "Page_height": "2",
+ "Page_DPI": "72",
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "figures": {"type": "directory", "name": "figures_dir"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_posit_it_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run posit-it template and verify outputs.
+
+ Validates:
+ 1. save_to_disk=True returns a dict with 'figures' key
+ 2. Figures directory contains a non-empty PNG
+ 3. save_to_disk=False returns a matplotlib Figure with correct text
+ """
+ # -- Act (save_to_disk=True): write outputs to disk ------------
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ show_plot=False,
+ output_dir=self.tmp_dir.name,
+ )
+
+ # -- Act (save_to_disk=False): get figure in memory ------------
+ fig = run_from_json(
+ self.json_file,
+ save_to_disk=False,
+ show_plot=False,
+ )
+
+ # -- Assert: return type ---------------------------------------
+ self.assertIsInstance(
+ saved_files, dict,
+ f"Expected dict from run_from_json, got {type(saved_files)}"
+ )
+
+ # -- Assert: figures directory contains at least one PNG -------
+ self.assertIn("figures", saved_files,
+ "Missing 'figures' key in saved_files")
+ figure_paths = saved_files["figures"]
+ self.assertGreaterEqual(
+ len(figure_paths), 1, "No figure files were saved"
+ )
+
+ for fig_path in figure_paths:
+ fig_file = Path(fig_path)
+ self.assertTrue(
+ fig_file.exists(), f"Figure not found: {fig_path}"
+ )
+ self.assertGreater(
+ fig_file.stat().st_size, 0,
+ f"Figure file is empty: {fig_path}"
+ )
+ self.assertEqual(
+ fig_file.suffix, ".png",
+ f"Expected .png extension, got {fig_file.suffix}"
+ )
+
+ # -- Assert: in-memory figure is valid -------------------------
+ import matplotlib.figure
+ self.assertIsInstance(
+ fig, matplotlib.figure.Figure,
+ f"Expected matplotlib Figure, got {type(fig)}"
+ )
+
+ # The figure text at (0.5, 0.5) should contain "Test Note"
+ text_artists = fig.texts
+ self.assertGreaterEqual(
+ len(text_artists), 1,
+ "Figure has no text artists"
+ )
+ # First text artist is the label placed by fig.text(0.5, 0.5, ...)
+ self.assertEqual(
+ text_artists[0].get_text(), "Test Note",
+ f"Expected figure text 'Test Note', "
+ f"got '{text_artists[0].get_text()}'"
+ )
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_quantile_scaling_template.py b/tests/templates/test_quantile_scaling_template.py
new file mode 100644
index 00000000..13c93046
--- /dev/null
+++ b/tests/templates/test_quantile_scaling_template.py
@@ -0,0 +1,100 @@
+# tests/templates/test_quantile_scaling_template.py
+"""
+Real (non-mocked) unit test for the Quantile Scaling template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.quantile_scaling_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData: 4 cells, 2 genes for quantile scaling."""
+ rng = np.random.default_rng(42)
+ X = rng.integers(1, 100, size=(4, 2)).astype(float)
+ obs = pd.DataFrame({"cell_type": ["A", "B", "A", "B"]})
+ var = pd.DataFrame(index=["Gene_0", "Gene_1"])
+ return ad.AnnData(X=X, obs=obs, var=var)
+
+
+class TestQuantileScalingTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the quantile scaling template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Table_to_Normalize": "Original",
+ "Lower_Quantile": "0.01",
+ "Upper_Quantile": "0.99",
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "analysis": {"type": "file", "name": "output.pickle"},
+ "html": {"type": "directory", "name": "html_dir"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_quantile_scaling_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run quantile scaling and verify outputs.
+
+ Validates:
+ 1. saved_files dict has 'analysis' key
+ 2. Pickle exists, is non-empty, contains AnnData with normalized layer
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("analysis", saved_files)
+
+ pkl_path = Path(saved_files["analysis"])
+ self.assertTrue(pkl_path.exists())
+ self.assertGreater(pkl_path.stat().st_size, 0)
+
+ with open(pkl_path, "rb") as f:
+ result_adata = pickle.load(f)
+ self.assertIsInstance(result_adata, ad.AnnData)
+ # quantile scaling creates a layer named "quantile__"
+ layer_names = list(result_adata.layers.keys())
+ self.assertGreater(len(layer_names), 0)
+
+ mem_result = run_from_json(self.json_file, save_to_disk=False)
+ # save_to_disk=False returns (adata, fig) tuple
+ self.assertIsNotNone(mem_result)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_relational_heatmap_template.py b/tests/templates/test_relational_heatmap_template.py
new file mode 100644
index 00000000..8c2db32a
--- /dev/null
+++ b/tests/templates/test_relational_heatmap_template.py
@@ -0,0 +1,139 @@
+# tests/templates/test_relational_heatmap_template.py
+"""
+Real (non-mocked) unit test for the Relational Heatmap template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import matplotlib
+matplotlib.use("Agg")
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.relational_heatmap_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData: 8 cells, 3 genes, 2 groups for heatmap."""
+ rng = np.random.default_rng(42)
+ X = rng.integers(1, 20, size=(8, 3)).astype(float)
+ obs = pd.DataFrame({
+ "cell_type": ["A", "A", "B", "B", "A", "A", "B", "B"],
+ })
+ var = pd.DataFrame(index=["Gene_0", "Gene_1", "Gene_2"])
+ return ad.AnnData(X=X, obs=obs, var=var)
+
+
+class TestRelationalHeatmapTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the relational heatmap template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Source_Annotation_Name": "cell_type",
+ "Target_Annotation_Name": "cell_type",
+ "Figure_Width_inch": 6,
+ "Figure_Height_inch": 4,
+ "Figure_DPI": 72,
+ "Font_Size": 8,
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "figures": {"type": "directory", "name": "figures_dir"},
+ "html": {"type": "directory", "name": "html_dir"},
+ "dataframe": {"type": "file", "name": "dataframe.csv"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_relational_heatmap_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run relational heatmap with show_static_image=False
+ (default).
+
+ Validates:
+ 1. saved_files dict has 'html' key (interactive HTML is default output)
+ 2. HTML file exists and is non-empty
+ 3. No 'figures' key when show_static_image=False
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("html", saved_files)
+
+ html_paths = saved_files["html"]
+ self.assertGreaterEqual(len(html_paths), 1)
+ for html_path in html_paths:
+ html_file = Path(html_path)
+ self.assertTrue(html_file.exists())
+ self.assertGreater(html_file.stat().st_size, 0)
+
+ # When show_static_image defaults to False, no figures produced
+ self.assertNotIn("figures", saved_files)
+
+ def test_relational_heatmap_with_static_image(self) -> None:
+ """
+ End-to-end I/O test: run relational heatmap with show_static_image=True.
+
+ Validates:
+ 1. saved_files dict has both 'figures' and 'html' keys
+ 2. Figure PNG and HTML files exist and are non-empty
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ show_static_image=True,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("figures", saved_files)
+ self.assertIn("html", saved_files)
+
+ figure_paths = saved_files["figures"]
+ self.assertGreaterEqual(len(figure_paths), 1)
+ for fig_path in figure_paths:
+ fig_file = Path(fig_path)
+ self.assertTrue(fig_file.exists())
+ self.assertGreater(fig_file.stat().st_size, 0)
+
+ html_paths = saved_files["html"]
+ self.assertGreaterEqual(len(html_paths), 1)
+ for html_path in html_paths:
+ html_file = Path(html_path)
+ self.assertTrue(html_file.exists())
+ self.assertGreater(html_file.stat().st_size, 0)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_rename_labels_template.py b/tests/templates/test_rename_labels_template.py
new file mode 100644
index 00000000..41842124
--- /dev/null
+++ b/tests/templates/test_rename_labels_template.py
@@ -0,0 +1,109 @@
+# tests/templates/test_rename_labels_template.py
+"""
+Real (non-mocked) unit test for the Rename Labels template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.rename_labels_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData: 4 cells with cell_type annotation to rename."""
+ rng = np.random.default_rng(42)
+ X = rng.random((4, 2))
+ obs = pd.DataFrame({"cell_type": ["A", "B", "A", "B"]})
+ var = pd.DataFrame(index=["Gene_0", "Gene_1"])
+ return ad.AnnData(X=X, obs=obs, var=var)
+
+
+class TestRenameLabelsTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the rename labels template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ # Create mapping CSV: old_label -> new_label
+ mapping_df = pd.DataFrame({
+ "Original": ["A", "B"],
+ "New": ["Alpha", "Beta"],
+ })
+ self.mapping_file = os.path.join(self.tmp_dir.name, "mapping.csv")
+ mapping_df.to_csv(self.mapping_file, index=False)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Source_Annotation": "cell_type",
+ "Cluster_Mapping_Dictionary": self.mapping_file,
+ "New_Annotation": "cell_type_renamed",
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "analysis": {"type": "file", "name": "output.pickle"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_rename_labels_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run rename labels and verify outputs.
+
+ Validates:
+ 1. saved_files dict has 'analysis' key
+ 2. Pickle exists, is non-empty, contains AnnData
+ 3. Renamed annotation column is present with new values
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("analysis", saved_files)
+
+ pkl_path = Path(saved_files["analysis"])
+ self.assertTrue(pkl_path.exists())
+ self.assertGreater(pkl_path.stat().st_size, 0)
+
+ with open(pkl_path, "rb") as f:
+ result_adata = pickle.load(f)
+ self.assertIsInstance(result_adata, ad.AnnData)
+ self.assertIn("cell_type_renamed", result_adata.obs.columns)
+ self.assertEqual(
+ set(result_adata.obs["cell_type_renamed"].unique()),
+ {"Alpha", "Beta"},
+ )
+
+ mem_adata = run_from_json(self.json_file, save_to_disk=False)
+ self.assertIsInstance(mem_adata, ad.AnnData)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_ripley_l_template.py b/tests/templates/test_ripley_l_template.py
new file mode 100644
index 00000000..1bc926ac
--- /dev/null
+++ b/tests/templates/test_ripley_l_template.py
@@ -0,0 +1,108 @@
+# tests/templates/test_ripley_l_template.py
+"""
+Real (non-mocked) unit test for the Ripley L Calculation template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.ripley_l_calculation_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData: 20 cells with spatial coords for Ripley L."""
+ rng = np.random.default_rng(42)
+ X = rng.random((20, 2))
+ obs = pd.DataFrame({
+ "cell_type": (["A"] * 10) + (["B"] * 10),
+ })
+ var = pd.DataFrame(index=["Gene_0", "Gene_1"])
+ spatial = rng.random((20, 2)) * 100
+ adata = ad.AnnData(X=X, obs=obs, var=var)
+ adata.obsm["spatial"] = spatial
+ return adata
+
+
+class TestRipleyLTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the Ripley L calculation template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Radii": [5, 10, 20],
+ "Annotation": "cell_type",
+ "Center_Phenotype": "A",
+ "Neighbor_Phenotype": "B",
+ "Stratify_By": "None",
+ "Number_of_Simulations": 5,
+ "Seed": 42,
+ "Spatial_Key": "spatial",
+ "Edge_Correction": True,
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "analysis": {"type": "file", "name": "output.pickle"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_ripley_l_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run Ripley L calculation and verify outputs.
+
+ Validates:
+ 1. saved_files dict has 'analysis' key
+ 2. Pickle contains AnnData with Ripley results in .uns
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("analysis", saved_files)
+
+ pkl_path = Path(saved_files["analysis"])
+ self.assertTrue(pkl_path.exists())
+ self.assertGreater(pkl_path.stat().st_size, 0)
+
+ with open(pkl_path, "rb") as f:
+ result_adata = pickle.load(f)
+ self.assertIsInstance(result_adata, ad.AnnData)
+ # Ripley results stored in .uns
+ self.assertGreater(len(result_adata.uns), 0)
+
+ mem_adata = run_from_json(self.json_file, save_to_disk=False)
+ self.assertIsInstance(mem_adata, ad.AnnData)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_sankey_plot_template.py b/tests/templates/test_sankey_plot_template.py
new file mode 100644
index 00000000..dc73a2c0
--- /dev/null
+++ b/tests/templates/test_sankey_plot_template.py
@@ -0,0 +1,132 @@
+# tests/templates/test_sankey_plot_template.py
+"""
+Real (non-mocked) unit test for the Sankey Plot template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import matplotlib
+matplotlib.use("Agg")
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.sankey_plot_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData: 8 cells with two annotation columns for Sankey."""
+ rng = np.random.default_rng(42)
+ X = rng.random((8, 2))
+ obs = pd.DataFrame({
+ "cell_type": ["A", "A", "B", "B", "A", "A", "B", "B"],
+ "cluster": ["1", "2", "1", "2", "1", "2", "1", "2"],
+ })
+ var = pd.DataFrame(index=["Gene_0", "Gene_1"])
+ return ad.AnnData(X=X, obs=obs, var=var)
+
+
+class TestSankeyPlotTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the sankey plot template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Source_Annotation_Name": "cell_type",
+ "Target_Annotation_Name": "cluster",
+ "Figure_Width_inch": 6,
+ "Figure_Height_inch": 6,
+ "Font_Size": 10,
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "figures": {"type": "directory", "name": "figures_dir"},
+ "html": {"type": "directory", "name": "html_dir"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_sankey_plot_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run sankey plot with show_static_image=False
+ (default).
+
+ Validates:
+ 1. saved_files dict has 'html' key (interactive HTML is default)
+ 2. HTML output files exist and are non-empty
+ 3. No 'figures' key when show_static_image=False
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("html", saved_files)
+
+ html_paths = saved_files["html"]
+ self.assertGreaterEqual(len(html_paths), 1)
+ for p in html_paths:
+ pf = Path(p)
+ self.assertTrue(pf.exists())
+ self.assertGreater(pf.stat().st_size, 0)
+
+ # When show_static_image defaults to False, no figures produced
+ self.assertNotIn("figures", saved_files)
+
+ def test_sankey_plot_with_static_image(self) -> None:
+ """
+ End-to-end I/O test: run sankey plot with show_static_image=True.
+
+ Validates:
+ 1. saved_files dict has both 'figures' and 'html' keys
+ 2. Figure PNG and HTML files exist and are non-empty
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ show_static_image=True,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("figures", saved_files)
+ self.assertIn("html", saved_files)
+
+ for key in ["html", "figures"]:
+ paths = saved_files[key]
+ self.assertGreaterEqual(len(paths), 1)
+ for p in paths:
+ pf = Path(p)
+ self.assertTrue(pf.exists())
+ self.assertGreater(pf.stat().st_size, 0)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_select_values_template.py b/tests/templates/test_select_values_template.py
new file mode 100644
index 00000000..abfd4c8d
--- /dev/null
+++ b/tests/templates/test_select_values_template.py
@@ -0,0 +1,112 @@
+# tests/templates/test_select_values_template.py
+"""
+Real (non-mocked) unit test for the Select Values template.
+
+Validates template I/O behaviour only:
+ - Expected output files are produced on disk
+ - Filenames follow the convention
+ - Output artifacts are non-empty
+
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.select_values_template import run_from_json
+
+
+def _make_tiny_dataframe() -> pd.DataFrame:
+ """
+ Minimal synthetic DataFrame for value filtering.
+
+ 6 rows, 3 cell types -- enough to test include-based selection.
+ """
+ return pd.DataFrame({
+ "cell_type": ["A", "B", "C", "A", "B", "C"],
+ "marker": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0],
+ })
+
+
+class TestSelectValuesTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the select values template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.csv")
+
+ _make_tiny_dataframe().to_csv(self.in_file, index=False)
+
+ params = {
+ "Upstream_Dataset": self.in_file,
+ "Annotation_of_Interest": "cell_type",
+ "Label_s_of_Interest": ["A", "B"],
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "dataframe": {"type": "file", "name": "dataframe.csv"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_select_values_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run select values template and verify
+ output artifacts.
+
+ Validates:
+ 1. saved_files is a dict with 'dataframe' key
+ 2. Output CSV exists and is non-empty
+ 3. Only selected values (A, B) remain in the output
+ """
+ # -- Act (save_to_disk=True) -----------------------------------
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ # -- Assert: return type ---------------------------------------
+ self.assertIsInstance(saved_files, dict)
+
+ # -- Assert: CSV file exists and is non-empty ------------------
+ self.assertIn("dataframe", saved_files)
+ csv_path = Path(saved_files["dataframe"])
+ self.assertTrue(csv_path.exists(), f"CSV not found: {csv_path}")
+ self.assertGreater(csv_path.stat().st_size, 0)
+
+ # -- Assert: only selected values remain -----------------------
+ result_df = pd.read_csv(csv_path)
+ self.assertEqual(len(result_df), 4)
+ self.assertEqual(
+ set(result_df["cell_type"].unique()), {"A", "B"}
+ )
+
+ # -- Act (save_to_disk=False) ----------------------------------
+ mem_df = run_from_json(
+ self.json_file,
+ save_to_disk=False,
+ )
+
+ # -- Assert: in-memory return is DataFrame ---------------------
+ self.assertIsInstance(mem_df, pd.DataFrame)
+ self.assertEqual(len(mem_df), 4)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_setup_analysis_template.py b/tests/templates/test_setup_analysis_template.py
new file mode 100644
index 00000000..79fe9fa3
--- /dev/null
+++ b/tests/templates/test_setup_analysis_template.py
@@ -0,0 +1,106 @@
+# tests/templates/test_setup_analysis_template.py
+"""
+Real (non-mocked) unit test for the Setup Analysis template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.setup_analysis_template import run_from_json
+
+
+def _make_tiny_dataframe() -> pd.DataFrame:
+ """
+ Minimal synthetic DataFrame simulating raw cell data.
+
+ 4 cells with spatial coordinates, features, and an annotation column.
+ """
+ return pd.DataFrame({
+ "Gene_0": [1.0, 2.0, 3.0, 4.0],
+ "Gene_1": [5.0, 6.0, 7.0, 8.0],
+ "X_coord": [10.0, 20.0, 30.0, 40.0],
+ "Y_coord": [11.0, 21.0, 31.0, 41.0],
+ "cell_type": ["A", "B", "A", "B"],
+ })
+
+
+class TestSetupAnalysisTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the setup analysis template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.csv")
+
+ _make_tiny_dataframe().to_csv(self.in_file, index=False)
+
+ params = {
+ "Upstream_Dataset": self.in_file,
+ "Features_to_Analyze": ["Gene_0", "Gene_1"],
+ "Annotation_s_": ["cell_type"],
+ "X_Coordinate_Column": "X_coord",
+ "Y_Coordinate_Column": "Y_coord",
+ "Output_File": "output.pickle",
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "analysis": {"type": "file", "name": "output.pickle"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_setup_analysis_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run setup analysis and verify outputs.
+
+ Validates:
+ 1. saved_files dict has 'analysis' key
+ 2. Pickle exists, is non-empty, contains AnnData
+ 3. AnnData has correct features, obs, and spatial coords
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("analysis", saved_files)
+
+ pkl_path = Path(saved_files["analysis"])
+ self.assertTrue(pkl_path.exists())
+ self.assertGreater(pkl_path.stat().st_size, 0)
+
+ with open(pkl_path, "rb") as f:
+ result_adata = pickle.load(f)
+ self.assertIsInstance(result_adata, ad.AnnData)
+ self.assertEqual(result_adata.n_obs, 4)
+ self.assertIn("cell_type", result_adata.obs.columns)
+ self.assertIn("spatial", result_adata.obsm)
+
+ mem_adata = run_from_json(self.json_file, save_to_disk=False)
+ self.assertIsInstance(mem_adata, ad.AnnData)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_spatial_interaction_template.py b/tests/templates/test_spatial_interaction_template.py
new file mode 100644
index 00000000..e531f8c9
--- /dev/null
+++ b/tests/templates/test_spatial_interaction_template.py
@@ -0,0 +1,112 @@
+# tests/templates/test_spatial_interaction_template.py
+"""
+Real (non-mocked) unit test for the Spatial Interaction template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import matplotlib
+matplotlib.use("Agg")
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.spatial_interaction_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData: 20 cells with spatial coords for interaction."""
+ rng = np.random.default_rng(42)
+ X = rng.random((20, 2))
+ obs = pd.DataFrame({
+ "cell_type": (["A"] * 10) + (["B"] * 10),
+ })
+ var = pd.DataFrame(index=["Gene_0", "Gene_1"])
+ spatial = rng.random((20, 2)) * 100
+ adata = ad.AnnData(X=X, obs=obs, var=var)
+ adata.obsm["spatial"] = spatial
+ return adata
+
+
+class TestSpatialInteractionTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the spatial interaction template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Annotation": "cell_type",
+ "Spatial_Analysis_Method": "Neighborhood Enrichment",
+ "Stratify_By": ["None"],
+ "K_Nearest_Neighbors": 6,
+ "Seed": 42,
+ "Coordinate_Type": "None",
+ "Radius": "None",
+ "Figure_Width": 6,
+ "Figure_Height": 4,
+ "Figure_DPI": 72,
+ "Font_Size": 10,
+ "Color_Bar_Range": "Automatic",
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "figures": {"type": "directory", "name": "figures"},
+ "dataframes": {"type": "directory", "name": "matrices"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_spatial_interaction_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run spatial interaction and verify outputs.
+
+ Validates:
+ 1. saved_files dict has 'figures' and/or 'dataframes' keys
+ 2. Output files exist and are non-empty
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ show_plot=False,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertGreater(len(saved_files), 0)
+
+ for key in ["figures", "dataframes"]:
+ if key in saved_files:
+ paths = saved_files[key]
+ self.assertGreaterEqual(len(paths), 1)
+ for p in paths:
+ pf = Path(p)
+ self.assertTrue(pf.exists())
+ self.assertGreater(pf.stat().st_size, 0)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_spatial_plot_template.py b/tests/templates/test_spatial_plot_template.py
new file mode 100644
index 00000000..2373f894
--- /dev/null
+++ b/tests/templates/test_spatial_plot_template.py
@@ -0,0 +1,107 @@
+# tests/templates/test_spatial_plot_template.py
+"""
+Real (non-mocked) unit test for the Spatial Plot template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import matplotlib
+matplotlib.use("Agg")
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.spatial_plot_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData: 8 cells with spatial coords for plotting."""
+ rng = np.random.default_rng(42)
+ X = rng.random((8, 2))
+ obs = pd.DataFrame({
+ "cell_type": ["A", "B", "A", "B", "A", "B", "A", "B"],
+ })
+ var = pd.DataFrame(index=["Gene_0", "Gene_1"])
+ spatial = rng.random((8, 2)) * 100
+ adata = ad.AnnData(X=X, obs=obs, var=var)
+ adata.obsm["spatial"] = spatial
+ return adata
+
+
+class TestSpatialPlotTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the spatial plot template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Color_By": "Annotation",
+ "Annotation_to_Highlight": "cell_type",
+ "Feature_to_Highlight": "None",
+ "Stratify": False,
+ "Stratify_By": [],
+ "Figure_Width": 6,
+ "Figure_Height": 4,
+ "Figure_DPI": 72,
+ "Font_Size": 10,
+ "Dot_Size": 50,
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "figures": {"type": "directory", "name": "figures_dir"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_spatial_plot_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run spatial plot and verify outputs.
+
+ Validates:
+ 1. saved_files dict has 'figures' key
+ 2. Figures directory contains non-empty PNG(s)
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ show_plots=False,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("figures", saved_files)
+
+ figure_paths = saved_files["figures"]
+ self.assertGreaterEqual(len(figure_paths), 1)
+ for fig_path in figure_paths:
+ fig_file = Path(fig_path)
+ self.assertTrue(fig_file.exists())
+ self.assertGreater(fig_file.stat().st_size, 0)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_subset_analysis_template.py b/tests/templates/test_subset_analysis_template.py
new file mode 100644
index 00000000..6d601c4a
--- /dev/null
+++ b/tests/templates/test_subset_analysis_template.py
@@ -0,0 +1,103 @@
+# tests/templates/test_subset_analysis_template.py
+"""
+Real (non-mocked) unit test for the Subset Analysis template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.subset_analysis_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData: 6 cells, 3 cell types for subset filtering."""
+ rng = np.random.default_rng(42)
+ X = rng.random((6, 2))
+ obs = pd.DataFrame({
+ "cell_type": ["A", "B", "C", "A", "B", "C"],
+ })
+ var = pd.DataFrame(index=["Gene_0", "Gene_1"])
+ return ad.AnnData(X=X, obs=obs, var=var)
+
+
+class TestSubsetAnalysisTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the subset analysis template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Annotation_of_interest": "cell_type",
+ "Labels": ["A", "B"],
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "analysis": {"type": "file", "name": "transform_output.pickle"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_subset_analysis_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run subset analysis and verify outputs.
+
+ Validates:
+ 1. saved_files dict has 'analysis' key
+ 2. Pickle exists, is non-empty, contains AnnData
+ 3. Subset has fewer cells than original (only A and B)
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("analysis", saved_files)
+
+ pkl_path = Path(saved_files["analysis"])
+ self.assertTrue(pkl_path.exists())
+ self.assertGreater(pkl_path.stat().st_size, 0)
+
+ with open(pkl_path, "rb") as f:
+ result_adata = pickle.load(f)
+ self.assertIsInstance(result_adata, ad.AnnData)
+ # 6 original cells, selecting A and B = 4 cells
+ self.assertEqual(result_adata.n_obs, 4)
+ self.assertEqual(
+ set(result_adata.obs["cell_type"].unique()), {"A", "B"}
+ )
+
+ mem_adata = run_from_json(self.json_file, save_to_disk=False)
+ self.assertIsInstance(mem_adata, ad.AnnData)
+ self.assertEqual(mem_adata.n_obs, 4)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_summarize_annotation_statistics_template.py b/tests/templates/test_summarize_annotation_statistics_template.py
new file mode 100644
index 00000000..7454a2ff
--- /dev/null
+++ b/tests/templates/test_summarize_annotation_statistics_template.py
@@ -0,0 +1,97 @@
+# tests/templates/test_summarize_annotation_statistics_template.py
+"""
+Real (non-mocked) unit test for the Summarize Annotation Statistics template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.summarize_annotation_statistics_template import (
+ run_from_json,
+)
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData: 6 cells with cell_type annotation for statistics."""
+ rng = np.random.default_rng(42)
+ X = rng.random((6, 2))
+ obs = pd.DataFrame({
+ "cell_type": ["A", "A", "B", "B", "B", "C"],
+ })
+ var = pd.DataFrame(index=["Gene_0", "Gene_1"])
+ return ad.AnnData(X=X, obs=obs, var=var)
+
+
+class TestSummarizeAnnotationStatisticsTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for summarize annotation statistics."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Annotation": "cell_type",
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "dataframe": {"type": "file", "name": "dataframe.csv"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_summarize_annotation_stats_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: summarize annotation stats and verify outputs.
+
+ Validates:
+ 1. saved_files dict has 'dataframe' key
+ 2. CSV exists and is non-empty
+ 3. Summary includes count/percentage information
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("dataframe", saved_files)
+
+ csv_path = Path(saved_files["dataframe"])
+ self.assertTrue(csv_path.exists())
+ self.assertGreater(csv_path.stat().st_size, 0)
+
+ result_df = pd.read_csv(csv_path)
+ self.assertGreater(len(result_df), 0)
+
+ mem_df = run_from_json(self.json_file, save_to_disk=False)
+ self.assertIsInstance(mem_df, pd.DataFrame)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_summarize_dataframe_template.py b/tests/templates/test_summarize_dataframe_template.py
new file mode 100644
index 00000000..516a3053
--- /dev/null
+++ b/tests/templates/test_summarize_dataframe_template.py
@@ -0,0 +1,85 @@
+# tests/templates/test_summarize_dataframe_template.py
+"""
+Real (non-mocked) unit test for the Summarize DataFrame template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.summarize_dataframe_template import run_from_json
+
+
+def _make_tiny_dataframe() -> pd.DataFrame:
+ """Minimal synthetic DataFrame for summarization."""
+ return pd.DataFrame({
+ "cell_type": ["A", "B", "A", "B", "C", "C"],
+ "marker_1": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0],
+ "marker_2": [10.0, 20.0, 30.0, 40.0, 50.0, 60.0],
+ })
+
+
+class TestSummarizeDataFrameTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the summarize dataframe template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.csv")
+
+ _make_tiny_dataframe().to_csv(self.in_file, index=False)
+
+ params = {
+ "Upstream_Dataset": self.in_file,
+ "Columns": ["cell_type", "marker_1", "marker_2"],
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "html": {"type": "directory", "name": "html_dir"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_summarize_dataframe_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: summarize dataframe and verify outputs.
+
+ Validates:
+ 1. saved_files dict has 'html' key
+ 2. HTML directory contains non-empty file(s)
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("html", saved_files)
+
+ html_paths = saved_files["html"]
+ self.assertGreaterEqual(len(html_paths), 1)
+ for html_path in html_paths:
+ html_file = Path(html_path)
+ self.assertTrue(html_file.exists())
+ self.assertGreater(html_file.stat().st_size, 0)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_template_utils.py b/tests/templates/test_template_utils.py
new file mode 100644
index 00000000..76cd2a58
--- /dev/null
+++ b/tests/templates/test_template_utils.py
@@ -0,0 +1,989 @@
+# tests/templates/test_template_utils.py
+"""
+Real (non-mocked) unit tests for template utility functions.
+
+Validates utility I/O behaviour only:
+ • Functions produce correct outputs from real inputs
+ • File I/O operations work on real filesystem
+ • Error messages are accurate
+
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+import warnings
+import anndata as ad
+import numpy as np
+import pandas as pd
+from pathlib import Path
+import matplotlib.pyplot as plt
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.template_utils import (
+ load_input,
+ save_results,
+ _save_single_object,
+ text_to_value,
+ convert_pickle_to_h5ad,
+ convert_to_floats,
+ spell_out_special_characters,
+ load_csv_files,
+ parse_params,
+ string_list_to_dictionary,
+ clean_column_name,
+)
+
+
+def create_test_adata(n_cells: int = 10) -> ad.AnnData:
+ """Return a minimal synthetic AnnData for fast tests."""
+ rng = np.random.default_rng(0)
+ obs = pd.DataFrame({
+ "cell_type": ["TypeA", "TypeB"] * (n_cells // 2)
+ })
+ x_mat = rng.normal(size=(n_cells, 2))
+ adata = ad.AnnData(X=x_mat, obs=obs)
+ return adata
+
+
+def create_test_dataframe(n_rows: int = 5) -> pd.DataFrame:
+ """Return a minimal DataFrame for fast tests."""
+ return pd.DataFrame({
+ "col1": range(n_rows),
+ "col2": [f"value_{i}" for i in range(n_rows)]
+ })
+
+
+class TestTemplateUtils(unittest.TestCase):
+ """Unit tests for template utility functions."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.test_adata = create_test_adata()
+ self.test_df = create_test_dataframe()
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_complete_io_workflow(self) -> None:
+ """Single I/O test covering all input/output scenarios."""
+ # Suppress warnings for cleaner test output
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+
+ # Test 1: Load h5ad file
+ h5ad_path = os.path.join(self.tmp_dir.name, "test.h5ad")
+ self.test_adata.write_h5ad(h5ad_path)
+ loaded_h5ad = load_input(h5ad_path)
+ self.assertEqual(loaded_h5ad.n_obs, 10)
+ self.assertIn("cell_type", loaded_h5ad.obs.columns)
+
+ # Test 2: Load pickle file
+ pickle_path = os.path.join(self.tmp_dir.name, "test.pickle")
+ with open(pickle_path, "wb") as f:
+ pickle.dump(self.test_adata, f)
+ loaded_pickle = load_input(pickle_path)
+ self.assertEqual(loaded_pickle.n_obs, 10)
+
+ # Test 3: Load .pkl extension
+ pkl_path = os.path.join(self.tmp_dir.name, "test.pkl")
+ with open(pkl_path, "wb") as f:
+ pickle.dump(self.test_adata, f)
+ loaded_pkl = load_input(pkl_path)
+ self.assertEqual(loaded_pkl.n_obs, 10)
+
+ # Test 4: Load .p extension
+ p_path = os.path.join(self.tmp_dir.name, "test.p")
+ with open(p_path, "wb") as f:
+ pickle.dump(self.test_adata, f)
+ loaded_p = load_input(p_path)
+ self.assertEqual(loaded_p.n_obs, 10)
+
+ # Test 5: Convert pickle to h5ad
+ pickle_src = os.path.join(
+ self.tmp_dir.name, "convert_src.pickle"
+ )
+ with open(pickle_src, "wb") as f:
+ pickle.dump(self.test_adata, f)
+
+ h5ad_dest = convert_pickle_to_h5ad(pickle_src)
+ self.assertTrue(os.path.exists(h5ad_dest))
+ self.assertTrue(h5ad_dest.endswith(".h5ad"))
+
+ # Test with custom output path
+ custom_dest = os.path.join(
+ self.tmp_dir.name, "custom_output.h5ad"
+ )
+ h5ad_custom = convert_pickle_to_h5ad(pickle_src, custom_dest)
+ self.assertEqual(h5ad_custom, custom_dest)
+ self.assertTrue(os.path.exists(custom_dest))
+
+ # Test 7: Load file with no extension (content detection)
+ no_ext_path = os.path.join(self.tmp_dir.name, "noextension")
+ with open(no_ext_path, "wb") as f:
+ pickle.dump(self.test_adata, f)
+ loaded_no_ext = load_input(no_ext_path)
+ self.assertEqual(loaded_no_ext.n_obs, 10)
+
+ def test_text_to_value_conversions(self) -> None:
+ """Test all text_to_value conversion scenarios."""
+ # Test 1: Convert to float
+ result = text_to_value("3.14", to_float=True)
+ self.assertEqual(result, 3.14)
+ self.assertIsInstance(result, float)
+
+ # Test 2: Convert to int
+ result = text_to_value("42", to_int=True)
+ self.assertEqual(result, 42)
+ self.assertIsInstance(result, int)
+
+ # Test 3: None text handling
+ result = text_to_value("None", value_to_convert_to=None)
+ self.assertIsNone(result)
+
+ # Test 4: Empty string handling
+ result = text_to_value("", value_to_convert_to=-1)
+ self.assertEqual(result, -1)
+
+ # Test 5: Case insensitive None
+ result = text_to_value("none", value_to_convert_to=0)
+ self.assertEqual(result, 0)
+
+ # Test 6: Custom none text
+ result = text_to_value(
+ "NA", default_none_text="NA", value_to_convert_to=999
+ )
+ self.assertEqual(result, 999)
+
+ # Test 7: No conversion
+ result = text_to_value("keep_as_string")
+ self.assertEqual(result, "keep_as_string")
+ self.assertIsInstance(result, str)
+
+ # Test 8: Whitespace handling
+ result = text_to_value(" None ", value_to_convert_to=None)
+ self.assertIsNone(result)
+
+ # Test 9: Non-string input
+ result = text_to_value(123, to_float=True)
+ self.assertEqual(result, 123.0)
+ self.assertIsInstance(result, float)
+
+ def test_convert_to_floats(self) -> None:
+ """Test convert_to_floats function."""
+ # Test 1: String list
+ result = convert_to_floats(["1.5", "2.0", "3.14"])
+ self.assertEqual(result, [1.5, 2.0, 3.14])
+ self.assertTrue(all(isinstance(x, float) for x in result))
+
+ # Test 2: Mixed numeric types
+ result = convert_to_floats([1, "2.5", 3.0])
+ self.assertEqual(result, [1.0, 2.5, 3.0])
+
+ # Test 3: Invalid value
+ with self.assertRaises(ValueError) as context:
+ convert_to_floats(["1.0", "invalid", "3.0"])
+ expected_msg = "Failed to convert value: 'invalid' to float"
+ self.assertIn(expected_msg, str(context.exception))
+
+ # Test 4: Empty list
+ result = convert_to_floats([])
+ self.assertEqual(result, [])
+
+ def test_load_input_missing_file_error_message(self) -> None:
+ """Test exact error message for missing input file."""
+ missing_path = "/nonexistent/path/file.h5ad"
+
+ with self.assertRaises(FileNotFoundError) as context:
+ load_input(missing_path)
+
+ expected_msg = f"Input file not found: {missing_path}"
+ actual_msg = str(context.exception)
+ self.assertEqual(expected_msg, actual_msg)
+
+ def test_load_input_unsupported_format_error_message(self) -> None:
+ """Test exact error message for unsupported file format."""
+ # Create a text file with unsupported content
+ txt_path = os.path.join(self.tmp_dir.name, "test.txt")
+ with open(txt_path, "w") as f:
+ f.write("This is not a valid data file")
+
+ with self.assertRaises(ValueError) as context:
+ load_input(txt_path)
+
+ actual_msg = str(context.exception)
+ self.assertTrue(actual_msg.startswith("Unable to load file"))
+ self.assertIn("Supported formats: h5ad, pickle", actual_msg)
+
+ def test_text_to_value_float_conversion_error_message(self) -> None:
+ """Test exact error message for invalid float conversion."""
+ with self.assertRaises(ValueError) as context:
+ text_to_value(
+ "not_a_number", to_float=True, param_name="test_param"
+ )
+
+ expected_msg = (
+ 'Error: can\'t convert test_param to float. '
+ 'Received:"not_a_number"'
+ )
+ actual_msg = str(context.exception)
+ self.assertEqual(expected_msg, actual_msg)
+
+ def test_text_to_value_int_conversion_error_message(self) -> None:
+ """Test exact error message for invalid integer conversion."""
+ with self.assertRaises(ValueError) as context:
+ text_to_value("3.14", to_int=True, param_name="count")
+
+ expected_msg = (
+ 'Error: can\'t convert count to integer. '
+ 'Received:"3.14"'
+ )
+ actual_msg = str(context.exception)
+ self.assertEqual(expected_msg, actual_msg)
+
+ def test_convert_pickle_to_h5ad_missing_file_error_message(self) -> None:
+ """Test exact error message for missing pickle file."""
+ missing_pickle = "/nonexistent/file.pickle"
+
+ with self.assertRaises(FileNotFoundError) as context:
+ convert_pickle_to_h5ad(missing_pickle)
+
+ expected_msg = f"Pickle file not found: {missing_pickle}"
+ actual_msg = str(context.exception)
+ self.assertEqual(expected_msg, actual_msg)
+
+ def test_convert_pickle_to_h5ad_wrong_type_error_message(self) -> None:
+ """Test exact error message when pickle doesn't contain AnnData."""
+ # Create pickle with wrong type
+ wrong_pickle = os.path.join(self.tmp_dir.name, "wrong_type.pickle")
+ with open(wrong_pickle, "wb") as f:
+ pickle.dump({"not": "anndata"}, f)
+
+ with self.assertRaises(TypeError) as context:
+ convert_pickle_to_h5ad(wrong_pickle)
+
+ expected_msg = "Loaded object is not AnnData, got "
+ actual_msg = str(context.exception)
+ self.assertEqual(expected_msg, actual_msg)
+
+ def test_spell_out_special_characters(self) -> None:
+ """Test spell_out_special_characters function."""
+ from spac.templates.template_utils import spell_out_special_characters
+
+ # Test space replacement
+ result = spell_out_special_characters("Cell Type")
+ self.assertEqual(result, "Cell_Type")
+
+ # Test special units
+ result = spell_out_special_characters("Area µm²")
+ self.assertEqual(result, "Area_um2")
+
+ # Test hyphen between letters
+ result = spell_out_special_characters("CD4-positive")
+ self.assertEqual(result, "CD4_positive")
+
+ # Test plus/minus
+ result = spell_out_special_characters("CD4+")
+ self.assertEqual(result, "CD4_pos") # Trailing underscore is stripped
+ result = spell_out_special_characters("CD8-")
+ self.assertEqual(result, "CD8_neg") # Trailing underscore is stripped
+
+ # Test combination markers
+ result = spell_out_special_characters("CD4+CD20-")
+ self.assertEqual(result, "CD4_pos_CD20_neg")
+
+ # Test edge cases with special separators
+ result = spell_out_special_characters("CD4+/CD20-")
+ self.assertEqual(result, "CD4_pos_slashCD20_neg")
+
+ result = spell_out_special_characters("CD4+ CD20-")
+ self.assertEqual(result, "CD4_pos_CD20_neg")
+
+ result = spell_out_special_characters("CD4+,CD20-")
+ self.assertEqual(result, "CD4_pos_CD20_neg")
+
+ # Test parentheses removal
+ result = spell_out_special_characters("CD4+ (bright)")
+ self.assertEqual(result, "CD4_pos_bright")
+
+ # Test special characters
+ result = spell_out_special_characters("Cell@100%")
+ self.assertEqual(result, "Cellat100percent")
+
+ # Test multiple underscores
+ result = spell_out_special_characters("Cell___Type")
+ self.assertEqual(result, "Cell_Type")
+
+ # Test leading/trailing underscores
+ result = spell_out_special_characters("_Cell_Type_")
+ self.assertEqual(result, "Cell_Type")
+
+ # Test complex case
+ result = spell_out_special_characters("CD4+ T-cells (µm²)")
+ self.assertEqual(result, "CD4_pos_T_cells_um2")
+
+ # Test empty string
+ result = spell_out_special_characters("")
+ self.assertEqual(result, "")
+
+ # Additional edge cases
+ result = spell_out_special_characters("CD3+CD4+CD8-")
+ self.assertEqual(result, "CD3_pos_CD4_pos_CD8_neg")
+
+ result = spell_out_special_characters("PD-1/PD-L1")
+ self.assertEqual(result, "PD_1slashPD_L1")
+
+ result = spell_out_special_characters("CD45RA+CD45RO-")
+ self.assertEqual(result, "CD45RA_pos_CD45RO_neg")
+
+ result = spell_out_special_characters("CD4+CD25+FOXP3+")
+ self.assertEqual(result, "CD4_pos_CD25_pos_FOXP3_pos")
+
+ # Test with multiple special characters
+ result = spell_out_special_characters("CD4+ & CD8+ (double positive)")
+ self.assertEqual(result, "CD4_pos_and_CD8_pos_double_positive")
+
+ # Test with numbers at start (should add col_ prefix in
+ # clean_column_name)
+ result = spell_out_special_characters("123ABC")
+ # Note: col_ prefix is added by clean_column_name
+ self.assertEqual(result, "123ABC")
+
+ def test_load_csv_files(self) -> None:
+ """Test load_csv_files function."""
+
+ # Create test CSV files
+ csv_dir = Path(self.tmp_dir.name) / "csv_data"
+ csv_dir.mkdir()
+
+ # CSV 1: Normal data
+ csv1 = pd.DataFrame({
+ 'ID': ['001', '002', '003'],
+ 'Value': [1.5, 2.5, 3.5],
+ 'Type': ['A', 'B', 'A']
+ })
+ csv1.to_csv(csv_dir / 'data1.csv', index=False)
+
+ # CSV 2: Special characters in columns
+ csv2 = pd.DataFrame({
+ 'ID': ['004', '005'],
+ 'Value': [4.5, 5.5],
+ 'Type': ['B', 'C'],
+ 'Area µm²': [100, 200]
+ })
+ csv2.to_csv(csv_dir / 'data2.csv', index=False)
+
+ # Test 1: Basic loading with metadata
+ config = pd.DataFrame({
+ 'file_name': ['data1.csv', 'data2.csv'],
+ 'experiment': ['Exp1', 'Exp2'],
+ 'batch': [1, 2]
+ })
+
+ result = load_csv_files(csv_dir, config)
+
+ # Verify basic structure
+ self.assertEqual(len(result), 5) # 3 + 2 rows
+ self.assertIn('file_name', result.columns)
+ self.assertIn('experiment', result.columns)
+ self.assertIn('batch', result.columns)
+ self.assertIn('ID', result.columns)
+ self.assertIn('Area_um2', result.columns) # Cleaned name
+
+ # Verify metadata mapping
+ exp1_rows = result[result['file_name'] == 'data1.csv']
+ self.assertTrue(all(exp1_rows['experiment'] == 'Exp1'))
+ self.assertTrue(all(exp1_rows['batch'] == 1))
+
+ # Test 2: String columns preservation
+ result_str = load_csv_files(
+ csv_dir, config, string_columns=['ID']
+ )
+ self.assertEqual(result_str['ID'].dtype, 'object')
+ self.assertTrue(all(isinstance(x, str) for x in result_str['ID']))
+
+ # Test 3: Empty string_columns list
+ result_empty = load_csv_files(csv_dir, config, string_columns=[])
+ self.assertIsInstance(result_empty, pd.DataFrame)
+
+ # Test 4: Column name with spaces in config
+ config_spaces = pd.DataFrame({
+ 'file_name': ['data1.csv'],
+ 'Sample Type': ['Control'] # Space in column name
+ })
+ with self.assertRaises(ValueError):
+ # Should fail validation due to string_columns not being list
+ load_csv_files(csv_dir, config_spaces, string_columns="ID")
+
+ # Test 5: Missing file in config
+ config_missing = pd.DataFrame({
+ 'file_name': ['missing.csv'],
+ 'experiment': ['Exp3']
+ })
+ with self.assertRaises(FileNotFoundError) as context:
+ load_csv_files(csv_dir, config_missing)
+ self.assertIn("not found", str(context.exception))
+
+ # Test 6: Empty CSV file
+ empty_csv = csv_dir / 'empty.csv'
+ empty_csv.write_text('')
+ config_empty = pd.DataFrame({
+ 'file_name': ['empty.csv'],
+ 'experiment': ['Exp4']
+ })
+ with self.assertRaises(ValueError) as context:
+ load_csv_files(csv_dir, config_empty)
+ self.assertIn("empty", str(context.exception))
+
+ # Test 7: Non-existent string_columns are silently ignored
+ config_single = pd.DataFrame({
+ 'file_name': ['data1.csv']
+ })
+ result_nonexist = load_csv_files(
+ csv_dir, config_single,
+ string_columns=['NonExistentColumn']
+ )
+ self.assertIsInstance(result_nonexist, pd.DataFrame)
+
+ def test_load_csv_files_special_character_column_cleaning(self) -> None:
+ """Test that load_csv_files cleans special character column names."""
+ # Setup test data with special character columns
+ csv_dir = Path(self.tmp_dir.name) / "csv_test"
+ csv_dir.mkdir()
+
+ csv_data = pd.DataFrame({
+ 'ID': [1, 2],
+ 'CD4+': ['pos', 'neg'], # Special character
+ 'Area µm²': [100.0, 200.0],
+ })
+ csv_data.to_csv(csv_dir / 'test.csv', index=False)
+
+ config = pd.DataFrame({
+ 'file_name': ['test.csv'],
+ 'group': ['A']
+ })
+
+ result = load_csv_files(csv_dir, config)
+
+ # Assert: special character columns cleaned
+ self.assertIn('CD4_pos', result.columns)
+ self.assertIn('Area_um2', result.columns)
+ self.assertNotIn('CD4+', result.columns)
+ self.assertNotIn('Area µm²', result.columns)
+
+ # Assert: data integrity preserved
+ self.assertEqual(len(result), 2)
+ self.assertEqual(result['group'].unique().tolist(), ['A'])
+
+ def test_save_results_single_csv_file(self) -> None:
+ """Test saving DataFrame as single CSV file using save_results."""
+ # Setup
+ df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
+
+ params = {
+ "outputs": {
+ "dataframe": {"type": "file", "name": "data.csv"}
+ }
+ }
+
+ results = {
+ "dataframe": df
+ }
+
+ # Execute
+ saved = save_results(results, params, self.tmp_dir.name)
+
+ # Verify
+ csv_path = Path(self.tmp_dir.name) / "data.csv"
+ self.assertTrue(csv_path.exists())
+ self.assertTrue(csv_path.is_file())
+
+ # Check content
+ loaded_df = pd.read_csv(csv_path)
+ pd.testing.assert_frame_equal(loaded_df, df)
+
+ def test_save_results_multiple_csvs_directory(self) -> None:
+ """Test saving multiple DataFrames in directory using save_results."""
+ # Setup
+ df1 = pd.DataFrame({'X': [1, 2]})
+ df2 = pd.DataFrame({'Y': [3, 4]})
+
+ params = {
+ "outputs": {
+ "dataframe": {"type": "directory", "name": "dataframe_dir"}
+ }
+ }
+
+ results = {
+ "dataframe": {
+ "first": df1,
+ "second": df2
+ }
+ }
+
+ # Execute
+ saved = save_results(results, params, self.tmp_dir.name)
+
+ # Verify
+ dir_path = Path(self.tmp_dir.name) / "dataframe_dir"
+ self.assertTrue(dir_path.exists())
+ self.assertTrue(dir_path.is_dir())
+ self.assertTrue((dir_path / "first.csv").exists())
+ self.assertTrue((dir_path / "second.csv").exists())
+
+ def test_save_results_figures_directory(self) -> None:
+ """Test saving multiple figures in directory using save_results."""
+ # Suppress matplotlib warnings
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+
+ # Setup
+ fig1, ax1 = plt.subplots()
+ ax1.plot([1, 2, 3])
+
+ fig2, ax2 = plt.subplots()
+ ax2.bar(['A', 'B'], [5, 10])
+
+ params = {
+ "outputs": {
+ "figures": {"type": "directory", "name": "plots"}
+ }
+ }
+
+ results = {
+ "figures": {
+ "line_plot": fig1,
+ "bar_plot": fig2
+ }
+ }
+
+ # Execute
+ saved = save_results(results, params, self.tmp_dir.name)
+
+ # Verify
+ plots_dir = Path(self.tmp_dir.name) / "plots"
+ self.assertTrue(plots_dir.exists())
+ self.assertTrue(plots_dir.is_dir())
+ self.assertTrue((plots_dir / "line_plot.png").exists())
+ self.assertTrue((plots_dir / "bar_plot.png").exists())
+
+ # Clean up
+ plt.close('all')
+
+ def test_save_results_analysis_pickle_file(self) -> None:
+ """Test saving analysis object as pickle file using save_results."""
+ # Setup
+ analysis = {
+ "method": "test_analysis",
+ "results": [1, 2, 3, 4, 5],
+ "params": {"alpha": 0.05}
+ }
+
+ params = {
+ "outputs": {
+ "analysis": {"type": "file", "name": "results.pickle"}
+ }
+ }
+
+ results = {
+ "analysis": analysis
+ }
+
+ # Execute
+ saved = save_results(results, params, self.tmp_dir.name)
+
+ # Verify
+ pickle_path = Path(self.tmp_dir.name) / "results.pickle"
+ self.assertTrue(pickle_path.exists())
+ self.assertTrue(pickle_path.is_file())
+
+ # Check content
+ with open(pickle_path, 'rb') as f:
+ loaded = pickle.load(f)
+ self.assertEqual(loaded, analysis)
+
+ def test_save_results_html_directory(self) -> None:
+ """Test saving HTML reports in directory using save_results."""
+ # Setup
+ html1 = "Report 1
"
+ html2 = "Report 2
"
+
+ params = {
+ "outputs": {
+ "html": {"type": "directory", "name": "reports"}
+ }
+ }
+
+ results = {
+ "html": {
+ "main": html1,
+ "summary": html2
+ }
+ }
+
+ # Execute
+ saved = save_results(results, params, self.tmp_dir.name)
+
+ # Verify
+ reports_dir = Path(self.tmp_dir.name) / "reports"
+ self.assertTrue(reports_dir.exists())
+ self.assertTrue(reports_dir.is_dir())
+ self.assertTrue((reports_dir / "main.html").exists())
+ self.assertTrue((reports_dir / "summary.html").exists())
+
+ # Check content
+ with open(reports_dir / "main.html") as f:
+ content = f.read()
+ self.assertIn("Report 1", content)
+
+ def test_save_results_complete_configuration(self) -> None:
+ """Test complete configuration with all output types using save_results."""
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+
+ # Setup
+ fig, ax = plt.subplots()
+ ax.plot([1, 2, 3])
+
+ df = pd.DataFrame({'A': [1, 2]})
+ analysis = {"result": "complete"}
+ html = "Report"
+
+ params = {
+ "outputs": {
+ "figures": {"type": "directory", "name": "figure_dir"},
+ "dataframe": {"type": "file", "name": "dataframe.csv"},
+ "analysis": {"type": "file", "name": "output.pickle"},
+ "html": {"type": "directory", "name": "html_dir"}
+ }
+ }
+
+ results = {
+ "figures": {"plot": fig},
+ "dataframe": df,
+ "analysis": analysis,
+ "html": {"report": html}
+ }
+
+ # Execute
+ saved = save_results(results, params, self.tmp_dir.name)
+
+ # Verify all outputs created
+ self.assertTrue((Path(self.tmp_dir.name) / "figure_dir").is_dir())
+ self.assertTrue((Path(self.tmp_dir.name) / "dataframe.csv").is_file())
+ self.assertTrue((Path(self.tmp_dir.name) / "output.pickle").is_file())
+ self.assertTrue((Path(self.tmp_dir.name) / "html_dir").is_dir())
+
+ # Clean up
+ plt.close('all')
+
+ def test_save_results_case_insensitive_matching(self) -> None:
+ """Test case-insensitive matching of result keys to config."""
+ # Setup
+ df = pd.DataFrame({'A': [1, 2]})
+
+ params = {
+ "outputs": {
+ "dataframe": {"type": "file", "name": "data.csv"} # Capital D
+ }
+ }
+
+ results = {
+ "dataframe": df # lowercase d
+ }
+
+ # Execute
+ saved = save_results(results, params, self.tmp_dir.name)
+
+ # Should still match and save
+ self.assertTrue((Path(self.tmp_dir.name) / "data.csv").exists())
+
+ def test_save_results_missing_config(self) -> None:
+ """Test that missing config for result type generates warning."""
+ # Setup
+ df = pd.DataFrame({'A': [1, 2]})
+
+ params = {
+ "outputs": {
+ # No config for "dataframes"
+ "figures": {"type": "directory", "name": "plots"}
+ }
+ }
+
+ results = {
+ "dataframe": df, # No matching config
+ "figures": {}
+ }
+
+ # Execute (should not raise, just warn)
+ saved = save_results(results, params, self.tmp_dir.name)
+
+ # Only figures should be in saved files
+ self.assertIn("figures", saved)
+ self.assertNotIn("dataframes", saved)
+ self.assertNotIn("DataFrames", saved)
+
+ def test_save_single_object_dataframe(self) -> None:
+ """Test _save_single_object helper with DataFrame."""
+ df = pd.DataFrame({'A': [1, 2]})
+
+ path = _save_single_object(df, "test", Path(self.tmp_dir.name))
+
+ self.assertEqual(path.name, "test.csv")
+ self.assertTrue(path.exists())
+
+ def test_save_single_object_figure(self) -> None:
+ """Test _save_single_object helper with matplotlib figure."""
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+
+ fig, ax = plt.subplots()
+ ax.plot([1, 2, 3])
+
+ path = _save_single_object(fig, "plot", Path(self.tmp_dir.name))
+
+ self.assertEqual(path.name, "plot.png")
+ self.assertTrue(path.exists())
+
+ plt.close('all')
+
+ def test_save_single_object_html(self) -> None:
+ """Test _save_single_object helper with HTML string."""
+ html = "Test"
+
+ path = _save_single_object(html, "report.html", Path(self.tmp_dir.name))
+
+ self.assertEqual(path.name, "report.html")
+ self.assertTrue(path.exists())
+
+ def test_save_single_object_generic(self) -> None:
+ """Test _save_single_object helper with generic object."""
+ data = {"test": "data", "value": 123}
+
+ path = _save_single_object(data, "data", Path(self.tmp_dir.name))
+
+ self.assertEqual(path.name, "data.pickle")
+ self.assertTrue(path.exists())
+
+ def test_save_results_dataframes_both_configurations(self) -> None:
+ """Test DataFrames can be saved as both file and directory."""
+ # Test 1: Single DataFrame as file
+ df_single = pd.DataFrame({'A': [1, 2, 3]})
+
+ params_file = {
+ "outputs": {
+ "dataframe": {"type": "file", "name": "single.csv"}
+ }
+ }
+
+ results_single = {"dataframe": df_single}
+
+ saved = save_results(results_single, params_file, self.tmp_dir.name)
+ self.assertTrue((Path(self.tmp_dir.name) / "single.csv").exists())
+
+ # Test 2: Multiple DataFrames as directory
+ df1 = pd.DataFrame({'X': [1, 2]})
+ df2 = pd.DataFrame({'Y': [3, 4]})
+
+ params_dir = {
+ "outputs": {
+ "dataframe": {"type": "directory", "name": "multi_df"}
+ }
+ }
+
+ results_multi = {
+ "dataframe": {
+ "data1": df1,
+ "data2": df2
+ }
+ }
+
+ saved = save_results(results_multi, params_dir,
+ os.path.join(self.tmp_dir.name, "test2"))
+
+ dir_path = Path(self.tmp_dir.name) / "test2" / "multi_df"
+ self.assertTrue(dir_path.exists())
+ self.assertTrue(dir_path.is_dir())
+ self.assertTrue((dir_path / "data1.csv").exists())
+ self.assertTrue((dir_path / "data2.csv").exists())
+
+ def test_save_results_auto_type_detection(self) -> None:
+ """Test automatic type detection based on standardized schema."""
+ # Setup - params with no explicit type
+ params = {
+ "outputs": {
+ "figures": {"name": "plot.png"}, # No type specified
+ "analysis": {"name": "results.pickle"}, # No type specified
+ "dataframe": {"name": "data.csv"}, # No type specified
+ "html": {"name": "report_dir"} # No type specified
+ }
+ }
+
+ # Create test data
+ fig, ax = plt.subplots()
+ ax.plot([1, 2, 3])
+
+ results = {
+ "figures": {"plot1": fig, "plot2": fig}, # Should auto-detect as directory
+ "analysis": {"data": [1, 2, 3]}, # Should auto-detect as file
+ "dataframe": pd.DataFrame({'A': [1, 2]}), # Should auto-detect as file
+ "html": {"report": ""} # Should auto-detect as directory
+ }
+
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+
+ # Execute
+ saved = save_results(results, params, self.tmp_dir.name)
+
+ # Verify auto-detection worked correctly
+ # figure should be directory (standardized for figures)
+ self.assertTrue((Path(self.tmp_dir.name) / "plot.png").is_dir())
+
+ # analysis should be file
+ self.assertTrue((Path(self.tmp_dir.name) / "results.pickle").is_file())
+
+ # dataframes should be file (standard case)
+ self.assertTrue((Path(self.tmp_dir.name) / "data.csv").is_file())
+
+ # html should be directory (standardized for html)
+ self.assertTrue((Path(self.tmp_dir.name) / "report_dir").is_dir())
+
+ plt.close('all')
+
+ def test_save_results_neighborhood_profile_special_case(self) -> None:
+ """Test special case for Neighborhood Profile as directory."""
+ # Setup - Neighborhood Profile should be directory even though it's a dataframe
+ params = {
+ "outputs": {
+ "dataframes": {"name": "Neighborhood_Profile_Results"} # No type, should auto-detect
+ }
+ }
+
+ df1 = pd.DataFrame({'X': [1, 2]})
+ df2 = pd.DataFrame({'Y': [3, 4]})
+
+ results = {
+ "dataframes": {
+ "profile1": df1,
+ "profile2": df2
+ }
+ }
+
+ # Execute
+ saved = save_results(results, params, self.tmp_dir.name)
+
+ # Verify it was saved as directory (special case)
+ dir_path = Path(self.tmp_dir.name) / "Neighborhood_Profile_Results"
+ self.assertTrue(dir_path.exists())
+ self.assertTrue(dir_path.is_dir())
+ self.assertTrue((dir_path / "profile1.csv").exists())
+ self.assertTrue((dir_path / "profile2.csv").exists())
+
+ def test_save_results_with_output_directory_param(self) -> None:
+ """Test using Output_Directory from params."""
+ custom_dir = os.path.join(self.tmp_dir.name, "custom_output")
+
+ # Setup - params includes Output_Directory
+ params = {
+ "Output_Directory": custom_dir,
+ "outputs": {
+ "dataframes": {"type": "file", "name": "data.csv"}
+ }
+ }
+
+ results = {
+ "dataframes": pd.DataFrame({'A': [1, 2]})
+ }
+
+ # Execute without specifying output_base_dir (should use params)
+ saved = save_results(results, params)
+
+ # Verify it used the Output_Directory from params
+ csv_path = Path(custom_dir) / "data.csv"
+ self.assertTrue(csv_path.exists())
+
+ def test_parse_params_from_json_file(self) -> None:
+ """Test parse_params loads parameters from a JSON file."""
+ params = {"key1": "value1", "key2": 42, "nested": {"a": True}}
+ json_path = os.path.join(self.tmp_dir.name, "params.json")
+ with open(json_path, "w") as f:
+ json.dump(params, f)
+
+ result = parse_params(json_path)
+
+ self.assertEqual(result, params)
+ self.assertEqual(result["key1"], "value1")
+ self.assertEqual(result["key2"], 42)
+ self.assertTrue(result["nested"]["a"])
+
+ def test_parse_params_from_dict(self) -> None:
+ """Test parse_params passes through a dict unchanged."""
+ params = {"key": "value"}
+ result = parse_params(params)
+ self.assertIs(result, params)
+
+ def test_parse_params_from_json_string(self) -> None:
+ """Test parse_params parses a raw JSON string."""
+ json_str = '{"key": "value", "num": 7}'
+ result = parse_params(json_str)
+ self.assertEqual(result, {"key": "value", "num": 7})
+
+ def test_parse_params_invalid_type_raises(self) -> None:
+ """Test parse_params raises TypeError for unsupported input."""
+ with self.assertRaises(TypeError):
+ parse_params(12345)
+
+ def test_string_list_to_dictionary_valid(self) -> None:
+ """Test string_list_to_dictionary with valid key:value pairs."""
+ result = string_list_to_dictionary(
+ ["red:#FF0000", "blue:#0000FF"]
+ )
+ self.assertEqual(result, {"red": "#FF0000", "blue": "#0000FF"})
+
+ def test_string_list_to_dictionary_custom_names(self) -> None:
+ """Test string_list_to_dictionary with custom key/value names."""
+ result = string_list_to_dictionary(
+ ["TypeA:Cancer", "TypeB:Normal"],
+ key_name="cell_type",
+ value_name="diagnosis",
+ )
+ self.assertEqual(
+ result, {"TypeA": "Cancer", "TypeB": "Normal"}
+ )
+
+ def test_string_list_to_dictionary_invalid_entry(self) -> None:
+ """Test string_list_to_dictionary raises on missing colon."""
+ with self.assertRaises(ValueError) as ctx:
+ string_list_to_dictionary(["valid:pair", "no_colon"])
+ self.assertIn("Missing ':'", str(ctx.exception))
+
+ def test_string_list_to_dictionary_not_list_raises(self) -> None:
+ """Test string_list_to_dictionary raises TypeError for non-list."""
+ with self.assertRaises(TypeError):
+ string_list_to_dictionary("not_a_list")
+
+ def test_clean_column_name_basic(self) -> None:
+ """Test clean_column_name on normal and special-char columns."""
+ # Normal name — unchanged
+ self.assertEqual(clean_column_name("cell_type"), "cell_type")
+
+ # Special characters cleaned
+ self.assertEqual(clean_column_name("CD4+"), "CD4_pos")
+ self.assertEqual(clean_column_name("Area µm²"), "Area_um2")
+
+ def test_clean_column_name_digit_prefix(self) -> None:
+ """Test clean_column_name adds col_ prefix for digit-leading names."""
+ result = clean_column_name("123ABC")
+ self.assertEqual(result, "col_123ABC")
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_tsne_analysis_template.py b/tests/templates/test_tsne_analysis_template.py
new file mode 100644
index 00000000..72039f1e
--- /dev/null
+++ b/tests/templates/test_tsne_analysis_template.py
@@ -0,0 +1,95 @@
+# tests/templates/test_tsne_analysis_template.py
+"""
+Real (non-mocked) unit test for the tSNE Analysis template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.tsne_analysis_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData: 50 cells, 5 genes for tSNE (needs enough cells)."""
+ rng = np.random.default_rng(42)
+ X = rng.random((50, 5))
+ obs = pd.DataFrame({"cell_type": ["A", "B"] * 25})
+ var = pd.DataFrame(index=[f"Gene_{i}" for i in range(5)])
+ return ad.AnnData(X=X, obs=obs, var=var)
+
+
+class TestTsneAnalysisTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the tSNE analysis template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Table_to_Process": "Original",
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "analysis": {"type": "file", "name": "output.pickle"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_tsne_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run tSNE and verify outputs.
+
+ Validates:
+ 1. saved_files dict has 'analysis' key
+ 2. Pickle contains AnnData with 'X_tsne' in .obsm
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("analysis", saved_files)
+
+ pkl_path = Path(saved_files["analysis"])
+ self.assertTrue(pkl_path.exists())
+ self.assertGreater(pkl_path.stat().st_size, 0)
+
+ with open(pkl_path, "rb") as f:
+ result_adata = pickle.load(f)
+ self.assertIsInstance(result_adata, ad.AnnData)
+ self.assertIn("X_tsne", result_adata.obsm)
+
+ mem_adata = run_from_json(self.json_file, save_to_disk=False)
+ self.assertIsInstance(mem_adata, ad.AnnData)
+ self.assertIn("X_tsne", mem_adata.obsm)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_umap_transformation_template.py b/tests/templates/test_umap_transformation_template.py
new file mode 100644
index 00000000..96cd2b8e
--- /dev/null
+++ b/tests/templates/test_umap_transformation_template.py
@@ -0,0 +1,101 @@
+# tests/templates/test_umap_transformation_template.py
+"""
+Real (non-mocked) unit test for the UMAP Transformation template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.umap_transformation_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData: 20 cells, 5 genes for UMAP."""
+ rng = np.random.default_rng(42)
+ X = rng.random((20, 5))
+ obs = pd.DataFrame({"cell_type": ["A", "B"] * 10})
+ var = pd.DataFrame(index=[f"Gene_{i}" for i in range(5)])
+ return ad.AnnData(X=X, obs=obs, var=var)
+
+
+class TestUmapTransformationTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the UMAP transformation template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Table_to_Process": "Original",
+ "Number_of_Neighbors": 5,
+ "Minimum_Distance_between_Points": 0.1,
+ "Target_Dimension_Number": 2,
+ "Computational_Metric": "euclidean",
+ "Random_State": 0,
+ "Transform_Seed": 42,
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "analysis": {"type": "file", "name": "output.pickle"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_umap_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run UMAP and verify outputs.
+
+ Validates:
+ 1. saved_files dict has 'analysis' key
+ 2. Pickle contains AnnData with 'X_umap' in .obsm
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("analysis", saved_files)
+
+ pkl_path = Path(saved_files["analysis"])
+ self.assertTrue(pkl_path.exists())
+ self.assertGreater(pkl_path.stat().st_size, 0)
+
+ with open(pkl_path, "rb") as f:
+ result_adata = pickle.load(f)
+ self.assertIsInstance(result_adata, ad.AnnData)
+ self.assertIn("X_umap", result_adata.obsm)
+
+ mem_adata = run_from_json(self.json_file, save_to_disk=False)
+ self.assertIsInstance(mem_adata, ad.AnnData)
+ self.assertIn("X_umap", mem_adata.obsm)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_umap_tsne_pca_template.py b/tests/templates/test_umap_tsne_pca_template.py
new file mode 100644
index 00000000..f04215f2
--- /dev/null
+++ b/tests/templates/test_umap_tsne_pca_template.py
@@ -0,0 +1,103 @@
+# tests/templates/test_umap_tsne_pca_template.py
+"""
+Real (non-mocked) unit test for the UMAP/tSNE/PCA Visualization template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import matplotlib
+matplotlib.use("Agg")
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.umap_tsne_pca_visualization_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData with pre-computed UMAP embedding for visualization."""
+ rng = np.random.default_rng(42)
+ X = rng.random((8, 2))
+ obs = pd.DataFrame({"cell_type": ["A", "B"] * 4})
+ var = pd.DataFrame(index=["Gene_0", "Gene_1"])
+ adata = ad.AnnData(X=X, obs=obs, var=var)
+ adata.obsm["X_umap"] = rng.random((8, 2)) * 10
+ return adata
+
+
+class TestUmapTsnePcaTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the UMAP/tSNE/PCA visualization."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Dimensionality_Reduction_Method": "UMAP",
+ "Color_By": "Annotation",
+ "Annotation": "cell_type",
+ "Feature": "None",
+ "Figure_Width": 6,
+ "Figure_Height": 4,
+ "Figure_DPI": 72,
+ "Font_Size": 10,
+ "Spot_Size": 50,
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "figures": {"type": "directory", "name": "figures_dir"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_umap_tsne_pca_visualization_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run dim reduction visualization and verify.
+
+ Validates:
+ 1. saved_files dict has 'figures' key
+ 2. Figures directory contains non-empty PNG(s)
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ show_plot=False,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("figures", saved_files)
+
+ figure_paths = saved_files["figures"]
+ self.assertGreaterEqual(len(figure_paths), 1)
+ for fig_path in figure_paths:
+ fig_file = Path(fig_path)
+ self.assertTrue(fig_file.exists())
+ self.assertGreater(fig_file.stat().st_size, 0)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_utag_clustering_template.py b/tests/templates/test_utag_clustering_template.py
new file mode 100644
index 00000000..dc5f172d
--- /dev/null
+++ b/tests/templates/test_utag_clustering_template.py
@@ -0,0 +1,109 @@
+# tests/templates/test_utag_clustering_template.py
+"""
+Real (non-mocked) unit test for the UTAG Clustering template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.utag_clustering_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData: 30 cells with spatial coords for UTAG clustering."""
+ rng = np.random.default_rng(42)
+ X = rng.random((30, 3))
+ obs = pd.DataFrame({"cell_type": ["A", "B", "C"] * 10})
+ var = pd.DataFrame(index=["Gene_0", "Gene_1", "Gene_2"])
+ spatial = rng.random((30, 2)) * 100
+ adata = ad.AnnData(X=X, obs=obs, var=var)
+ adata.obsm["spatial"] = spatial
+ return adata
+
+
+class TestUTAGClusteringTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the UTAG clustering template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Table_to_Process": "Original",
+ "Features": ["All"],
+ "Slide_Annotation": "None",
+ "Distance_Threshold": 20.0,
+ "K_Nearest_Neighbors": 5,
+ "Resolution_Parameter": 1,
+ "PCA_Components": "None",
+ "Random_Seed": 42,
+ "N_Jobs": 1,
+ "Leiden_Iterations": 3,
+ "Parellel_Processes": False,
+ "Output_Annotation_Name": "UTAG",
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "analysis": {"type": "file", "name": "output.pickle"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_utag_clustering_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run UTAG clustering and verify outputs.
+
+ Validates:
+ 1. saved_files dict has 'analysis' key
+ 2. Pickle contains AnnData with UTAG obs column
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("analysis", saved_files)
+
+ pkl_path = Path(saved_files["analysis"])
+ self.assertTrue(pkl_path.exists())
+ self.assertGreater(pkl_path.stat().st_size, 0)
+
+ with open(pkl_path, "rb") as f:
+ result_adata = pickle.load(f)
+ self.assertIsInstance(result_adata, ad.AnnData)
+ self.assertIn("UTAG", result_adata.obs.columns)
+
+ mem_adata = run_from_json(self.json_file, save_to_disk=False)
+ self.assertIsInstance(mem_adata, ad.AnnData)
+ self.assertIn("UTAG", mem_adata.obs.columns)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_visualize_nearest_neighbor_template.py b/tests/templates/test_visualize_nearest_neighbor_template.py
new file mode 100644
index 00000000..ba62948c
--- /dev/null
+++ b/tests/templates/test_visualize_nearest_neighbor_template.py
@@ -0,0 +1,130 @@
+# tests/templates/test_visualize_nearest_neighbor_template.py
+"""
+Real (non-mocked) unit test for the Visualize Nearest Neighbor template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import matplotlib
+matplotlib.use("Agg")
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.visualize_nearest_neighbor_template import run_from_json
+from spac.templates.nearest_neighbor_calculation_template import (
+ run_from_json as run_nn,
+)
+
+
+def _make_adata_with_nn() -> ad.AnnData:
+ """Create AnnData with pre-computed nearest neighbor results."""
+ rng = np.random.default_rng(42)
+ X = rng.random((12, 2))
+ obs = pd.DataFrame({
+ "cell_type": ["A", "B", "C"] * 4,
+ })
+ var = pd.DataFrame(index=["Gene_0", "Gene_1"])
+ spatial = rng.random((12, 2)) * 100
+ adata = ad.AnnData(X=X, obs=obs, var=var)
+ adata.obsm["spatial"] = spatial
+
+ # Run actual nearest neighbor to populate .obsm
+ import tempfile as tf
+ with tf.TemporaryDirectory() as td:
+ pkl_in = os.path.join(td, "in.pickle")
+ with open(pkl_in, "wb") as f:
+ pickle.dump(adata, f)
+ nn_params = {
+ "Upstream_Analysis": pkl_in,
+ "Annotation": "cell_type",
+ "ImageID": "None",
+ }
+ json_path = os.path.join(td, "p.json")
+ with open(json_path, "w") as f:
+ json.dump(nn_params, f)
+ adata = run_nn(json_path, save_to_disk=False)
+ return adata
+
+
+class TestVisualizeNearestNeighborTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for visualize nearest neighbor template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_adata_with_nn(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Annotation": "cell_type",
+ "Source_Anchor_Cell_Label": "A",
+ "Nearest_Neighbor_Associated_Table": "spatial_distance",
+ "Figure_Width": 6,
+ "Figure_Height": 4,
+ "Figure_DPI": 72,
+ "Font_Size": 10,
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "figures": {"type": "directory", "name": "figures"},
+ "dataframe": {"type": "file", "name": "dataframe.csv"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_visualize_nn_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: visualize nearest neighbors and verify.
+
+ Validates:
+ 1. saved_files dict has 'figures' and/or 'dataframe' keys
+ 2. Output files exist and are non-empty
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ show_plot=False,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertGreater(len(saved_files), 0)
+
+ if "figures" in saved_files:
+ figure_paths = saved_files["figures"]
+ self.assertGreaterEqual(len(figure_paths), 1)
+ for fig_path in figure_paths:
+ fig_file = Path(fig_path)
+ self.assertTrue(fig_file.exists())
+ self.assertGreater(fig_file.stat().st_size, 0)
+
+ if "dataframe" in saved_files:
+ csv_path = Path(saved_files["dataframe"])
+ self.assertTrue(csv_path.exists())
+ self.assertGreater(csv_path.stat().st_size, 0)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_visualize_ripley_template.py b/tests/templates/test_visualize_ripley_template.py
new file mode 100644
index 00000000..c7ada182
--- /dev/null
+++ b/tests/templates/test_visualize_ripley_template.py
@@ -0,0 +1,134 @@
+# tests/templates/test_visualize_ripley_template.py
+"""
+Real (non-mocked) unit test for the Visualize Ripley L template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import matplotlib
+matplotlib.use("Agg")
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.visualize_ripley_l_template import run_from_json
+from spac.templates.ripley_l_calculation_template import (
+ run_from_json as run_ripley,
+)
+
+
+def _make_adata_with_ripley() -> ad.AnnData:
+ """Create AnnData with pre-computed Ripley L results in .uns."""
+ rng = np.random.default_rng(42)
+ X = rng.random((20, 2))
+ obs = pd.DataFrame({"cell_type": (["A"] * 10) + (["B"] * 10)})
+ var = pd.DataFrame(index=["Gene_0", "Gene_1"])
+ spatial = rng.random((20, 2)) * 100
+ adata = ad.AnnData(X=X, obs=obs, var=var)
+ adata.obsm["spatial"] = spatial
+
+ # Run actual Ripley L to populate .uns
+ import tempfile as tf
+ with tf.TemporaryDirectory() as td:
+ pkl_in = os.path.join(td, "in.pickle")
+ with open(pkl_in, "wb") as f:
+ pickle.dump(adata, f)
+ ripley_params = {
+ "Upstream_Analysis": pkl_in,
+ "Radii": [5, 10, 20],
+ "Annotation": "cell_type",
+ "Center_Phenotype": "A",
+ "Neighbor_Phenotype": "B",
+ "Number_of_Simulations": 5,
+ "Seed": 42,
+ "Spatial_Key": "spatial",
+ "Edge_Correction": True,
+ }
+ json_path = os.path.join(td, "p.json")
+ with open(json_path, "w") as f:
+ json.dump(ripley_params, f)
+ adata = run_ripley(json_path, save_to_disk=False)
+ return adata
+
+
+class TestVisualizeRipleyTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the visualize Ripley L template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_adata_with_ripley(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Radii": [5, 10, 20],
+ "Annotation": "cell_type",
+ "Center_Phenotype": "A",
+ "Neighbor_Phenotype": "B",
+ "Figure_Width": 6,
+ "Figure_Height": 4,
+ "Figure_DPI": 72,
+ "Font_Size": 10,
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "figures": {"type": "directory", "name": "figures_dir"},
+ "dataframe": {"type": "file", "name": "dataframe.csv"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_visualize_ripley_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: visualize Ripley L and verify outputs.
+
+ Validates:
+ 1. saved_files dict has output keys
+ 2. Output files exist and are non-empty
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ show_plot=False,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ # Check that at least some output was produced
+ self.assertGreater(len(saved_files), 0)
+
+ for key, value in saved_files.items():
+ if isinstance(value, list):
+ for p in value:
+ pf = Path(p)
+ self.assertTrue(pf.exists())
+ self.assertGreater(pf.stat().st_size, 0)
+ elif isinstance(value, str):
+ pf = Path(value)
+ self.assertTrue(pf.exists())
+ self.assertGreater(pf.stat().st_size, 0)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/templates/test_zscore_normalization_template.py b/tests/templates/test_zscore_normalization_template.py
new file mode 100644
index 00000000..6c0049e7
--- /dev/null
+++ b/tests/templates/test_zscore_normalization_template.py
@@ -0,0 +1,98 @@
+# tests/templates/test_zscore_normalization_template.py
+"""
+Real (non-mocked) unit test for the Z-Score Normalization template.
+
+Validates template I/O behaviour only.
+No mocking. Uses real data, real filesystem, and tempfile.
+"""
+
+import json
+import os
+import pickle
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+import anndata as ad
+import numpy as np
+import pandas as pd
+
+sys.path.append(
+ os.path.dirname(os.path.realpath(__file__)) + "/../../src"
+)
+
+from spac.templates.z_score_normalization_template import run_from_json
+
+
+def _make_tiny_adata() -> ad.AnnData:
+ """Minimal AnnData: 4 cells, 2 genes for z-score normalization."""
+ rng = np.random.default_rng(42)
+ X = rng.integers(1, 100, size=(4, 2)).astype(float)
+ obs = pd.DataFrame({"cell_type": ["A", "B", "A", "B"]})
+ var = pd.DataFrame(index=["Gene_0", "Gene_1"])
+ return ad.AnnData(X=X, obs=obs, var=var)
+
+
+class TestZScoreNormalizationTemplate(unittest.TestCase):
+ """Real (non-mocked) tests for the z-score normalization template."""
+
+ def setUp(self) -> None:
+ self.tmp_dir = tempfile.TemporaryDirectory()
+ self.in_file = os.path.join(self.tmp_dir.name, "input.pickle")
+
+ with open(self.in_file, "wb") as f:
+ pickle.dump(_make_tiny_adata(), f)
+
+ params = {
+ "Upstream_Analysis": self.in_file,
+ "Table_to_Process": "Original",
+ "Output_Table_Name": "zscore",
+ "Output_Directory": self.tmp_dir.name,
+ "outputs": {
+ "analysis": {"type": "file", "name": "output.pickle"},
+ },
+ }
+
+ self.json_file = os.path.join(self.tmp_dir.name, "params.json")
+ with open(self.json_file, "w") as f:
+ json.dump(params, f)
+
+ def tearDown(self) -> None:
+ self.tmp_dir.cleanup()
+
+ def test_zscore_normalization_produces_expected_outputs(self) -> None:
+ """
+ End-to-end I/O test: run z-score normalization and verify outputs.
+
+ Validates:
+ 1. saved_files dict has 'analysis' key
+ 2. Pickle exists, is non-empty, contains AnnData
+ 3. Z-score layer is present in the AnnData
+ """
+ saved_files = run_from_json(
+ self.json_file,
+ save_to_disk=True,
+ output_dir=self.tmp_dir.name,
+ )
+
+ self.assertIsInstance(saved_files, dict)
+ self.assertIn("analysis", saved_files)
+
+ pkl_path = Path(saved_files["analysis"])
+ self.assertTrue(pkl_path.exists())
+ self.assertGreater(pkl_path.stat().st_size, 0)
+
+ with open(pkl_path, "rb") as f:
+ result_adata = pickle.load(f)
+ self.assertIsInstance(result_adata, ad.AnnData)
+ # z-score normalization creates a 'zscore' layer
+ self.assertIn("zscore", result_adata.layers)
+
+ mem_adata = run_from_json(self.json_file, save_to_disk=False)
+ self.assertIsInstance(mem_adata, ad.AnnData)
+ self.assertIn("zscore", mem_adata.layers)
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/test_performance/__init__.py b/tests/test_performance/__init__.py
new file mode 100644
index 00000000..8042e032
--- /dev/null
+++ b/tests/test_performance/__init__.py
@@ -0,0 +1,3 @@
+import os
+import sys
+sys.path.append(os.path.dirname(os.path.realpath(__file__)) + "/../../src")
diff --git a/tests/test_performance/test_boxplot_performance.py b/tests/test_performance/test_boxplot_performance.py
new file mode 100644
index 00000000..1e3ea434
--- /dev/null
+++ b/tests/test_performance/test_boxplot_performance.py
@@ -0,0 +1,210 @@
+import os
+import unittest
+import time
+import numpy as np
+import pandas as pd
+import anndata as ad
+import matplotlib
+import matplotlib.pyplot as plt
+from sklearn.datasets import make_blobs
+from sklearn.preprocessing import StandardScaler
+from spac.visualization import boxplot, boxplot_interactive
+
+matplotlib.use('Agg') # Set the backend to 'Agg' to suppress plot window
+
+
+skip_perf = unittest.skipUnless(
+ os.getenv("SPAC_RUN_PERF") == "1",
+ "Perf tests disabled by default"
+)
+
+@skip_perf
+class TestBoxplotPerformance(unittest.TestCase):
+ """Performance comparison tests for boxplot vs boxplot_interactive."""
+
+ @classmethod
+ def setUpClass(cls):
+ """Generate large datasets once for all tests."""
+ print("\n" + "=" * 70)
+ print("Setting up large datasets for boxplot performance tests...")
+ print("=" * 70)
+
+ # Generate 1M cell dataset
+ print("\nGenerating 1M cell dataset...")
+ start = time.time()
+ cls.adata_1m = cls._generate_dataset(n_obs=1_000_000, random_state=42)
+ print(f" Completed in {time.time() - start:.2f} seconds")
+
+ # Generate 5M cell dataset
+ print("\nGenerating 5M cell dataset...")
+ start = time.time()
+ cls.adata_5m = cls._generate_dataset(n_obs=5_000_000, random_state=42)
+ print(f" Completed in {time.time() - start:.2f} seconds")
+
+ # Generate 10M cell dataset
+ print("\nGenerating 10M cell dataset...")
+ start = time.time()
+ cls.adata_10m = cls._generate_dataset(n_obs=10_000_000, random_state=42)
+ print(f" Completed in {time.time() - start:.2f} seconds")
+ print("=" * 70 + "\n")
+
+ @staticmethod
+ def _generate_dataset(n_obs: int, random_state: int = 42) -> ad.AnnData:
+ """
+ Generate a synthetic AnnData object with realistic clustering.
+
+ Creates dataset with:
+ - 5 features (marker_1 to marker_5)
+ - 5 annotations (cell_type, phenotype, region, batch, treatment)
+ - 3 layers (normalized, log_transformed, scaled)
+ """
+ np.random.seed(random_state)
+
+ # Generate base data with natural clustering
+ n_features = 5
+ n_centers = 5
+
+ X, cluster_labels = make_blobs(
+ n_samples=n_obs,
+ n_features=n_features,
+ centers=n_centers,
+ cluster_std=1.5,
+ random_state=random_state
+ )
+
+ # Make values positive and add variation
+ X = np.abs(X) + np.random.exponential(scale=2.0, size=X.shape)
+
+ # Create feature names
+ feature_names = [f"marker_{i+1}" for i in range(n_features)]
+
+ # Create annotations based on clusters
+ cell_types = [f"Type_{chr(65+i)}" for i in range(5)]
+ cell_type = np.array([cell_types[i % 5] for i in cluster_labels])
+
+ phenotypes = [f"Pheno_{i+1}" for i in range(4)]
+ phenotype = np.array([phenotypes[i % 4] for i in cluster_labels])
+ random_mask = np.random.random(n_obs) < 0.2
+ phenotype[random_mask] = np.random.choice(phenotypes, size=random_mask.sum())
+
+ regions = ["Region_X", "Region_Y", "Region_Z"]
+ region = np.random.choice(regions, size=n_obs)
+
+ batches = ["Batch_1", "Batch_2", "Batch_3"]
+ batch = np.random.choice(batches, size=n_obs)
+
+ treatments = ["Control", "Treated"]
+ treatment = np.random.choice(treatments, size=n_obs, p=[0.5, 0.5])
+
+ # Create observations DataFrame
+ obs = pd.DataFrame({
+ 'cell_type': pd.Categorical(cell_type),
+ 'phenotype': pd.Categorical(phenotype),
+ 'region': pd.Categorical(region),
+ 'batch': pd.Categorical(batch),
+ 'treatment': pd.Categorical(treatment)
+ })
+
+ # Create AnnData object
+ adata = ad.AnnData(X=X, obs=obs)
+ adata.var_names = feature_names
+
+ # Create layers with different transformations
+ X_normalized = np.zeros_like(X)
+ for i in range(n_features):
+ feature_min = X[:, i].min()
+ feature_max = X[:, i].max()
+ X_normalized[:, i] = (X[:, i] - feature_min) / (feature_max - feature_min)
+ adata.layers['normalized'] = X_normalized
+
+ adata.layers['log_transformed'] = np.log1p(X)
+
+ scaler = StandardScaler()
+ adata.layers['scaled'] = scaler.fit_transform(X)
+
+ return adata
+
+ def tearDown(self):
+ """Clean up matplotlib figures after each test."""
+ plt.close('all')
+
+ def _run_comparison(self, adata, test_name):
+ """Run comparison between boxplot and boxplot_interactive."""
+ n_obs = adata.n_obs
+ features = ['marker_1', 'marker_2', 'marker_3', 'marker_4', 'marker_5']
+ annotation = 'cell_type'
+ layer = 'normalized'
+
+ print(f"\n{'=' * 70}")
+ print(f"{test_name}: {n_obs:,} cells")
+ print(f" Features: {', '.join(features)}")
+ print(f" Annotation: {annotation}")
+ print(f" Layer: {layer}")
+ print(f"{'=' * 70}")
+
+ # Test boxplot
+ print("\n Running boxplot...")
+ start = time.time()
+ fig, ax, df = boxplot(
+ adata,
+ features=features,
+ annotation=annotation,
+ layer=layer
+ )
+ boxplot_time = time.time() - start
+ print(f" Time: {boxplot_time:.2f} seconds")
+ plt.close('all')
+
+ # Test boxplot_interactive with downsampling
+ print("\n Running boxplot_interactive (with downsampling)...")
+ start = time.time()
+ result = boxplot_interactive(
+ adata,
+ features=features,
+ annotation=annotation,
+ layer=layer,
+ showfliers='downsample'
+ )
+ interactive_time = time.time() - start
+ print(f" Time: {interactive_time:.2f} seconds")
+
+ # Calculate speedup
+ speedup = boxplot_time / interactive_time if interactive_time > 0 else 0
+
+ print(f"\n Results:")
+ print(f" boxplot: {boxplot_time:.2f}s")
+ print(f" boxplot_interactive: {interactive_time:.2f}s")
+ print(f" Speedup factor: {speedup:.2f}x")
+
+ if speedup > 1:
+ print(f" → boxplot_interactive is {speedup:.2f}x faster")
+ elif speedup < 1:
+ print(f" → boxplot is {1/speedup:.2f}x faster")
+ else:
+ print(f" → Both functions have similar performance")
+
+ print(f"{'=' * 70}\n")
+
+ # Store results for potential further analysis
+ return {
+ 'n_obs': n_obs,
+ 'boxplot_time': boxplot_time,
+ 'boxplot_interactive_time': interactive_time,
+ 'speedup_factor': speedup
+ }
+
+ def test_comparison_1m(self):
+ """Compare boxplot vs boxplot_interactive with 1M cells."""
+ self._run_comparison(self.adata_1m, "Boxplot Performance Comparison [1M cells]")
+
+ def test_comparison_5m(self):
+ """Compare boxplot vs boxplot_interactive with 5M cells."""
+ self._run_comparison(self.adata_5m, "Boxplot Performance Comparison [5M cells]")
+
+ def test_comparison_10m(self):
+ """Compare boxplot vs boxplot_interactive with 10M cells."""
+ self._run_comparison(self.adata_10m, "Boxplot Performance Comparison [10M cells]")
+
+
+if __name__ == '__main__':
+ unittest.main(verbosity=2)
diff --git a/tests/test_performance/test_histogram_performance.py b/tests/test_performance/test_histogram_performance.py
new file mode 100644
index 00000000..308e80b3
--- /dev/null
+++ b/tests/test_performance/test_histogram_performance.py
@@ -0,0 +1,386 @@
+import os
+import unittest
+import time
+import warnings
+import numpy as np
+import pandas as pd
+import anndata as ad
+import matplotlib
+import matplotlib.pyplot as plt
+import seaborn as sns
+from sklearn.datasets import make_blobs
+from sklearn.preprocessing import StandardScaler
+from spac.visualization import histogram
+from spac.utils import check_annotation, check_feature, check_table
+
+matplotlib.use('Agg') # Set the backend to 'Agg' to suppress plot window
+
+
+
+
+skip_perf = unittest.skipUnless(
+ os.getenv("SPAC_RUN_PERF") == "1",
+ "Perf tests disabled by default"
+)
+
+@skip_perf
+class TestHistogramPerformance(unittest.TestCase):
+ """Performance comparison tests for histogram vs histogram_old."""
+
+ @classmethod
+ def setUpClass(cls):
+ """Generate large datasets once for all tests."""
+ print("\n" + "=" * 70)
+ print("Setting up large datasets for histogram performance tests...")
+ print("=" * 70)
+
+ # Generate 1M cell dataset
+ print("\nGenerating 1M cell dataset...")
+ start = time.time()
+ cls.adata_1m = cls._generate_dataset(n_obs=1_000_000, random_state=42)
+ print(f" Completed in {time.time() - start:.2f} seconds")
+
+ # Generate 5M cell dataset
+ print("\nGenerating 5M cell dataset...")
+ start = time.time()
+ cls.adata_5m = cls._generate_dataset(n_obs=5_000_000, random_state=42)
+ print(f" Completed in {time.time() - start:.2f} seconds")
+
+ # Generate 10M cell dataset
+ print("\nGenerating 10M cell dataset...")
+ start = time.time()
+ cls.adata_10m = cls._generate_dataset(n_obs=10_000_000, random_state=42)
+ print(f" Completed in {time.time() - start:.2f} seconds")
+ print("=" * 70 + "\n")
+
+ @staticmethod
+ def _generate_dataset(n_obs: int, random_state: int = 42) -> ad.AnnData:
+ """
+ Generate a synthetic AnnData object with realistic clustering.
+
+ Creates dataset with:
+ - 5 features (marker_1 to marker_5)
+ - 5 annotations (cell_type, phenotype, region, batch, treatment)
+ - 3 layers (normalized, log_transformed, scaled)
+ """
+ np.random.seed(random_state)
+
+ # Generate base data with natural clustering
+ n_features = 5
+ n_centers = 5
+
+ X, cluster_labels = make_blobs(
+ n_samples=n_obs,
+ n_features=n_features,
+ centers=n_centers,
+ cluster_std=1.5,
+ random_state=random_state
+ )
+
+ # Make values positive and add variation
+ X = np.abs(X) + np.random.exponential(scale=2.0, size=X.shape)
+
+ # Create feature names
+ feature_names = [f"marker_{i+1}" for i in range(n_features)]
+
+ # Create annotations based on clusters
+ cell_types = [f"Type_{chr(65+i)}" for i in range(5)]
+ cell_type = np.array([cell_types[i % 5] for i in cluster_labels])
+
+ phenotypes = [f"Pheno_{i+1}" for i in range(4)]
+ phenotype = np.array([phenotypes[i % 4] for i in cluster_labels])
+ random_mask = np.random.random(n_obs) < 0.2
+ phenotype[random_mask] = np.random.choice(phenotypes, size=random_mask.sum())
+
+ regions = ["Region_X", "Region_Y", "Region_Z"]
+ region = np.random.choice(regions, size=n_obs)
+
+ batches = ["Batch_1", "Batch_2", "Batch_3"]
+ batch = np.random.choice(batches, size=n_obs)
+
+ treatments = ["Control", "Treated"]
+ treatment = np.random.choice(treatments, size=n_obs, p=[0.5, 0.5])
+
+ # Create observations DataFrame
+ obs = pd.DataFrame({
+ 'cell_type': pd.Categorical(cell_type),
+ 'phenotype': pd.Categorical(phenotype),
+ 'region': pd.Categorical(region),
+ 'batch': pd.Categorical(batch),
+ 'treatment': pd.Categorical(treatment)
+ })
+
+ # Create AnnData object
+ adata = ad.AnnData(X=X, obs=obs)
+ adata.var_names = feature_names
+
+ # Create layers with different transformations
+ X_normalized = np.zeros_like(X)
+ for i in range(n_features):
+ feature_min = X[:, i].min()
+ feature_max = X[:, i].max()
+ X_normalized[:, i] = (X[:, i] - feature_min) / (feature_max - feature_min)
+ adata.layers['normalized'] = X_normalized
+
+ adata.layers['log_transformed'] = np.log1p(X)
+
+ scaler = StandardScaler()
+ adata.layers['scaled'] = scaler.fit_transform(X)
+
+ return adata
+
+ def tearDown(self):
+ """Clean up matplotlib figures after each test."""
+ plt.close('all')
+
+ @staticmethod
+ def histogram_old(adata, feature=None, annotation=None, layer=None,
+ group_by=None, together=False, ax=None,
+ x_log_scale=False, y_log_scale=False, **kwargs):
+ """
+ Old histogram implementation for performance comparison.
+
+ Copied from commit 1cfad52f00aa6c1b8384f727b60e3bf07f57bee6 in
+ visualization.py, before the refactor to histogram
+ """
+ # If no feature or annotation is specified, apply default behavior
+ if feature is None and annotation is None:
+ feature = adata.var_names[0]
+ warnings.warn(
+ "No feature or annotation specified. "
+ "Defaulting to the first feature: "
+ f"'{feature}'.",
+ UserWarning
+ )
+
+ # Use utility functions for input validation
+ if layer:
+ check_table(adata, tables=layer)
+ if annotation:
+ check_annotation(adata, annotations=annotation)
+ if feature:
+ check_feature(adata, features=feature)
+ if group_by:
+ check_annotation(adata, annotations=group_by)
+
+ # If layer is specified, get the data from that layer
+ if layer:
+ df = pd.DataFrame(
+ adata.layers[layer], index=adata.obs.index, columns=adata.var_names
+ )
+ else:
+ df = pd.DataFrame(
+ adata.X, index=adata.obs.index, columns=adata.var_names
+ )
+ layer = 'Original'
+
+ df = pd.concat([df, adata.obs], axis=1)
+
+ if feature and annotation:
+ raise ValueError("Cannot pass both feature and annotation,"
+ " choose one.")
+
+ data_column = feature if feature else annotation
+
+ # Check for negative values and apply log1p transformation if x_log_scale is True
+ if x_log_scale:
+ if (df[data_column] < 0).any():
+ print(
+ "There are negative values in the data, disabling x_log_scale."
+ )
+ x_log_scale = False
+ else:
+ df[data_column] = np.log1p(df[data_column])
+
+ if ax is not None:
+ fig = ax.get_figure()
+ else:
+ fig, ax = plt.subplots()
+
+ axs = []
+
+ # Prepare the data for plotting
+ plot_data = df.dropna(subset=[data_column])
+
+ # Bin calculation section
+ def cal_bin_num(num_rows):
+ bins = max(int(2*(num_rows ** (1/3))), 1)
+ print(f'Automatically calculated number of bins is: {bins}')
+ return(bins)
+
+ num_rows = plot_data.shape[0]
+
+ # Check if bins is being passed
+ if 'bins' not in kwargs:
+ kwargs['bins'] = cal_bin_num(num_rows)
+
+ # Plotting with or without grouping
+ if group_by:
+ groups = df[group_by].dropna().unique().tolist()
+ n_groups = len(groups)
+ if n_groups == 0:
+ raise ValueError("There must be at least one group to create a"
+ " histogram.")
+
+ if together:
+ kwargs.setdefault("multiple", "stack")
+ kwargs.setdefault("element", "bars")
+
+ sns.histplot(data=df.dropna(), x=data_column, hue=group_by,
+ ax=ax, **kwargs)
+ if feature:
+ ax.set_title(f'Layer: {layer}')
+ axs.append(ax)
+ else:
+ fig, ax_array = plt.subplots(
+ n_groups, 1, figsize=(5, 5 * n_groups)
+ )
+
+ if n_groups == 1:
+ ax_array = [ax_array]
+ else:
+ ax_array = ax_array.flatten()
+
+ for i, ax_i in enumerate(ax_array):
+ group_data = plot_data[plot_data[group_by] == groups[i]]
+
+ sns.histplot(data=group_data, x=data_column, ax=ax_i, **kwargs)
+ if feature:
+ ax_i.set_title(f'{groups[i]} with Layer: {layer}')
+ else:
+ ax_i.set_title(f'{groups[i]}')
+
+ if y_log_scale:
+ ax_i.set_yscale('log')
+
+ if x_log_scale:
+ xlabel = f'log({data_column})'
+ else:
+ xlabel = data_column
+ ax_i.set_xlabel(xlabel)
+
+ stat = kwargs.get('stat', 'count')
+ ylabel_map = {
+ 'count': 'Count',
+ 'frequency': 'Frequency',
+ 'density': 'Density',
+ 'probability': 'Probability'
+ }
+ ylabel = ylabel_map.get(stat, 'Count')
+ if y_log_scale:
+ ylabel = f'log({ylabel})'
+ ax_i.set_ylabel(ylabel)
+
+ axs.append(ax_i)
+ else:
+ sns.histplot(data=plot_data, x=data_column, ax=ax, **kwargs)
+ if feature:
+ ax.set_title(f'Layer: {layer}')
+ axs.append(ax)
+
+ if y_log_scale:
+ ax.set_yscale('log')
+
+ if x_log_scale:
+ xlabel = f'log({data_column})'
+ else:
+ xlabel = data_column
+ ax.set_xlabel(xlabel)
+
+ stat = kwargs.get('stat', 'count')
+ ylabel_map = {
+ 'count': 'Count',
+ 'frequency': 'Frequency',
+ 'density': 'Density',
+ 'probability': 'Probability'
+ }
+ ylabel = ylabel_map.get(stat, 'Count')
+ if y_log_scale:
+ ylabel = f'log({ylabel})'
+ ax.set_ylabel(ylabel)
+
+ if len(axs) == 1:
+ return fig, axs[0]
+ else:
+ return fig, axs
+
+ def _run_comparison(self, adata, test_name):
+ """Run comparison between histogram_old and histogram."""
+ n_obs = adata.n_obs
+ feature = 'marker_1'
+ annotation = None
+ layer = 'normalized'
+
+ print(f"\n{'=' * 70}")
+ print(f"{test_name}: {n_obs:,} cells")
+ print(f" Feature: {feature}")
+ print(f" Annotation: {annotation}")
+ print(f" Layer: {layer}")
+ print(f"{'=' * 70}")
+
+ # Test histogram_old
+ print("\n Running histogram_old...")
+ start = time.time()
+ fig_old, ax_old = self.histogram_old(
+ adata,
+ feature=feature,
+ annotation=annotation,
+ layer=layer
+ )
+ old_time = time.time() - start
+ print(f" Time: {old_time:.2f} seconds")
+ plt.close('all')
+
+ # Test histogram from SPAC
+ print("\n Running histogram (SPAC)...")
+ start = time.time()
+ result = histogram(
+ adata,
+ feature=feature,
+ annotation=annotation,
+ layer=layer
+ )
+ new_time = time.time() - start
+ print(f" Time: {new_time:.2f} seconds")
+ plt.close('all')
+
+ # Calculate speedup
+ speedup = old_time / new_time if new_time > 0 else 0
+
+ print(f"\n Results:")
+ print(f" histogram_old: {old_time:.2f}s")
+ print(f" histogram: {new_time:.2f}s")
+ print(f" Speedup factor: {speedup:.2f}x")
+
+ if speedup > 1:
+ print(f" → histogram (SPAC) is {speedup:.2f}x faster")
+ elif speedup < 1:
+ print(f" → histogram_old is {1/speedup:.2f}x faster")
+ else:
+ print(f" → Both functions have similar performance")
+
+ print(f"{'=' * 70}\n")
+
+ # Store results for potential further analysis
+ return {
+ 'n_obs': n_obs,
+ 'histogram_old_time': old_time,
+ 'histogram_time': new_time,
+ 'speedup_factor': speedup
+ }
+
+ def test_comparison_1m(self):
+ """Compare histogram_old vs histogram with 1M cells."""
+ self._run_comparison(self.adata_1m, "Histogram Performance Comparison [1M cells]")
+
+ def test_comparison_5m(self):
+ """Compare histogram_old vs histogram with 5M cells."""
+ self._run_comparison(self.adata_5m, "Histogram Performance Comparison [5M cells]")
+
+ def test_comparison_10m(self):
+ """Compare histogram_old vs histogram with 10M cells."""
+ self._run_comparison(self.adata_10m, "Histogram Performance Comparison [10M cells]")
+
+
+if __name__ == '__main__':
+ unittest.main(verbosity=2)
diff --git a/tests/test_transformations/test_add_qc_metrics.py b/tests/test_transformations/test_add_qc_metrics.py
new file mode 100644
index 00000000..65d650fb
--- /dev/null
+++ b/tests/test_transformations/test_add_qc_metrics.py
@@ -0,0 +1,62 @@
+import unittest
+import numpy as np
+import scanpy as sc
+from scipy.sparse import csr_matrix
+from spac.transformations import add_qc_metrics
+
+class TestAddQCMetrics(unittest.TestCase):
+ @classmethod
+ def setUpClass(cls):
+ np.random.seed(42)
+
+ def create_test_adata(self, sparse=False):
+ X = np.array([
+ [1, 0, 3, 0],
+ [0, 2, 0, 4],
+ [5, 0, 0, 6]
+ ])
+ var_names = ["MT-CO1", "MT-CO2", "GeneA", "GeneB"]
+ obs_names = ["cell1", "cell2", "cell3"]
+ adata = sc.AnnData(X=csr_matrix(X) if sparse else X)
+ adata.var_names = var_names
+ adata.obs_names = obs_names
+ return adata
+
+ def test_qc_metrics_dense(self):
+ adata = self.create_test_adata(sparse=False)
+ add_qc_metrics(adata, organism="hs")
+ self.assertIn("nFeature", adata.obs)
+ self.assertIn("nCount", adata.obs)
+ self.assertIn("nCount_mt", adata.obs)
+ self.assertIn("percent.mt", adata.obs)
+ np.testing.assert_array_equal(adata.obs["nFeature"].values, [2, 2, 2])
+ np.testing.assert_array_equal(adata.obs["nCount"].values, [4, 6, 11])
+ np.testing.assert_array_equal(adata.obs["nCount_mt"].values, [1, 2, 5])
+ np.testing.assert_allclose(adata.obs["percent.mt"].values,
+ [25.0, 33.333333, 45.454545], rtol=1e-4)
+
+ def test_qc_metrics_sparse(self):
+ adata = self.create_test_adata(sparse=True)
+ add_qc_metrics(adata, organism="hs")
+ self.assertIn("nFeature", adata.obs)
+ self.assertIn("nCount", adata.obs)
+ self.assertIn("nCount_mt", adata.obs)
+ self.assertIn("percent.mt", adata.obs)
+ np.testing.assert_array_equal(adata.obs["nFeature"].values, [2, 2, 2])
+ np.testing.assert_array_equal(adata.obs["nCount"].values, [4, 6, 11])
+ np.testing.assert_array_equal(adata.obs["nCount_mt"].values, [1, 2, 5])
+ np.testing.assert_allclose(adata.obs["percent.mt"].values,
+ [25.0, 33.333333, 45.454545], rtol=1e-4)
+
+ def test_custom_mt_pattern(self):
+ adata = self.create_test_adata()
+ add_qc_metrics(adata, mt_match_pattern="Gene")
+ np.testing.assert_array_equal(adata.obs["nCount_mt"].values, [3, 4, 6])
+
+ def test_invalid_layer(self):
+ adata = self.create_test_adata()
+ with self.assertRaises(ValueError):
+ add_qc_metrics(adata, layer="not_a_layer")
+
+if __name__ == "__main__":
+ unittest.main()
\ No newline at end of file
diff --git a/tests/test_transformations/test_get_qc_summary_table.py b/tests/test_transformations/test_get_qc_summary_table.py
new file mode 100644
index 00000000..2894a3af
--- /dev/null
+++ b/tests/test_transformations/test_get_qc_summary_table.py
@@ -0,0 +1,95 @@
+import unittest
+import numpy as np
+import pandas as pd
+import scanpy as sc
+from anndata import AnnData
+from spac.transformations import add_qc_metrics
+from spac.transformations import get_qc_summary_table
+
+class TestGetQCSummaryTable(unittest.TestCase):
+ @classmethod
+ def setUpClass(cls):
+ # Set a random seed for reproducibility
+ np.random.seed(42)
+
+ # Create a small AnnData object for testing
+ def create_test_adata(self):
+ X = np.array([
+ [1, 0, 3, 0],
+ [0, 2, 0, 4],
+ [5, 0, 0, 6]
+ ])
+ var_names = ["MT-CO1", "MT-CO2", "GeneA", "GeneB"]
+ obs_names = ["cell1", "cell2", "cell3"]
+ adata = AnnData(X=X)
+ adata.var_names = var_names
+ adata.obs_names = obs_names
+ # Compute QC metrics using the provided function
+ add_qc_metrics(adata)
+ return adata
+
+ # Test that the summary table is created and has the correct structure
+ def test_qc_summary_table_basic(self):
+ adata = self.create_test_adata()
+ get_qc_summary_table(adata)
+ summary = adata.uns["qc_summary_table"]
+ self.assertIn("qc_summary_table", adata.uns)
+ self.assertTrue(isinstance(summary, pd.DataFrame))
+ # Check that all expected columns are present
+ self.assertIn("mean", summary.columns)
+ self.assertIn("median", summary.columns)
+ self.assertIn("upper_mad", summary.columns)
+ self.assertIn("lower_mad", summary.columns)
+ self.assertIn("upper_quantile", summary.columns)
+ self.assertIn("lower_quantile", summary.columns)
+ self.assertIn("Sample", summary.columns)
+ # Check that the correct metrics are summarized
+ self.assertEqual(set(summary["metric_name"]),
+ {"nFeature", "nCount", "percent.mt"})
+ # Check that the sample label is correct when not grouping
+ self.assertEqual(summary["Sample"].iloc[0], "All")
+
+ # Test that a TypeError is raised if a non-numeric column is included
+ def test_qc_summary_table_non_numeric(self):
+ adata = self.create_test_adata()
+ adata.obs["non_numeric"] = ["a", "b", "c"]
+ with self.assertRaises(TypeError) as exc_info:
+ get_qc_summary_table(adata,
+ stat_columns_list=["nFeature", "non_numeric"])
+ expected_msg = 'Column "non_numeric" must be numeric to compute statistics.'
+ self.assertEqual(str(exc_info.exception), expected_msg)
+
+ # Test that summary statistics is computed correctly with
+ # sample_column grouping
+ def test_qc_summary_table_grouping(self):
+ adata = self.create_test_adata()
+ get_qc_summary_table(adata)
+ # Add a sample column with two groups
+ adata.obs["batch"] = ["A", "A", "B"]
+ get_qc_summary_table(adata, sample_column="batch")
+ summary = adata.uns["qc_summary_table"]
+ # There should be two groups: A and B
+ self.assertEqual(set(summary["Sample"]), {"A", "B"})
+ # For group A (cells 0 and 1): nCount = [4, 6]
+ group_a = summary[(summary["Sample"] == "A") &
+ (summary["metric_name"] == "nCount")].iloc[0]
+ self.assertAlmostEqual(group_a["mean"], 5.0)
+ self.assertAlmostEqual(group_a["median"], 5.0)
+ # For group B (cell 2): nCount = [11]
+ group_b = summary[(summary["Sample"] == "B") &
+ (summary["metric_name"] == "nCount")].iloc[0]
+ self.assertAlmostEqual(group_b["mean"], 11.0)
+ self.assertAlmostEqual(group_b["median"], 11.0)
+
+ #Test that ValueError is raised if stat_columns_list is empty
+ def test_qc_summary_table_empty_stat_columns_list(self):
+ adata = self.create_test_adata()
+ with self.assertRaises(ValueError) as exc_info:
+ get_qc_summary_table(adata, stat_columns_list=[])
+ expected_msg = (
+ 'Parameter "stat_columns_list" must contain at least one column name.'
+ )
+ self.assertEqual(str(exc_info.exception), expected_msg)
+
+if __name__ == "__main__":
+ unittest.main()
\ No newline at end of file
diff --git a/tests/test_utils/test_compute_summary_qc_stats.py b/tests/test_utils/test_compute_summary_qc_stats.py
new file mode 100644
index 00000000..9ae628d4
--- /dev/null
+++ b/tests/test_utils/test_compute_summary_qc_stats.py
@@ -0,0 +1,74 @@
+import unittest
+import numpy as np
+import pandas as pd
+from spac.utils import compute_summary_qc_stats
+
+class TestComputeSummaryQCStats(unittest.TestCase):
+ def setUp(self):
+ # Create a simple DataFrame for testing
+ self.df = pd.DataFrame({
+ "nFeature": [2, 2, 2],
+ "nCount": [4, 6, 11],
+ "percent.mt": [25.0, 33.33333333333333, 45.45454545454545],
+ "all_nan": [np.nan, np.nan, np.nan],
+ "non_numeric": ["a", "b", "c"]
+ })
+
+ # Test that summary statistics are computed correctly for nFeature
+ def test_basic_statistics(self):
+ result = compute_summary_qc_stats(self.df,
+ stat_columns_list=["nFeature"])
+ row = result.iloc[0]
+ self.assertEqual(row["mean"], 2)
+ self.assertEqual(row["median"], 2)
+ self.assertEqual(row["upper_mad"], 2)
+ self.assertEqual(row["lower_mad"], 2)
+ self.assertEqual(row["upper_quantile"], 2)
+ self.assertEqual(row["lower_quantile"], 2)
+
+ # Test that summary statistics are computed correctly for nCount
+ def test_ncount_statistics(self):
+ # nCount: [4, 6, 11] -> mean 7.0, median 6.0, 95th pct 10.5, 5th pct 4.2
+ result = compute_summary_qc_stats(self.df,
+ stat_columns_list=["nCount"])
+ row = result.iloc[0]
+ self.assertAlmostEqual(row["mean"], 7.0)
+ self.assertAlmostEqual(row["median"], 6.0)
+ self.assertAlmostEqual(row["upper_quantile"], 10.5)
+ self.assertAlmostEqual(row["lower_quantile"], 4.2)
+
+ # Test that summary statistics are computed correctly for percent.mt
+ def test_percent_mt_statistics(self):
+ # percent.mt: [25.0, 33.33333333333333, 45.45454545454545] ->
+ # mean 34.59596, median 33.33333, upper_quantile 44.24242,
+ # lower_quantile 25.83333
+ result = compute_summary_qc_stats(self.df,
+ stat_columns_list=["percent.mt"])
+ row = result.iloc[0]
+ self.assertAlmostEqual(row["mean"], 34.59596, places=5)
+ self.assertAlmostEqual(row["median"], 33.33333, places=5)
+ self.assertAlmostEqual(row["upper_quantile"], 44.24242, places=5)
+ self.assertAlmostEqual(row["lower_quantile"], 25.83333, places=5)
+
+ # Test that a TypeError is raised if a non-numeric column is included
+ def test_non_numeric_column_raises(self):
+ with self.assertRaises(TypeError) as exc_info:
+ compute_summary_qc_stats(self.df,
+ stat_columns_list=["non_numeric"])
+ expected_msg = (
+ 'Column "non_numeric" must be numeric to compute statistics.'
+ )
+ self.assertEqual(str(exc_info.exception), expected_msg)
+
+ # Test that all-NaN columns are handled gracefully
+ def test_all_nan_column_raises(self):
+ with self.assertRaises(TypeError) as exc_info:
+ compute_summary_qc_stats(self.df, stat_columns_list=["all_nan"])
+ expected_msg = (
+ 'Column "all_nan" must be numeric to compute statistics. '
+ 'All values are NaN.'
+ )
+ self.assertEqual(str(exc_info.exception), expected_msg)
+
+if __name__ == "__main__":
+ unittest.main()