diff --git a/CHANGELOG.md b/CHANGELOG.md index b99276e2..395f9d26 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,15 +1,341 @@ # CHANGELOG -## v0.9.0 (2025-05-23) +## v0.9.1 (2026-02-27) -### Step +### Bug Fixes -- Bumping minor version - ([`e333641`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/e3336417a09b4ef26e71bde1b54da840f0980ab9)) +- Add missing 'import os' in performance tests + ([`7a5ec6d`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/7a5ec6d57ca7934d7d1003d417b3907f6d692308)) + +- test_boxplot_performance.py - test_histogram_performance.py + +- Add missing __init__.py in tests/templates + ([`7b7e6cb`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/7b7e6cbbe3bf9ea591de2075cb40861c2136a879)) + +- Remove 6 deprecated templates (sync with tools_refactor) + ([`d0bbc5e`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/d0bbc5ea8a7151bbf5f389549612b01862d6f382)) + +- Spac_boxplot outputs in json and validated ha5d + ([`c87e782`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/c87e782ff03c3ec52a2ae4c4353f5b426a6fc9d0)) + +- **boxplot**: Replace deprecated append call with concat + ([`4906439`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/4906439dfdbec132ba675e4d13b9c48cd33d8c38)) + +- **boxplot_template**: Address minor comments from copilot + ([`cd5abd0`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/cd5abd005d35678351867ae14020ca9d57317e02)) + +- **check_layer**: Use check_table spac function to evaluate if adata.layer is present + ([`0cf530b`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/0cf530bb603c0d74beeaa797df5f8ad222512921)) + +- **combine_annotations_template**: Address comments from copilot CR for + combine_annotations_template function + ([`9d8582a`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/9d8582a88b1d61cd1912aa146153178d8287d82a)) + +- **histogram_performance**: Add clarifying comment for old hist implementation + ([`179482e`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/179482eb16bbfaa7566bfcc8aae0adb29c0d2429)) + +- **histogram_template**: Fix odd number of cells in test + ([`51ba1c4`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/51ba1c4609e5d4435e612faf6b632bd8f1f76927)) + +- **interactive_spatial_plot_template**: Remove nidap comments + ([`64ff302`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/64ff3023155345b86ad9b72838bae0b53db21930)) + +- **nearest_neighbor_template**: Break the title in two lines + ([`4e083fb`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/4e083fbe77a1aaa1dac1d5b3d7841ed172721132)) + +- **normalize_batch_template**: Fix typo and unused import + ([`9edf8ce`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/9edf8cee2972ecfd34244d7f3482a7ca5be94b2e)) + +- **performance_test**: Fix the speedup calculation logic + ([`c31c3ff`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/c31c3ffec708bf2d2cf0a9ce487f54ccd04fe874)) + +- **posit_it_python_template**: Fixed typo + ([`e70f547`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/e70f547aae17a011ce52162c0bf6fd42a74902ed)) + +- **quantile_scaling_template**: Fix typo in both function and unit tests + ([`de6ee91`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/de6ee910c5a36eb7fa2c6429f36695317ceebb03)) + +- **relational_heatmap_template**: Address the issue of insecure temporary file and comments from + copilot + ([`5662bdb`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/5662bdbf77980af891bd55e45c02a20ed7af7546)) + +- **ripley_template**: Address review comments - merge dev into the branch and fix unit test + ([`415df89`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/415df89b0d3d368fa126b6167d2fb68028a8b512)) + +- **ripley_template**: Address review comments - replace debug prints with logging + ([`9914716`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/9914716db356d8ca217963a0d95c85bb37e2ff19)) + +- **sankey_plot_template**: Address the comments from copilot + ([`07baeb9`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/07baeb9037fd6bc3cefee2b71abb1b2d777223d9)) + +- **scripts**: Remove old performance testing script + ([`c0762c3`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/c0762c34c453e35bbe07e9a57180686cde1de7e8)) + +- **select_values_template**: Fix pandas/numpy version compatibility issue + ([`cedb6d1`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/cedb6d163d1ffc7383961c0291a20a474f35843f)) + +- **setup_analysis_template**: Fix setup_analysis_template function + ([`dddc33a`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/dddc33a78a0d6af0262eefd4b9250c0cfc19b77e)) + +- **spatial_interaction_template**: Fix typo + ([`1a3d03d`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/1a3d03daa6e0630132c2596e149a74cdba0520a0)) + +- **spatial_plot_temp**: Addrss copilot comments spatial_plot_template.py + ([`028a049`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/028a049bf5857cdb51eff4a76a4940dc50100872)) + +- **subset_analysis_template**: Fix typo and enhance function + ([`aefcb29`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/aefcb294f3d3ab5059e76232c667b5c3a37963a1)) + +- **summarize_dataframe_template**: Address comments from copilot + ([`30118f9`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/30118f9813cc8768401b2206ca1ba867a79175aa)) + +- **template_utils**: Address review comments + ([`e9f0883`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/e9f088335c76c093aa75edc81b3bab33b53a30c8)) + +- **template_utils**: Address review comments again + ([`c94b219`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/c94b219f30b551caeffa5074dda6173cf3a9ab8f)) + +- **template_utils**: Use applymap instead of map for pandas compatibility + ([`bbfa2f6`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/bbfa2f6cc82aadaf67b904642ec3f0ff61b3a816)) + +- **test_arcsinh_normalization_template**: Handle odd numbers with better list slicing + ([`afca7ff`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/afca7ff1eb153e3545a805a04be2df4acd283d15)) + +- **test_manual_phenotyping_temp**: Address comments of copilot review for unit tests + ([`e4d61cf`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/e4d61cf3748e6eb9f3aca7cad9faf2b41b6ea652)) + +- **test_performance**: Set the path to include spac + ([`9c4d606`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/9c4d606b9baf125dd5084125eb29e439b1930ab4)) + +- **tsne_analysis_template**: Fixed typo + ([`7ae8e57`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/7ae8e5725326e68e9f2ce440be62089c06ad5f36)) + +- **umap_transformation_template**: Return adata in place and fix comments of copilot + ([`9d24638`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/9d24638e5a235f3e128e72fc1e11f28f52bb1822)) + +- **umap_tsne_pca_template**: Address the comments from copilot + ([`7fb9b2c`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/7fb9b2cd291e3aa3651e5b3d01240772330609ff)) + +- **visualize_nearest_neighbor_template**: Fix typo + ([`52a4ee6`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/52a4ee6ef66f9ffaed59391bbe6d0fd4f003a816)) + +- **visualize_ripley**: Add missing __init__.py for templates module + ([`ff9238c`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/ff9238c126ab5e524608b24c28a47bebc3ed487d)) + +- **visualize_ripley**: Make plt.show() conditional based on show_plot parameter + ([`0e2747e`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/0e2747e8d05547c2e2dca4ad9e2b8ec730e24260)) + +### Code Style + +- **qc-metrics**: Fix spelling typo in nFeature metric + ([`59675ca`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/59675cad6540787be3a8a8a300dcc6c7e398dec9)) + +### Features + +- Add refactored galaxy tools + ([`4d2e3d7`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/4d2e3d722472e4a1808e936bd6267cc59e709a55)) + +- Add spac arcsinh_norm interactive_spatial_plot galaxy tools + ([`67e3ec4`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/67e3ec45822572f147f785999dfa0c4121d01635)) + +- Add SPAC boxplot Galaxy tool for Docker deployment + ([`9e3bea0`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/9e3bea0de6400b3a1031f831f8ced81f410b9007)) + +- Add spac_load_csv_files galaxy tools + ([`d2526a7`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/d2526a79965acb886c0100a04385f5325ec1923b)) + +- Add spac_setup_analysis galaxy tools + ([`bb6834b`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/bb6834bc2d292604f7499e09e9245384a3b0f694)) + +- Add spac_zscore_normalization galaxy tools + ([`cbbcd9e`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/cbbcd9e47930cf9b258a8668af6ec81238ad09d9)) + +- Refactor all templates and unit tests + ([`8005111`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/800511118ab5970754b4e09bf7017b423328da92)) + +- Refactored all template run_from_json() functions to use centralized save_results from + template_utils - Added show_static_image toggle (default False) to relational_heatmap_template and + sankey_plot_template to prevent Plotly-to-PNG hang on Galaxy - Refactored all unit tests in + tests/templates/ using snowball approach: real data, real filesystem, no mocking - One test file + per template validating output file existence, naming conventions, and non-empty artifacts - + Updated posit_it_python_template to use centralized save_results + +Templates changed: 43 files in src/spac/templates/ Tests changed: 37 files in tests/templates/ + +- **add_pin_color_rule_template**: Add add_pin_color_rule_template fnction and unit tests + ([`2477266`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/2477266b7de03e603b1dc9d48b9a53bed0af61ad)) + +- **analysis_to_csv_template**: Add analysis_to_csv_template function and unit tests + ([`448a980`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/448a980c496dfa5295a33e4432649947da6e6af7)) + +- **append_annotation_template**: Add append_annotation_template function and unit tests + ([`5e68e02`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/5e68e02faaa423f381d3d7321c1378f51e7f3c7f)) + +- **arcsinh_normalization_template**: Add arcsinh_normalization_template function and unit tests + ([`ff6cce4`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/ff6cce42b602642a4bd2f211d1dbd1fe6c6fd65e)) + +- **binary_to_categorical_annotation_template**: Add binary_to_categorical_annotation_template + function and unit tests + ([`8e500ec`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/8e500ecfe264b6125b6ace828003684e4b1b5cad)) +- **boxplot_template**: Add boxplot_template function and unit tests + ([`eb810ab`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/eb810ab4a532901817a5652ad59733af38b083fd)) -## v0.8.11 (2025-05-23) +- **calculate_centroid_template**: Add calculate_centroid_template function and unit tests + ([`4fea9c3`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/4fea9c336dabc207d6fa66de8553d3fe89c9dd62)) + +- **combine_annotations_template**: Add combine_annotations_template function and unit tests + ([`829a4bd`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/829a4bdbc95575079d9e24492f0e6bc2ea57475e)) + +- **combine_dataframes_template**: Add combine_dataframes_template function and unit tests + ([`3e24237`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/3e24237c5a4e3f57e09c0396532200dcdf471990)) + +- **downsample_cells_template**: Add downsample_cells_template function and unit tests + ([`47adf3e`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/47adf3e881f28d05b4627b1ec3d0954b403f634e)) + +- **hierarchical_heatmap_template**: Add hierarchical_heatmap_template and unit tests + ([`67e5a80`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/67e5a802500459e01a918c7fe96d20ab45c373c2)) + +- **hierarchical_heatmap_template**: Add hierarchical_heatmap_template function and unit tests + ([`6466e2f`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/6466e2f8daa4c95c49e0b6d0b0e5ec1460064d09)) + +- **histogram_template**: Add histogram_template and unit tests + ([`3380427`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/3380427ecbd6d743aaedfb954d612905153079e4)) + +- **interactive_spatial_plot_template**: Add interactive_spatial_plot_template function and unit + tests + ([`3f4336b`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/3f4336bc75f04ef1f4f8ca740a8b9b678a288da2)) + +- **load_csv**: Add load_csv template function with configuration support + ([`5456658`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/5456658e8d7b0ab0475ead65bbe982ccefbacf47)) + +- Add load_csv_files() to template_utils.py for loading and combining CSV files - Add + spell_out_special_characters() to handle biological marker names - Add + load_csv_files_with_config.py template wrapper for NIDAP compatibility - Add comprehensive unit + tests for both functions - Support column name cleaning, metadata mapping, and string column + enforcement + +- **manual_phenotyping_template**: Add manual_phenotyping_template function and unit tests + ([`941d641`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/941d641352c6eaee1c293b5ef98f1a9d92646c0c)) + +- **nearest_neighbor_calculation_template**: Add nearest_neighbor_calculation_template function and + unit tests + ([`19cd477`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/19cd477fbb3af575ae2b90615b34e801fb4dc66c)) + +- **neighborhood_profile_template**: Add neighborhood_profile_template function and unit tests + ([`824d131`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/824d131fed2216a5a291f61fc53d53e5e2c98c11)) + +- **normalize_batch_template**: Add normalize_batch_template functionand unit tests + ([`a71e865`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/a71e8656033f79736330a1f902200a6c81c4b37e)) + +- **phenograph_clustering_template**: Add phenograph_clustering_template function and unit tests + ([`ca29330`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/ca2933068c63c0a877970a496d4d9e2afe34d447)) + +- **posit_it_python_template**: Add posit_it_python_template functionand unit tests + ([`bbb53f7`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/bbb53f712285a19f738aa117205e907c1aa0404d)) + +- **qc-metrics**: Add common single cell quality control metrics + ([`994bac4`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/994bac4896650ecaeee7c81e2084872509a4b815)) + +- **qc_summary_statistics**: Add summary statistics table for sc/spatial transcriptomics quality + control metrics + ([`a228e5e`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/a228e5eb33777d03bec96af826568545e44157fd)) + +- **quantile_scaling_template**: Refactor nidap code, add quantile_scaling_template function and + unit tests + ([`542f985`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/542f985f4d1a2811f010b68da8dadffe8ac64220)) + +- **relational_heatmap_template**: Add relational_heatmap_template function and unit tests + ([`c57075d`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/c57075d87995b7f35b0c9065a354363f7cceb623)) + +- **rename_labels_template**: Add rename_labels_template function and unit tests + ([`96446d7`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/96446d70e392aa00e1ba77b2abddaa040d6aea7a)) + +- **ripley_l_template**: Add ripley_l_template and unit tests + ([`c889259`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/c88925913faf4c99c906e9b55a073759395faa78)) + +- **sankey_plot_template**: Add sankey_plot_template function and unit tests + ([`34b4eee`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/34b4eee45ae98fd0ce7c5f184611f4993125949f)) + +- **select_values_template**: Add select_values_template function and unit tests + ([`e59c994`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/e59c994352f09ee1398f9cea6cc02041daa0cd03)) + +- **setup_analysis_template**: Add setup_analysis_template function and unit tests + ([`1cfb39e`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/1cfb39ed56e30e805bf7bcf29399e8ce0f20333c)) + +- **spatial_interaction_template**: Add spatial_interaction_template and unit tests + ([`a7b1349`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/a7b13494e86b14555aba6a6fbcc0c7c12c1441f1)) + +- **spatial_plot_temp**: Add spatial_plot_template.py and unit tests + ([`0f26c08`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/0f26c08744f5611f0aa92a9a60b0ad20815e8d6e)) + +- **subset_analysis_template**: Add subset_analysis_template function and unit tests + ([`1db00a8`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/1db00a88dfdfbbc68862b3aac99aea5332e2255c)) + +- **summarize_annotation_statistics**: Add summarize_annotation_statistics template function and + unit tests + ([`34961bd`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/34961bde03bfc47211047ec58cbacc0814712e3c)) + +- **summarize_dataframe_template**: Add summarize_dataframe_template function and unit tests + ([`06d8feb`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/06d8feb4d72d132efb6c4db04f6254a8bd69ca04)) + +- **template_utils**: Add string_list_to_dictionary to template utils + ([`6ab7a9d`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/6ab7a9d91111c6fd1aa4a771bf85f8d07bd01b28)) + +- **template_utils**: Add template_utils and unit tests + ([`b960684`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/b960684f3e1887330f91ad307cc36b467d68bea3)) + +- **test_performance**: Add performance tests for boxplot/histogram + ([`862e523`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/862e523d08f40bb1e0aee53437fa06bdb533ac45)) + +- **tsne_analysis_template**: Add tsne_analysis_template function and unit tests + ([`abda610`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/abda61091b2a94251eccbe2535efc300d79a7e73)) + +- **umap_transformation_template**: Add umap_transformation_template function and unit tests + ([`e79fd78`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/e79fd7814a8728c9f9a529e90256ac44726d1571)) + +- **umap_tsne_pca_template**: Add umap_tsne_pca_template function and unit tests + ([`d67f6c7`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/d67f6c7278e7c8622ca43832acf8b74ae4a4363e)) + +- **utag_clustering_template**: Add utag_clustering_template and unit tests + ([`6da3985`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/6da39852b1d2ee8b3a1b4fda5c040055d6ae3cd4)) + +- **utag_clustering_template**: Add utag_clustering_template and unit tests + ([`743fb10`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/743fb10f71dc015b8892a4b291d3a2a9069e89a2)) + +- **visualize_nearest_neighbor_template**: Add visualize_nearest_neighbor_template function and unit + tests + ([`07ecdfa`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/07ecdfa15c3b5c4c7f65288811306bf68cef4962)) + +- **visualize_ripley_template**: Add visualize_ripley_template and unit tests + ([`48608e2`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/48608e26ca16c1e89a573286553b370f2a3f508b)) + +- **zscore_normalization_template**: Add zscore_normalization_template and unit tests + ([`b2d68c5`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/b2d68c5fb6cd1dfba38684d855f38aea98d56296)) + +### Refactoring + +- Merge paper.bib and paper.md updates from address-reviewer-comments branch + ([`9ae3ef3`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/9ae3ef331290197d3fcc305fb88f9b2baac3bccc)) + +- Streamline galaxy tools implementation + ([`009d010`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/009d01000c6a80dad9f6dc4a508b334a19796b3b)) + +- **get_qc_summary_table**: Adjust code style to adhere to spac guidlines closer + ([`5e03dc2`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/5e03dc23b5deaadb853fd26f327f643d7e19ad12)) + +- **get_qc_summary_table**: Refactor quality control summary statistics function and tests based on + the PR review + ([`d5061c4`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/d5061c43d31c20338576c58d207619f9ae789143)) + +### Testing + +- **perforamnce**: Skip performance tests by default + ([`fc664ad`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/fc664ad96b3375a3255f9bb36e51d0e4a505daba)) + + +## v0.9.0 (2025-05-23) ### Bug Fixes @@ -82,6 +408,9 @@ ### Continuous Integration +- **version**: Automatic development release + ([`3e126e9`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/3e126e9711be5d485010ced7460f99a180c8089e)) + - **version**: Automatic development release ([`195761d`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/195761de5563e80a60a7ea43ecb73e6105dc7d1d)) @@ -173,6 +502,11 @@ - **interactive_spatial_plot**: Used partial for better readability ([`60283bd`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/60283bd7671d2f2a65b52d77f4792b7461a8e407)) +### Step + +- Bumping minor version + ([`e333641`](https://github.com/FNLCR-DMAP/SCSAWorkflow/commit/e3336417a09b4ef26e71bde1b54da840f0980ab9)) + ### Testing - **comments**: Add extensive comments for complex data set generation in utag tests diff --git a/galaxy_tools/README.md b/galaxy_tools/README.md new file mode 100644 index 00000000..c615436a --- /dev/null +++ b/galaxy_tools/README.md @@ -0,0 +1,12 @@ +# SPAC Galaxy Tools + + ## Requirements + - Galaxy instance with Docker enabled + - Docker image: nciccbr/spac:v1 + + ## Installation + 1. Pull Docker image: `docker pull nciccbr/spac:v1` + 2. Copy tool directory to Galaxy's tools folder + 3. Add to tool_conf.xml: +```xml + \ No newline at end of file diff --git a/galaxy_tools/refactor_tools/nidap_to_galaxy_synthesizer.py b/galaxy_tools/refactor_tools/nidap_to_galaxy_synthesizer.py new file mode 100644 index 00000000..d695dc7d --- /dev/null +++ b/galaxy_tools/refactor_tools/nidap_to_galaxy_synthesizer.py @@ -0,0 +1,569 @@ +#!/usr/bin/env python3 +""" +Generalized NIDAP to Galaxy synthesizer - Production Version v11 +- No hardcoded tool-specific logic +- Blueprint-driven for all tools +- Handles multiple files/columns via blueprint flags +- FIXED: Use 'binary' instead of 'pickle' for Galaxy compatibility +- FIXED: Use 'set -eu' instead of 'set -euo pipefail' for broader shell compatibility +- FIXED: Pass outputs spec as environment variable to avoid encoding issues +- FIXED: Method signature for build_command_section +""" + +import argparse +import json +import re +import shutil +from pathlib import Path +from typing import Dict, List, Tuple + +class GeneralizedNIDAPToGalaxySynthesizer: + + def __init__(self, docker_image: str = "nciccbr/spac:v1"): + self.docker_image = docker_image + self.galaxy_profile = "24.2" + self.wrapper_script = Path('run_spac_template.sh') + self.runner_script = Path('spac_galaxy_runner.py') + + def slugify(self, name: str) -> str: + """Convert name to valid Galaxy tool ID component""" + s = re.sub(r'\[.*?\]', '', name).strip() + s = s.lower() + s = re.sub(r'\s+', '_', s) + s = re.sub(r'[^a-z0-9_]+', '', s) + s = re.sub(r'_+', '_', s) + return s.strip('_') + + def escape_xml(self, text: str, is_attribute: bool = True) -> str: + """Escape XML special characters""" + if text is None: + return "" + text = str(text) + text = text.replace('&', '&') + text = text.replace('<', '<') + text = text.replace('>', '>') + if is_attribute: + text = text.replace('"', '"') + text = text.replace("'", ''') + return text + + def clean_description(self, description: str) -> str: + """Clean NIDAP-specific content from descriptions""" + if not description: + return "" + + desc = str(description).replace('\r\n', '\n').replace('\r', '\n') + desc = re.sub(r'\[DUET\s*Documentation\]\([^)]+\)', '', desc, flags=re.IGNORECASE) + desc = re.sub(r'Please refer to\s+(?:,?\s*and\s*)+', '', desc, flags=re.IGNORECASE) + desc = re.sub(r'\\(?=\s*(?:\n|$))', '', desc) + desc = re.sub(r'[ \t]{2,}', ' ', desc) + desc = re.sub(r'\n{3,}', '\n\n', desc) + + return desc.strip() + + def determine_input_format(self, dataset: Dict, tool_name: str) -> str: + """ + Determine the correct format for an input dataset. + Simple mapping based on dataType field. + Uses 'binary' instead of 'pickle' for Galaxy compatibility. + """ + data_type = dataset.get('dataType', '').upper() + + # Handle comma-separated types (e.g., "CSV, Tabular") + data_types = [dt.strip() for dt in data_type.split(',')] + + # Check for CSV/Tabular types + if any(dt in ['CSV', 'TABULAR', 'TSV', 'TXT'] for dt in data_types): + return 'csv,tabular,tsv,txt' + + # DataFrame types + if any('DATAFRAME' in dt for dt in data_types): + return 'csv,tabular,tsv,txt' + + # AnnData/H5AD types + if any(dt in ['ANNDATA', 'H5AD', 'HDF5'] for dt in data_types): + return 'h5ad,h5,hdf5' + + # Pickle - use 'binary' for Galaxy compatibility + if any('PICKLE' in dt for dt in data_types): + return 'binary' + + # PYTHON_TRANSFORM_INPUT - default to binary (analysis objects) + if 'PYTHON_TRANSFORM_INPUT' in data_type: + return 'h5ad,binary' # Use binary instead of pickle + + # Default fallback + return 'h5ad,binary' # Use binary instead of pickle + + def build_inputs_section(self, blueprint: Dict, tool_name: str) -> Tuple[List[str], List[str]]: + """Build inputs from blueprint - generalized for all tools""" + lines = [] + multiple_file_inputs = [] # Track which inputs accept multiple files + + # Handle input datasets + for dataset in blueprint.get('inputDatasets', []): + name = dataset.get('key', 'input_data') + label = self.escape_xml(dataset.get('displayName', 'Input Data')) + desc = self.escape_xml(self.clean_description(dataset.get('description', ''))) + + # Determine format - now simpler with direct dataType mapping + formats = self.determine_input_format(dataset, tool_name) + + # Check if multiple files allowed (from blueprint) + is_multiple = dataset.get('isMultiple', False) + + if is_multiple: + multiple_file_inputs.append(name) + lines.append( + f' ' + ) + else: + lines.append( + f' ' + ) + + # Handle explicit column definitions from 'columns' schema + for col in blueprint.get('columns', []): + key = col.get('key') + if not key: + continue + + label = self.escape_xml(col.get('displayName', key)) + desc = self.escape_xml(col.get('description', '')) + # isMulti can be True, False, or None (None means False) + is_multi = col.get('isMulti') == True + + # Use text inputs for column names + if is_multi: + lines.append( + f' ' + ) + else: + lines.append( + f' ' + ) + + # Handle regular parameters + for param in blueprint.get('parameters', []): + key = param.get('key') + if not key: + continue + + label = self.escape_xml(param.get('displayName', key)) + desc = self.escape_xml(self.clean_description(param.get('description', ''))) + param_type = param.get('paramType', 'STRING').upper() + default = param.get('defaultValue', '') + is_optional = param.get('isOptional', False) + + # Add optional attribute if needed + optional_attr = ' optional="true"' if is_optional else '' + + if param_type == 'BOOLEAN': + checked = 'true' if str(default).strip().lower() == 'true' else 'false' + lines.append( + f' ' + ) + + elif param_type == 'INTEGER': + lines.append( + f' ' + ) + + elif param_type in ['NUMBER', 'FLOAT']: + lines.append( + f' ' + ) + + elif param_type == 'SELECT': + options = param.get('paramValues', []) + lines.append(f' ') + for opt in options: + selected = ' selected="true"' if str(opt) == str(default) else '' + opt_escaped = self.escape_xml(str(opt)) + lines.append(f' ') + lines.append(' ') + + elif param_type == 'LIST': + # Handle LIST type parameters - convert list to simple string + if isinstance(default, list): + # Filter out empty strings and join + filtered = [str(x) for x in default if x and str(x).strip()] + default = ', '.join(filtered) if filtered else '' + elif default == '[""]' or default == "['']" or default == '[]': + # Handle common empty list representations + default = '' + lines.append( + f' ' + ) + + else: # STRING + lines.append( + f' ' + ) + + return lines, multiple_file_inputs + + def build_outputs_section(self, outputs: Dict) -> List[str]: + """Build outputs section based on blueprint specification""" + lines = [] + + for output_type, output_path in outputs.items(): + + # Determine if single file or collection + is_collection = (output_path.endswith('_folder') or + output_path.endswith('_dir')) + + if not is_collection: + # Single file output + if output_type == 'analysis': + if '.h5ad' in output_path: + fmt = 'h5ad' + elif '.pickle' in output_path or '.pkl' in output_path: + fmt = 'binary' # Use binary instead of pickle + else: + fmt = 'binary' + + lines.append( + f' ' + ) + + elif output_type == 'DataFrames' and (output_path.endswith('.csv') or output_path.endswith('.tsv')): + # Single DataFrame file output + fmt = 'csv' if output_path.endswith('.csv') else 'tabular' + lines.append( + f' ' + ) + + elif output_type == 'figure': + ext = output_path.split('.')[-1] if '.' in output_path else 'png' + lines.append( + f' ' + ) + + elif output_type == 'html': + lines.append( + f' ' + ) + + else: + # Collection outputs + if output_type == 'DataFrames': + lines.append( + ' ' + ) + lines.append(f' ') + lines.append(f' ') + lines.append(' ') + + elif output_type == 'figures': + lines.append( + ' ' + ) + lines.append(f' ') + lines.append(f' ') + lines.append(f' ') + lines.append(' ') + + elif output_type == 'html': + lines.append( + ' ' + ) + lines.append(f' ') + lines.append(' ') + + # Debug outputs + lines.append(' ') + lines.append(' ') + lines.append(' ') + + return lines + + def build_command_section(self, tool_name: str, blueprint: Dict, multiple_file_inputs: List[str], outputs_spec: Dict) -> str: + """Build command section - generalized for all tools + FIXED: Use 'set -eu' instead of 'set -euo pipefail' for broader shell compatibility + FIXED: Pass outputs spec as environment variable to avoid encoding issues + """ + + # Convert outputs spec to JSON string + outputs_json = json.dumps(outputs_spec) + + # Check if any inputs accept multiple files + has_multiple_files = len(multiple_file_inputs) > 0 + + if has_multiple_files: + # Generate file copying logic for each multiple input + copy_sections = [] + for input_name in multiple_file_inputs: + # Use double curly braces to escape them in f-strings + copy_sections.append(f''' + ## Create directory for {input_name} + mkdir -p {input_name}_dir && + + ## Copy files to directory with original names + #for $i, $file in enumerate(${input_name}) + cp '${{file}}' '{input_name}_dir/${{file.name}}' && + #end for''') + + copy_logic = ''.join(copy_sections) + + command_section = f''' &2 && + cat "$params_json" >&2 && + echo "==================" >&2 && + + ## Save snapshot + cp "$params_json" params_snapshot.json && + + ## Run wrapper + bash $__tool_directory__/run_spac_template.sh "$params_json" "{tool_name}" + ]]>''' + else: + # Standard command for single-file inputs + command_section = f''' &2 && + cat "$params_json" >&2 && + echo "==================" >&2 && + + ## Save snapshot + cp "$params_json" params_snapshot.json && + + ## Run wrapper + bash $__tool_directory__/run_spac_template.sh "$params_json" "{tool_name}" + ]]>''' + + return command_section + + def get_template_filename(self, title: str, tool_name: str) -> str: + """Get the correct template filename""" + # Check if there's a custom mapping in the blueprint + # Otherwise use standard naming convention + if title == 'Load CSV Files' or tool_name == 'load_csv_files': + return 'load_csv_files_with_config.py' + else: + return f'{tool_name}_template.py' + + def generate_tool(self, json_path: Path, output_dir: Path) -> Dict: + """Generate Galaxy tool from NIDAP JSON blueprint""" + + with open(json_path, 'r') as f: + blueprint = json.load(f) + + title = blueprint.get('title', 'Unknown Tool') + clean_title = re.sub(r'\[.*?\]', '', title).strip() + + tool_name = self.slugify(clean_title) + tool_id = f'spac_{tool_name}' + + # Get outputs from blueprint + outputs_spec = blueprint.get('outputs', {}) + if not outputs_spec: + outputs_spec = {'analysis': 'transform_output.pickle'} + + # Get template filename (could be in blueprint too) + template_filename = blueprint.get('templateFilename', + self.get_template_filename(clean_title, tool_name)) + + # Build sections - pass tool_name and outputs_spec for context + inputs_lines, multiple_file_inputs = self.build_inputs_section(blueprint, tool_name) + outputs_lines = self.build_outputs_section(outputs_spec) + command_section = self.build_command_section(tool_name, blueprint, multiple_file_inputs, outputs_spec) + + # Generate description + full_desc = self.clean_description(blueprint.get('description', '')) + short_desc = full_desc.split('\n')[0] if full_desc else '' + if len(short_desc) > 100: + short_desc = short_desc[:97] + '...' + + # Build help section + help_sections = [] + help_sections.append(f'**{title}**\n') + help_sections.append(f'{full_desc}\n') + help_sections.append('This tool is part of the SPAC (SPAtial single-Cell analysis) toolkit.\n') + + # Add usage notes based on input types + if blueprint.get('columns'): + help_sections.append('**Column Parameters:** Enter column names as text. Use comma-separation or one per line for multiple columns.') + + if any(p.get('paramType') == 'LIST' for p in blueprint.get('parameters', [])): + help_sections.append('**List Parameters:** Use comma-separated values or one per line.') + help_sections.append('**Special Values:** Enter "All" to select all items.') + + if multiple_file_inputs: + help_sections.append(f'**Multiple File Inputs:** This tool accepts multiple files for: {", ".join(multiple_file_inputs)}') + + help_text = '\n'.join(help_sections) + + # Generate complete XML + xml_content = f''' + {self.escape_xml(short_desc, False)} + + + {self.docker_image} + + + + python3 + + +{command_section} + + + + + + +{chr(10).join(inputs_lines)} + + + +{chr(10).join(outputs_lines)} + + + + + + +@misc{{spac_toolkit, + author = {{FNLCR DMAP Team}}, + title = {{SPAC: SPAtial single-Cell analysis}}, + year = {{2024}}, + url = {{https://github.com/FNLCR-DMAP/SCSAWorkflow}} +}} + + +''' + + # Write files + tool_dir = output_dir / tool_id + tool_dir.mkdir(parents=True, exist_ok=True) + + xml_path = tool_dir / f'{tool_id}.xml' + with open(xml_path, 'w') as f: + f.write(xml_content) + + # Copy wrapper script + if self.wrapper_script.exists(): + shutil.copy2(self.wrapper_script, tool_dir / 'run_spac_template.sh') + + # Copy runner script + if self.runner_script.exists(): + shutil.copy2(self.runner_script, tool_dir / 'spac_galaxy_runner.py') + else: + print(f" Warning: spac_galaxy_runner.py not found in current directory") + + return { + 'tool_id': tool_id, + 'tool_name': title, + 'xml_path': xml_path, + 'tool_dir': tool_dir, + 'template': template_filename, + 'outputs': outputs_spec + } + +def main(): + parser = argparse.ArgumentParser( + description='Convert NIDAP templates to Galaxy tools - Generalized Version' + ) + parser.add_argument('json_input', help='JSON file or directory') + parser.add_argument('-o', '--output-dir', default='galaxy_tools') + parser.add_argument('--docker-image', default='nciccbr/spac:v1') + + args = parser.parse_args() + + synthesizer = GeneralizedNIDAPToGalaxySynthesizer( + docker_image=args.docker_image + ) + + json_input = Path(args.json_input) + if json_input.is_file(): + json_files = [json_input] + elif json_input.is_dir(): + json_files = sorted(json_input.glob('*.json')) + else: + print(f"Error: {json_input} not found") + return 1 + + print(f"Processing {len(json_files)} files") + print(f"Docker image: {args.docker_image}") + + output_dir = Path(args.output_dir) + output_dir.mkdir(parents=True, exist_ok=True) + + successful = [] + failed = [] + + for json_file in json_files: + print(f"\nProcessing: {json_file.name}") + try: + result = synthesizer.generate_tool(json_file, output_dir) + successful.append(result) + print(f" ✔ Created: {result['tool_id']}") + print(f" Template: {result['template']}") + print(f" Outputs: {list(result['outputs'].keys())}") + except Exception as e: + failed.append(json_file.name) + print(f" ✗ Failed: {e}") + import traceback + traceback.print_exc() + + print(f"\n{'='*60}") + print(f"Summary: {len(successful)} successful, {len(failed)} failed") + + if successful: + snippet_path = output_dir / 'tool_conf_snippet.xml' + with open(snippet_path, 'w') as f: + f.write('
\n') + for result in sorted(successful, key=lambda x: x['tool_id']): + tool_id = result['tool_id'] + f.write(f' \n') + f.write('
\n') + + print(f"\nGenerated tool configuration snippet: {snippet_path}") + + return 0 if not failed else 1 + +if __name__ == '__main__': + exit(main()) \ No newline at end of file diff --git a/galaxy_tools/refactor_tools/run_spac_template.sh b/galaxy_tools/refactor_tools/run_spac_template.sh new file mode 100644 index 00000000..3f2a7a3e --- /dev/null +++ b/galaxy_tools/refactor_tools/run_spac_template.sh @@ -0,0 +1,27 @@ +#!/usr/bin/env bash +# run_spac_template.sh - Universal wrapper for SPAC Galaxy tools +set -eu + +PARAMS_JSON="${1:?Missing params.json path}" +TEMPLATE_NAME="${2:?Missing template name}" + +# Get the directory where this script is located (the tool directory) +SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" + +# Look for spac_galaxy_runner.py in multiple locations +if [ -f "$SCRIPT_DIR/spac_galaxy_runner.py" ]; then + # If it's in the same directory as this script + RUNNER_PATH="$SCRIPT_DIR/spac_galaxy_runner.py" +elif [ -f "$__tool_directory__/spac_galaxy_runner.py" ]; then + # If Galaxy provides tool directory + RUNNER_PATH="$__tool_directory__/spac_galaxy_runner.py" +else + # Fallback to trying the module approach + echo "Warning: spac_galaxy_runner.py not found locally, trying as module" >&2 + python3 -m spac_galaxy_runner "$PARAMS_JSON" "$TEMPLATE_NAME" + exit $? +fi + +# Run the runner script directly +echo "Running: python3 $RUNNER_PATH $PARAMS_JSON $TEMPLATE_NAME" >&2 +python3 "$RUNNER_PATH" "$PARAMS_JSON" "$TEMPLATE_NAME" \ No newline at end of file diff --git a/galaxy_tools/refactor_tools/spac_arcsinh_normalization.xml b/galaxy_tools/refactor_tools/spac_arcsinh_normalization.xml new file mode 100644 index 00000000..69a183b7 --- /dev/null +++ b/galaxy_tools/refactor_tools/spac_arcsinh_normalization.xml @@ -0,0 +1,69 @@ + + Normalize features either by a user-defined co-factor or a determined percentile, allowing for ef... + + + nciccbr/spac:v1 + + + + python3 + + + &2 && + cat "$params_json" >&2 && + echo "==================" >&2 && + + ## Save snapshot + cp "$params_json" params_snapshot.json && + + ## Run wrapper + bash $__tool_directory__/run_spac_template.sh "$params_json" "arcsinh_normalization" + ]]> + + + + + + + + + + + + + + + + + + + + + + + + + + +@misc{spac_toolkit, + author = {FNLCR DMAP Team}, + title = {SPAC: SPAtial single-Cell analysis}, + year = {2024}, + url = {https://github.com/FNLCR-DMAP/SCSAWorkflow} +} + + + \ No newline at end of file diff --git a/galaxy_tools/refactor_tools/spac_boxplot.xml b/galaxy_tools/refactor_tools/spac_boxplot.xml new file mode 100644 index 00000000..97c9ef88 --- /dev/null +++ b/galaxy_tools/refactor_tools/spac_boxplot.xml @@ -0,0 +1,85 @@ + + Create a boxplot visualization of the features in the analysis dataset. + + + nciccbr/spac:v1 + + + + python3 + + + &2 && + cat "$params_json" >&2 && + echo "==================" >&2 && + + ## Save snapshot + cp "$params_json" params_snapshot.json && + + ## Run wrapper + bash $__tool_directory__/run_spac_template.sh "$params_json" "boxplot" + ]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@misc{spac_toolkit, + author = {FNLCR DMAP Team}, + title = {SPAC: SPAtial single-Cell analysis}, + year = {2024}, + url = {https://github.com/FNLCR-DMAP/SCSAWorkflow} +} + + + \ No newline at end of file diff --git a/galaxy_tools/refactor_tools/spac_galaxy_runner.py b/galaxy_tools/refactor_tools/spac_galaxy_runner.py new file mode 100644 index 00000000..d8535936 --- /dev/null +++ b/galaxy_tools/refactor_tools/spac_galaxy_runner.py @@ -0,0 +1,515 @@ +#!/usr/bin/env python3 +""" +spac_galaxy_runner.py - Hybrid version combining refactored structure with robust parameter handling +Incorporates critical fixes from original wrapper for parameter processing +""" + +import json +import os +import sys +import subprocess +import shutil +from pathlib import Path +import re + +def main(): + """Main entry point for SPAC Galaxy runner""" + if len(sys.argv) != 3: + print("Usage: spac_galaxy_runner.py ") + sys.exit(1) + + params_path = sys.argv[1] + template_name = sys.argv[2] + + print(f"=== SPAC Galaxy Runner v2.0 (Hybrid) ===") + print(f"Template: {template_name}") + print(f"Parameters: {params_path}") + + # Load parameters + with open(params_path) as f: + params = json.load(f) + + # Extract outputs specification from environment variable + outputs_spec_env = os.environ.get('GALAXY_OUTPUTS_SPEC', '') + if outputs_spec_env: + try: + outputs = json.loads(outputs_spec_env) + except json.JSONDecodeError: + print(f"WARNING: Could not parse GALAXY_OUTPUTS_SPEC: {outputs_spec_env}") + outputs = determine_default_outputs(template_name) + else: + # Fallback: try to get from params + outputs = params.pop('outputs', {}) + if isinstance(outputs, str): + try: + outputs = json.loads(unsanitize_galaxy_params(outputs)) + except json.JSONDecodeError: + print(f"WARNING: Could not parse outputs: {outputs}") + outputs = determine_default_outputs(template_name) + + print(f"Outputs specification: {outputs}") + + # CRITICAL: Unsanitize and normalize parameters (from original) + params = process_galaxy_parameters(params, template_name) + + # Handle multiple file inputs that were copied to directories by Galaxy + handle_multiple_file_inputs(params) + + # Create output directories + create_output_directories(outputs) + + # Add output paths to params - critical for templates that save results + params['save_results'] = True + + if 'analysis' in outputs: + params['output_path'] = outputs['analysis'] + params['Output_Path'] = outputs['analysis'] + params['Output_File'] = outputs['analysis'] + + if 'DataFrames' in outputs: + df_path = outputs['DataFrames'] + # Check if it's a single file or a directory + if df_path.endswith('.csv') or df_path.endswith('.tsv'): + # Single file output (like Load CSV Files) + params['output_file'] = df_path + params['Output_File'] = df_path + print(f" Set output_file to: {df_path}") + else: + # Directory for multiple files (like boxplot) + params['output_dir'] = df_path + params['Export_Dir'] = df_path + params['Output_File'] = os.path.join(df_path, f'{template_name}_output.csv') + print(f" Set output_dir to: {df_path}") + + if 'figures' in outputs: + fig_dir = outputs['figures'] + params['figure_dir'] = fig_dir + params['Figure_Dir'] = fig_dir + params['Figure_File'] = os.path.join(fig_dir, f'{template_name}.png') + print(f" Set figure_dir to: {fig_dir}") + + if 'html' in outputs: + html_dir = outputs['html'] + params['html_dir'] = html_dir + params['Output_File'] = os.path.join(html_dir, f'{template_name}.html') + print(f" Set html_dir to: {html_dir}") + + # Save config for debugging (without outputs key) + with open('config_used.json', 'w') as f: + config_data = {k: v for k, v in params.items() if k not in ['outputs']} + json.dump(config_data, f, indent=2) + + # Save params for template execution + with open('params_exec.json', 'w') as f: + json.dump(params, f, indent=2) + + # Find and execute template + template_path = find_template(template_name) + if not template_path: + print(f"ERROR: Template for {template_name} not found") + sys.exit(1) + + # Run template + exit_code = execute_template(template_path, 'params_exec.json') + if exit_code != 0: + print(f"ERROR: Template failed with exit code {exit_code}") + sys.exit(exit_code) + + # Handle output mapping for specific tools + handle_output_mapping(template_name, outputs) + + # Verify outputs + verify_outputs(outputs) + + # Save snapshot for debugging + with open('params_snapshot.json', 'w') as f: + json.dump(params, f, indent=2) + + print("=== Execution Complete ===") + sys.exit(0) + +def unsanitize_galaxy_params(s: str) -> str: + """Remove Galaxy's parameter sanitization tokens""" + if not isinstance(s, str): + return s + replacements = { + '__ob__': '[', '__cb__': ']', + '__oc__': '{', '__cc__': '}', + '__dq__': '"', '__sq__': "'", + '__gt__': '>', '__lt__': '<', + '__cn__': '\n', '__cr__': '\r', + '__tc__': '\t', '__pd__': '#', + '__at__': '@', '__cm__': ',' + } + for token, char in replacements.items(): + s = s.replace(token, char) + return s + +def process_galaxy_parameters(params: dict, template_name: str) -> dict: + """Process Galaxy parameters - unsanitize and normalize (from original wrapper)""" + print("\n=== Processing Galaxy Parameters ===") + + # Step 1: Recursively unsanitize all parameters + def recursive_unsanitize(obj): + if isinstance(obj, str): + unsanitized = unsanitize_galaxy_params(obj).strip() + # Try to parse JSON strings + if (unsanitized.startswith('[') and unsanitized.endswith(']')) or \ + (unsanitized.startswith('{') and unsanitized.endswith('}')): + try: + return json.loads(unsanitized) + except: + return unsanitized + return unsanitized + elif isinstance(obj, dict): + return {k: recursive_unsanitize(v) for k, v in obj.items()} + elif isinstance(obj, list): + return [recursive_unsanitize(item) for item in obj] + return obj + + params = recursive_unsanitize(params) + + # Step 2: Handle specific parameter normalizations + + # Special handling for String_Columns in load_csv templates + if 'load_csv' in template_name and 'String_Columns' in params: + value = params['String_Columns'] + if not isinstance(value, list): + if value in [None, "", "[]", "__ob____cb__", []]: + params['String_Columns'] = [] + elif isinstance(value, str): + s = value.strip() + if s and s != '[]': + if ',' in s: + params['String_Columns'] = [item.strip() for item in s.split(',') if item.strip()] + else: + params['String_Columns'] = [s] if s else [] + else: + params['String_Columns'] = [] + else: + params['String_Columns'] = [] + print(f" Normalized String_Columns: {params['String_Columns']}") + + # Handle Feature_Regex specially - MUST BE AFTER Features_to_Analyze processing + if 'Feature_Regex' in params: + value = params['Feature_Regex'] + if value in [[], [""], "__ob____cb__", "[]", "", None]: + params['Feature_Regex'] = [] + print(" Cleared empty Feature_Regex parameter") + elif isinstance(value, list) and value: + # Join regex patterns with | + params['Feature_Regex'] = "|".join(str(v) for v in value if v) + print(f" Joined Feature_Regex list: {params['Feature_Regex']}") + + # Handle Features_to_Analyze - split if it's a single string with spaces or commas + if 'Features_to_Analyze' in params: + value = params['Features_to_Analyze'] + if isinstance(value, str): + # Check for comma-separated or space-separated features + if ',' in value: + params['Features_to_Analyze'] = [item.strip() for item in value.split(',') if item.strip()] + print(f" Split Features_to_Analyze on comma: {value} -> {params['Features_to_Analyze']}") + elif ' ' in value: + # This is likely multiple features in a single string + params['Features_to_Analyze'] = [item.strip() for item in value.split() if item.strip()] + print(f" Split Features_to_Analyze on space: {value} -> {params['Features_to_Analyze']}") + elif value: + params['Features_to_Analyze'] = [value] + print(f" Wrapped Features_to_Analyze in list: {params['Features_to_Analyze']}") + + # Handle Feature_s_to_Plot for boxplot + if 'Feature_s_to_Plot' in params: + value = params['Feature_s_to_Plot'] + # Check if it's "All" + if value == "All" or value == ["All"]: + params['Feature_s_to_Plot'] = ["All"] + print(" Set Feature_s_to_Plot to ['All']") + elif isinstance(value, str) and value not in ["", "[]"]: + params['Feature_s_to_Plot'] = [value] + print(f" Wrapped Feature_s_to_Plot in list: {params['Feature_s_to_Plot']}") + + # Normalize list parameters + list_params = ['Annotation_s_', 'Features', 'Markers', 'Markers_to_Plot', + 'Phenotypes', 'Binary_Phenotypes', 'Features_to_Analyze'] + + for key in list_params: + if key in params: + value = params[key] + if not isinstance(value, list): + if value in [None, ""]: + continue + elif isinstance(value, str): + if ',' in value: + params[key] = [item.strip() for item in value.split(',') if item.strip()] + print(f" Split {key} on comma: {params[key]}") + else: + params[key] = [value] + print(f" Wrapped {key} in list: {params[key]}") + + # Fix single-element lists for coordinate columns + coordinate_keys = ['X_Coordinate_Column', 'Y_Coordinate_Column', + 'X_centroid', 'Y_centroid', 'Primary_Annotation', + 'Secondary_Annotation', 'Annotation'] + + for key in coordinate_keys: + if key in params: + value = params[key] + if isinstance(value, list) and len(value) == 1: + params[key] = value[0] + print(f" Extracted single value from {key}: {params[key]}") + + return params + +def determine_default_outputs(template_name: str) -> dict: + """Determine default outputs based on template name""" + if 'boxplot' in template_name or 'plot' in template_name or 'histogram' in template_name: + return {'DataFrames': 'dataframe_folder', 'figures': 'figure_folder'} + elif 'load_csv' in template_name: + # Load CSV Files produces a single CSV file, not a folder + return {'DataFrames': 'combined_data.csv'} + elif 'interactive' in template_name: + return {'html': 'html_folder'} + else: + return {'analysis': 'transform_output.pickle'} + +def handle_multiple_file_inputs(params): + """ + Handle multiple file inputs that Galaxy copies to directories. + Galaxy copies multiple files to xxx_dir directories. + """ + print("\n=== Handling Multiple File Inputs ===") + + # Check for directory inputs that indicate multiple files + for key in list(params.keys()): + # Check if Galaxy created a _dir directory for this input + dir_name = f"{key}_dir" + if os.path.isdir(dir_name): + params[key] = dir_name + print(f" Updated {key} -> {dir_name}") + # List files in the directory + files = os.listdir(dir_name) + print(f" Contains {len(files)} files") + for f in files[:3]: + print(f" - {f}") + if len(files) > 3: + print(f" ... and {len(files)-3} more") + + # Special case for CSV_Files (Load CSV Files tool) + if 'CSV_Files' in params: + # Check for csv_input_dir created by Galaxy command + if os.path.exists('csv_input_dir') and os.path.isdir('csv_input_dir'): + params['CSV_Files'] = 'csv_input_dir' + print(f" Using csv_input_dir for CSV_Files") + elif os.path.isdir('CSV_Files_dir'): + params['CSV_Files'] = 'CSV_Files_dir' + print(f" Updated CSV_Files -> CSV_Files_dir") + elif isinstance(params['CSV_Files'], str) and os.path.isfile(params['CSV_Files']): + # Single file - get its directory + params['CSV_Files'] = os.path.dirname(params['CSV_Files']) + print(f" Using directory of CSV file: {params['CSV_Files']}") + +def create_output_directories(outputs): + """Create directories for collection outputs""" + print("\n=== Creating Output Directories ===") + + for output_type, path in outputs.items(): + if path.endswith('_folder') or path.endswith('_dir'): + # This is a directory for multiple files + os.makedirs(path, exist_ok=True) + print(f" Created directory: {path}") + else: + # For single files, ensure parent directory exists if there is one + parent = os.path.dirname(path) + if parent and not os.path.exists(parent): + os.makedirs(parent, exist_ok=True) + print(f" Created parent directory: {parent}") + else: + print(f" Single file output: {path} (no directory needed)") + + # Add output parameters to params for templates that need them + # This is critical for templates like boxplot that check for these + return outputs + +def find_template(template_name): + """Find the template Python file""" + print("\n=== Finding Template ===") + + # Determine template filename + if template_name == 'load_csv_files': + template_py = 'load_csv_files_with_config.py' + else: + template_py = f'{template_name}_template.py' + + # Search paths (adjust based on your container/environment) + search_paths = [ + f'/opt/spac/templates/{template_py}', + f'/app/spac/templates/{template_py}', + f'/opt/SCSAWorkflow/src/spac/templates/{template_py}', + f'/usr/local/lib/python3.9/site-packages/spac/templates/{template_py}', + f'./templates/{template_py}', + f'./{template_py}' + ] + + for path in search_paths: + if os.path.exists(path): + print(f" Found: {path}") + return path + + print(f" ERROR: {template_py} not found in:") + for path in search_paths: + print(f" - {path}") + return None + +def execute_template(template_path, params_file): + """Execute the SPAC template""" + print("\n=== Executing Template ===") + print(f" Command: python3 {template_path} {params_file}") + + # Run template and capture output + result = subprocess.run( + ['python3', template_path, params_file], + capture_output=True, + text=True + ) + + # Save stdout and stderr + with open('tool_stdout.txt', 'w') as f: + f.write("=== STDOUT ===\n") + f.write(result.stdout) + if result.stderr: + f.write("\n=== STDERR ===\n") + f.write(result.stderr) + + # Display output + if result.stdout: + print(" Output:") + lines = result.stdout.split('\n') + for line in lines[:20]: # First 20 lines + print(f" {line}") + if len(lines) > 20: + print(f" ... ({len(lines)-20} more lines)") + + if result.stderr: + print(" Errors:", file=sys.stderr) + for line in result.stderr.split('\n'): + if line.strip(): + print(f" {line}", file=sys.stderr) + + return result.returncode + +def handle_output_mapping(template_name, outputs): + """ + Map template outputs to expected locations. + Generic approach: find outputs based on pattern matching. + """ + print("\n=== Output Mapping ===") + + for output_type, expected_path in outputs.items(): + # Skip if already exists at expected location + if os.path.exists(expected_path): + print(f" {output_type}: Already at {expected_path}") + continue + + # Handle single file outputs + if expected_path.endswith('.csv') or expected_path.endswith('.tsv') or \ + expected_path.endswith('.pickle') or expected_path.endswith('.h5ad'): + find_and_move_output(output_type, expected_path) + + # Handle folder outputs - check if a default folder exists + elif expected_path.endswith('_folder') or expected_path.endswith('_dir'): + default_folder = output_type.lower() + '_folder' + if default_folder != expected_path and os.path.isdir(default_folder): + print(f" Moving {default_folder} to {expected_path}") + shutil.move(default_folder, expected_path) + +def find_and_move_output(output_type, expected_path): + """ + Find output file based on extension and move to expected location. + More generic approach without hardcoded paths. + """ + ext = os.path.splitext(expected_path)[1] # e.g., '.csv' + basename = os.path.basename(expected_path) + + print(f" Looking for {output_type} output ({ext} file)...") + + # Search in common output locations + search_dirs = ['.', 'dataframe_folder', 'output', 'results'] + + for search_dir in search_dirs: + if not os.path.exists(search_dir): + continue + + if os.path.isdir(search_dir): + # Find files with matching extension + matches = [f for f in os.listdir(search_dir) + if f.endswith(ext)] + + if len(matches) == 1: + source = os.path.join(search_dir, matches[0]) + print(f" Found: {source}") + print(f" Moving to: {expected_path}") + shutil.move(source, expected_path) + return + elif len(matches) > 1: + # Multiple matches - use the largest or most recent + matches_with_size = [(f, os.path.getsize(os.path.join(search_dir, f))) + for f in matches] + matches_with_size.sort(key=lambda x: x[1], reverse=True) + source = os.path.join(search_dir, matches_with_size[0][0]) + print(f" Found multiple {ext} files, using largest: {source}") + shutil.move(source, expected_path) + return + + # Also check if file exists with different name in current dir + current_dir_matches = [f for f in os.listdir('.') + if f.endswith(ext) and f != basename] + if current_dir_matches: + source = current_dir_matches[0] + print(f" Found: {source}") + print(f" Moving to: {expected_path}") + shutil.move(source, expected_path) + return + + print(f" WARNING: No {ext} file found for {output_type}") + +def verify_outputs(outputs): + """Verify that expected outputs were created""" + print("\n=== Output Verification ===") + + all_found = True + for output_type, path in outputs.items(): + if os.path.exists(path): + if os.path.isdir(path): + files = os.listdir(path) + total_size = sum(os.path.getsize(os.path.join(path, f)) + for f in files) + print(f" ✔ {output_type}: {len(files)} files in {path} " + f"({format_size(total_size)})") + # Show first few files + for f in files[:3]: + size = os.path.getsize(os.path.join(path, f)) + print(f" - {f} ({format_size(size)})") + if len(files) > 3: + print(f" ... and {len(files)-3} more") + else: + size = os.path.getsize(path) + print(f" ✔ {output_type}: {path} ({format_size(size)})") + else: + print(f" ✗ {output_type}: NOT FOUND at {path}") + all_found = False + + if not all_found: + print("\n WARNING: Some outputs not found!") + print(" Check tool_stdout.txt for errors") + # Don't exit with error - let Galaxy handle missing outputs + +def format_size(bytes): + """Format byte size in human-readable format""" + for unit in ['B', 'KB', 'MB', 'GB']: + if bytes < 1024.0: + return f"{bytes:.1f} {unit}" + bytes /= 1024.0 + return f"{bytes:.1f} TB" + +if __name__ == '__main__': + main() \ No newline at end of file diff --git a/galaxy_tools/refactor_tools/spac_load_csv_files.xml b/galaxy_tools/refactor_tools/spac_load_csv_files.xml new file mode 100644 index 00000000..5d71d104 --- /dev/null +++ b/galaxy_tools/refactor_tools/spac_load_csv_files.xml @@ -0,0 +1,75 @@ + + Load CSV files from NIDAP dataset and combine them into a single pandas dataframe for downstream ... + + + nciccbr/spac:v1 + + + + python3 + + + &2 && + cat "$params_json" >&2 && + echo "==================" >&2 && + + ## Save snapshot + cp "$params_json" params_snapshot.json && + + ## Run wrapper + bash $__tool_directory__/run_spac_template.sh "$params_json" "load_csv_files" + ]]> + + + + + + + + + + + + + + + + + + + + + + +@misc{spac_toolkit, + author = {FNLCR DMAP Team}, + title = {SPAC: SPAtial single-Cell analysis}, + year = {2024}, + url = {https://github.com/FNLCR-DMAP/SCSAWorkflow} +} + + + \ No newline at end of file diff --git a/galaxy_tools/refactor_tools/spac_setup_analysis.xml b/galaxy_tools/refactor_tools/spac_setup_analysis.xml new file mode 100644 index 00000000..f762f78d --- /dev/null +++ b/galaxy_tools/refactor_tools/spac_setup_analysis.xml @@ -0,0 +1,71 @@ + + Convert the pre-processed dataset to the analysis object for downstream analysis. + + + nciccbr/spac:v1 + + + + python3 + + + &2 && + cat "$params_json" >&2 && + echo "==================" >&2 && + + ## Save snapshot + cp "$params_json" params_snapshot.json && + + ## Run wrapper + bash $__tool_directory__/run_spac_template.sh "$params_json" "setup_analysis" + ]]> + + + + + + + + + + + + + + + + + + + + + + + + + +@misc{spac_toolkit, + author = {FNLCR DMAP Team}, + title = {SPAC: SPAtial single-Cell analysis}, + year = {2024}, + url = {https://github.com/FNLCR-DMAP/SCSAWorkflow} +} + + + \ No newline at end of file diff --git a/galaxy_tools/spac_arcsinh_normalization/run_spac_template.sh b/galaxy_tools/spac_arcsinh_normalization/run_spac_template.sh new file mode 100644 index 00000000..a93b2d6e --- /dev/null +++ b/galaxy_tools/spac_arcsinh_normalization/run_spac_template.sh @@ -0,0 +1,710 @@ +#!/usr/bin/env bash +# run_spac_template.sh - SPAC wrapper with column index conversion +# Version: 5.4.1 - Integrated column conversion +set -euo pipefail + +PARAMS_JSON="${1:?Missing params.json path}" +TEMPLATE_BASE="${2:?Missing template base name}" + +# Handle both base names and full .py filenames +if [[ "$TEMPLATE_BASE" == *.py ]]; then + TEMPLATE_PY="$TEMPLATE_BASE" +elif [[ "$TEMPLATE_BASE" == "load_csv_files_with_config" ]]; then + TEMPLATE_PY="load_csv_files_with_config.py" +else + TEMPLATE_PY="${TEMPLATE_BASE}_template.py" +fi + +# Use SPAC Python environment +SPAC_PYTHON="${SPAC_PYTHON:-python3}" + +echo "=== SPAC Template Wrapper v5.3 ===" +echo "Parameters: $PARAMS_JSON" +echo "Template base: $TEMPLATE_BASE" +echo "Template file: $TEMPLATE_PY" +echo "Python: $SPAC_PYTHON" + +# Run template through Python +"$SPAC_PYTHON" - <<'PYTHON_RUNNER' "$PARAMS_JSON" "$TEMPLATE_PY" 2>&1 | tee tool_stdout.txt +import json +import os +import sys +import copy +import traceback +import inspect +import shutil +import re +import csv + +# Get arguments +params_path = sys.argv[1] +template_filename = sys.argv[2] + +print(f"[Runner] Loading parameters from: {params_path}") +print(f"[Runner] Template: {template_filename}") + +# Load parameters +with open(params_path, 'r') as f: + params = json.load(f) + +# Extract template name +template_name = os.path.basename(template_filename).replace('_template.py', '').replace('.py', '') + +# =========================================================================== +# DE-SANITIZATION AND PARSING +# =========================================================================== +def _unsanitize(s: str) -> str: + """Remove Galaxy's parameter sanitization tokens""" + if not isinstance(s, str): + return s + replacements = { + '__ob__': '[', '__cb__': ']', + '__oc__': '{', '__cc__': '}', + '__dq__': '"', '__sq__': "'", + '__gt__': '>', '__lt__': '<', + '__cn__': '\n', '__cr__': '\r', + '__tc__': '\t', '__pd__': '#', + '__at__': '@', '__cm__': ',' + } + for token, char in replacements.items(): + s = s.replace(token, char) + return s + +def _maybe_parse(v): + """Recursively de-sanitize and JSON-parse strings where possible.""" + if isinstance(v, str): + u = _unsanitize(v).strip() + if (u.startswith('[') and u.endswith(']')) or (u.startswith('{') and u.endswith('}')): + try: + return json.loads(u) + except Exception: + return u + return u + elif isinstance(v, dict): + return {k: _maybe_parse(val) for k, val in v.items()} + elif isinstance(v, list): + return [_maybe_parse(item) for item in v] + return v + +# Normalize the whole params tree +params = _maybe_parse(params) + +# =========================================================================== +# COLUMN INDEX CONVERSION - CRITICAL FOR SETUP ANALYSIS +# =========================================================================== +def should_skip_column_conversion(template_name): + """Some templates don't need column index conversion""" + return 'load_csv' in template_name + +def read_file_headers(filepath): + """Read column headers from various file formats""" + try: + import pandas as pd + + # Try pandas auto-detect + try: + df = pd.read_csv(filepath, nrows=1) + if len(df.columns) > 1 or not df.columns[0].startswith('Unnamed'): + columns = df.columns.tolist() + print(f"[Runner] Pandas auto-detected delimiter, found {len(columns)} columns") + return columns + except: + pass + + # Try common delimiters + for sep in ['\t', ',', ';', '|', ' ']: + try: + df = pd.read_csv(filepath, sep=sep, nrows=1) + if len(df.columns) > 1: + columns = df.columns.tolist() + sep_name = {'\t': 'tab', ',': 'comma', ';': 'semicolon', + '|': 'pipe', ' ': 'space'}.get(sep, sep) + print(f"[Runner] Pandas found {sep_name}-delimited file with {len(columns)} columns") + return columns + except: + continue + except ImportError: + print("[Runner] pandas not available, using csv fallback") + + # CSV module fallback + try: + with open(filepath, 'r', encoding='utf-8', errors='replace', newline='') as f: + sample = f.read(8192) + f.seek(0) + + try: + dialect = csv.Sniffer().sniff(sample, delimiters='\t,;| ') + reader = csv.reader(f, dialect) + header = next(reader) + columns = [h.strip().strip('"') for h in header if h.strip()] + if columns: + print(f"[Runner] csv.Sniffer detected {len(columns)} columns") + return columns + except: + f.seek(0) + first_line = f.readline().strip() + for sep in ['\t', ',', ';', '|']: + if sep in first_line: + columns = [h.strip().strip('"') for h in first_line.split(sep)] + if len(columns) > 1: + print(f"[Runner] Manual parsing found {len(columns)} columns") + return columns + except Exception as e: + print(f"[Runner] Failed to read headers: {e}") + + return None + +def should_convert_param(key, value): + """Check if parameter contains column indices""" + if value is None or value == "" or value == [] or value == {}: + return False + + key_lower = key.lower() + + # Skip String_Columns - it's names not indices + if key == 'String_Columns': + return False + + # Skip output/path parameters + if any(x in key_lower for x in ['output', 'path', 'file', 'directory', 'save', 'export']): + return False + + # Skip regex/pattern parameters (but we'll handle Feature_Regex specially) + if 'regex' in key_lower or 'pattern' in key_lower: + return False + + # Parameters with 'column' likely have indices + if 'column' in key_lower or '_col' in key_lower: + return True + + # Known index parameters + if key in {'Annotation_s_', 'Features_to_Analyze', 'Features', 'Markers', 'Markers_to_Plot', 'Phenotypes'}: + return True + + # Check if values look like indices + if isinstance(value, list): + return all(isinstance(v, int) or (isinstance(v, str) and v.strip().isdigit()) for v in value if v) + elif isinstance(value, (int, str)): + return isinstance(value, int) or (isinstance(value, str) and value.strip().isdigit()) + + return False + +def convert_single_index(item, columns): + """Convert a single column index to name""" + if isinstance(item, str) and not item.strip().isdigit(): + return item + + try: + if isinstance(item, str): + item = int(item.strip()) + elif isinstance(item, float): + item = int(item) + except (ValueError, AttributeError): + return item + + if isinstance(item, int): + idx = item - 1 # Galaxy uses 1-based indexing + if 0 <= idx < len(columns): + return columns[idx] + elif 0 <= item < len(columns): # Fallback for 0-based + print(f"[Runner] Note: Found 0-based index {item}, converting to {columns[item]}") + return columns[item] + else: + print(f"[Runner] Warning: Index {item} out of range (have {len(columns)} columns)") + + return item + +def convert_column_indices_to_names(params, template_name): + """Convert column indices to names for templates that need it""" + + if should_skip_column_conversion(template_name): + print(f"[Runner] Skipping column conversion for {template_name}") + return params + + print(f"[Runner] Checking for column index conversion (template: {template_name})") + + # Find input file + input_file = None + input_keys = ['Upstream_Dataset', 'Upstream_Analysis', 'CSV_Files', + 'Input_File', 'Input_Dataset', 'Data_File'] + + for key in input_keys: + if key in params: + value = params[key] + if isinstance(value, list) and value: + value = value[0] + if value and os.path.exists(str(value)): + input_file = str(value) + print(f"[Runner] Found input file via {key}: {os.path.basename(input_file)}") + break + + if not input_file: + print("[Runner] No input file found for column conversion") + return params + + # Read headers + columns = read_file_headers(input_file) + if not columns: + print("[Runner] Could not read column headers, skipping conversion") + return params + + print(f"[Runner] Successfully read {len(columns)} columns") + if len(columns) <= 10: + print(f"[Runner] Columns: {columns}") + else: + print(f"[Runner] First 10 columns: {columns[:10]}") + + # Convert indices to names + converted_count = 0 + for key, value in params.items(): + # Skip non-column parameters + if not should_convert_param(key, value): + continue + + # Convert indices + if isinstance(value, list): + converted_items = [] + for item in value: + converted = convert_single_index(item, columns) + if converted is not None: + converted_items.append(converted) + converted_value = converted_items + else: + converted_value = convert_single_index(value, columns) + + if value != converted_value: + params[key] = converted_value + converted_count += 1 + print(f"[Runner] Converted {key}: {value} -> {converted_value}") + + if converted_count > 0: + print(f"[Runner] Total conversions: {converted_count} parameters") + + # CRITICAL: Handle Feature_Regex specially + if 'Feature_Regex' in params: + value = params['Feature_Regex'] + if value in [[], [""], "__ob____cb__", "[]", "", None]: + params['Feature_Regex'] = "" + print("[Runner] Cleared empty Feature_Regex parameter") + elif isinstance(value, list) and value: + params['Feature_Regex'] = "|".join(str(v) for v in value if v) + print(f"[Runner] Joined Feature_Regex list: {params['Feature_Regex']}") + + return params + +# =========================================================================== +# APPLY COLUMN CONVERSION +# =========================================================================== +print("[Runner] Step 1: Converting column indices to names") +params = convert_column_indices_to_names(params, template_name) + +# =========================================================================== +# SPECIAL HANDLING FOR SPECIFIC TEMPLATES +# =========================================================================== + +# Helper function to coerce singleton lists to strings for load_csv +def _coerce_singleton_paths_for_load_csv(params, template_name): + """For load_csv templates, flatten 1-item lists to strings for path-like params.""" + if 'load_csv' not in template_name: + return params + for key in ('CSV_Files', 'CSV_Files_Configuration'): + val = params.get(key) + if isinstance(val, list) and len(val) == 1 and isinstance(val[0], (str, bytes)): + params[key] = val[0] + print(f"[Runner] Coerced {key} from list -> string") + return params + +# Special handling for String_Columns in load_csv templates +if 'load_csv' in template_name and 'String_Columns' in params: + value = params['String_Columns'] + if not isinstance(value, list): + if value in [None, "", "[]", "__ob____cb__"]: + params['String_Columns'] = [] + elif isinstance(value, str): + s = value.strip() + if s.startswith('[') and s.endswith(']'): + try: + params['String_Columns'] = json.loads(s) + except: + params['String_Columns'] = [s] if s else [] + elif ',' in s: + params['String_Columns'] = [item.strip() for item in s.split(',') if item.strip()] + else: + params['String_Columns'] = [s] if s else [] + else: + params['String_Columns'] = [] + print(f"[Runner] Ensured String_Columns is list: {params['String_Columns']}") + +# Apply coercion for load_csv files +params = _coerce_singleton_paths_for_load_csv(params, template_name) + +# Fix for Load CSV Files directory +if 'load_csv' in template_name and 'CSV_Files' in params: + # Check if csv_input_dir was created by Galaxy command + if os.path.exists('csv_input_dir') and os.path.isdir('csv_input_dir'): + params['CSV_Files'] = 'csv_input_dir' + print("[Runner] Using csv_input_dir created by Galaxy") + elif isinstance(params['CSV_Files'], str) and os.path.isfile(params['CSV_Files']): + # We have a single file path, need to get its directory + params['CSV_Files'] = os.path.dirname(params['CSV_Files']) + print(f"[Runner] Using directory of CSV file: {params['CSV_Files']}") + +# =========================================================================== +# LIST PARAMETER NORMALIZATION +# =========================================================================== +def should_normalize_as_list(key, value): + """Determine if a parameter should be normalized as a list""" + if isinstance(value, list): + return True + + if value is None or value == "": + return False + + key_lower = key.lower() + + # Skip regex parameters + if 'regex' in key_lower or 'pattern' in key_lower: + return False + + # Skip known single-value parameters + if any(x in key_lower for x in ['single', 'one', 'first', 'second', 'primary']): + return False + + # Plural forms suggest lists + if any(x in key_lower for x in ['features', 'markers', 'phenotypes', 'annotations', + 'columns', 'types', 'labels', 'regions', 'radii']): + return True + + # Check for list separators + if isinstance(value, str): + if ',' in value or '\n' in value: + return True + if value.strip().startswith('[') and value.strip().endswith(']'): + return True + + return False + +def normalize_to_list(value): + """Convert various input formats to a proper Python list""" + if value in (None, "", "All", ["All"], "all", ["all"]): + return ["All"] + + if isinstance(value, list): + return value + + if isinstance(value, str): + s = value.strip() + + # Try JSON parsing + if s.startswith('[') and s.endswith(']'): + try: + parsed = json.loads(s) + return parsed if isinstance(parsed, list) else [str(parsed)] + except: + pass + + # Split by comma + if ',' in s: + return [item.strip() for item in s.split(',') if item.strip()] + + # Split by newline + if '\n' in s: + return [item.strip() for item in s.split('\n') if item.strip()] + + # Single value + return [s] if s else [] + + return [value] if value is not None else [] + +# Normalize list parameters +print("[Runner] Step 2: Normalizing list parameters") +list_count = 0 +for key, value in list(params.items()): + if should_normalize_as_list(key, value): + original = value + normalized = normalize_to_list(value) + if original != normalized: + params[key] = normalized + list_count += 1 + if len(str(normalized)) > 100: + print(f"[Runner] Normalized {key}: {type(original).__name__} -> list of {len(normalized)} items") + else: + print(f"[Runner] Normalized {key}: {original} -> {normalized}") + +if list_count > 0: + print(f"[Runner] Normalized {list_count} list parameters") + +# CRITICAL FIX: Handle single-element lists for coordinate columns +# These should be strings, not lists +coordinate_keys = ['X_Coordinate_Column', 'Y_Coordinate_Column', 'X_centroid', 'Y_centroid'] +for key in coordinate_keys: + if key in params: + value = params[key] + if isinstance(value, list) and len(value) == 1: + params[key] = value[0] + print(f"[Runner] Extracted single value from {key}: {value} -> {params[key]}") + +# Also check for any key ending with '_Column' that has a single-element list +for key in list(params.keys()): + if key.endswith('_Column') and isinstance(params[key], list) and len(params[key]) == 1: + original = params[key] + params[key] = params[key][0] + print(f"[Runner] Extracted single value from {key}: {original} -> {params[key]}") + +# =========================================================================== +# OUTPUTS HANDLING +# =========================================================================== + +# Extract outputs specification +raw_outputs = params.pop('outputs', {}) +outputs = {} + +if isinstance(raw_outputs, dict): + outputs = raw_outputs +elif isinstance(raw_outputs, str): + try: + maybe = json.loads(_unsanitize(raw_outputs)) + if isinstance(maybe, dict): + outputs = maybe + except Exception: + pass + +if not isinstance(outputs, dict) or not outputs: + print("[Runner] Warning: 'outputs' missing or not a dict; using defaults") + if 'boxplot' in template_name or 'plot' in template_name or 'histogram' in template_name: + outputs = {'DataFrames': 'dataframe_folder', 'figures': 'figure_folder'} + elif 'load_csv' in template_name: + outputs = {'DataFrames': 'dataframe_folder'} + elif 'interactive' in template_name: + outputs = {'html': 'html_folder'} + else: + outputs = {'analysis': 'transform_output.pickle'} + +print(f"[Runner] Outputs -> {list(outputs.keys())}") + +# Create output directories +for output_type, path in outputs.items(): + if output_type != 'analysis' and path: + os.makedirs(path, exist_ok=True) + print(f"[Runner] Created {output_type} directory: {path}") + +# Add output paths to params +params['save_results'] = True + +if 'analysis' in outputs: + params['output_path'] = outputs['analysis'] + params['Output_Path'] = outputs['analysis'] + params['Output_File'] = outputs['analysis'] + +if 'DataFrames' in outputs: + df_dir = outputs['DataFrames'] + params['output_dir'] = df_dir + params['Export_Dir'] = df_dir + params['Output_File'] = os.path.join(df_dir, f'{template_name}_output.csv') + +if 'figures' in outputs: + fig_dir = outputs['figures'] + params['figure_dir'] = fig_dir + params['Figure_Dir'] = fig_dir + params['Figure_File'] = os.path.join(fig_dir, f'{template_name}.png') + +if 'html' in outputs: + html_dir = outputs['html'] + params['html_dir'] = html_dir + params['Output_File'] = os.path.join(html_dir, f'{template_name}.html') + +# Save runtime parameters +with open('params.runtime.json', 'w') as f: + json.dump(params, f, indent=2) + +# Save clean params for Galaxy display +params_display = {k: v for k, v in params.items() + if k not in ['Output_File', 'Figure_File', 'output_dir', 'figure_dir']} +with open('config_used.json', 'w') as f: + json.dump(params_display, f, indent=2) + +print(f"[Runner] Saved runtime parameters") + +# ============================================================================ +# LOAD AND EXECUTE TEMPLATE +# ============================================================================ + +# Try to import from installed package first (Docker environment) +template_module_name = template_filename.replace('.py', '') +try: + import importlib + mod = importlib.import_module(f'spac.templates.{template_module_name}') + print(f"[Runner] Loaded template from package: spac.templates.{template_module_name}") +except (ImportError, ModuleNotFoundError): + # Fallback to loading from file + print(f"[Runner] Package import failed, trying file load") + import importlib.util + + # Standard locations + template_paths = [ + f'/app/spac/templates/{template_filename}', + f'/opt/spac/templates/{template_filename}', + f'/opt/SCSAWorkflow/src/spac/templates/{template_filename}', + template_filename # Current directory + ] + + spec = None + for path in template_paths: + if os.path.exists(path): + spec = importlib.util.spec_from_file_location("template_mod", path) + if spec: + print(f"[Runner] Found template at: {path}") + break + + if not spec or not spec.loader: + print(f"[Runner] ERROR: Could not find template: {template_filename}") + sys.exit(1) + + mod = importlib.util.module_from_spec(spec) + spec.loader.exec_module(mod) + +# Verify run_from_json exists +if not hasattr(mod, 'run_from_json'): + print('[Runner] ERROR: Template missing run_from_json function') + sys.exit(2) + +# Check function signature +sig = inspect.signature(mod.run_from_json) +kwargs = {} + +if 'save_results' in sig.parameters: + kwargs['save_results'] = True +if 'show_plot' in sig.parameters: + kwargs['show_plot'] = False + +print(f"[Runner] Executing template with kwargs: {kwargs}") + +# Execute template +try: + result = mod.run_from_json('params.runtime.json', **kwargs) + print(f"[Runner] Template completed, returned: {type(result).__name__}") + + # Handle different return types + if result is not None: + if isinstance(result, dict): + print(f"[Runner] Template saved files: {list(result.keys())}") + elif isinstance(result, tuple): + # Handle tuple returns + saved_count = 0 + for i, item in enumerate(result): + if hasattr(item, 'savefig') and 'figures' in outputs: + import matplotlib + matplotlib.use('Agg') + import matplotlib.pyplot as plt + fig_path = os.path.join(outputs['figures'], f'figure_{i+1}.png') + item.savefig(fig_path, dpi=300, bbox_inches='tight') + plt.close(item) + saved_count += 1 + print(f"[Runner] Saved figure to {fig_path}") + elif hasattr(item, 'to_csv') and 'DataFrames' in outputs: + df_path = os.path.join(outputs['DataFrames'], f'table_{i+1}.csv') + item.to_csv(df_path, index=True) + saved_count += 1 + print(f"[Runner] Saved DataFrame to {df_path}") + + if saved_count > 0: + print(f"[Runner] Saved {saved_count} in-memory results") + + elif hasattr(result, 'to_csv') and 'DataFrames' in outputs: + df_path = os.path.join(outputs['DataFrames'], 'output.csv') + result.to_csv(df_path, index=True) + print(f"[Runner] Saved DataFrame to {df_path}") + + elif hasattr(result, 'savefig') and 'figures' in outputs: + import matplotlib + matplotlib.use('Agg') + import matplotlib.pyplot as plt + fig_path = os.path.join(outputs['figures'], 'figure.png') + result.savefig(fig_path, dpi=300, bbox_inches='tight') + plt.close(result) + print(f"[Runner] Saved figure to {fig_path}") + + elif hasattr(result, 'write_h5ad') and 'analysis' in outputs: + result.write_h5ad(outputs['analysis']) + print(f"[Runner] Saved AnnData to {outputs['analysis']}") + +except Exception as e: + print(f"[Runner] ERROR in template execution: {e}") + print(f"[Runner] Error type: {type(e).__name__}") + traceback.print_exc() + + # Debug help for common issues + if "String Columns must be a *list*" in str(e): + print("\n[Runner] DEBUG: String_Columns validation failed") + print(f"[Runner] Current String_Columns value: {params.get('String_Columns')}") + print(f"[Runner] Type: {type(params.get('String_Columns'))}") + + elif "regex pattern" in str(e).lower() or "^8$" in str(e): + print("\n[Runner] DEBUG: This appears to be a column index issue") + print("[Runner] Check that column indices were properly converted to names") + print("[Runner] Current Features_to_Analyze value:", params.get('Features_to_Analyze')) + print("[Runner] Current Feature_Regex value:", params.get('Feature_Regex')) + + sys.exit(1) + +# Verify outputs +print("[Runner] Verifying outputs...") +found_outputs = False + +for output_type, path in outputs.items(): + if output_type == 'analysis': + if os.path.exists(path): + size = os.path.getsize(path) + print(f"[Runner] ✔ {output_type}: {path} ({size:,} bytes)") + found_outputs = True + else: + print(f"[Runner] ✗ {output_type}: NOT FOUND") + else: + if os.path.exists(path) and os.path.isdir(path): + files = os.listdir(path) + if files: + print(f"[Runner] ✔ {output_type}: {len(files)} files") + for f in files[:3]: + print(f"[Runner] - {f}") + if len(files) > 3: + print(f"[Runner] ... and {len(files)-3} more") + found_outputs = True + else: + print(f"[Runner] ⚠ {output_type}: directory empty") + +# Check for files in working directory and move them +print("[Runner] Checking for files in working directory...") +for file in os.listdir('.'): + if os.path.isdir(file) or file in ['params.runtime.json', 'config_used.json', + 'tool_stdout.txt', 'outputs_returned.json']: + continue + + if file.endswith('.csv') and 'DataFrames' in outputs: + if not os.path.exists(os.path.join(outputs['DataFrames'], file)): + target = os.path.join(outputs['DataFrames'], file) + shutil.move(file, target) + print(f"[Runner] Moved {file} to {target}") + found_outputs = True + elif file.endswith(('.png', '.pdf', '.jpg', '.svg')) and 'figures' in outputs: + if not os.path.exists(os.path.join(outputs['figures'], file)): + target = os.path.join(outputs['figures'], file) + shutil.move(file, target) + print(f"[Runner] Moved {file} to {target}") + found_outputs = True + +if found_outputs: + print("[Runner] === SUCCESS ===") +else: + print("[Runner] WARNING: No outputs created") + +PYTHON_RUNNER + +EXIT_CODE=$? + +if [ $EXIT_CODE -ne 0 ]; then + echo "ERROR: Template execution failed with exit code $EXIT_CODE" + exit 1 +fi + +echo "=== Execution Complete ===" +exit 0 \ No newline at end of file diff --git a/galaxy_tools/spac_arcsinh_normalization/spac_arcsinh_normalization.xml b/galaxy_tools/spac_arcsinh_normalization/spac_arcsinh_normalization.xml new file mode 100644 index 00000000..ad0f4baf --- /dev/null +++ b/galaxy_tools/spac_arcsinh_normalization/spac_arcsinh_normalization.xml @@ -0,0 +1,70 @@ + + Normalize features either by a user-defined co-factor or a determined percentile, allowing for ef... + + + nciccbr/spac:v1 + + + + python3 + + + tool_stdout.txt && + + ## Run the universal wrapper (template name without .py extension) + bash $__tool_directory__/run_spac_template.sh "$params_json" arcsinh_normalization + ]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @misc{spac_toolkit, + author = {FNLCR DMAP Team}, + title = {SPAC: SPAtial single-Cell analysis}, + year = {2024}, + url = {https://github.com/FNLCR-DMAP/SCSAWorkflow} + } + + + \ No newline at end of file diff --git a/galaxy_tools/spac_boxplot/run_spac_template.sh b/galaxy_tools/spac_boxplot/run_spac_template.sh new file mode 100644 index 00000000..5e08ae50 --- /dev/null +++ b/galaxy_tools/spac_boxplot/run_spac_template.sh @@ -0,0 +1,782 @@ +#!/usr/bin/env bash +# run_spac_template.sh - SPAC wrapper with column index conversion +# Version: 5.4.2 - Integrated column conversion +set -euo pipefail + +PARAMS_JSON="${1:?Missing params.json path}" +TEMPLATE_BASE="${2:?Missing template base name}" + +# Handle both base names and full .py filenames +if [[ "$TEMPLATE_BASE" == *.py ]]; then + TEMPLATE_PY="$TEMPLATE_BASE" +elif [[ "$TEMPLATE_BASE" == "load_csv_files_with_config" ]]; then + TEMPLATE_PY="load_csv_files_with_config.py" +else + TEMPLATE_PY="${TEMPLATE_BASE}_template.py" +fi + +# Use SPAC Python environment +SPAC_PYTHON="${SPAC_PYTHON:-python3}" + +echo "=== SPAC Template Wrapper v5.4 ===" +echo "Parameters: $PARAMS_JSON" +echo "Template base: $TEMPLATE_BASE" +echo "Template file: $TEMPLATE_PY" +echo "Python: $SPAC_PYTHON" + +# Run template through Python +"$SPAC_PYTHON" - <<'PYTHON_RUNNER' "$PARAMS_JSON" "$TEMPLATE_PY" 2>&1 | tee tool_stdout.txt +import json +import os +import sys +import copy +import traceback +import inspect +import shutil +import re +import csv + +# Get arguments +params_path = sys.argv[1] +template_filename = sys.argv[2] + +print(f"[Runner] Loading parameters from: {params_path}") +print(f"[Runner] Template: {template_filename}") + +# Load parameters +with open(params_path, 'r') as f: + params = json.load(f) + +# Extract template name +template_name = os.path.basename(template_filename).replace('_template.py', '').replace('.py', '') + +# =========================================================================== +# DE-SANITIZATION AND PARSING +# =========================================================================== +def _unsanitize(s: str) -> str: + """Remove Galaxy's parameter sanitization tokens""" + if not isinstance(s, str): + return s + replacements = { + '__ob__': '[', '__cb__': ']', + '__oc__': '{', '__cc__': '}', + '__dq__': '"', '__sq__': "'", + '__gt__': '>', '__lt__': '<', + '__cn__': '\n', '__cr__': '\r', + '__tc__': '\t', '__pd__': '#', + '__at__': '@', '__cm__': ',' + } + for token, char in replacements.items(): + s = s.replace(token, char) + return s + +def _maybe_parse(v): + """Recursively de-sanitize and JSON-parse strings where possible.""" + if isinstance(v, str): + u = _unsanitize(v).strip() + if (u.startswith('[') and u.endswith(']')) or (u.startswith('{') and u.endswith('}')): + try: + return json.loads(u) + except Exception: + return u + return u + elif isinstance(v, dict): + return {k: _maybe_parse(val) for k, val in v.items()} + elif isinstance(v, list): + return [_maybe_parse(item) for item in v] + return v + +# Normalize the whole params tree +params = _maybe_parse(params) + +# =========================================================================== +# COLUMN INDEX CONVERSION - CRITICAL FOR SETUP ANALYSIS +# =========================================================================== +def should_skip_column_conversion(template_name): + """Some templates don't need column index conversion""" + return 'load_csv' in template_name + +def read_file_headers(filepath): + """Read column headers from various file formats""" + try: + import pandas as pd + + # Try pandas auto-detect + try: + df = pd.read_csv(filepath, nrows=1) + if len(df.columns) > 1 or not df.columns[0].startswith('Unnamed'): + columns = df.columns.tolist() + print(f"[Runner] Pandas auto-detected delimiter, found {len(columns)} columns") + return columns + except: + pass + + # Try common delimiters + for sep in ['\t', ',', ';', '|', ' ']: + try: + df = pd.read_csv(filepath, sep=sep, nrows=1) + if len(df.columns) > 1: + columns = df.columns.tolist() + sep_name = {'\t': 'tab', ',': 'comma', ';': 'semicolon', + '|': 'pipe', ' ': 'space'}.get(sep, sep) + print(f"[Runner] Pandas found {sep_name}-delimited file with {len(columns)} columns") + return columns + except: + continue + except ImportError: + print("[Runner] pandas not available, using csv fallback") + + # CSV module fallback + try: + with open(filepath, 'r', encoding='utf-8', errors='replace', newline='') as f: + sample = f.read(8192) + f.seek(0) + + try: + dialect = csv.Sniffer().sniff(sample, delimiters='\t,;| ') + reader = csv.reader(f, dialect) + header = next(reader) + columns = [h.strip().strip('"') for h in header if h.strip()] + if columns: + print(f"[Runner] csv.Sniffer detected {len(columns)} columns") + return columns + except: + f.seek(0) + first_line = f.readline().strip() + for sep in ['\t', ',', ';', '|']: + if sep in first_line: + columns = [h.strip().strip('"') for h in first_line.split(sep)] + if len(columns) > 1: + print(f"[Runner] Manual parsing found {len(columns)} columns") + return columns + except Exception as e: + print(f"[Runner] Failed to read headers: {e}") + + return None + +def should_convert_param(key, value): + """Check if parameter contains column indices""" + if value is None or value == "" or value == [] or value == {}: + return False + + key_lower = key.lower() + + # Skip String_Columns - it's names not indices + if key == 'String_Columns': + return False + + # Skip output/path parameters + if any(x in key_lower for x in ['output', 'path', 'file', 'directory', 'save', 'export']): + return False + + # Skip regex/pattern parameters (but we'll handle Feature_Regex specially) + if 'regex' in key_lower or 'pattern' in key_lower: + return False + + # Parameters with 'column' likely have indices + if 'column' in key_lower or '_col' in key_lower: + return True + + # Known index parameters + if key in {'Annotation_s_', 'Features_to_Analyze', 'Features', 'Markers', 'Markers_to_Plot', 'Phenotypes'}: + return True + + # Check if values look like indices + if isinstance(value, list): + return all(isinstance(v, int) or (isinstance(v, str) and v.strip().isdigit()) for v in value if v) + elif isinstance(value, (int, str)): + return isinstance(value, int) or (isinstance(value, str) and value.strip().isdigit()) + + return False + +def convert_single_index(item, columns): + """Convert a single column index to name""" + if isinstance(item, str) and not item.strip().isdigit(): + return item + + try: + if isinstance(item, str): + item = int(item.strip()) + elif isinstance(item, float): + item = int(item) + except (ValueError, AttributeError): + return item + + if isinstance(item, int): + idx = item - 1 # Galaxy uses 1-based indexing + if 0 <= idx < len(columns): + return columns[idx] + elif 0 <= item < len(columns): # Fallback for 0-based + print(f"[Runner] Note: Found 0-based index {item}, converting to {columns[item]}") + return columns[item] + else: + print(f"[Runner] Warning: Index {item} out of range (have {len(columns)} columns)") + + return item + +def convert_column_indices_to_names(params, template_name): + """Convert column indices to names for templates that need it""" + + if should_skip_column_conversion(template_name): + print(f"[Runner] Skipping column conversion for {template_name}") + return params + + print(f"[Runner] Checking for column index conversion (template: {template_name})") + + # Find input file + input_file = None + input_keys = ['Upstream_Dataset', 'Upstream_Analysis', 'CSV_Files', + 'Input_File', 'Input_Dataset', 'Data_File'] + + for key in input_keys: + if key in params: + value = params[key] + if isinstance(value, list) and value: + value = value[0] + if value and os.path.exists(str(value)): + input_file = str(value) + print(f"[Runner] Found input file via {key}: {os.path.basename(input_file)}") + break + + if not input_file: + print("[Runner] No input file found for column conversion") + return params + + # Read headers + columns = read_file_headers(input_file) + if not columns: + print("[Runner] Could not read column headers, skipping conversion") + return params + + print(f"[Runner] Successfully read {len(columns)} columns") + if len(columns) <= 10: + print(f"[Runner] Columns: {columns}") + else: + print(f"[Runner] First 10 columns: {columns[:10]}") + + # Convert indices to names + converted_count = 0 + for key, value in params.items(): + # Skip non-column parameters + if not should_convert_param(key, value): + continue + + # Convert indices + if isinstance(value, list): + converted_items = [] + for item in value: + converted = convert_single_index(item, columns) + if converted is not None: + converted_items.append(converted) + converted_value = converted_items + else: + converted_value = convert_single_index(value, columns) + + if value != converted_value: + params[key] = converted_value + converted_count += 1 + print(f"[Runner] Converted {key}: {value} -> {converted_value}") + + if converted_count > 0: + print(f"[Runner] Total conversions: {converted_count} parameters") + + # CRITICAL: Handle Feature_Regex specially + if 'Feature_Regex' in params: + value = params['Feature_Regex'] + if value in [[], [""], "__ob____cb__", "[]", "", None]: + params['Feature_Regex'] = "" + print("[Runner] Cleared empty Feature_Regex parameter") + elif isinstance(value, list) and value: + params['Feature_Regex'] = "|".join(str(v) for v in value if v) + print(f"[Runner] Joined Feature_Regex list: {params['Feature_Regex']}") + + return params + +# =========================================================================== +# APPLY COLUMN CONVERSION +# =========================================================================== +print("[Runner] Step 1: Converting column indices to names") +params = convert_column_indices_to_names(params, template_name) + +# =========================================================================== +# SPECIAL HANDLING FOR SPECIFIC TEMPLATES +# =========================================================================== + +# Helper function to coerce singleton lists to strings for load_csv +def _coerce_singleton_paths_for_load_csv(params, template_name): + """For load_csv templates, flatten 1-item lists to strings for path-like params.""" + if 'load_csv' not in template_name: + return params + for key in ('CSV_Files', 'CSV_Files_Configuration'): + val = params.get(key) + if isinstance(val, list) and len(val) == 1 and isinstance(val[0], (str, bytes)): + params[key] = val[0] + print(f"[Runner] Coerced {key} from list -> string") + return params + +# Special handling for String_Columns in load_csv templates +if 'load_csv' in template_name and 'String_Columns' in params: + value = params['String_Columns'] + if not isinstance(value, list): + if value in [None, "", "[]", "__ob____cb__"]: + params['String_Columns'] = [] + elif isinstance(value, str): + s = value.strip() + if s.startswith('[') and s.endswith(']'): + try: + params['String_Columns'] = json.loads(s) + except: + params['String_Columns'] = [s] if s else [] + elif ',' in s: + params['String_Columns'] = [item.strip() for item in s.split(',') if item.strip()] + else: + params['String_Columns'] = [s] if s else [] + else: + params['String_Columns'] = [] + print(f"[Runner] Ensured String_Columns is list: {params['String_Columns']}") + +# Apply coercion for load_csv files +params = _coerce_singleton_paths_for_load_csv(params, template_name) + +# Fix for Load CSV Files directory +if 'load_csv' in template_name and 'CSV_Files' in params: + # Check if csv_input_dir was created by Galaxy command + if os.path.exists('csv_input_dir') and os.path.isdir('csv_input_dir'): + params['CSV_Files'] = 'csv_input_dir' + print("[Runner] Using csv_input_dir created by Galaxy") + elif isinstance(params['CSV_Files'], str) and os.path.isfile(params['CSV_Files']): + # We have a single file path, need to get its directory + params['CSV_Files'] = os.path.dirname(params['CSV_Files']) + print(f"[Runner] Using directory of CSV file: {params['CSV_Files']}") + +# =========================================================================== +# LIST PARAMETER NORMALIZATION +# =========================================================================== +def should_normalize_as_list(key, value): + """Determine if a parameter should be normalized as a list""" + # CRITICAL: Skip outputs and other non-list parameters + key_lower = key.lower() + if key_lower in {'outputs', 'output', 'upstream_analysis', 'upstream_dataset', + 'table_to_visualize', 'figure_title', 'figure_width', + 'figure_height', 'figure_dpi', 'font_size'}: + return False + + # Already a proper list? + if isinstance(value, list): + # Only re-process if it's a single JSON string that needs parsing + if len(value) == 1 and isinstance(value[0], str): + s = value[0].strip() + return s.startswith('[') and s.endswith(']') + return False + + # Nothing to normalize + if value is None or value == "": + return False + + # CRITICAL: Explicitly mark Feature_s_to_Plot as a list parameter + if key == 'Feature_s_to_Plot' or key_lower == 'feature_s_to_plot': + return True + + # Other explicit list parameters + explicit_list_keys = { + 'features_to_analyze', 'features', 'markers', 'markers_to_plot', + 'phenotypes', 'labels', 'annotation_s_', 'string_columns' + } + if key_lower in explicit_list_keys: + return True + + # Skip regex parameters + if 'regex' in key_lower or 'pattern' in key_lower: + return False + + # Skip known single-value parameters + if any(x in key_lower for x in ['single', 'one', 'first', 'second', 'primary']): + return False + + # Plural forms suggest lists + if any(x in key_lower for x in [ + 'features', 'markers', 'phenotypes', 'annotations', + 'columns', 'types', 'labels', 'regions', 'radii' + ]): + return True + + # List-like syntax in string values + if isinstance(value, str): + s = value.strip() + if s.startswith('[') and s.endswith(']'): + return True + # Only treat comma/newline as list separator if not in outputs-like params + if 'output' not in key_lower and 'path' not in key_lower: + if ',' in s or '\n' in s: + return True + + return False + +def normalize_to_list(value): + """Convert various input formats to a proper Python list""" + # Handle special "All" cases first + if value in (None, "", "All", "all"): + return ["All"] + + # If it's already a list + if isinstance(value, list): + # Check for already-correct lists + if value == ["All"] or value == ["all"]: + return ["All"] + + # Check if it's a single-element list with a JSON string + if len(value) == 1 and isinstance(value[0], str): + s = value[0].strip() + # If the single element looks like JSON + if s.startswith('[') and s.endswith(']'): + try: + parsed = json.loads(s) + if isinstance(parsed, list): + return parsed + except: + pass + # If single element is "All" or "all" + elif s.lower() == "all": + return ["All"] + + # Already a proper list, return as-is + return value + + if isinstance(value, str): + s = value.strip() + + # Check for "All" string + if s.lower() == "all": + return ["All"] + + # Try JSON parsing + if s.startswith('[') and s.endswith(']'): + try: + parsed = json.loads(s) + return parsed if isinstance(parsed, list) else [str(parsed)] + except: + pass + + # Split by comma + if ',' in s: + return [item.strip() for item in s.split(',') if item.strip()] + + # Split by newline + if '\n' in s: + return [item.strip() for item in s.split('\n') if item.strip()] + + # Single value + return [s] if s else [] + + return [value] if value is not None else [] + +# Normalize list parameters +print("[Runner] Step 2: Normalizing list parameters") +list_count = 0 +for key, value in list(params.items()): + if should_normalize_as_list(key, value): + original = value + normalized = normalize_to_list(value) + if original != normalized: + params[key] = normalized + list_count += 1 + if len(str(normalized)) > 100: + print(f"[Runner] Normalized {key}: {type(original).__name__} -> list of {len(normalized)} items") + else: + print(f"[Runner] Normalized {key}: {original} -> {normalized}") + +if list_count > 0: + print(f"[Runner] Normalized {list_count} list parameters") + +# CRITICAL FIX: Handle single-element lists for coordinate columns +# These should be strings, not lists +coordinate_keys = ['X_Coordinate_Column', 'Y_Coordinate_Column', 'X_centroid', 'Y_centroid'] +for key in coordinate_keys: + if key in params: + value = params[key] + if isinstance(value, list) and len(value) == 1: + params[key] = value[0] + print(f"[Runner] Extracted single value from {key}: {value} -> {params[key]}") + +# Also check for any key ending with '_Column' that has a single-element list +for key in list(params.keys()): + if key.endswith('_Column') and isinstance(params[key], list) and len(params[key]) == 1: + original = params[key] + params[key] = params[key][0] + print(f"[Runner] Extracted single value from {key}: {original} -> {params[key]}") + +# =========================================================================== +# OUTPUTS HANDLING +# =========================================================================== + +# Extract outputs specification +raw_outputs = params.pop('outputs', {}) +outputs = {} + +if isinstance(raw_outputs, dict): + outputs = raw_outputs +elif isinstance(raw_outputs, str): + try: + maybe = json.loads(_unsanitize(raw_outputs)) + if isinstance(maybe, dict): + outputs = maybe + except Exception: + pass + +# CRITICAL FIX: Handle outputs if it was mistakenly normalized as a list +if isinstance(raw_outputs, list) and raw_outputs: + # Try to reconstruct the dict from the list + if len(raw_outputs) >= 2: + # Assume format like ["{'DataFrames': 'dataframe_folder'", "'figures': 'figure_folder'}"] + combined = ''.join(str(item) for item in raw_outputs) + # Clean up the string + combined = combined.replace("'", '"') + try: + outputs = json.loads(combined) + except: + # Try another approach - look for dict-like patterns + try: + dict_str = '{' + combined.split('{')[1].split('}')[0] + '}' + outputs = json.loads(dict_str.replace("'", '"')) + except: + pass + +if not isinstance(outputs, dict) or not outputs: + print("[Runner] Warning: 'outputs' missing or not a dict; using defaults") + if 'boxplot' in template_name or 'plot' in template_name or 'histogram' in template_name: + outputs = {'DataFrames': 'dataframe_folder', 'figures': 'figure_folder'} + elif 'load_csv' in template_name: + outputs = {'DataFrames': 'dataframe_folder'} + elif 'interactive' in template_name: + outputs = {'html': 'html_folder'} + else: + outputs = {'analysis': 'transform_output.pickle'} + +print(f"[Runner] Outputs -> {list(outputs.keys())}") + +# Create output directories +for output_type, path in outputs.items(): + if output_type != 'analysis' and path: + os.makedirs(path, exist_ok=True) + print(f"[Runner] Created {output_type} directory: {path}") + +# Add output paths to params +params['save_results'] = True + +if 'analysis' in outputs: + params['output_path'] = outputs['analysis'] + params['Output_Path'] = outputs['analysis'] + params['Output_File'] = outputs['analysis'] + +if 'DataFrames' in outputs: + df_dir = outputs['DataFrames'] + params['output_dir'] = df_dir + params['Export_Dir'] = df_dir + params['Output_File'] = os.path.join(df_dir, f'{template_name}_output.csv') + +if 'figures' in outputs: + fig_dir = outputs['figures'] + params['figure_dir'] = fig_dir + params['Figure_Dir'] = fig_dir + params['Figure_File'] = os.path.join(fig_dir, f'{template_name}.png') + +if 'html' in outputs: + html_dir = outputs['html'] + params['html_dir'] = html_dir + params['Output_File'] = os.path.join(html_dir, f'{template_name}.html') + +# Save runtime parameters +with open('params.runtime.json', 'w') as f: + json.dump(params, f, indent=2) + +# Save clean params for Galaxy display +params_display = {k: v for k, v in params.items() + if k not in ['Output_File', 'Figure_File', 'output_dir', 'figure_dir']} +with open('config_used.json', 'w') as f: + json.dump(params_display, f, indent=2) + +print(f"[Runner] Saved runtime parameters") + +# ============================================================================ +# LOAD AND EXECUTE TEMPLATE +# ============================================================================ + +# Try to import from installed package first (Docker environment) +template_module_name = template_filename.replace('.py', '') +try: + import importlib + mod = importlib.import_module(f'spac.templates.{template_module_name}') + print(f"[Runner] Loaded template from package: spac.templates.{template_module_name}") +except (ImportError, ModuleNotFoundError): + # Fallback to loading from file + print(f"[Runner] Package import failed, trying file load") + import importlib.util + + # Standard locations + template_paths = [ + f'/app/spac/templates/{template_filename}', + f'/opt/spac/templates/{template_filename}', + f'/opt/SCSAWorkflow/src/spac/templates/{template_filename}', + template_filename # Current directory + ] + + spec = None + for path in template_paths: + if os.path.exists(path): + spec = importlib.util.spec_from_file_location("template_mod", path) + if spec: + print(f"[Runner] Found template at: {path}") + break + + if not spec or not spec.loader: + print(f"[Runner] ERROR: Could not find template: {template_filename}") + sys.exit(1) + + mod = importlib.util.module_from_spec(spec) + spec.loader.exec_module(mod) + +# Verify run_from_json exists +if not hasattr(mod, 'run_from_json'): + print('[Runner] ERROR: Template missing run_from_json function') + sys.exit(2) + +# Check function signature +sig = inspect.signature(mod.run_from_json) +kwargs = {} + +if 'save_results' in sig.parameters: + kwargs['save_results'] = True +if 'show_plot' in sig.parameters: + kwargs['show_plot'] = False + +print(f"[Runner] Executing template with kwargs: {kwargs}") + +# Execute template +try: + result = mod.run_from_json('params.runtime.json', **kwargs) + print(f"[Runner] Template completed, returned: {type(result).__name__}") + + # Handle different return types + if result is not None: + if isinstance(result, dict): + print(f"[Runner] Template saved files: {list(result.keys())}") + elif isinstance(result, tuple): + # Handle tuple returns + saved_count = 0 + for i, item in enumerate(result): + if hasattr(item, 'savefig') and 'figures' in outputs: + import matplotlib + matplotlib.use('Agg') + import matplotlib.pyplot as plt + fig_path = os.path.join(outputs['figures'], f'figure_{i+1}.png') + item.savefig(fig_path, dpi=300, bbox_inches='tight') + plt.close(item) + saved_count += 1 + print(f"[Runner] Saved figure to {fig_path}") + elif hasattr(item, 'to_csv') and 'DataFrames' in outputs: + df_path = os.path.join(outputs['DataFrames'], f'table_{i+1}.csv') + item.to_csv(df_path, index=True) + saved_count += 1 + print(f"[Runner] Saved DataFrame to {df_path}") + + if saved_count > 0: + print(f"[Runner] Saved {saved_count} in-memory results") + + elif hasattr(result, 'to_csv') and 'DataFrames' in outputs: + df_path = os.path.join(outputs['DataFrames'], 'output.csv') + result.to_csv(df_path, index=True) + print(f"[Runner] Saved DataFrame to {df_path}") + + elif hasattr(result, 'savefig') and 'figures' in outputs: + import matplotlib + matplotlib.use('Agg') + import matplotlib.pyplot as plt + fig_path = os.path.join(outputs['figures'], 'figure.png') + result.savefig(fig_path, dpi=300, bbox_inches='tight') + plt.close(result) + print(f"[Runner] Saved figure to {fig_path}") + + elif hasattr(result, 'write_h5ad') and 'analysis' in outputs: + result.write_h5ad(outputs['analysis']) + print(f"[Runner] Saved AnnData to {outputs['analysis']}") + +except Exception as e: + print(f"[Runner] ERROR in template execution: {e}") + print(f"[Runner] Error type: {type(e).__name__}") + traceback.print_exc() + + # Debug help for common issues + if "String Columns must be a *list*" in str(e): + print("\n[Runner] DEBUG: String_Columns validation failed") + print(f"[Runner] Current String_Columns value: {params.get('String_Columns')}") + print(f"[Runner] Type: {type(params.get('String_Columns'))}") + + elif "regex pattern" in str(e).lower() or "^8$" in str(e): + print("\n[Runner] DEBUG: This appears to be a column index issue") + print("[Runner] Check that column indices were properly converted to names") + print("[Runner] Current Features_to_Analyze value:", params.get('Features_to_Analyze')) + print("[Runner] Current Feature_Regex value:", params.get('Feature_Regex')) + + sys.exit(1) + +# Verify outputs +print("[Runner] Verifying outputs...") +found_outputs = False + +for output_type, path in outputs.items(): + if output_type == 'analysis': + if os.path.exists(path): + size = os.path.getsize(path) + print(f"[Runner] ✔ {output_type}: {path} ({size:,} bytes)") + found_outputs = True + else: + print(f"[Runner] ✗ {output_type}: NOT FOUND") + else: + if os.path.exists(path) and os.path.isdir(path): + files = os.listdir(path) + if files: + print(f"[Runner] ✔ {output_type}: {len(files)} files") + for f in files[:3]: + print(f"[Runner] - {f}") + if len(files) > 3: + print(f"[Runner] ... and {len(files)-3} more") + found_outputs = True + else: + print(f"[Runner] ⚠ {output_type}: directory empty") + +# Check for files in working directory and move them +print("[Runner] Checking for files in working directory...") +for file in os.listdir('.'): + if os.path.isdir(file) or file in ['params.runtime.json', 'config_used.json', + 'tool_stdout.txt', 'outputs_returned.json']: + continue + + if file.endswith('.csv') and 'DataFrames' in outputs: + if not os.path.exists(os.path.join(outputs['DataFrames'], file)): + target = os.path.join(outputs['DataFrames'], file) + shutil.move(file, target) + print(f"[Runner] Moved {file} to {target}") + found_outputs = True + elif file.endswith(('.png', '.pdf', '.jpg', '.svg')) and 'figures' in outputs: + if not os.path.exists(os.path.join(outputs['figures'], file)): + target = os.path.join(outputs['figures'], file) + shutil.move(file, target) + print(f"[Runner] Moved {file} to {target}") + found_outputs = True + +if found_outputs: + print("[Runner] === SUCCESS ===") +else: + print("[Runner] WARNING: No outputs created") + +PYTHON_RUNNER + +EXIT_CODE=$? + +if [ $EXIT_CODE -ne 0 ]; then + echo "ERROR: Template execution failed with exit code $EXIT_CODE" + exit 1 +fi + +echo "=== Execution Complete ===" +exit 0 \ No newline at end of file diff --git a/galaxy_tools/spac_boxplot/spac_boxplot.xml b/galaxy_tools/spac_boxplot/spac_boxplot.xml new file mode 100644 index 00000000..18d80004 --- /dev/null +++ b/galaxy_tools/spac_boxplot/spac_boxplot.xml @@ -0,0 +1,92 @@ + + Create a boxplot visualization of the features in the analysis dataset. + + + nciccbr/spac:v1 + + + + python3 + + + tool_stdout.txt && + + ## Run the universal wrapper (template name without .py extension) + bash $__tool_directory__/run_spac_template.sh "$params_json" boxplot + ]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @misc{spac_toolkit, + author = {FNLCR DMAP Team}, + title = {SPAC: SPAtial single-Cell analysis}, + year = {2024}, + url = {https://github.com/FNLCR-DMAP/SCSAWorkflow} + } + + + \ No newline at end of file diff --git a/galaxy_tools/spac_load_csv_files/run_spac_template.sh b/galaxy_tools/spac_load_csv_files/run_spac_template.sh new file mode 100644 index 00000000..4ec7c784 --- /dev/null +++ b/galaxy_tools/spac_load_csv_files/run_spac_template.sh @@ -0,0 +1,786 @@ +#!/usr/bin/env bash +# run_spac_template.sh - SPAC wrapper with column index conversion +# Version: 5.5.0 - Fixed load_csv_files to output single CSV +set -euo pipefail + +PARAMS_JSON="${1:?Missing params.json path}" +TEMPLATE_BASE="${2:?Missing template base name}" + +# Handle both base names and full .py filenames +if [[ "$TEMPLATE_BASE" == *.py ]]; then + TEMPLATE_PY="$TEMPLATE_BASE" +elif [[ "$TEMPLATE_BASE" == "load_csv_files_with_config" ]]; then + TEMPLATE_PY="load_csv_files_with_config.py" +else + TEMPLATE_PY="${TEMPLATE_BASE}_template.py" +fi + +# Use SPAC Python environment +SPAC_PYTHON="${SPAC_PYTHON:-python3}" + +echo "=== SPAC Template Wrapper v5.5 ===" +echo "Parameters: $PARAMS_JSON" +echo "Template base: $TEMPLATE_BASE" +echo "Template file: $TEMPLATE_PY" +echo "Python: $SPAC_PYTHON" + +# Run template through Python +"$SPAC_PYTHON" - <<'PYTHON_RUNNER' "$PARAMS_JSON" "$TEMPLATE_PY" 2>&1 | tee tool_stdout.txt +import json +import os +import sys +import copy +import traceback +import inspect +import shutil +import re +import csv + +# Get arguments +params_path = sys.argv[1] +template_filename = sys.argv[2] + +print(f"[Runner] Loading parameters from: {params_path}") +print(f"[Runner] Template: {template_filename}") + +# Load parameters +with open(params_path, 'r') as f: + params = json.load(f) + +# Extract template name +template_name = os.path.basename(template_filename).replace('_template.py', '').replace('.py', '') + +# =========================================================================== +# DE-SANITIZATION AND PARSING +# =========================================================================== +def _unsanitize(s: str) -> str: + """Remove Galaxy's parameter sanitization tokens""" + if not isinstance(s, str): + return s + replacements = { + '__ob__': '[', '__cb__': ']', + '__oc__': '{', '__cc__': '}', + '__dq__': '"', '__sq__': "'", + '__gt__': '>', '__lt__': '<', + '__cn__': '\n', '__cr__': '\r', + '__tc__': '\t', '__pd__': '#', + '__at__': '@', '__cm__': ',' + } + for token, char in replacements.items(): + s = s.replace(token, char) + return s + +def _maybe_parse(v): + """Recursively de-sanitize and JSON-parse strings where possible.""" + if isinstance(v, str): + u = _unsanitize(v).strip() + if (u.startswith('[') and u.endswith(']')) or (u.startswith('{') and u.endswith('}')): + try: + return json.loads(u) + except Exception: + return u + return u + elif isinstance(v, dict): + return {k: _maybe_parse(val) for k, val in v.items()} + elif isinstance(v, list): + return [_maybe_parse(item) for item in v] + return v + +# Normalize the whole params tree +params = _maybe_parse(params) + +# =========================================================================== +# COLUMN INDEX CONVERSION - CRITICAL FOR SETUP ANALYSIS +# =========================================================================== +def should_skip_column_conversion(template_name): + """Some templates don't need column index conversion""" + return 'load_csv' in template_name + +def read_file_headers(filepath): + """Read column headers from various file formats""" + try: + import pandas as pd + + # Try pandas auto-detect + try: + df = pd.read_csv(filepath, nrows=1) + if len(df.columns) > 1 or not df.columns[0].startswith('Unnamed'): + columns = df.columns.tolist() + print(f"[Runner] Pandas auto-detected delimiter, found {len(columns)} columns") + return columns + except: + pass + + # Try common delimiters + for sep in ['\t', ',', ';', '|', ' ']: + try: + df = pd.read_csv(filepath, sep=sep, nrows=1) + if len(df.columns) > 1: + columns = df.columns.tolist() + sep_name = {'\t': 'tab', ',': 'comma', ';': 'semicolon', + '|': 'pipe', ' ': 'space'}.get(sep, sep) + print(f"[Runner] Pandas found {sep_name}-delimited file with {len(columns)} columns") + return columns + except: + continue + except ImportError: + print("[Runner] pandas not available, using csv fallback") + + # CSV module fallback + try: + with open(filepath, 'r', encoding='utf-8', errors='replace', newline='') as f: + sample = f.read(8192) + f.seek(0) + + try: + dialect = csv.Sniffer().sniff(sample, delimiters='\t,;| ') + reader = csv.reader(f, dialect) + header = next(reader) + columns = [h.strip().strip('"') for h in header if h.strip()] + if columns: + print(f"[Runner] csv.Sniffer detected {len(columns)} columns") + return columns + except: + f.seek(0) + first_line = f.readline().strip() + for sep in ['\t', ',', ';', '|']: + if sep in first_line: + columns = [h.strip().strip('"') for h in first_line.split(sep)] + if len(columns) > 1: + print(f"[Runner] Manual parsing found {len(columns)} columns") + return columns + except Exception as e: + print(f"[Runner] Failed to read headers: {e}") + + return None + +def should_convert_param(key, value): + """Check if parameter contains column indices""" + if value is None or value == "" or value == [] or value == {}: + return False + + key_lower = key.lower() + + # Skip String_Columns - it's names not indices + if key == 'String_Columns': + return False + + # Skip output/path parameters + if any(x in key_lower for x in ['output', 'path', 'file', 'directory', 'save', 'export']): + return False + + # Skip regex/pattern parameters (but we'll handle Feature_Regex specially) + if 'regex' in key_lower or 'pattern' in key_lower: + return False + + # Parameters with 'column' likely have indices + if 'column' in key_lower or '_col' in key_lower: + return True + + # Known index parameters + if key in {'Annotation_s_', 'Features_to_Analyze', 'Features', 'Markers', 'Markers_to_Plot', 'Phenotypes'}: + return True + + # Check if values look like indices + if isinstance(value, list): + return all(isinstance(v, int) or (isinstance(v, str) and v.strip().isdigit()) for v in value if v) + elif isinstance(value, (int, str)): + return isinstance(value, int) or (isinstance(value, str) and value.strip().isdigit()) + + return False + +def convert_single_index(item, columns): + """Convert a single column index to name""" + if isinstance(item, str) and not item.strip().isdigit(): + return item + + try: + if isinstance(item, str): + item = int(item.strip()) + elif isinstance(item, float): + item = int(item) + except (ValueError, AttributeError): + return item + + if isinstance(item, int): + idx = item - 1 # Galaxy uses 1-based indexing + if 0 <= idx < len(columns): + return columns[idx] + elif 0 <= item < len(columns): # Fallback for 0-based + print(f"[Runner] Note: Found 0-based index {item}, converting to {columns[item]}") + return columns[item] + else: + print(f"[Runner] Warning: Index {item} out of range (have {len(columns)} columns)") + + return item + +def convert_column_indices_to_names(params, template_name): + """Convert column indices to names for templates that need it""" + + if should_skip_column_conversion(template_name): + print(f"[Runner] Skipping column conversion for {template_name}") + return params + + print(f"[Runner] Checking for column index conversion (template: {template_name})") + + # Find input file + input_file = None + input_keys = ['Upstream_Dataset', 'Upstream_Analysis', 'CSV_Files', + 'Input_File', 'Input_Dataset', 'Data_File'] + + for key in input_keys: + if key in params: + value = params[key] + if isinstance(value, list) and value: + value = value[0] + if value and os.path.exists(str(value)): + input_file = str(value) + print(f"[Runner] Found input file via {key}: {os.path.basename(input_file)}") + break + + if not input_file: + print("[Runner] No input file found for column conversion") + return params + + # Read headers + columns = read_file_headers(input_file) + if not columns: + print("[Runner] Could not read column headers, skipping conversion") + return params + + print(f"[Runner] Successfully read {len(columns)} columns") + if len(columns) <= 10: + print(f"[Runner] Columns: {columns}") + else: + print(f"[Runner] First 10 columns: {columns[:10]}") + + # Convert indices to names + converted_count = 0 + for key, value in params.items(): + # Skip non-column parameters + if not should_convert_param(key, value): + continue + + # Convert indices + if isinstance(value, list): + converted_items = [] + for item in value: + converted = convert_single_index(item, columns) + if converted is not None: + converted_items.append(converted) + converted_value = converted_items + else: + converted_value = convert_single_index(value, columns) + + if value != converted_value: + params[key] = converted_value + converted_count += 1 + print(f"[Runner] Converted {key}: {value} -> {converted_value}") + + if converted_count > 0: + print(f"[Runner] Total conversions: {converted_count} parameters") + + # CRITICAL: Handle Feature_Regex specially + if 'Feature_Regex' in params: + value = params['Feature_Regex'] + if value in [[], [""], "__ob____cb__", "[]", "", None]: + params['Feature_Regex'] = "" + print("[Runner] Cleared empty Feature_Regex parameter") + elif isinstance(value, list) and value: + params['Feature_Regex'] = "|".join(str(v) for v in value if v) + print(f"[Runner] Joined Feature_Regex list: {params['Feature_Regex']}") + + return params + +# =========================================================================== +# APPLY COLUMN CONVERSION +# =========================================================================== +print("[Runner] Step 1: Converting column indices to names") +params = convert_column_indices_to_names(params, template_name) + +# =========================================================================== +# SPECIAL HANDLING FOR SPECIFIC TEMPLATES +# =========================================================================== + +# Helper function to coerce singleton lists to strings for load_csv +def _coerce_singleton_paths_for_load_csv(params, template_name): + """For load_csv templates, flatten 1-item lists to strings for path-like params.""" + if 'load_csv' not in template_name: + return params + for key in ('CSV_Files', 'CSV_Files_Configuration'): + val = params.get(key) + if isinstance(val, list) and len(val) == 1 and isinstance(val[0], (str, bytes)): + params[key] = val[0] + print(f"[Runner] Coerced {key} from list -> string") + return params + +# Special handling for String_Columns in load_csv templates +if 'load_csv' in template_name and 'String_Columns' in params: + value = params['String_Columns'] + if not isinstance(value, list): + if value in [None, "", "[]", "__ob____cb__"]: + params['String_Columns'] = [] + elif isinstance(value, str): + s = value.strip() + if s.startswith('[') and s.endswith(']'): + try: + params['String_Columns'] = json.loads(s) + except: + params['String_Columns'] = [s] if s else [] + elif ',' in s: + params['String_Columns'] = [item.strip() for item in s.split(',') if item.strip()] + else: + params['String_Columns'] = [s] if s else [] + else: + params['String_Columns'] = [] + print(f"[Runner] Ensured String_Columns is list: {params['String_Columns']}") + +# Apply coercion for load_csv files +params = _coerce_singleton_paths_for_load_csv(params, template_name) + +# Fix for Load CSV Files directory +if 'load_csv' in template_name and 'CSV_Files' in params: + # Check if csv_input_dir was created by Galaxy command + if os.path.exists('csv_input_dir') and os.path.isdir('csv_input_dir'): + params['CSV_Files'] = 'csv_input_dir' + print("[Runner] Using csv_input_dir created by Galaxy") + elif isinstance(params['CSV_Files'], str) and os.path.isfile(params['CSV_Files']): + # We have a single file path, need to get its directory + params['CSV_Files'] = os.path.dirname(params['CSV_Files']) + print(f"[Runner] Using directory of CSV file: {params['CSV_Files']}") + +# =========================================================================== +# LIST PARAMETER NORMALIZATION +# =========================================================================== +def should_normalize_as_list(key, value): + """Determine if a parameter should be normalized as a list""" + if isinstance(value, list): + return True + + if value is None or value == "": + return False + + key_lower = key.lower() + + # Skip regex parameters + if 'regex' in key_lower or 'pattern' in key_lower: + return False + + # Skip known single-value parameters + if any(x in key_lower for x in ['single', 'one', 'first', 'second', 'primary']): + return False + + # Plural forms suggest lists + if any(x in key_lower for x in ['features', 'markers', 'phenotypes', 'annotations', + 'columns', 'types', 'labels', 'regions', 'radii']): + return True + + # Check for list separators + if isinstance(value, str): + if ',' in value or '\n' in value: + return True + if value.strip().startswith('[') and value.strip().endswith(']'): + return True + + return False + +def normalize_to_list(value): + """Convert various input formats to a proper Python list""" + if value in (None, "", "All", ["All"], "all", ["all"]): + return ["All"] + + if isinstance(value, list): + return value + + if isinstance(value, str): + s = value.strip() + + # Try JSON parsing + if s.startswith('[') and s.endswith(']'): + try: + parsed = json.loads(s) + return parsed if isinstance(parsed, list) else [str(parsed)] + except: + pass + + # Split by comma + if ',' in s: + return [item.strip() for item in s.split(',') if item.strip()] + + # Split by newline + if '\n' in s: + return [item.strip() for item in s.split('\n') if item.strip()] + + # Single value + return [s] if s else [] + + return [value] if value is not None else [] + +# Normalize list parameters +print("[Runner] Step 2: Normalizing list parameters") +list_count = 0 +for key, value in list(params.items()): + if should_normalize_as_list(key, value): + original = value + normalized = normalize_to_list(value) + if original != normalized: + params[key] = normalized + list_count += 1 + if len(str(normalized)) > 100: + print(f"[Runner] Normalized {key}: {type(original).__name__} -> list of {len(normalized)} items") + else: + print(f"[Runner] Normalized {key}: {original} -> {normalized}") + +if list_count > 0: + print(f"[Runner] Normalized {list_count} list parameters") + +# CRITICAL FIX: Handle single-element lists for coordinate columns +# These should be strings, not lists +coordinate_keys = ['X_Coordinate_Column', 'Y_Coordinate_Column', 'X_centroid', 'Y_centroid'] +for key in coordinate_keys: + if key in params: + value = params[key] + if isinstance(value, list) and len(value) == 1: + params[key] = value[0] + print(f"[Runner] Extracted single value from {key}: {value} -> {params[key]}") + +# Also check for any key ending with '_Column' that has a single-element list +for key in list(params.keys()): + if key.endswith('_Column') and isinstance(params[key], list) and len(params[key]) == 1: + original = params[key] + params[key] = params[key][0] + print(f"[Runner] Extracted single value from {key}: {original} -> {params[key]}") + +# =========================================================================== +# OUTPUTS HANDLING +# =========================================================================== + +# Extract outputs specification +raw_outputs = params.pop('outputs', {}) +outputs = {} + +if isinstance(raw_outputs, dict): + outputs = raw_outputs +elif isinstance(raw_outputs, str): + try: + maybe = json.loads(_unsanitize(raw_outputs)) + if isinstance(maybe, dict): + outputs = maybe + except Exception: + pass + +if not isinstance(outputs, dict) or not outputs: + print("[Runner] Warning: 'outputs' missing or not a dict; using defaults") + if 'boxplot' in template_name or 'plot' in template_name or 'histogram' in template_name: + outputs = {'DataFrames': 'dataframe_folder', 'figures': 'figure_folder'} + elif 'load_csv' in template_name: + outputs = {'DataFrames': 'dataframe_folder'} + elif 'interactive' in template_name: + outputs = {'html': 'html_folder'} + else: + outputs = {'analysis': 'transform_output.pickle'} + +print(f"[Runner] Outputs -> {list(outputs.keys())}") + +# Create output directories +for output_type, path in outputs.items(): + if output_type != 'analysis' and path: + os.makedirs(path, exist_ok=True) + print(f"[Runner] Created {output_type} directory: {path}") + +# Add output paths to params +params['save_results'] = True + +if 'analysis' in outputs: + params['output_path'] = outputs['analysis'] + params['Output_Path'] = outputs['analysis'] + params['Output_File'] = outputs['analysis'] + +if 'DataFrames' in outputs: + df_dir = outputs['DataFrames'] + params['output_dir'] = df_dir + params['Export_Dir'] = df_dir + # For load_csv, use a specific filename for the combined dataframe + if 'load_csv' in template_name: + params['Output_File'] = os.path.join(df_dir, 'combined_dataframe.csv') + else: + params['Output_File'] = os.path.join(df_dir, f'{template_name}_output.csv') + +if 'figures' in outputs: + fig_dir = outputs['figures'] + params['figure_dir'] = fig_dir + params['Figure_Dir'] = fig_dir + params['Figure_File'] = os.path.join(fig_dir, f'{template_name}.png') + +if 'html' in outputs: + html_dir = outputs['html'] + params['html_dir'] = html_dir + params['Output_File'] = os.path.join(html_dir, f'{template_name}.html') + +# Save runtime parameters +with open('params.runtime.json', 'w') as f: + json.dump(params, f, indent=2) + +# Save clean params for Galaxy display +params_display = {k: v for k, v in params.items() + if k not in ['Output_File', 'Figure_File', 'output_dir', 'figure_dir']} +with open('config_used.json', 'w') as f: + json.dump(params_display, f, indent=2) + +print(f"[Runner] Saved runtime parameters") + +# ============================================================================ +# LOAD AND EXECUTE TEMPLATE +# ============================================================================ + +# Try to import from installed package first (Docker environment) +template_module_name = template_filename.replace('.py', '') +try: + import importlib + mod = importlib.import_module(f'spac.templates.{template_module_name}') + print(f"[Runner] Loaded template from package: spac.templates.{template_module_name}") +except (ImportError, ModuleNotFoundError): + # Fallback to loading from file + print(f"[Runner] Package import failed, trying file load") + import importlib.util + + # Standard locations + template_paths = [ + f'/app/spac/templates/{template_filename}', + f'/opt/spac/templates/{template_filename}', + f'/opt/SCSAWorkflow/src/spac/templates/{template_filename}', + template_filename # Current directory + ] + + spec = None + for path in template_paths: + if os.path.exists(path): + spec = importlib.util.spec_from_file_location("template_mod", path) + if spec: + print(f"[Runner] Found template at: {path}") + break + + if not spec or not spec.loader: + print(f"[Runner] ERROR: Could not find template: {template_filename}") + sys.exit(1) + + mod = importlib.util.module_from_spec(spec) + spec.loader.exec_module(mod) + +# Verify run_from_json exists +if not hasattr(mod, 'run_from_json'): + print('[Runner] ERROR: Template missing run_from_json function') + sys.exit(2) + +# Check function signature +sig = inspect.signature(mod.run_from_json) +kwargs = {} + +if 'save_results' in sig.parameters: + kwargs['save_results'] = True +if 'show_plot' in sig.parameters: + kwargs['show_plot'] = False + +print(f"[Runner] Executing template with kwargs: {kwargs}") + +# Execute template +try: + result = mod.run_from_json('params.runtime.json', **kwargs) + print(f"[Runner] Template completed, returned: {type(result).__name__}") + + # =========================================================================== + # SPECIAL HANDLING FOR LOAD_CSV_FILES TEMPLATE + # =========================================================================== + if 'load_csv' in template_name: + print("[Runner] Special handling for load_csv_files template") + + # The template should return a DataFrame or save CSV files + if result is not None: + try: + import pandas as pd + + # If result is a DataFrame, save it directly + if hasattr(result, 'to_csv'): + output_path = os.path.join(outputs.get('DataFrames', 'dataframe_folder'), 'combined_dataframe.csv') + result.to_csv(output_path, index=False, header=True) + print(f"[Runner] Saved combined DataFrame to {output_path}") + + # If result is a dict of DataFrames, combine them + elif isinstance(result, dict): + dfs = [] + for name, df in result.items(): + if hasattr(df, 'to_csv'): + # Add a source column to track origin + df['_source_file'] = name + dfs.append(df) + + if dfs: + combined = pd.concat(dfs, ignore_index=True) + output_path = os.path.join(outputs.get('DataFrames', 'dataframe_folder'), 'combined_dataframe.csv') + combined.to_csv(output_path, index=False, header=True) + print(f"[Runner] Combined {len(dfs)} DataFrames into {output_path}") + except Exception as e: + print(f"[Runner] Could not combine DataFrames: {e}") + + # Check if CSV files were saved in the dataframe folder + df_dir = outputs.get('DataFrames', 'dataframe_folder') + if os.path.exists(df_dir): + csv_files = [f for f in os.listdir(df_dir) if f.endswith('.csv')] + + # If we have multiple CSV files but no combined_dataframe.csv, create it + if len(csv_files) > 1 and 'combined_dataframe.csv' not in csv_files: + try: + import pandas as pd + dfs = [] + for csv_file in csv_files: + filepath = os.path.join(df_dir, csv_file) + df = pd.read_csv(filepath) + df['_source_file'] = csv_file.replace('.csv', '') + dfs.append(df) + + combined = pd.concat(dfs, ignore_index=True) + output_path = os.path.join(df_dir, 'combined_dataframe.csv') + combined.to_csv(output_path, index=False, header=True) + print(f"[Runner] Combined {len(csv_files)} CSV files into {output_path}") + except Exception as e: + print(f"[Runner] Could not combine CSV files: {e}") + # If combination fails, just rename the first CSV + if csv_files: + src = os.path.join(df_dir, csv_files[0]) + dst = os.path.join(df_dir, 'combined_dataframe.csv') + shutil.copy2(src, dst) + print(f"[Runner] Copied {csv_files[0]} to combined_dataframe.csv") + + # If we have exactly one CSV file and it's not named combined_dataframe.csv, rename it + elif len(csv_files) == 1 and csv_files[0] != 'combined_dataframe.csv': + src = os.path.join(df_dir, csv_files[0]) + dst = os.path.join(df_dir, 'combined_dataframe.csv') + shutil.move(src, dst) + print(f"[Runner] Renamed {csv_files[0]} to combined_dataframe.csv") + + # =========================================================================== + # HANDLE OTHER RETURN TYPES + # =========================================================================== + elif result is not None: + if isinstance(result, dict): + print(f"[Runner] Template saved files: {list(result.keys())}") + elif isinstance(result, tuple): + # Handle tuple returns + saved_count = 0 + for i, item in enumerate(result): + if hasattr(item, 'savefig') and 'figures' in outputs: + import matplotlib + matplotlib.use('Agg') + import matplotlib.pyplot as plt + fig_path = os.path.join(outputs['figures'], f'figure_{i+1}.png') + item.savefig(fig_path, dpi=300, bbox_inches='tight') + plt.close(item) + saved_count += 1 + print(f"[Runner] Saved figure to {fig_path}") + elif hasattr(item, 'to_csv') and 'DataFrames' in outputs: + df_path = os.path.join(outputs['DataFrames'], f'table_{i+1}.csv') + item.to_csv(df_path, index=True) + saved_count += 1 + print(f"[Runner] Saved DataFrame to {df_path}") + + if saved_count > 0: + print(f"[Runner] Saved {saved_count} in-memory results") + + elif hasattr(result, 'to_csv') and 'DataFrames' in outputs: + df_path = os.path.join(outputs['DataFrames'], 'output.csv') + result.to_csv(df_path, index=False, header=True) + print(f"[Runner] Saved DataFrame to {df_path}") + + elif hasattr(result, 'savefig') and 'figures' in outputs: + import matplotlib + matplotlib.use('Agg') + import matplotlib.pyplot as plt + fig_path = os.path.join(outputs['figures'], 'figure.png') + result.savefig(fig_path, dpi=300, bbox_inches='tight') + plt.close(result) + print(f"[Runner] Saved figure to {fig_path}") + + elif hasattr(result, 'write_h5ad') and 'analysis' in outputs: + result.write_h5ad(outputs['analysis']) + print(f"[Runner] Saved AnnData to {outputs['analysis']}") + +except Exception as e: + print(f"[Runner] ERROR in template execution: {e}") + print(f"[Runner] Error type: {type(e).__name__}") + traceback.print_exc() + + # Debug help for common issues + if "String Columns must be a *list*" in str(e): + print("\n[Runner] DEBUG: String_Columns validation failed") + print(f"[Runner] Current String_Columns value: {params.get('String_Columns')}") + print(f"[Runner] Type: {type(params.get('String_Columns'))}") + + elif "regex pattern" in str(e).lower() or "^8$" in str(e): + print("\n[Runner] DEBUG: This appears to be a column index issue") + print("[Runner] Check that column indices were properly converted to names") + print("[Runner] Current Features_to_Analyze value:", params.get('Features_to_Analyze')) + print("[Runner] Current Feature_Regex value:", params.get('Feature_Regex')) + + sys.exit(1) + +# Verify outputs +print("[Runner] Verifying outputs...") +found_outputs = False + +for output_type, path in outputs.items(): + if output_type == 'analysis': + if os.path.exists(path): + size = os.path.getsize(path) + print(f"[Runner] ✓ {output_type}: {path} ({size:,} bytes)") + found_outputs = True + else: + print(f"[Runner] ✗ {output_type}: NOT FOUND") + else: + if os.path.exists(path) and os.path.isdir(path): + files = os.listdir(path) + if files: + print(f"[Runner] ✓ {output_type}: {len(files)} files") + for f in files[:3]: + print(f"[Runner] - {f}") + if len(files) > 3: + print(f"[Runner] ... and {len(files)-3} more") + found_outputs = True + else: + print(f"[Runner] ⚠ {output_type}: directory empty") + +# Check for files in working directory and move them +print("[Runner] Checking for files in working directory...") +for file in os.listdir('.'): + if os.path.isdir(file) or file in ['params.runtime.json', 'config_used.json', + 'tool_stdout.txt', 'outputs_returned.json']: + continue + + if file.endswith('.csv') and 'DataFrames' in outputs: + if not os.path.exists(os.path.join(outputs['DataFrames'], file)): + target = os.path.join(outputs['DataFrames'], file) + shutil.move(file, target) + print(f"[Runner] Moved {file} to {target}") + found_outputs = True + elif file.endswith(('.png', '.pdf', '.jpg', '.svg')) and 'figures' in outputs: + if not os.path.exists(os.path.join(outputs['figures'], file)): + target = os.path.join(outputs['figures'], file) + shutil.move(file, target) + print(f"[Runner] Moved {file} to {target}") + found_outputs = True + +if found_outputs: + print("[Runner] === SUCCESS ===") +else: + print("[Runner] WARNING: No outputs created") + +PYTHON_RUNNER + +EXIT_CODE=$? + +if [ $EXIT_CODE -ne 0 ]; then + echo "ERROR: Template execution failed with exit code $EXIT_CODE" + exit 1 +fi + +echo "=== Execution Complete ===" +exit 0 \ No newline at end of file diff --git a/galaxy_tools/spac_load_csv_files/spac_load_csv_files.xml b/galaxy_tools/spac_load_csv_files/spac_load_csv_files.xml new file mode 100644 index 00000000..ec185659 --- /dev/null +++ b/galaxy_tools/spac_load_csv_files/spac_load_csv_files.xml @@ -0,0 +1,89 @@ + + Load CSV files from NIDAP dataset and combine them into a single pandas dataframe for downstream ... + + + nciccbr/spac:v1 + + + + python3 + + + tool_stdout.txt && + + ## Run the universal wrapper + bash $__tool_directory__/run_spac_template.sh "$params_json" load_csv_files_with_config + ]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @misc{spac_toolkit, + author = {FNLCR DMAP Team}, + title = {SPAC: SPAtial single-Cell analysis}, + year = {2024}, + url = {https://github.com/FNLCR-DMAP/SCSAWorkflow} + } + + + \ No newline at end of file diff --git a/galaxy_tools/spac_setup_analysis/run_spac_template.sh b/galaxy_tools/spac_setup_analysis/run_spac_template.sh new file mode 100644 index 00000000..15d7afee --- /dev/null +++ b/galaxy_tools/spac_setup_analysis/run_spac_template.sh @@ -0,0 +1,849 @@ +#!/usr/bin/env bash +# run_spac_template.sh - SPAC wrapper with column index conversion +# Version: 5.5.0 - Enhanced text input handling for setup_analysis +set -euo pipefail + +PARAMS_JSON="${1:?Missing params.json path}" +TEMPLATE_BASE="${2:?Missing template base name}" + +# Handle both base names and full .py filenames +if [[ "$TEMPLATE_BASE" == *.py ]]; then + TEMPLATE_PY="$TEMPLATE_BASE" +elif [[ "$TEMPLATE_BASE" == "load_csv_files_with_config" ]]; then + TEMPLATE_PY="load_csv_files_with_config.py" +else + TEMPLATE_PY="${TEMPLATE_BASE}_template.py" +fi + +# Use SPAC Python environment +SPAC_PYTHON="${SPAC_PYTHON:-python3}" + +echo "=== SPAC Template Wrapper v5.5 ===" +echo "Parameters: $PARAMS_JSON" +echo "Template base: $TEMPLATE_BASE" +echo "Template file: $TEMPLATE_PY" +echo "Python: $SPAC_PYTHON" + +# Run template through Python +"$SPAC_PYTHON" - <<'PYTHON_RUNNER' "$PARAMS_JSON" "$TEMPLATE_PY" 2>&1 | tee tool_stdout.txt +import json +import os +import sys +import copy +import traceback +import inspect +import shutil +import re +import csv + +# Get arguments +params_path = sys.argv[1] +template_filename = sys.argv[2] + +print(f"[Runner] Loading parameters from: {params_path}") +print(f"[Runner] Template: {template_filename}") + +# Load parameters +with open(params_path, 'r') as f: + params = json.load(f) + +# Extract template name +template_name = os.path.basename(template_filename).replace('_template.py', '').replace('.py', '') + +# =========================================================================== +# DE-SANITIZATION AND PARSING +# =========================================================================== +def _unsanitize(s: str) -> str: + """Remove Galaxy's parameter sanitization tokens""" + if not isinstance(s, str): + return s + replacements = { + '__ob__': '[', '__cb__': ']', + '__oc__': '{', '__cc__': '}', + '__dq__': '"', '__sq__': "'", + '__gt__': '>', '__lt__': '<', + '__cn__': '\n', '__cr__': '\r', + '__tc__': '\t', '__pd__': '#', + '__at__': '@', '__cm__': ',' + } + for token, char in replacements.items(): + s = s.replace(token, char) + return s + +def _maybe_parse(v): + """Recursively de-sanitize and JSON-parse strings where possible.""" + if isinstance(v, str): + u = _unsanitize(v).strip() + if (u.startswith('[') and u.endswith(']')) or (u.startswith('{') and u.endswith('}')): + try: + return json.loads(u) + except Exception: + return u + return u + elif isinstance(v, dict): + return {k: _maybe_parse(val) for k, val in v.items()} + elif isinstance(v, list): + return [_maybe_parse(item) for item in v] + return v + +# Normalize the whole params tree +params = _maybe_parse(params) + +# =========================================================================== +# SETUP ANALYSIS SPECIAL HANDLING - Process text inputs before column conversion +# =========================================================================== +def process_setup_analysis_text_inputs(params, template_name): + """Process text-based column inputs for setup_analysis template""" + if 'setup_analysis' not in template_name: + return params + + print("[Runner] Processing setup_analysis text inputs") + + # Handle X_centroid and Y_centroid (single text values) + for coord_key in ['X_centroid', 'Y_centroid']: + if coord_key in params: + value = params[coord_key] + if isinstance(value, list) and len(value) == 1: + params[coord_key] = value[0] + # Ensure it's a string + if value: + params[coord_key] = str(value).strip() + print(f"[Runner] {coord_key} = '{params[coord_key]}'") + + # Handle Annotation_s_ (text area, can be comma-separated or newline-separated) + if 'Annotation_s_' in params: + value = params['Annotation_s_'] + if value: + # Convert to list if it's a string + if isinstance(value, str): + # Check for comma separation first, then newline + if ',' in value: + items = [item.strip() for item in value.split(',') if item.strip()] + elif '\n' in value: + items = [item.strip() for item in value.split('\n') if item.strip()] + else: + # Single value + items = [value.strip()] if value.strip() else [] + params['Annotation_s_'] = items + print(f"[Runner] Parsed Annotation_s_: {len(items)} items -> {items}") + elif not isinstance(value, list): + params['Annotation_s_'] = [] + else: + params['Annotation_s_'] = [] + + # Handle Feature_s_ (text area, can be comma-separated or newline-separated) + if 'Feature_s_' in params: + value = params['Feature_s_'] + if value: + # Convert to list if it's a string + if isinstance(value, str): + # Check for comma separation first, then newline + if ',' in value: + items = [item.strip() for item in value.split(',') if item.strip()] + elif '\n' in value: + items = [item.strip() for item in value.split('\n') if item.strip()] + else: + # Single value + items = [value.strip()] if value.strip() else [] + params['Feature_s_'] = items + print(f"[Runner] Parsed Feature_s_: {len(items)} items") + if len(items) <= 10: + print(f"[Runner] Features: {items}") + elif not isinstance(value, list): + params['Feature_s_'] = [] + else: + params['Feature_s_'] = [] + + # Handle Feature_Regex (optional text field) + if 'Feature_Regex' in params: + value = params['Feature_Regex'] + if value in [[], [""], "__ob____cb__", "[]", "", None]: + params['Feature_Regex'] = "" + elif isinstance(value, str): + params['Feature_Regex'] = value.strip() + print(f"[Runner] Feature_Regex = '{params.get('Feature_Regex', '')}'") + + return params + +# =========================================================================== +# COLUMN INDEX CONVERSION - For tools using column indices +# =========================================================================== +def should_skip_column_conversion(template_name): + """Some templates don't need column index conversion""" + # setup_analysis uses text inputs now, not indices + return 'load_csv' in template_name or 'setup_analysis' in template_name + +def read_file_headers(filepath): + """Read column headers from various file formats""" + try: + import pandas as pd + + # Try pandas auto-detect + try: + df = pd.read_csv(filepath, nrows=1) + if len(df.columns) > 1 or not df.columns[0].startswith('Unnamed'): + columns = df.columns.tolist() + print(f"[Runner] Pandas auto-detected delimiter, found {len(columns)} columns") + return columns + except: + pass + + # Try common delimiters + for sep in ['\t', ',', ';', '|', ' ']: + try: + df = pd.read_csv(filepath, sep=sep, nrows=1) + if len(df.columns) > 1: + columns = df.columns.tolist() + sep_name = {'\t': 'tab', ',': 'comma', ';': 'semicolon', + '|': 'pipe', ' ': 'space'}.get(sep, sep) + print(f"[Runner] Pandas found {sep_name}-delimited file with {len(columns)} columns") + return columns + except: + continue + except ImportError: + print("[Runner] pandas not available, using csv fallback") + + # CSV module fallback + try: + with open(filepath, 'r', encoding='utf-8', errors='replace', newline='') as f: + sample = f.read(8192) + f.seek(0) + + try: + dialect = csv.Sniffer().sniff(sample, delimiters='\t,;| ') + reader = csv.reader(f, dialect) + header = next(reader) + columns = [h.strip().strip('"') for h in header if h.strip()] + if columns: + print(f"[Runner] csv.Sniffer detected {len(columns)} columns") + return columns + except: + f.seek(0) + first_line = f.readline().strip() + for sep in ['\t', ',', ';', '|']: + if sep in first_line: + columns = [h.strip().strip('"') for h in first_line.split(sep)] + if len(columns) > 1: + print(f"[Runner] Manual parsing found {len(columns)} columns") + return columns + except Exception as e: + print(f"[Runner] Failed to read headers: {e}") + + return None + +def should_convert_param(key, value): + """Check if parameter contains column indices""" + if value is None or value == "" or value == [] or value == {}: + return False + + key_lower = key.lower() + + # Skip String_Columns - it's names not indices + if key == 'String_Columns': + return False + + # Skip the text-based parameters from setup_analysis + if key in ['X_centroid', 'Y_centroid', 'Annotation_s_', 'Feature_s_', 'Feature_Regex']: + return False + + # Skip output/path parameters + if any(x in key_lower for x in ['output', 'path', 'file', 'directory', 'save', 'export']): + return False + + # Skip regex/pattern parameters + if 'regex' in key_lower or 'pattern' in key_lower: + return False + + # Parameters with 'column' likely have indices + if 'column' in key_lower or '_col' in key_lower: + return True + + # Known index parameters (but not the text-based ones) + if key in {'Features_to_Analyze', 'Features', 'Markers', 'Markers_to_Plot', 'Phenotypes'}: + return True + + # Check if values look like indices + if isinstance(value, list): + return all(isinstance(v, int) or (isinstance(v, str) and v.strip().isdigit()) for v in value if v) + elif isinstance(value, (int, str)): + return isinstance(value, int) or (isinstance(value, str) and value.strip().isdigit()) + + return False + +def convert_single_index(item, columns): + """Convert a single column index to name""" + if isinstance(item, str) and not item.strip().isdigit(): + return item + + try: + if isinstance(item, str): + item = int(item.strip()) + elif isinstance(item, float): + item = int(item) + except (ValueError, AttributeError): + return item + + if isinstance(item, int): + idx = item - 1 # Galaxy uses 1-based indexing + if 0 <= idx < len(columns): + return columns[idx] + elif 0 <= item < len(columns): # Fallback for 0-based + print(f"[Runner] Note: Found 0-based index {item}, converting to {columns[item]}") + return columns[item] + else: + print(f"[Runner] Warning: Index {item} out of range (have {len(columns)} columns)") + + return item + +def convert_column_indices_to_names(params, template_name): + """Convert column indices to names for templates that need it""" + + if should_skip_column_conversion(template_name): + print(f"[Runner] Skipping column conversion for {template_name}") + return params + + print(f"[Runner] Checking for column index conversion (template: {template_name})") + + # Find input file + input_file = None + input_keys = ['Upstream_Dataset', 'Upstream_Analysis', 'CSV_Files', + 'Input_File', 'Input_Dataset', 'Data_File'] + + for key in input_keys: + if key in params: + value = params[key] + if isinstance(value, list) and value: + value = value[0] + if value and os.path.exists(str(value)): + input_file = str(value) + print(f"[Runner] Found input file via {key}: {os.path.basename(input_file)}") + break + + if not input_file: + print("[Runner] No input file found for column conversion") + return params + + # Read headers + columns = read_file_headers(input_file) + if not columns: + print("[Runner] Could not read column headers, skipping conversion") + return params + + print(f"[Runner] Successfully read {len(columns)} columns") + if len(columns) <= 10: + print(f"[Runner] Columns: {columns}") + else: + print(f"[Runner] First 10 columns: {columns[:10]}") + + # Convert indices to names + converted_count = 0 + for key, value in params.items(): + # Skip non-column parameters + if not should_convert_param(key, value): + continue + + # Convert indices + if isinstance(value, list): + converted_items = [] + for item in value: + converted = convert_single_index(item, columns) + if converted is not None: + converted_items.append(converted) + converted_value = converted_items + else: + converted_value = convert_single_index(value, columns) + + if value != converted_value: + params[key] = converted_value + converted_count += 1 + print(f"[Runner] Converted {key}: {value} -> {converted_value}") + + if converted_count > 0: + print(f"[Runner] Total conversions: {converted_count} parameters") + + return params + +# =========================================================================== +# APPLY TEXT PROCESSING AND COLUMN CONVERSION +# =========================================================================== +print("[Runner] Step 1: Processing text inputs for setup_analysis") +params = process_setup_analysis_text_inputs(params, template_name) + +print("[Runner] Step 2: Converting column indices to names (if needed)") +params = convert_column_indices_to_names(params, template_name) + +# =========================================================================== +# SPECIAL HANDLING FOR SPECIFIC TEMPLATES +# =========================================================================== + +# Helper function to coerce singleton lists to strings for load_csv +def _coerce_singleton_paths_for_load_csv(params, template_name): + """For load_csv templates, flatten 1-item lists to strings for path-like params.""" + if 'load_csv' not in template_name: + return params + for key in ('CSV_Files', 'CSV_Files_Configuration'): + val = params.get(key) + if isinstance(val, list) and len(val) == 1 and isinstance(val[0], (str, bytes)): + params[key] = val[0] + print(f"[Runner] Coerced {key} from list -> string") + return params + +# Special handling for String_Columns in load_csv templates +if 'load_csv' in template_name and 'String_Columns' in params: + value = params['String_Columns'] + if not isinstance(value, list): + if value in [None, "", "[]", "__ob____cb__"]: + params['String_Columns'] = [] + elif isinstance(value, str): + s = value.strip() + if s.startswith('[') and s.endswith(']'): + try: + params['String_Columns'] = json.loads(s) + except: + params['String_Columns'] = [s] if s else [] + elif ',' in s: + params['String_Columns'] = [item.strip() for item in s.split(',') if item.strip()] + elif '\n' in s: + params['String_Columns'] = [item.strip() for item in s.split('\n') if item.strip()] + else: + params['String_Columns'] = [s] if s else [] + else: + params['String_Columns'] = [] + print(f"[Runner] Ensured String_Columns is list: {params['String_Columns']}") + +# Apply coercion for load_csv files +params = _coerce_singleton_paths_for_load_csv(params, template_name) + +# Fix for Load CSV Files directory +if 'load_csv' in template_name and 'CSV_Files' in params: + # Check if csv_input_dir was created by Galaxy command + if os.path.exists('csv_input_dir') and os.path.isdir('csv_input_dir'): + params['CSV_Files'] = 'csv_input_dir' + print("[Runner] Using csv_input_dir created by Galaxy") + elif isinstance(params['CSV_Files'], str) and os.path.isfile(params['CSV_Files']): + # We have a single file path, need to get its directory + params['CSV_Files'] = os.path.dirname(params['CSV_Files']) + print(f"[Runner] Using directory of CSV file: {params['CSV_Files']}") + +# =========================================================================== +# LIST PARAMETER NORMALIZATION (for other tools) +# =========================================================================== +def should_normalize_as_list(key, value): + """Determine if a parameter should be normalized as a list""" + # Skip if already handled by text processing + if key in ['Annotation_s_', 'Feature_s_'] and 'setup_analysis' in template_name: + return False + + if isinstance(value, list): + return True + + if value is None or value == "": + return False + + key_lower = key.lower() + + # Skip regex parameters + if 'regex' in key_lower or 'pattern' in key_lower: + return False + + # Skip known single-value parameters + if any(x in key_lower for x in ['single', 'one', 'first', 'second', 'primary', 'centroid']): + return False + + # Plural forms suggest lists + if any(x in key_lower for x in ['features', 'markers', 'phenotypes', 'annotations', + 'columns', 'types', 'labels', 'regions', 'radii']): + return True + + # Check for list separators + if isinstance(value, str): + if ',' in value or '\n' in value: + return True + if value.strip().startswith('[') and value.strip().endswith(']'): + return True + + return False + +def normalize_to_list(value): + """Convert various input formats to a proper Python list""" + if value in (None, "", "All", ["All"], "all", ["all"]): + return ["All"] + + if isinstance(value, list): + return value + + if isinstance(value, str): + s = value.strip() + + # Try JSON parsing + if s.startswith('[') and s.endswith(']'): + try: + parsed = json.loads(s) + return parsed if isinstance(parsed, list) else [str(parsed)] + except: + pass + + # Split by comma + if ',' in s: + return [item.strip() for item in s.split(',') if item.strip()] + + # Split by newline + if '\n' in s: + return [item.strip() for item in s.split('\n') if item.strip()] + + # Single value + return [s] if s else [] + + return [value] if value is not None else [] + +# Normalize list parameters +print("[Runner] Step 3: Normalizing list parameters") +list_count = 0 +for key, value in list(params.items()): + if should_normalize_as_list(key, value): + original = value + normalized = normalize_to_list(value) + if original != normalized: + params[key] = normalized + list_count += 1 + if len(str(normalized)) > 100: + print(f"[Runner] Normalized {key}: {type(original).__name__} -> list of {len(normalized)} items") + else: + print(f"[Runner] Normalized {key}: {original} -> {normalized}") + +if list_count > 0: + print(f"[Runner] Normalized {list_count} list parameters") + +# =========================================================================== +# OUTPUTS HANDLING +# =========================================================================== + +# Extract outputs specification +raw_outputs = params.pop('outputs', {}) +outputs = {} + +if isinstance(raw_outputs, dict): + outputs = raw_outputs +elif isinstance(raw_outputs, str): + try: + maybe = json.loads(_unsanitize(raw_outputs)) + if isinstance(maybe, dict): + outputs = maybe + except Exception: + pass + +if not isinstance(outputs, dict) or not outputs: + print("[Runner] Warning: 'outputs' missing or not a dict; using defaults") + if 'boxplot' in template_name or 'plot' in template_name or 'histogram' in template_name: + outputs = {'DataFrames': 'dataframe_folder', 'figures': 'figure_folder'} + elif 'load_csv' in template_name: + outputs = {'DataFrames': 'dataframe_folder'} + elif 'interactive' in template_name: + outputs = {'html': 'html_folder'} + else: + outputs = {'analysis': 'transform_output.pickle'} + +print(f"[Runner] Outputs -> {list(outputs.keys())}") + +# Create output directories +for output_type, path in outputs.items(): + if output_type != 'analysis' and path: + os.makedirs(path, exist_ok=True) + print(f"[Runner] Created {output_type} directory: {path}") + +# Add output paths to params +params['save_results'] = True + +if 'analysis' in outputs: + params['output_path'] = outputs['analysis'] + params['Output_Path'] = outputs['analysis'] + params['Output_File'] = outputs['analysis'] + +if 'DataFrames' in outputs: + df_dir = outputs['DataFrames'] + params['output_dir'] = df_dir + params['Export_Dir'] = df_dir + # For load_csv, use a specific filename for the combined dataframe + if 'load_csv' in template_name: + params['Output_File'] = os.path.join(df_dir, 'combined_dataframe.csv') + else: + params['Output_File'] = os.path.join(df_dir, f'{template_name}_output.csv') + +if 'figures' in outputs: + fig_dir = outputs['figures'] + params['figure_dir'] = fig_dir + params['Figure_Dir'] = fig_dir + params['Figure_File'] = os.path.join(fig_dir, f'{template_name}.png') + +if 'html' in outputs: + html_dir = outputs['html'] + params['html_dir'] = html_dir + params['Output_File'] = os.path.join(html_dir, f'{template_name}.html') + +# Save runtime parameters +with open('params.runtime.json', 'w') as f: + json.dump(params, f, indent=2) + +# Save clean params for Galaxy display +params_display = {k: v for k, v in params.items() + if k not in ['Output_File', 'Figure_File', 'output_dir', 'figure_dir']} +with open('config_used.json', 'w') as f: + json.dump(params_display, f, indent=2) + +print(f"[Runner] Saved runtime parameters") + +# ============================================================================ +# LOAD AND EXECUTE TEMPLATE +# ============================================================================ + +# Try to import from installed package first (Docker environment) +template_module_name = template_filename.replace('.py', '') +try: + import importlib + mod = importlib.import_module(f'spac.templates.{template_module_name}') + print(f"[Runner] Loaded template from package: spac.templates.{template_module_name}") +except (ImportError, ModuleNotFoundError): + # Fallback to loading from file + print(f"[Runner] Package import failed, trying file load") + import importlib.util + + # Standard locations + template_paths = [ + f'/app/spac/templates/{template_filename}', + f'/opt/spac/templates/{template_filename}', + f'/opt/SCSAWorkflow/src/spac/templates/{template_filename}', + template_filename # Current directory + ] + + spec = None + for path in template_paths: + if os.path.exists(path): + spec = importlib.util.spec_from_file_location("template_mod", path) + if spec: + print(f"[Runner] Found template at: {path}") + break + + if not spec or not spec.loader: + print(f"[Runner] ERROR: Could not find template: {template_filename}") + sys.exit(1) + + mod = importlib.util.module_from_spec(spec) + spec.loader.exec_module(mod) + +# Verify run_from_json exists +if not hasattr(mod, 'run_from_json'): + print('[Runner] ERROR: Template missing run_from_json function') + sys.exit(2) + +# Check function signature +sig = inspect.signature(mod.run_from_json) +kwargs = {} + +if 'save_results' in sig.parameters: + kwargs['save_results'] = True +if 'show_plot' in sig.parameters: + kwargs['show_plot'] = False + +print(f"[Runner] Executing template with kwargs: {kwargs}") + +# Execute template +try: + result = mod.run_from_json('params.runtime.json', **kwargs) + print(f"[Runner] Template completed, returned: {type(result).__name__}") + + # =========================================================================== + # SPECIAL HANDLING FOR LOAD_CSV_FILES TEMPLATE + # =========================================================================== + if 'load_csv' in template_name: + print("[Runner] Special handling for load_csv_files template") + + # The template should return a DataFrame or save CSV files + if result is not None: + try: + import pandas as pd + + # If result is a DataFrame, save it directly + if hasattr(result, 'to_csv'): + output_path = os.path.join(outputs.get('DataFrames', 'dataframe_folder'), 'combined_dataframe.csv') + result.to_csv(output_path, index=False, header=True) + print(f"[Runner] Saved combined DataFrame to {output_path}") + + # If result is a dict of DataFrames, combine them + elif isinstance(result, dict): + dfs = [] + for name, df in result.items(): + if hasattr(df, 'to_csv'): + # Add a source column to track origin + df['_source_file'] = name + dfs.append(df) + + if dfs: + combined = pd.concat(dfs, ignore_index=True) + output_path = os.path.join(outputs.get('DataFrames', 'dataframe_folder'), 'combined_dataframe.csv') + combined.to_csv(output_path, index=False, header=True) + print(f"[Runner] Combined {len(dfs)} DataFrames into {output_path}") + except Exception as e: + print(f"[Runner] Could not combine DataFrames: {e}") + + # Check if CSV files were saved in the dataframe folder + df_dir = outputs.get('DataFrames', 'dataframe_folder') + if os.path.exists(df_dir): + csv_files = [f for f in os.listdir(df_dir) if f.endswith('.csv')] + + # If we have multiple CSV files but no combined_dataframe.csv, create it + if len(csv_files) > 1 and 'combined_dataframe.csv' not in csv_files: + try: + import pandas as pd + dfs = [] + for csv_file in csv_files: + filepath = os.path.join(df_dir, csv_file) + df = pd.read_csv(filepath) + df['_source_file'] = csv_file.replace('.csv', '') + dfs.append(df) + + combined = pd.concat(dfs, ignore_index=True) + output_path = os.path.join(df_dir, 'combined_dataframe.csv') + combined.to_csv(output_path, index=False, header=True) + print(f"[Runner] Combined {len(csv_files)} CSV files into {output_path}") + except Exception as e: + print(f"[Runner] Could not combine CSV files: {e}") + # If combination fails, just rename the first CSV + if csv_files: + src = os.path.join(df_dir, csv_files[0]) + dst = os.path.join(df_dir, 'combined_dataframe.csv') + shutil.copy2(src, dst) + print(f"[Runner] Copied {csv_files[0]} to combined_dataframe.csv") + + # If we have exactly one CSV file and it's not named combined_dataframe.csv, rename it + elif len(csv_files) == 1 and csv_files[0] != 'combined_dataframe.csv': + src = os.path.join(df_dir, csv_files[0]) + dst = os.path.join(df_dir, 'combined_dataframe.csv') + shutil.move(src, dst) + print(f"[Runner] Renamed {csv_files[0]} to combined_dataframe.csv") + + # =========================================================================== + # HANDLE OTHER RETURN TYPES + # =========================================================================== + elif result is not None: + if isinstance(result, dict): + print(f"[Runner] Template saved files: {list(result.keys())}") + elif isinstance(result, tuple): + # Handle tuple returns + saved_count = 0 + for i, item in enumerate(result): + if hasattr(item, 'savefig') and 'figures' in outputs: + import matplotlib + matplotlib.use('Agg') + import matplotlib.pyplot as plt + fig_path = os.path.join(outputs['figures'], f'figure_{i+1}.png') + item.savefig(fig_path, dpi=300, bbox_inches='tight') + plt.close(item) + saved_count += 1 + print(f"[Runner] Saved figure to {fig_path}") + elif hasattr(item, 'to_csv') and 'DataFrames' in outputs: + df_path = os.path.join(outputs['DataFrames'], f'table_{i+1}.csv') + item.to_csv(df_path, index=True) + saved_count += 1 + print(f"[Runner] Saved DataFrame to {df_path}") + + if saved_count > 0: + print(f"[Runner] Saved {saved_count} in-memory results") + + elif hasattr(result, 'to_csv') and 'DataFrames' in outputs: + df_path = os.path.join(outputs['DataFrames'], 'output.csv') + result.to_csv(df_path, index=False, header=True) + print(f"[Runner] Saved DataFrame to {df_path}") + + elif hasattr(result, 'savefig') and 'figures' in outputs: + import matplotlib + matplotlib.use('Agg') + import matplotlib.pyplot as plt + fig_path = os.path.join(outputs['figures'], 'figure.png') + result.savefig(fig_path, dpi=300, bbox_inches='tight') + plt.close(result) + print(f"[Runner] Saved figure to {fig_path}") + + elif hasattr(result, 'write_h5ad') and 'analysis' in outputs: + result.write_h5ad(outputs['analysis']) + print(f"[Runner] Saved AnnData to {outputs['analysis']}") + +except Exception as e: + print(f"[Runner] ERROR in template execution: {e}") + print(f"[Runner] Error type: {type(e).__name__}") + traceback.print_exc() + + # Debug help for common issues + if "String Columns must be a *list*" in str(e): + print("\n[Runner] DEBUG: String_Columns validation failed") + print(f"[Runner] Current String_Columns value: {params.get('String_Columns')}") + print(f"[Runner] Type: {type(params.get('String_Columns'))}") + + elif "regex pattern" in str(e).lower() or "^8$" in str(e): + print("\n[Runner] DEBUG: This appears to be a column index issue") + print("[Runner] Check that column indices were properly converted to names") + print("[Runner] Current Features_to_Analyze value:", params.get('Features_to_Analyze')) + print("[Runner] Current Feature_Regex value:", params.get('Feature_Regex')) + + sys.exit(1) + +# Verify outputs +print("[Runner] Verifying outputs...") +found_outputs = False + +for output_type, path in outputs.items(): + if output_type == 'analysis': + if os.path.exists(path): + size = os.path.getsize(path) + print(f"[Runner] ✓ {output_type}: {path} ({size:,} bytes)") + found_outputs = True + else: + print(f"[Runner] ✗ {output_type}: NOT FOUND") + else: + if os.path.exists(path) and os.path.isdir(path): + files = os.listdir(path) + if files: + print(f"[Runner] ✓ {output_type}: {len(files)} files") + for f in files[:3]: + print(f"[Runner] - {f}") + if len(files) > 3: + print(f"[Runner] ... and {len(files)-3} more") + found_outputs = True + else: + print(f"[Runner] ⚠ {output_type}: directory empty") + +# Check for files in working directory and move them +print("[Runner] Checking for files in working directory...") +for file in os.listdir('.'): + if os.path.isdir(file) or file in ['params.runtime.json', 'config_used.json', + 'tool_stdout.txt', 'outputs_returned.json']: + continue + + if file.endswith('.csv') and 'DataFrames' in outputs: + if not os.path.exists(os.path.join(outputs['DataFrames'], file)): + target = os.path.join(outputs['DataFrames'], file) + shutil.move(file, target) + print(f"[Runner] Moved {file} to {target}") + found_outputs = True + elif file.endswith(('.png', '.pdf', '.jpg', '.svg')) and 'figures' in outputs: + if not os.path.exists(os.path.join(outputs['figures'], file)): + target = os.path.join(outputs['figures'], file) + shutil.move(file, target) + print(f"[Runner] Moved {file} to {target}") + found_outputs = True + +if found_outputs: + print("[Runner] === SUCCESS ===") +else: + print("[Runner] WARNING: No outputs created") + +PYTHON_RUNNER + +EXIT_CODE=$? + +if [ $EXIT_CODE -ne 0 ]; then + echo "ERROR: Template execution failed with exit code $EXIT_CODE" + exit 1 +fi + +echo "=== Execution Complete ===" +exit 0 \ No newline at end of file diff --git a/galaxy_tools/spac_setup_analysis/spac_setup_analysis.xml b/galaxy_tools/spac_setup_analysis/spac_setup_analysis.xml new file mode 100644 index 00000000..fefc6f95 --- /dev/null +++ b/galaxy_tools/spac_setup_analysis/spac_setup_analysis.xml @@ -0,0 +1,121 @@ + + Set up an analysis data object for downstream processing. + + + nciccbr/spac:v1 + + + + python3 + + + tool_stdout.txt && + + ## Run the universal wrapper + bash $__tool_directory__/run_spac_template.sh "$params_json" setup_analysis + ]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@misc{spac_toolkit, + author = {FNLCR DMAP Team}, + title = {SPAC: SPAtial single-Cell analysis}, + year = {2024}, + url = {https://github.com/FNLCR-DMAP/SCSAWorkflow} +} + + + \ No newline at end of file diff --git a/galaxy_tools/spac_zscore_normalization/run_spac_template.sh b/galaxy_tools/spac_zscore_normalization/run_spac_template.sh new file mode 100644 index 00000000..a93b2d6e --- /dev/null +++ b/galaxy_tools/spac_zscore_normalization/run_spac_template.sh @@ -0,0 +1,710 @@ +#!/usr/bin/env bash +# run_spac_template.sh - SPAC wrapper with column index conversion +# Version: 5.4.1 - Integrated column conversion +set -euo pipefail + +PARAMS_JSON="${1:?Missing params.json path}" +TEMPLATE_BASE="${2:?Missing template base name}" + +# Handle both base names and full .py filenames +if [[ "$TEMPLATE_BASE" == *.py ]]; then + TEMPLATE_PY="$TEMPLATE_BASE" +elif [[ "$TEMPLATE_BASE" == "load_csv_files_with_config" ]]; then + TEMPLATE_PY="load_csv_files_with_config.py" +else + TEMPLATE_PY="${TEMPLATE_BASE}_template.py" +fi + +# Use SPAC Python environment +SPAC_PYTHON="${SPAC_PYTHON:-python3}" + +echo "=== SPAC Template Wrapper v5.3 ===" +echo "Parameters: $PARAMS_JSON" +echo "Template base: $TEMPLATE_BASE" +echo "Template file: $TEMPLATE_PY" +echo "Python: $SPAC_PYTHON" + +# Run template through Python +"$SPAC_PYTHON" - <<'PYTHON_RUNNER' "$PARAMS_JSON" "$TEMPLATE_PY" 2>&1 | tee tool_stdout.txt +import json +import os +import sys +import copy +import traceback +import inspect +import shutil +import re +import csv + +# Get arguments +params_path = sys.argv[1] +template_filename = sys.argv[2] + +print(f"[Runner] Loading parameters from: {params_path}") +print(f"[Runner] Template: {template_filename}") + +# Load parameters +with open(params_path, 'r') as f: + params = json.load(f) + +# Extract template name +template_name = os.path.basename(template_filename).replace('_template.py', '').replace('.py', '') + +# =========================================================================== +# DE-SANITIZATION AND PARSING +# =========================================================================== +def _unsanitize(s: str) -> str: + """Remove Galaxy's parameter sanitization tokens""" + if not isinstance(s, str): + return s + replacements = { + '__ob__': '[', '__cb__': ']', + '__oc__': '{', '__cc__': '}', + '__dq__': '"', '__sq__': "'", + '__gt__': '>', '__lt__': '<', + '__cn__': '\n', '__cr__': '\r', + '__tc__': '\t', '__pd__': '#', + '__at__': '@', '__cm__': ',' + } + for token, char in replacements.items(): + s = s.replace(token, char) + return s + +def _maybe_parse(v): + """Recursively de-sanitize and JSON-parse strings where possible.""" + if isinstance(v, str): + u = _unsanitize(v).strip() + if (u.startswith('[') and u.endswith(']')) or (u.startswith('{') and u.endswith('}')): + try: + return json.loads(u) + except Exception: + return u + return u + elif isinstance(v, dict): + return {k: _maybe_parse(val) for k, val in v.items()} + elif isinstance(v, list): + return [_maybe_parse(item) for item in v] + return v + +# Normalize the whole params tree +params = _maybe_parse(params) + +# =========================================================================== +# COLUMN INDEX CONVERSION - CRITICAL FOR SETUP ANALYSIS +# =========================================================================== +def should_skip_column_conversion(template_name): + """Some templates don't need column index conversion""" + return 'load_csv' in template_name + +def read_file_headers(filepath): + """Read column headers from various file formats""" + try: + import pandas as pd + + # Try pandas auto-detect + try: + df = pd.read_csv(filepath, nrows=1) + if len(df.columns) > 1 or not df.columns[0].startswith('Unnamed'): + columns = df.columns.tolist() + print(f"[Runner] Pandas auto-detected delimiter, found {len(columns)} columns") + return columns + except: + pass + + # Try common delimiters + for sep in ['\t', ',', ';', '|', ' ']: + try: + df = pd.read_csv(filepath, sep=sep, nrows=1) + if len(df.columns) > 1: + columns = df.columns.tolist() + sep_name = {'\t': 'tab', ',': 'comma', ';': 'semicolon', + '|': 'pipe', ' ': 'space'}.get(sep, sep) + print(f"[Runner] Pandas found {sep_name}-delimited file with {len(columns)} columns") + return columns + except: + continue + except ImportError: + print("[Runner] pandas not available, using csv fallback") + + # CSV module fallback + try: + with open(filepath, 'r', encoding='utf-8', errors='replace', newline='') as f: + sample = f.read(8192) + f.seek(0) + + try: + dialect = csv.Sniffer().sniff(sample, delimiters='\t,;| ') + reader = csv.reader(f, dialect) + header = next(reader) + columns = [h.strip().strip('"') for h in header if h.strip()] + if columns: + print(f"[Runner] csv.Sniffer detected {len(columns)} columns") + return columns + except: + f.seek(0) + first_line = f.readline().strip() + for sep in ['\t', ',', ';', '|']: + if sep in first_line: + columns = [h.strip().strip('"') for h in first_line.split(sep)] + if len(columns) > 1: + print(f"[Runner] Manual parsing found {len(columns)} columns") + return columns + except Exception as e: + print(f"[Runner] Failed to read headers: {e}") + + return None + +def should_convert_param(key, value): + """Check if parameter contains column indices""" + if value is None or value == "" or value == [] or value == {}: + return False + + key_lower = key.lower() + + # Skip String_Columns - it's names not indices + if key == 'String_Columns': + return False + + # Skip output/path parameters + if any(x in key_lower for x in ['output', 'path', 'file', 'directory', 'save', 'export']): + return False + + # Skip regex/pattern parameters (but we'll handle Feature_Regex specially) + if 'regex' in key_lower or 'pattern' in key_lower: + return False + + # Parameters with 'column' likely have indices + if 'column' in key_lower or '_col' in key_lower: + return True + + # Known index parameters + if key in {'Annotation_s_', 'Features_to_Analyze', 'Features', 'Markers', 'Markers_to_Plot', 'Phenotypes'}: + return True + + # Check if values look like indices + if isinstance(value, list): + return all(isinstance(v, int) or (isinstance(v, str) and v.strip().isdigit()) for v in value if v) + elif isinstance(value, (int, str)): + return isinstance(value, int) or (isinstance(value, str) and value.strip().isdigit()) + + return False + +def convert_single_index(item, columns): + """Convert a single column index to name""" + if isinstance(item, str) and not item.strip().isdigit(): + return item + + try: + if isinstance(item, str): + item = int(item.strip()) + elif isinstance(item, float): + item = int(item) + except (ValueError, AttributeError): + return item + + if isinstance(item, int): + idx = item - 1 # Galaxy uses 1-based indexing + if 0 <= idx < len(columns): + return columns[idx] + elif 0 <= item < len(columns): # Fallback for 0-based + print(f"[Runner] Note: Found 0-based index {item}, converting to {columns[item]}") + return columns[item] + else: + print(f"[Runner] Warning: Index {item} out of range (have {len(columns)} columns)") + + return item + +def convert_column_indices_to_names(params, template_name): + """Convert column indices to names for templates that need it""" + + if should_skip_column_conversion(template_name): + print(f"[Runner] Skipping column conversion for {template_name}") + return params + + print(f"[Runner] Checking for column index conversion (template: {template_name})") + + # Find input file + input_file = None + input_keys = ['Upstream_Dataset', 'Upstream_Analysis', 'CSV_Files', + 'Input_File', 'Input_Dataset', 'Data_File'] + + for key in input_keys: + if key in params: + value = params[key] + if isinstance(value, list) and value: + value = value[0] + if value and os.path.exists(str(value)): + input_file = str(value) + print(f"[Runner] Found input file via {key}: {os.path.basename(input_file)}") + break + + if not input_file: + print("[Runner] No input file found for column conversion") + return params + + # Read headers + columns = read_file_headers(input_file) + if not columns: + print("[Runner] Could not read column headers, skipping conversion") + return params + + print(f"[Runner] Successfully read {len(columns)} columns") + if len(columns) <= 10: + print(f"[Runner] Columns: {columns}") + else: + print(f"[Runner] First 10 columns: {columns[:10]}") + + # Convert indices to names + converted_count = 0 + for key, value in params.items(): + # Skip non-column parameters + if not should_convert_param(key, value): + continue + + # Convert indices + if isinstance(value, list): + converted_items = [] + for item in value: + converted = convert_single_index(item, columns) + if converted is not None: + converted_items.append(converted) + converted_value = converted_items + else: + converted_value = convert_single_index(value, columns) + + if value != converted_value: + params[key] = converted_value + converted_count += 1 + print(f"[Runner] Converted {key}: {value} -> {converted_value}") + + if converted_count > 0: + print(f"[Runner] Total conversions: {converted_count} parameters") + + # CRITICAL: Handle Feature_Regex specially + if 'Feature_Regex' in params: + value = params['Feature_Regex'] + if value in [[], [""], "__ob____cb__", "[]", "", None]: + params['Feature_Regex'] = "" + print("[Runner] Cleared empty Feature_Regex parameter") + elif isinstance(value, list) and value: + params['Feature_Regex'] = "|".join(str(v) for v in value if v) + print(f"[Runner] Joined Feature_Regex list: {params['Feature_Regex']}") + + return params + +# =========================================================================== +# APPLY COLUMN CONVERSION +# =========================================================================== +print("[Runner] Step 1: Converting column indices to names") +params = convert_column_indices_to_names(params, template_name) + +# =========================================================================== +# SPECIAL HANDLING FOR SPECIFIC TEMPLATES +# =========================================================================== + +# Helper function to coerce singleton lists to strings for load_csv +def _coerce_singleton_paths_for_load_csv(params, template_name): + """For load_csv templates, flatten 1-item lists to strings for path-like params.""" + if 'load_csv' not in template_name: + return params + for key in ('CSV_Files', 'CSV_Files_Configuration'): + val = params.get(key) + if isinstance(val, list) and len(val) == 1 and isinstance(val[0], (str, bytes)): + params[key] = val[0] + print(f"[Runner] Coerced {key} from list -> string") + return params + +# Special handling for String_Columns in load_csv templates +if 'load_csv' in template_name and 'String_Columns' in params: + value = params['String_Columns'] + if not isinstance(value, list): + if value in [None, "", "[]", "__ob____cb__"]: + params['String_Columns'] = [] + elif isinstance(value, str): + s = value.strip() + if s.startswith('[') and s.endswith(']'): + try: + params['String_Columns'] = json.loads(s) + except: + params['String_Columns'] = [s] if s else [] + elif ',' in s: + params['String_Columns'] = [item.strip() for item in s.split(',') if item.strip()] + else: + params['String_Columns'] = [s] if s else [] + else: + params['String_Columns'] = [] + print(f"[Runner] Ensured String_Columns is list: {params['String_Columns']}") + +# Apply coercion for load_csv files +params = _coerce_singleton_paths_for_load_csv(params, template_name) + +# Fix for Load CSV Files directory +if 'load_csv' in template_name and 'CSV_Files' in params: + # Check if csv_input_dir was created by Galaxy command + if os.path.exists('csv_input_dir') and os.path.isdir('csv_input_dir'): + params['CSV_Files'] = 'csv_input_dir' + print("[Runner] Using csv_input_dir created by Galaxy") + elif isinstance(params['CSV_Files'], str) and os.path.isfile(params['CSV_Files']): + # We have a single file path, need to get its directory + params['CSV_Files'] = os.path.dirname(params['CSV_Files']) + print(f"[Runner] Using directory of CSV file: {params['CSV_Files']}") + +# =========================================================================== +# LIST PARAMETER NORMALIZATION +# =========================================================================== +def should_normalize_as_list(key, value): + """Determine if a parameter should be normalized as a list""" + if isinstance(value, list): + return True + + if value is None or value == "": + return False + + key_lower = key.lower() + + # Skip regex parameters + if 'regex' in key_lower or 'pattern' in key_lower: + return False + + # Skip known single-value parameters + if any(x in key_lower for x in ['single', 'one', 'first', 'second', 'primary']): + return False + + # Plural forms suggest lists + if any(x in key_lower for x in ['features', 'markers', 'phenotypes', 'annotations', + 'columns', 'types', 'labels', 'regions', 'radii']): + return True + + # Check for list separators + if isinstance(value, str): + if ',' in value or '\n' in value: + return True + if value.strip().startswith('[') and value.strip().endswith(']'): + return True + + return False + +def normalize_to_list(value): + """Convert various input formats to a proper Python list""" + if value in (None, "", "All", ["All"], "all", ["all"]): + return ["All"] + + if isinstance(value, list): + return value + + if isinstance(value, str): + s = value.strip() + + # Try JSON parsing + if s.startswith('[') and s.endswith(']'): + try: + parsed = json.loads(s) + return parsed if isinstance(parsed, list) else [str(parsed)] + except: + pass + + # Split by comma + if ',' in s: + return [item.strip() for item in s.split(',') if item.strip()] + + # Split by newline + if '\n' in s: + return [item.strip() for item in s.split('\n') if item.strip()] + + # Single value + return [s] if s else [] + + return [value] if value is not None else [] + +# Normalize list parameters +print("[Runner] Step 2: Normalizing list parameters") +list_count = 0 +for key, value in list(params.items()): + if should_normalize_as_list(key, value): + original = value + normalized = normalize_to_list(value) + if original != normalized: + params[key] = normalized + list_count += 1 + if len(str(normalized)) > 100: + print(f"[Runner] Normalized {key}: {type(original).__name__} -> list of {len(normalized)} items") + else: + print(f"[Runner] Normalized {key}: {original} -> {normalized}") + +if list_count > 0: + print(f"[Runner] Normalized {list_count} list parameters") + +# CRITICAL FIX: Handle single-element lists for coordinate columns +# These should be strings, not lists +coordinate_keys = ['X_Coordinate_Column', 'Y_Coordinate_Column', 'X_centroid', 'Y_centroid'] +for key in coordinate_keys: + if key in params: + value = params[key] + if isinstance(value, list) and len(value) == 1: + params[key] = value[0] + print(f"[Runner] Extracted single value from {key}: {value} -> {params[key]}") + +# Also check for any key ending with '_Column' that has a single-element list +for key in list(params.keys()): + if key.endswith('_Column') and isinstance(params[key], list) and len(params[key]) == 1: + original = params[key] + params[key] = params[key][0] + print(f"[Runner] Extracted single value from {key}: {original} -> {params[key]}") + +# =========================================================================== +# OUTPUTS HANDLING +# =========================================================================== + +# Extract outputs specification +raw_outputs = params.pop('outputs', {}) +outputs = {} + +if isinstance(raw_outputs, dict): + outputs = raw_outputs +elif isinstance(raw_outputs, str): + try: + maybe = json.loads(_unsanitize(raw_outputs)) + if isinstance(maybe, dict): + outputs = maybe + except Exception: + pass + +if not isinstance(outputs, dict) or not outputs: + print("[Runner] Warning: 'outputs' missing or not a dict; using defaults") + if 'boxplot' in template_name or 'plot' in template_name or 'histogram' in template_name: + outputs = {'DataFrames': 'dataframe_folder', 'figures': 'figure_folder'} + elif 'load_csv' in template_name: + outputs = {'DataFrames': 'dataframe_folder'} + elif 'interactive' in template_name: + outputs = {'html': 'html_folder'} + else: + outputs = {'analysis': 'transform_output.pickle'} + +print(f"[Runner] Outputs -> {list(outputs.keys())}") + +# Create output directories +for output_type, path in outputs.items(): + if output_type != 'analysis' and path: + os.makedirs(path, exist_ok=True) + print(f"[Runner] Created {output_type} directory: {path}") + +# Add output paths to params +params['save_results'] = True + +if 'analysis' in outputs: + params['output_path'] = outputs['analysis'] + params['Output_Path'] = outputs['analysis'] + params['Output_File'] = outputs['analysis'] + +if 'DataFrames' in outputs: + df_dir = outputs['DataFrames'] + params['output_dir'] = df_dir + params['Export_Dir'] = df_dir + params['Output_File'] = os.path.join(df_dir, f'{template_name}_output.csv') + +if 'figures' in outputs: + fig_dir = outputs['figures'] + params['figure_dir'] = fig_dir + params['Figure_Dir'] = fig_dir + params['Figure_File'] = os.path.join(fig_dir, f'{template_name}.png') + +if 'html' in outputs: + html_dir = outputs['html'] + params['html_dir'] = html_dir + params['Output_File'] = os.path.join(html_dir, f'{template_name}.html') + +# Save runtime parameters +with open('params.runtime.json', 'w') as f: + json.dump(params, f, indent=2) + +# Save clean params for Galaxy display +params_display = {k: v for k, v in params.items() + if k not in ['Output_File', 'Figure_File', 'output_dir', 'figure_dir']} +with open('config_used.json', 'w') as f: + json.dump(params_display, f, indent=2) + +print(f"[Runner] Saved runtime parameters") + +# ============================================================================ +# LOAD AND EXECUTE TEMPLATE +# ============================================================================ + +# Try to import from installed package first (Docker environment) +template_module_name = template_filename.replace('.py', '') +try: + import importlib + mod = importlib.import_module(f'spac.templates.{template_module_name}') + print(f"[Runner] Loaded template from package: spac.templates.{template_module_name}") +except (ImportError, ModuleNotFoundError): + # Fallback to loading from file + print(f"[Runner] Package import failed, trying file load") + import importlib.util + + # Standard locations + template_paths = [ + f'/app/spac/templates/{template_filename}', + f'/opt/spac/templates/{template_filename}', + f'/opt/SCSAWorkflow/src/spac/templates/{template_filename}', + template_filename # Current directory + ] + + spec = None + for path in template_paths: + if os.path.exists(path): + spec = importlib.util.spec_from_file_location("template_mod", path) + if spec: + print(f"[Runner] Found template at: {path}") + break + + if not spec or not spec.loader: + print(f"[Runner] ERROR: Could not find template: {template_filename}") + sys.exit(1) + + mod = importlib.util.module_from_spec(spec) + spec.loader.exec_module(mod) + +# Verify run_from_json exists +if not hasattr(mod, 'run_from_json'): + print('[Runner] ERROR: Template missing run_from_json function') + sys.exit(2) + +# Check function signature +sig = inspect.signature(mod.run_from_json) +kwargs = {} + +if 'save_results' in sig.parameters: + kwargs['save_results'] = True +if 'show_plot' in sig.parameters: + kwargs['show_plot'] = False + +print(f"[Runner] Executing template with kwargs: {kwargs}") + +# Execute template +try: + result = mod.run_from_json('params.runtime.json', **kwargs) + print(f"[Runner] Template completed, returned: {type(result).__name__}") + + # Handle different return types + if result is not None: + if isinstance(result, dict): + print(f"[Runner] Template saved files: {list(result.keys())}") + elif isinstance(result, tuple): + # Handle tuple returns + saved_count = 0 + for i, item in enumerate(result): + if hasattr(item, 'savefig') and 'figures' in outputs: + import matplotlib + matplotlib.use('Agg') + import matplotlib.pyplot as plt + fig_path = os.path.join(outputs['figures'], f'figure_{i+1}.png') + item.savefig(fig_path, dpi=300, bbox_inches='tight') + plt.close(item) + saved_count += 1 + print(f"[Runner] Saved figure to {fig_path}") + elif hasattr(item, 'to_csv') and 'DataFrames' in outputs: + df_path = os.path.join(outputs['DataFrames'], f'table_{i+1}.csv') + item.to_csv(df_path, index=True) + saved_count += 1 + print(f"[Runner] Saved DataFrame to {df_path}") + + if saved_count > 0: + print(f"[Runner] Saved {saved_count} in-memory results") + + elif hasattr(result, 'to_csv') and 'DataFrames' in outputs: + df_path = os.path.join(outputs['DataFrames'], 'output.csv') + result.to_csv(df_path, index=True) + print(f"[Runner] Saved DataFrame to {df_path}") + + elif hasattr(result, 'savefig') and 'figures' in outputs: + import matplotlib + matplotlib.use('Agg') + import matplotlib.pyplot as plt + fig_path = os.path.join(outputs['figures'], 'figure.png') + result.savefig(fig_path, dpi=300, bbox_inches='tight') + plt.close(result) + print(f"[Runner] Saved figure to {fig_path}") + + elif hasattr(result, 'write_h5ad') and 'analysis' in outputs: + result.write_h5ad(outputs['analysis']) + print(f"[Runner] Saved AnnData to {outputs['analysis']}") + +except Exception as e: + print(f"[Runner] ERROR in template execution: {e}") + print(f"[Runner] Error type: {type(e).__name__}") + traceback.print_exc() + + # Debug help for common issues + if "String Columns must be a *list*" in str(e): + print("\n[Runner] DEBUG: String_Columns validation failed") + print(f"[Runner] Current String_Columns value: {params.get('String_Columns')}") + print(f"[Runner] Type: {type(params.get('String_Columns'))}") + + elif "regex pattern" in str(e).lower() or "^8$" in str(e): + print("\n[Runner] DEBUG: This appears to be a column index issue") + print("[Runner] Check that column indices were properly converted to names") + print("[Runner] Current Features_to_Analyze value:", params.get('Features_to_Analyze')) + print("[Runner] Current Feature_Regex value:", params.get('Feature_Regex')) + + sys.exit(1) + +# Verify outputs +print("[Runner] Verifying outputs...") +found_outputs = False + +for output_type, path in outputs.items(): + if output_type == 'analysis': + if os.path.exists(path): + size = os.path.getsize(path) + print(f"[Runner] ✔ {output_type}: {path} ({size:,} bytes)") + found_outputs = True + else: + print(f"[Runner] ✗ {output_type}: NOT FOUND") + else: + if os.path.exists(path) and os.path.isdir(path): + files = os.listdir(path) + if files: + print(f"[Runner] ✔ {output_type}: {len(files)} files") + for f in files[:3]: + print(f"[Runner] - {f}") + if len(files) > 3: + print(f"[Runner] ... and {len(files)-3} more") + found_outputs = True + else: + print(f"[Runner] ⚠ {output_type}: directory empty") + +# Check for files in working directory and move them +print("[Runner] Checking for files in working directory...") +for file in os.listdir('.'): + if os.path.isdir(file) or file in ['params.runtime.json', 'config_used.json', + 'tool_stdout.txt', 'outputs_returned.json']: + continue + + if file.endswith('.csv') and 'DataFrames' in outputs: + if not os.path.exists(os.path.join(outputs['DataFrames'], file)): + target = os.path.join(outputs['DataFrames'], file) + shutil.move(file, target) + print(f"[Runner] Moved {file} to {target}") + found_outputs = True + elif file.endswith(('.png', '.pdf', '.jpg', '.svg')) and 'figures' in outputs: + if not os.path.exists(os.path.join(outputs['figures'], file)): + target = os.path.join(outputs['figures'], file) + shutil.move(file, target) + print(f"[Runner] Moved {file} to {target}") + found_outputs = True + +if found_outputs: + print("[Runner] === SUCCESS ===") +else: + print("[Runner] WARNING: No outputs created") + +PYTHON_RUNNER + +EXIT_CODE=$? + +if [ $EXIT_CODE -ne 0 ]; then + echo "ERROR: Template execution failed with exit code $EXIT_CODE" + exit 1 +fi + +echo "=== Execution Complete ===" +exit 0 \ No newline at end of file diff --git a/galaxy_tools/spac_zscore_normalization/spac_zscore_normalization.xml b/galaxy_tools/spac_zscore_normalization/spac_zscore_normalization.xml new file mode 100644 index 00000000..52be678c --- /dev/null +++ b/galaxy_tools/spac_zscore_normalization/spac_zscore_normalization.xml @@ -0,0 +1,67 @@ + + Perform z-scores normalization for the selected data table in the analysis. Normalized data table... + + + nciccbr/spac:v1 + + + + python3 + + + tool_stdout.txt && + + ## Run the universal wrapper (template name without .py extension) + bash $__tool_directory__/run_spac_template.sh "$params_json" zscore_normalization + ]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + @misc{spac_toolkit, + author = {FNLCR DMAP Team}, + title = {SPAC: SPAtial single-Cell analysis}, + year = {2024}, + url = {https://github.com/FNLCR-DMAP/SCSAWorkflow} + } + + + \ No newline at end of file diff --git a/galaxy_tools/test-data/setup_analysis.h5ad b/galaxy_tools/test-data/setup_analysis.h5ad new file mode 100644 index 00000000..11cc7eec Binary files /dev/null and b/galaxy_tools/test-data/setup_analysis.h5ad differ diff --git a/galaxy_tools/test-data/setup_analysis.pickle b/galaxy_tools/test-data/setup_analysis.pickle new file mode 100644 index 00000000..2aa845ab Binary files /dev/null and b/galaxy_tools/test-data/setup_analysis.pickle differ diff --git a/setup.py b/setup.py index 945475ad..79b1bbff 100644 --- a/setup.py +++ b/setup.py @@ -2,7 +2,7 @@ setup( name='spac', - version="0.9.0", + version="0.9.1", description=( 'SPatial Analysis for single-Cell analysis (SPAC)' 'is a Scalable Python package for single-cell spatial protein data ' diff --git a/src/spac/__init__.py b/src/spac/__init__.py index f8b63dd6..c7a7ff09 100644 --- a/src/spac/__init__.py +++ b/src/spac/__init__.py @@ -22,7 +22,7 @@ functions.extend(module_functions) # Define the package version before using it in __all__ -__version__ = "0.9.0" +__version__ = "0.9.1" # Define a __all__ list to specify which functions should be considered public __all__ = functions diff --git a/src/spac/templates/__init__.py b/src/spac/templates/__init__.py new file mode 100644 index 00000000..89c61771 --- /dev/null +++ b/src/spac/templates/__init__.py @@ -0,0 +1,13 @@ +""" +Canonical SPAC template sub‑package. + +Each template is a self‑contained module that + • reads parameters from JSON/dict + • runs a SPAC analysis function + • returns / saves results + +Available templates +------------------- +- ripley_l_template.run_from_json +""" + diff --git a/src/spac/templates/analysis_to_csv_template.py b/src/spac/templates/analysis_to_csv_template.py new file mode 100644 index 00000000..b079439e --- /dev/null +++ b/src/spac/templates/analysis_to_csv_template.py @@ -0,0 +1,199 @@ +""" +Platform-agnostic Analysis to CSV template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Reads outputs configuration from blueprint JSON file. + +Usage +----- +>>> from spac.templates.analysis_to_csv_template import run_from_json +>>> run_from_json("examples/analysis_to_csv_params.json") +""" +import json +import sys +import logging +from pathlib import Path +from typing import Any, Dict, Union +import pandas as pd + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.utils import check_table +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, + text_to_value, +) + +logger = logging.getLogger(__name__) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, +) -> Union[Dict[str, str], pd.DataFrame]: + """ + Execute Analysis to CSV analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Analysis": "path/to/data.pickle", + "Table_to_Export": "Original", + "Save_as_CSV_File": false, + "outputs": { + "dataframe": {"type": "file", "name": "dataframe.csv"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If False, returns the dataframe + directly for in-memory workflows. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. + + Returns + ------- + dict or DataFrame + If save_to_disk=True: Dictionary of saved file paths with structure: + {"dataframe": "path/to/dataframe.csv"} + If save_to_disk=False: The processed DataFrame + + Notes + ----- + Output Structure: + - DataFrame is saved as a CSV file when save_to_disk is True + - Otherwise, the DataFrame is returned for programmatic use + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + if "outputs" not in params: + params["outputs"] = { + "dataframe": {"type": "file", "name": "dataframe.csv"} + } + + # Load the upstream analysis data + adata = load_input(params["Upstream_Analysis"]) + + # Extract parameters + input_layer = params.get("Table_to_Export", "Original") + + if input_layer == "Original": + input_layer = None + + def export_layer_to_csv(adata, layer=None): + """ + Exports the specified layer or the default .X data matrix of an + AnnData object to a CSV file. + """ + # Check if the provided layer exists in the AnnData object + if layer: + check_table(adata, tables=layer) + data_to_export = pd.DataFrame( + adata.layers[layer], + index=adata.obs.index, + columns=adata.var.index + ) + else: + data_to_export = pd.DataFrame( + adata.X, + index=adata.obs.index, + columns=adata.var.index + ) + + # Join with the observation metadata + full_data_df = data_to_export.join(adata.obs) + + # Join the spatial coordinates + # Extract the spatial coordinates + spatial_df = pd.DataFrame( + adata.obsm['spatial'], + index=adata.obs.index, + columns=['spatial_x', 'spatial_y'] + ) + + # Join spatial_df with full_data_df + full_data_df = full_data_df.join(spatial_df) + + return full_data_df + + csv_data = export_layer_to_csv( + adata=adata, + layer=input_layer + ) + + logger.info(f"Exported DataFrame shape: {csv_data.shape}") + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + if "dataframe" in params["outputs"]: + results_dict["dataframe"] = csv_data + + # Use centralized save_results function + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + logger.info("Analysis to CSV completed successfully.") + return saved_files + else: + # Return the dataframe directly for in-memory workflows + logger.info("Returning DataFrame for in-memory use") + return csv_data + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python analysis_to_csv_template.py " + "[output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + else: + print("\nReturned DataFrame") + print(f"DataFrame shape: {result.shape}") diff --git a/src/spac/templates/append_annotation_template.py b/src/spac/templates/append_annotation_template.py new file mode 100644 index 00000000..1d51a83c --- /dev/null +++ b/src/spac/templates/append_annotation_template.py @@ -0,0 +1,208 @@ +""" +Platform-agnostic Append Annotation template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Reads outputs configuration from blueprint JSON file. + +Usage +----- +>>> from spac.templates.append_annotation_template import run_from_json +>>> run_from_json("examples/append_annotation_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union +import pandas as pd +import logging + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.data_utils import append_annotation +from spac.utils import check_column_name +from spac.templates.template_utils import ( + save_results, + parse_params, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, +) -> Union[Dict[str, str], pd.DataFrame]: + """ + Execute Append Annotation analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Dataset": "path/to/dataframe.csv", + "Annotation_Pair_List": ["column1:value1", "column2:value2"], + "outputs": { + "dataframe": {"type": "file", "name": "dataframe.csv"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If True, saves the DataFrame with + appended annotations to a CSV file. If False, returns the DataFrame + directly for in-memory workflows. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. All outputs will be saved relative to this directory. + + Returns + ------- + dict or DataFrame + If save_to_disk=True: Dictionary of saved file paths with structure: + { + "dataframe": "path/to/dataframe.csv" + } + If save_to_disk=False: The processed DataFrame with appended annotations + + Notes + ----- + Output Structure: + - DataFrame is saved as a single CSV file + - When save_to_disk=False, the DataFrame is returned for programmatic use + + Examples + -------- + >>> # Save results to disk + >>> saved_files = run_from_json("params.json") + >>> print(saved_files["dataframe"]) # Path to saved CSV file + + >>> # Get results in memory + >>> annotated_df = run_from_json("params.json", save_to_disk=False) + + >>> # Custom output directory + >>> saved = run_from_json("params.json", output_dir="/custom/path") + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + if "outputs" not in params: + params["outputs"] = { + "dataframe": {"type": "file", "name": "dataframe.csv"} + } + + # Load upstream data - DataFrame or CSV file + upstream_dataset = params["Upstream_Dataset"] + if isinstance(upstream_dataset, pd.DataFrame): + input_dataframe = upstream_dataset # Direct DataFrame from previous step + elif isinstance(upstream_dataset, (str, Path)): + # Galaxy passes .dat files, but they contain CSV data + # Don't check extension - directly read as CSV + path = Path(upstream_dataset) + try: + input_dataframe = pd.read_csv(path) + logging.info(f"Successfully loaded CSV data from: {path}") + except Exception as e: + raise ValueError( + f"Failed to read CSV data from '{path}'. " + f"This tool expects CSV/tabular format. " + f"Error: {str(e)}" + ) + else: + raise TypeError( + f"Upstream_Dataset must be DataFrame or file path. " + f"Got {type(upstream_dataset)}" + ) + + # Extract parameters + dataset_mapping_rules = params.get( + "Annotation_Pair_List", ["Example:Example"] + ) + + # Initialize an empty dictionary + parsed_dict = {} + + # Loop through each string pair in the list + for pair in dataset_mapping_rules: + # Split the string on the colon + key, value = pair.split(":") + check_column_name(key, pair) + # Add the key-value pair to the dictionary + parsed_dict[key] = value + + logging.info(f"The pairs to add are:\n{parsed_dict}") + + output_dataframe = append_annotation( + input_dataframe, + parsed_dict + ) + + logging.info(output_dataframe.info()) + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + # Check for dataframe output + if "dataframe" in params["outputs"]: + results_dict["dataframe"] = output_dataframe + + # Use centralized save_results function + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + logging.info("Append Annotation analysis completed successfully.") + return saved_files + else: + # Return the DataFrame directly for in-memory workflows + logging.info("Returning DataFrame for in-memory use") + return output_dataframe + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python append_annotation_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + # Run analysis + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + # Display results based on return type + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + else: + print("\nReturned DataFrame") + print(f"DataFrame shape: {result.shape}") diff --git a/src/spac/templates/append_pin_color_rule_template.py b/src/spac/templates/append_pin_color_rule_template.py new file mode 100644 index 00000000..eb9f06d8 --- /dev/null +++ b/src/spac/templates/append_pin_color_rule_template.py @@ -0,0 +1,168 @@ +""" +Platform-agnostic Append Pin Color Rule template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Reads outputs configuration from blueprint JSON file. + +Usage +----- +>>> from spac.templates.add_pin_color_rule_template import run_from_json +>>> run_from_json("examples/add_pin_color_rule_params.json") +""" +import json +import sys +import logging +from pathlib import Path +from typing import Any, Dict, Union + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.data_utils import add_pin_color_rules +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, + string_list_to_dictionary, +) + +logger = logging.getLogger(__name__) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, +) -> Union[Dict[str, str], Any]: + """ + Execute Append Pin Color Rule analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Analysis": "path/to/data.pickle", + "Label_Color_Map": ["label1:red", "label2:blue"], + "Color_Map_Name": "_spac_colors", + "Overwrite_Previous_Color_Map": true, + "outputs": { + "analysis": {"type": "file", "name": "output.pickle"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If False, returns the adata object + directly for in-memory workflows. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. + + Returns + ------- + dict or AnnData + If save_to_disk=True: Dictionary of saved file paths with structure: + {"analysis": "path/to/output.pickle"} + If save_to_disk=False: The processed AnnData object + + Notes + ----- + Output Structure: + - Analysis output is saved as a single pickle file + - When save_to_disk=False, the AnnData object is returned for programmatic use + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + if "outputs" not in params: + params["outputs"] = { + "analysis": {"type": "file", "name": "output.pickle"} + } + + # Load the upstream analysis data + adata = load_input(params["Upstream_Analysis"]) + + # Extract parameters + color_dict_string_list = params.get("Label_Color_Map", []) + color_map_name = params.get("Color_Map_Name", "_spac_colors") + overwrite = params.get("Overwrite_Previous_Color_Map", True) + + color_dict = string_list_to_dictionary( + color_dict_string_list, + key_name="label", + value_name="color" + ) + + add_pin_color_rules( + adata, + label_color_dict=color_dict, + color_map_name=color_map_name, + overwrite=overwrite + ) + logger.info(f"{adata.uns[f'{color_map_name}_summary']}") + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + if "analysis" in params["outputs"]: + results_dict["analysis"] = adata + + # Use centralized save_results function + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + logger.info("Append Pin Color Rule analysis completed successfully.") + return saved_files + else: + # Return the adata object directly for in-memory workflows + logger.info("Returning AnnData object for in-memory use") + return adata + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python add_pin_color_rule_template.py " + "[output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + else: + print("\nReturned AnnData object") diff --git a/src/spac/templates/arcsinh_normalization_template.py b/src/spac/templates/arcsinh_normalization_template.py new file mode 100644 index 00000000..fcdf62da --- /dev/null +++ b/src/spac/templates/arcsinh_normalization_template.py @@ -0,0 +1,218 @@ +""" +Platform-agnostic Arcsinh Normalization template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Follows standardized output schema where analysis is saved as a file. + +Usage +----- +>>> from spac.templates.arcsinh_normalization_template import run_from_json +>>> run_from_json("examples/arcsinh_normalization_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union +import pandas as pd +import logging + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.transformations import arcsinh_transformation +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, + text_to_value, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, +) -> Union[Dict[str, str], Any]: + """ + Execute Arcsinh Normalization analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Analysis": "path/to/data.pickle", + "Table_to_Process": "Original", + "Co_Factor": "5.0", + "Percentile": "None", + "Output_Table_Name": "arcsinh", + "Per_Batch": "False", + "Annotation": "None", + "outputs": { + "analysis": {"type": "file", "name": "output.pickle"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If True, saves the AnnData object + to a pickle file. If False, returns the AnnData object directly + for in-memory workflows. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. All outputs will be saved relative to this directory. + + Returns + ------- + dict or AnnData + If save_to_disk=True: Dictionary of saved file paths with structure: + {"analysis": "path/to/output.pickle"} + If save_to_disk=False: The processed AnnData object for in-memory use + + Notes + ----- + Output Structure: + - Analysis output is saved as a single pickle file (standardized for analysis outputs) + - When save_to_disk=False, the AnnData object is returned for programmatic use + + Examples + -------- + >>> # Save results to disk + >>> saved_files = run_from_json("params.json") + >>> print(saved_files["analysis"]) # Path to saved pickle file + >>> # './output.pickle' + + >>> # Get results in memory for further processing + >>> adata = run_from_json("params.json", save_to_disk=False) + >>> # Can now work with adata object directly + + >>> # Custom output directory + >>> saved = run_from_json("params.json", output_dir="/custom/path") + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + # Analysis uses file type per standardized schema + if "outputs" not in params: + params["outputs"] = { + "analysis": {"type": "file", "name": "output.pickle"} + } + + # Load the upstream analysis data + adata = load_input(params["Upstream_Analysis"]) + + # Extract parameters + input_layer = params.get("Table_to_Process", "Original") + co_factor = params.get("Co_Factor", "5.0") + percentile = params.get("Percentile", "None") + output_layer = params.get("Output_Table_Name", "arcsinh") + per_batch = params.get("Per_Batch", "False") + annotation = params.get("Annotation", "None") + + input_layer = text_to_value( + input_layer, + default_none_text="Original" + ) + + co_factor = text_to_value( + co_factor, + default_none_text="None", + to_float=True, + param_name="co_factor" + ) + + percentile = text_to_value( + percentile, + default_none_text="None", + to_float=True, + param_name="percentile" + ) + + if per_batch == "True": + per_batch = True + else: + per_batch = False + + annotation = text_to_value( + annotation, + default_none_text="None" + ) + + transformed_data = arcsinh_transformation( + adata, + input_layer=input_layer, + co_factor=co_factor, + percentile=percentile, + output_layer=output_layer, + per_batch=per_batch, + annotation=annotation + ) + + logging.info(f"Transformed data stored in layer: {output_layer}") + dataframe = pd.DataFrame(transformed_data.layers[output_layer]) + logging.info(f"Arcsinh transformation summary:\n{dataframe.describe()}") + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + if "analysis" in params["outputs"]: + results_dict["analysis"] = transformed_data + + # Use centralized save_results function + # All file handling and logging is now done by save_results + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + logging.info( + f"Arcsinh Normalization completed → {saved_files['analysis']}" + ) + return saved_files + else: + # Return the adata object directly for in-memory workflows + logging.info("Returning AnnData object (not saving to file)") + return transformed_data + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python arcsinh_normalization_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + # Run analysis + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + # Display results based on return type + if isinstance(result, dict): + print("\nOutput files:") + for key, path in result.items(): + print(f" {key}: {path}") + else: + print("\nReturned AnnData object for in-memory use") + print(f"AnnData: {result}") diff --git a/src/spac/templates/binary_to_categorical_annotation_template.py b/src/spac/templates/binary_to_categorical_annotation_template.py new file mode 100644 index 00000000..127a8e4a --- /dev/null +++ b/src/spac/templates/binary_to_categorical_annotation_template.py @@ -0,0 +1,203 @@ +""" +Platform-agnostic Binary to Categorical Annotation template converted from +NIDAP. Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Reads outputs configuration from blueprint JSON file. + +Usage +----- +>>> from spac.templates.binary_to_categorical_annotation_template import \ +... run_from_json +>>> run_from_json("examples/binary_to_categorical_annotation_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union +import pandas as pd +import logging + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.data_utils import bin2cat +from spac.utils import check_column_name +from spac.templates.template_utils import ( + save_results, + parse_params, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, +) -> Union[Dict[str, str], pd.DataFrame]: + """ + Execute Binary to Categorical Annotation analysis with parameters from + JSON. Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Dataset": "path/to/dataframe.csv", + "Binary_Annotation_Columns": ["Col1", "Col2", "Col3"], + "New_Annotation_Name": "cell_labels", + "outputs": { + "dataframe": {"type": "file", "name": "dataframe.csv"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If True, saves the DataFrame with + converted annotations to a CSV file. If False, returns the DataFrame + directly for in-memory workflows. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. All outputs will be saved relative to this directory. + + Returns + ------- + dict or DataFrame + If save_to_disk=True: Dictionary of saved file paths with structure: + { + "dataframe": "path/to/dataframe.csv" + } + If save_to_disk=False: The processed DataFrame with categorical annotation + + Notes + ----- + Output Structure: + - DataFrame is saved as a single CSV file + - When save_to_disk=False, the DataFrame is returned for programmatic use + + Examples + -------- + >>> # Save results to disk + >>> saved_files = run_from_json("params.json") + >>> print(saved_files["dataframe"]) # Path to saved CSV file + + >>> # Get results in memory + >>> converted_df = run_from_json("params.json", save_to_disk=False) + + >>> # Custom output directory + >>> saved = run_from_json("params.json", output_dir="/custom/path") + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + if "outputs" not in params: + params["outputs"] = { + "dataframe": {"type": "file", "name": "dataframe.csv"} + } + + # Load upstream data - DataFrame or CSV file + upstream_dataset = params["Upstream_Dataset"] + if isinstance(upstream_dataset, pd.DataFrame): + input_dataset = upstream_dataset # Direct DataFrame from previous step + elif isinstance(upstream_dataset, (str, Path)): + # Galaxy passes .dat files, but they contain CSV data + # Don't check extension - directly read as CSV + path = Path(upstream_dataset) + try: + input_dataset = pd.read_csv(path) + logging.info(f"Successfully loaded CSV data from: {path}") + except Exception as e: + raise ValueError( + f"Failed to read CSV data from '{path}'. " + f"This tool expects CSV/tabular format. " + f"Error: {str(e)}" + ) + else: + raise TypeError( + f"Upstream_Dataset must be DataFrame or file path. " + f"Got {type(upstream_dataset)}" + ) + + # Extract parameters + one_hot_annotations = params.get( + "Binary_Annotation_Columns", + ["Normal_Cells", "Cancer_Cells", "Immuno_Cells"] + ) + new_annotation = params.get("New_Annotation_Name", "cell_labels") + + check_column_name(new_annotation, "New Annotation Name") + + converted_df = bin2cat( + data=input_dataset, + one_hot_annotations=one_hot_annotations, + new_annotation=new_annotation + ) + + logging.info(converted_df.info()) + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + # Check for dataframe output + if "dataframe" in params["outputs"]: + results_dict["dataframe"] = converted_df + + # Use centralized save_results function + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + logging.info("Binary to Categorical Annotation completed successfully.") + return saved_files + else: + # Return the DataFrame directly for in-memory workflows + logging.info("Returning DataFrame for in-memory use") + return converted_df + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python binary_to_categorical_annotation_template.py " + " [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + # Run analysis + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + # Display results based on return type + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + else: + print("\nReturned DataFrame") + print(f"DataFrame shape: {result.shape}") diff --git a/src/spac/templates/boxplot_template.py b/src/spac/templates/boxplot_template.py new file mode 100644 index 00000000..791e5e5e --- /dev/null +++ b/src/spac/templates/boxplot_template.py @@ -0,0 +1,269 @@ +""" +Platform-agnostic Boxplot template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Follows standardized output schema where figures are saved as directories. + +Usage +----- +>>> from spac.templates.boxplot_template import run_from_json +>>> run_from_json("examples/boxplot_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union, List, Optional, Tuple +import logging +import pandas as pd +import matplotlib.pyplot as plt +import seaborn as sns + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.visualization import boxplot +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, + text_to_value, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + show_plot: bool = True, + output_dir: str = None, +) -> Union[Dict[str, Union[str, List[str]]], Tuple[Any, pd.DataFrame]]: + """ + Execute Boxplot analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Analysis": "path/to/data.pickle", + "Primary_Annotation": "cell_type", + "Feature_s_to_Plot": ["CD4", "CD8"], + "outputs": { + "figures": {"type": "directory", "name": "figures"}, + "dataframe": {"type": "file", "name": "output.csv"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If True, saves figures to a directory + and summary statistics to a CSV file. If False, returns the figure and + summary dataframe directly for in-memory workflows. Default is True. + show_plot : bool, optional + Whether to display the plot. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. All outputs will be saved relative to this directory. + + Returns + ------- + dict or tuple + If save_to_disk=True: Dictionary of saved file paths with structure: + { + "figures": ["path/to/figures/boxplot.png"], # List of figure paths + "DataFrame": "path/to/output.csv" # Single file path + } + If save_to_disk=False: Tuple of (matplotlib.figure.Figure, pd.DataFrame) + containing the figure object and summary statistics dataframe + + Notes + ----- + Output Structure: + - Figures are saved in a directory (standardized for all figure outputs) + - Summary statistics are saved as a single CSV file + - When save_to_disk=False, objects are returned for programmatic use + + Examples + -------- + >>> # Save results to disk + >>> saved_files = run_from_json("params.json") + >>> print(saved_files["figure"]) # List of paths to saved plots + >>> # ['./figures/boxplot.png'] + + >>> # Get results in memory + >>> fig, summary_df = run_from_json("params.json", save_to_disk=False) + + >>> # Custom output directory + >>> saved = run_from_json("params.json", output_dir="/custom/path") + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + # Figures use directory type per standardized schema + if "outputs" not in params: + params["outputs"] = { + "figures": {"type": "directory", "name": "figures"}, + "dataframe": {"type": "file", "name": "output.csv"} + } + + # Load the upstream analysis data + adata = load_input(params["Upstream_Analysis"]) + + # Extract parameters + annotation = params.get("Primary_Annotation", "None") + second_annotation = params.get("Secondary_Annotation", "None") + layer_to_plot = params.get("Table_to_Visualize", "Original") + feature_to_plot = params.get("Feature_s_to_Plot", ["All"]) + log_scale = params.get("Value_Axis_Log_Scale", False) + + # Extract figure parameters with defaults + figure_title = params.get("Figure_Title", "BoxPlot") + figure_horizontal = params.get("Horizontal_Plot", False) + fig_width = params.get("Figure_Width", 12) + fig_height = params.get("Figure_Height", 8) + fig_dpi = params.get("Figure_DPI", 300) + font_size = params.get("Font_Size", 10) + showfliers = params.get("Keep_Outliers", True) + + # Process parameters to match expected format + # Convert "None" strings to actual None values + layer_to_plot = None if layer_to_plot == "Original" else layer_to_plot + second_annotation = None if second_annotation == "None" else second_annotation + annotation = None if annotation == "None" else annotation + + # Convert horizontal flag to orientation string + figure_orientation = "h" if figure_horizontal else "v" + + # Handle feature selection + if isinstance(feature_to_plot, str): + # Convert single string to list + feature_to_plot = [feature_to_plot] + + # Check for "All" features selection + if any(item == "All" for item in feature_to_plot): + logging.info("Plotting All Features") + feature_to_plot = adata.var_names.tolist() + else: + feature_str = "\n".join(feature_to_plot) + logging.info(f"Plotting Feature:\n{feature_str}") + + # Create the plot exactly as in NIDAP template + fig, ax = plt.subplots() + plt.rcParams.update({'font.size': font_size}) + fig.set_size_inches(fig_width, fig_height) + fig.set_dpi(fig_dpi) + + fig, ax, df = boxplot( + adata=adata, + ax=ax, + layer=layer_to_plot, + annotation=annotation, + second_annotation=second_annotation, + features=feature_to_plot, + log_scale=log_scale, + orient=figure_orientation, + showfliers=showfliers + ) + + # Set the figure title + ax.set_title(figure_title) + + # Get summary statistics of the dataset + logging.info("Summary statistics of the dataset:") + summary = df.describe() + + # Convert the summary to a DataFrame that includes the index as a column + summary_df = summary.reset_index() + logging.info(f"\n{summary_df.to_string()}") + + # Move the legend outside the plotting area + # Check if a legend exists + try: + sns.move_legend(ax, "upper left", bbox_to_anchor=(1, 1)) + except Exception as e: + logging.debug("Legend does not exist.") + + # Apply tight layout to prevent label cutoff + plt.tight_layout() + + if show_plot: + plt.show() + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + # Package figure in a dictionary for directory saving + # This ensures it's saved in a directory per standardized schema + if "figures" in params["outputs"]: + results_dict["figures"] = {"boxplot": fig} # Dict triggers directory save + + # Check for DataFrames output (case-insensitive) + if any(k.lower() == "dataframe" for k in params["outputs"].keys()): + results_dict["dataframe"] = summary_df + + # Use centralized save_results function + # All file handling and logging is now done by save_results + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + logging.info("Boxplot analysis completed successfully.") + return saved_files + else: + # Return objects directly for in-memory workflows + logging.info( + "Returning figure and summary dataframe for in-memory use" + ) + return fig, summary_df + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python boxplot_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + # Run analysis + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + # Display results based on return type + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + else: + fig, summary_df = result + print("\nReturned figure and summary dataframe for in-memory use") + print(f"Figure size: {fig.get_size_inches()}") + print(f"Summary shape: {summary_df.shape}") + print("\nSummary statistics preview:") + print(summary_df.head()) \ No newline at end of file diff --git a/src/spac/templates/calculate_centroid_template.py b/src/spac/templates/calculate_centroid_template.py new file mode 100644 index 00000000..b59add49 --- /dev/null +++ b/src/spac/templates/calculate_centroid_template.py @@ -0,0 +1,211 @@ +""" +Platform-agnostic Calculate Centroid template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Reads outputs configuration from blueprint JSON file. + +Usage +----- +>>> from spac.templates.calculate_centroid_template import run_from_json +>>> run_from_json("examples/calculate_centroid_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union, Tuple +import pandas as pd +import logging + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.data_utils import calculate_centroid +from spac.utils import check_column_name +from spac.templates.template_utils import ( + save_results, + parse_params, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, +) -> Union[Dict[str, str], pd.DataFrame]: + """ + Execute Calculate Centroid analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Dataset": "path/to/dataframe.csv", + "Min_X_Coordinate_Column_Name": "XMin", + "Max_X_Coordinate_Column_Name": "XMax", + "Min_Y_Coordinate_Column_Name": "YMin", + "Max_Y_Coordinate_Column_Name": "YMax", + "outputs": { + "dataframe": {"type": "file", "name": "dataframe.csv"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If True, saves the DataFrame with + calculated centroids to a CSV file. If False, returns the DataFrame + directly for in-memory workflows. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. All outputs will be saved relative to this directory. + + Returns + ------- + dict or DataFrame + If save_to_disk=True: Dictionary of saved file paths with structure: + { + "dataframe": "path/to/dataframe.csv" + } + If save_to_disk=False: The processed DataFrame with centroids + + Notes + ----- + Output Structure: + - DataFrame is saved as a single CSV file + - When save_to_disk=False, the DataFrame is returned for programmatic use + + Examples + -------- + >>> # Save results to disk + >>> saved_files = run_from_json("params.json") + >>> print(saved_files["dataframe"]) # Path to saved CSV file + + >>> # Get results in memory + >>> centroid_df = run_from_json("params.json", save_to_disk=False) + + >>> # Custom output directory + >>> saved = run_from_json("params.json", output_dir="/custom/path") + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + # DataFrames typically use file type per standardized schema + if "outputs" not in params: + params["outputs"] = { + "dataframe": {"type": "file", "name": "dataframe.csv"} + } + + # Load upstream data - DataFrame or CSV file + upstream_dataset = params["Upstream_Dataset"] + if isinstance(upstream_dataset, pd.DataFrame): + input_dataset = upstream_dataset # Direct DataFrame from previous step + elif isinstance(upstream_dataset, (str, Path)): + # Galaxy passes .dat files, but they contain CSV data + # Don't check extension - directly read as CSV + path = Path(upstream_dataset) + try: + input_dataset = pd.read_csv(path) + logging.info(f"Successfully loaded CSV data from: {path}") + except Exception as e: + raise ValueError( + f"Failed to read CSV data from '{path}'. " + f"This tool expects CSV/tabular format. " + f"Error: {str(e)}" + ) + else: + raise TypeError( + f"Upstream_Dataset must be DataFrame or file path. " + f"Got {type(upstream_dataset)}" + ) + + # Extract parameters using .get() with defaults from JSON template + x_min = params.get("Min_X_Coordinate_Column_Name", "XMin") + x_max = params.get("Max_X_Coordinate_Column_Name", "XMax") + y_min = params.get("Min_Y_Coordinate_Column_Name", "YMin") + y_max = params.get("Max_Y_Coordinate_Column_Name", "YMax") + new_x = params.get("X_Centroid_Name", "XCentroid") + new_y = params.get("Y_Centroid_Name", "YCentroid") + + check_column_name(new_x, "X Centroid Name") + check_column_name(new_y, "Y Centroid Name") + + centroid_calculated = calculate_centroid( + input_dataset, + x_min=x_min, + x_max=x_max, + y_min=y_min, + y_max=y_max, + new_x=new_x, + new_y=new_y + ) + + logging.info(centroid_calculated.info()) + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + # Check for dataframe output + if "dataframe" in params["outputs"]: + results_dict["dataframe"] = centroid_calculated + + # Use centralized save_results function + # All file handling and logging is now done by save_results + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + logging.info("Calculate Centroid analysis completed successfully.") + return saved_files + else: + # Return the DataFrame directly for in-memory workflows + logging.info("Returning DataFrame for in-memory use") + return centroid_calculated + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python calculate_centroid_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + # Run analysis + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + # Display results based on return type + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + else: + print("\nReturned DataFrame") + print(f"DataFrame shape: {result.shape}") diff --git a/src/spac/templates/combine_annotations_template.py b/src/spac/templates/combine_annotations_template.py new file mode 100644 index 00000000..b152978b --- /dev/null +++ b/src/spac/templates/combine_annotations_template.py @@ -0,0 +1,181 @@ +""" +Platform-agnostic Combine Annotations template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Reads outputs configuration from blueprint JSON file. + +Usage +----- +>>> from spac.templates.combine_annotations_template import run_from_json +>>> run_from_json("examples/combine_annotations_params.json") +""" +import json +import sys +import logging +from pathlib import Path +from typing import Any, Dict, Union, List + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.data_utils import combine_annotations +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, +) + +logger = logging.getLogger(__name__) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, +) -> Union[Dict[str, str], Any]: + """ + Execute Combine Annotations analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Analysis": "path/to/data.pickle", + "Annotations_Names": ["annotation1", "annotation2"], + "New_Annotation_Name": "combined_annotation", + "Separator": "_", + "outputs": { + "dataframe": {"type": "file", "name": "dataframe.csv"}, + "analysis": {"type": "file", "name": "output.pickle"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If False, returns the adata object + directly for in-memory workflows. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. + + Returns + ------- + dict or AnnData + If save_to_disk=True: Dictionary of saved file paths with structure: + { + "dataframe": "path/to/dataframe.csv", + "analysis": "path/to/output.pickle" + } + If save_to_disk=False: The processed AnnData object + + Notes + ----- + Output Structure: + - Analysis output is saved as a pickle file + - DataFrame (label counts) is saved as a CSV file + - When save_to_disk=False, the AnnData object is returned for programmatic use + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + if "outputs" not in params: + params["outputs"] = { + "dataframe": {"type": "file", "name": "dataframe.csv"}, + "analysis": {"type": "file", "name": "output.pickle"} + } + + # Load the upstream analysis data + adata = load_input(params["Upstream_Analysis"]) + + # Extract parameters + annotations_list = params["Annotations_Names"] + new_annotation = params.get("New_Annotation_Name", "combined_annotation") + separator = params.get("Separator", "_") + + combine_annotations( + adata, + annotations=annotations_list, + separator=separator, + new_annotation_name=new_annotation + ) + + logger.info(f"After combining annotations: \n{adata}") + value_counts = adata.obs[new_annotation].value_counts(dropna=False) + logger.info(f"Unique labels in {new_annotation}") + logger.info(f"{value_counts}") + + # Create the frequency CSV for download + df_counts = ( + value_counts + .rename_axis(new_annotation) # move index to a column name + .reset_index(name='count') # two columns: label | count + ) + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + if "dataframe" in params["outputs"]: + results_dict["dataframe"] = df_counts + + if "analysis" in params["outputs"]: + results_dict["analysis"] = adata + + # Use centralized save_results function + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + logger.info("Combine Annotations analysis completed successfully.") + return saved_files + else: + # Return the adata object directly for in-memory workflows + logger.info("Returning AnnData object for in-memory use") + return adata + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python combine_annotations_template.py " + "[output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + else: + print("\nReturned AnnData object") diff --git a/src/spac/templates/combine_dataframes_template.py b/src/spac/templates/combine_dataframes_template.py new file mode 100644 index 00000000..6d23bbc4 --- /dev/null +++ b/src/spac/templates/combine_dataframes_template.py @@ -0,0 +1,217 @@ +""" +Platform-agnostic Combine DataFrames template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Reads outputs configuration from blueprint JSON file. + +Usage +----- +>>> from spac.templates.combine_dataframes_template import run_from_json +>>> run_from_json("examples/combine_dataframes_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union +import pandas as pd +import logging + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.data_utils import combine_dfs +from spac.templates.template_utils import ( + save_results, + parse_params, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, +) -> Union[Dict[str, str], pd.DataFrame]: + """ + Execute Combine DataFrames analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "First_Dataframe": "path/to/first.csv", + "Second_Dataframe": "path/to/second.csv", + "outputs": { + "dataframe": {"type": "file", "name": "dataframe.csv"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If True, saves the combined DataFrame + to a CSV file. If False, returns the DataFrame directly for in-memory + workflows. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. All outputs will be saved relative to this directory. + + Returns + ------- + dict or DataFrame + If save_to_disk=True: Dictionary of saved file paths with structure: + { + "dataframe": "path/to/dataframe.csv" + } + If save_to_disk=False: The combined DataFrame + + Notes + ----- + Output Structure: + - DataFrame is saved as a single CSV file + - When save_to_disk=False, the DataFrame is returned for programmatic use + + Examples + -------- + >>> # Save results to disk + >>> saved_files = run_from_json("params.json") + >>> print(saved_files["dataframe"]) # Path to saved CSV file + + >>> # Get results in memory + >>> combined_df = run_from_json("params.json", save_to_disk=False) + + >>> # Custom output directory + >>> saved = run_from_json("params.json", output_dir="/custom/path") + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + if "outputs" not in params: + params["outputs"] = { + "dataframe": {"type": "file", "name": "dataframe.csv"} + } + + # Load the first dataframe + dataset_A = params["First_Dataframe"] + if isinstance(dataset_A, pd.DataFrame): + dataset_A = dataset_A # Direct DataFrame from previous step + elif isinstance(dataset_A, (str, Path)): + # Galaxy passes .dat files, but they contain CSV data + # Don't check extension - directly read as CSV + path = Path(dataset_A) + try: + dataset_A = pd.read_csv(path) + logging.info(f"Successfully loaded first DataFrame from: {path}") + except Exception as e: + raise ValueError( + f"Failed to read CSV data from '{path}'. " + f"This tool expects CSV/tabular format. " + f"Error: {str(e)}" + ) + else: + raise TypeError( + f"First_Dataframe must be DataFrame or file path. " + f"Got {type(dataset_A)}" + ) + + # Load the second dataframe + dataset_B = params["Second_Dataframe"] + if isinstance(dataset_B, pd.DataFrame): + dataset_B = dataset_B # Direct DataFrame from previous step + elif isinstance(dataset_B, (str, Path)): + # Galaxy passes .dat files, but they contain CSV data + # Don't check extension - directly read as CSV + path = Path(dataset_B) + try: + dataset_B = pd.read_csv(path) + logging.info(f"Successfully loaded second DataFrame from: {path}") + except Exception as e: + raise ValueError( + f"Failed to read CSV data from '{path}'. " + f"This tool expects CSV/tabular format. " + f"Error: {str(e)}" + ) + else: + raise TypeError( + f"Second_Dataframe must be DataFrame or file path. " + f"Got {type(dataset_B)}" + ) + + # Extract parameters + input_df_lists = [dataset_A, dataset_B] + + logging.info("Information about the first dataset:") + logging.info(dataset_A.info()) + logging.info("\n\nInformation about the second dataset:") + logging.info(dataset_B.info()) + + combined_dfs = combine_dfs(input_df_lists) + logging.info("\n\nInformation about the combined dataset:") + logging.info(combined_dfs.info()) + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + # Check for dataframe output + if "dataframe" in params["outputs"]: + results_dict["dataframe"] = combined_dfs + + # Use centralized save_results function + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + logging.info("Combine DataFrames completed successfully.") + return saved_files + else: + # Return the DataFrame directly for in-memory workflows + logging.info("Returning combined DataFrame for in-memory use") + return combined_dfs + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python combine_dataframes_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + # Run analysis + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + # Display results based on return type + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + else: + print("\nReturned combined DataFrame") + print(f"DataFrame shape: {result.shape}") diff --git a/src/spac/templates/downsample_cells_template.py b/src/spac/templates/downsample_cells_template.py new file mode 100644 index 00000000..761135e0 --- /dev/null +++ b/src/spac/templates/downsample_cells_template.py @@ -0,0 +1,208 @@ +""" +Platform-agnostic Downsample Cells template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Reads outputs configuration from blueprint JSON file. + +Usage +----- +>>> from spac.templates.downsample_cells_template import run_from_json +>>> run_from_json("examples/downsample_cells_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union, Tuple +import pandas as pd +import logging + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.data_utils import downsample_cells +from spac.utils import check_column_name +from spac.templates.template_utils import ( + save_results, + parse_params, + text_to_value, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, +) -> Union[Dict[str, str], pd.DataFrame]: + """ + Execute Downsample Cells analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Dataset": "path/to/dataframe.csv", + "Annotations_List": ["cell_type", "tissue"], + "Number_of_Samples": 1000, + "Stratify_Option": true, + "Random_Selection": true, + "outputs": { + "dataframe": {"type": "file", "name": "dataframe.csv"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If True, saves the downsampled DataFrame + to a CSV file. If False, returns the DataFrame directly for in-memory + workflows. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. All outputs will be saved relative to this directory. + + Returns + ------- + dict or DataFrame + If save_to_disk=True: Dictionary of saved file paths with structure: + { + "dataframe": "path/to/dataframe.csv" + } + If save_to_disk=False: The downsampled DataFrame + + Notes + ----- + Output Structure: + - DataFrame is saved as a single CSV file + - When save_to_disk=False, the DataFrame is returned for programmatic use + + Examples + -------- + >>> # Save results to disk + >>> saved_files = run_from_json("params.json") + >>> print(saved_files["dataframe"]) # Path to saved CSV file + + >>> # Get results in memory + >>> downsampled_df = run_from_json("params.json", save_to_disk=False) + + >>> # Custom output directory + >>> saved = run_from_json("params.json", output_dir="/custom/path") + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + # DataFrames typically use file type per standardized schema + if "outputs" not in params: + params["outputs"] = { + "dataframe": {"type": "file", "name": "dataframe.csv"} + } + + # Load upstream data - could be DataFrame, CSV + upstream_dataset = params["Upstream_Dataset"] + if isinstance(upstream_dataset, pd.DataFrame): + input_dataset = upstream_dataset # Direct DF from previous step + elif isinstance(upstream_dataset, (str, Path)): + try: + input_dataset = pd.read_csv(upstream_dataset) + except Exception as e: + raise ValueError(f"Failed to read CSV from {upstream_dataset}: {e}") + else: + raise TypeError( + f"Upstream_Dataset must be DataFrame or file path. " + f"Got {type(upstream_dataset)}" + ) + + # Extract parameters + annotations = params["Annotations_List"] + n_samples = params["Number_of_Samples"] + stratify = params["Stratify_Option"] + rand = params["Random_Selection"] + combined_col_name = params.get( + "New_Combined_Annotation_Name", "_combined_" + ) + min_threshold = params.get("Minimum_Threshold", 5) + + check_column_name( + combined_col_name, "New Combined Annotation Name" + ) + + down_sampled_dataset = downsample_cells( + input_data=input_dataset, + annotations=annotations, + n_samples=n_samples, + stratify=stratify, + rand=rand, + combined_col_name=combined_col_name, + min_threshold=min_threshold + ) + + logging.info("Downsampled! Processed dataset info:") + logging.info(down_sampled_dataset.info()) + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + # Check for dataframe output + if "dataframe" in params["outputs"]: + results_dict["dataframe"] = down_sampled_dataset + + # Use centralized save_results function + # All file handling and logging is now done by save_results + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + logging.info("Downsample Cells analysis completed successfully.") + return saved_files + else: + # Return the dataframe directly for in-memory workflows + logging.info("Returning DataFrame for in-memory use") + return down_sampled_dataset + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python downsample_cells_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + # Run analysis + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + # Display results based on return type + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + else: + print("\nReturned DataFrame") + print(f"DataFrame shape: {result.shape}") diff --git a/src/spac/templates/hierarchical_heatmap_template.py b/src/spac/templates/hierarchical_heatmap_template.py new file mode 100644 index 00000000..92bec6cc --- /dev/null +++ b/src/spac/templates/hierarchical_heatmap_template.py @@ -0,0 +1,215 @@ +""" +Platform-agnostic Hierarchical Heatmap template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Usage +----- +>>> from spac.templates.hierarchical_heatmap_template import run_from_json +>>> run_from_json("examples/hierarchical_heatmap_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union, List, Optional +import pandas as pd +import matplotlib.pyplot as plt + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.visualization import hierarchical_heatmap +from spac.utils import check_feature +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, + text_to_value, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_results_flag: bool = True, + show_plot: bool = True, + output_dir: Union[str, Path] = None +) -> Union[Dict[str, str], pd.DataFrame]: + """ + Execute Hierarchical Heatmap analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary + save_results_flag : bool, optional + Whether to save results to file. If False, returns the figure and + dataframe directly for in-memory workflows. Default is True. + show_plot : bool, optional + Whether to display the plot. Default is True. + output_dir : str or Path, optional + Directory for outputs. If None, uses params['Output_Directory'] or '.' + + Returns + ------- + dict or DataFrame + If save_results_flag=True: Dictionary of saved file paths + If save_results_flag=False: The mean intensity dataframe + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Load the upstream analysis data + adata = load_input(params["Upstream_Analysis"]) + + # Extract parameters + annotation = params["Annotation"] + layer_to_plot = params.get("Table_to_Visualize", "Original") + features = params.get("Feature_s_", ["All"]) + standard_scale = params.get("Standard_Scale_", "None") + z_score = params.get("Z_Score", "None") + cluster_feature = params.get("Feature_Dendrogram", True) + cluster_annotations = params.get("Annotation_Dendrogram", True) + Figure_Title = params.get("Figure_Title", "Hierarchical Heatmap") + fig_width = params.get("Figure_Width", 8) + fig_height = params.get("Figure_Height", 8) + fig_dpi = params.get("Figure_DPI", 300) + font_size = params.get("Font_Size", 10) + matrix_ratio = params.get("Matrix_Plot_Ratio", 0.8) + swap_axes = params.get("Swap_Axes", False) + rotate_label = params.get("Rotate_Label_", False) + r_h_axis_dendrogram = params.get( + "Horizontal_Dendrogram_Display_Ratio", 0.2 + ) + r_v_axis_dendrogram = params.get( + "Vertical_Dendrogram_Display_Ratio", 0.2 + ) + v_min = params.get("Value_Min", "None") + v_max = params.get("Value_Max", "None") + color_map = params.get("Color_Map", 'seismic') + + # Use check_feature to validate features + if len(features) == 1 and features[0] == "All": + features = None + else: + check_feature(adata, features) + + if not swap_axes: + features = None + + # Use text_to_value for parameter conversions + standard_scale = text_to_value( + standard_scale, to_int=True, param_name='Standard Scale' + ) + layer_to_plot = text_to_value( + layer_to_plot, default_none_text="Original" + ) + z_score = text_to_value(z_score, param_name='Z Score') + vmin = text_to_value( + v_min, default_none_text="none", to_float=True, + param_name="Value Min" + ) + vmax = text_to_value( + v_max, default_none_text="none", to_float=True, + param_name="Value Max" + ) + + fig, ax = plt.subplots() + plt.rcParams.update({'font.size': font_size}) + fig.set_size_inches(fig_width, fig_height) + fig.set_dpi(fig_dpi) + + mean_intensity, clustergrid, dendrogram_data = hierarchical_heatmap( + adata, + annotation=annotation, + features=features, + layer=layer_to_plot, + cluster_feature=cluster_feature, + cluster_annotations=cluster_annotations, + standard_scale=standard_scale, + z_score=z_score, + swap_axes=swap_axes, + rotate_label=rotate_label, + figsize=(fig_width, fig_height), + dendrogram_ratio=(r_h_axis_dendrogram, r_v_axis_dendrogram), + vmin=vmin, + vmax=vmax, + cmap=color_map + ) + print("Printing mean intensity data.") + print(mean_intensity) + print() + print("Printing dendrogram data.") + for data in dendrogram_data: + print(data) + print(dendrogram_data[data]) + + # Ensure the mean_intensity index matches phenograph clusters + row_clusters = adata.obs[annotation].astype(str).unique() + mean_intensity[annotation] = mean_intensity.index.astype(str) + + # Reorder columns to move 'clusters' to the first position + cols = mean_intensity.columns.tolist() + cols = [annotation] + [col for col in cols if col != annotation] + mean_intensity = mean_intensity[cols] + + # Show the modified plot + clustergrid.ax_heatmap.set_title(Figure_Title) + clustergrid.height = fig_height * matrix_ratio + clustergrid.width = fig_width * matrix_ratio + plt.close(1) + + if show_plot: + plt.show() + + # Handle results based on save_results_flag + if save_results_flag: + # Prepare results dictionary based on outputs config + results_dict = {} + + # Package figure in a dictionary for directory saving + # This ensures it's saved in a directory per standardized schema + if "figures" in params.get("outputs", {}): + results_dict["figures"] = {"hierarchical_heatmap": clustergrid.fig} + + # Check for dataframe output + if "dataframe" in params.get("outputs", {}): + results_dict["dataframe"] = mean_intensity + + # Use centralized save_results function + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + print("Hierarchical Heatmap completed successfully.") + return saved_files + else: + # Return the dataframe directly for in-memory workflows + print("Returning mean intensity dataframe (not saving to file)") + return mean_intensity + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python hierarchical_heatmap_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + result = run_from_json(sys.argv[1], output_dir=output_dir) + + if isinstance(result, dict): + print("\nOutput files:") + for filename, filepath in result.items(): + if isinstance(filepath, list): + print(f" {filename}: {len(filepath)} files in directory") + else: + print(f" {filename}: {filepath}") + else: + print("\nReturned mean intensity dataframe") diff --git a/src/spac/templates/histogram_template.py b/src/spac/templates/histogram_template.py new file mode 100644 index 00000000..0a3924d4 --- /dev/null +++ b/src/spac/templates/histogram_template.py @@ -0,0 +1,349 @@ +""" +Platform-agnostic Histogram template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Reads outputs configuration from blueprint JSON file. + +Usage +----- +>>> from spac.templates.histogram_template import run_from_json +>>> run_from_json("examples/histogram_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union, Optional, Tuple, List +import pandas as pd +import matplotlib.pyplot as plt +import seaborn as sns +import logging + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.visualization import histogram +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, + text_to_value, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + show_plot: bool = False, + output_dir: str = None, +) -> Union[Dict[str, Union[str, List[str]]], Tuple[Any, pd.DataFrame]]: + """ + Execute Histogram analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Analysis": "path/to/data.pickle", + "Plot_By": "Annotation", + "Annotation": "cell_type", + ... + "outputs": { + "dataframe": {"type": "file", "name": "dataframe.csv"}, + "figures": {"type": "directory", "name": "figures_dir"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If False, returns the figure and + dataframe directly for in-memory workflows. Default is True. + show_plot : bool, optional + Whether to display the plot. Default is False. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. + + Returns + ------- + dict or tuple + If save_to_disk=True: Dictionary of saved file paths + If save_to_disk=False: Tuple of (figure, dataframe) + """ + # Set up logging + logging.basicConfig(level=logging.INFO) + logger = logging.getLogger(__name__) + + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + if "outputs" not in params: + params["outputs"] = { + "dataframe": {"type": "file", "name": "dataframe.csv"}, + "figures": {"type": "directory", "name": "figures_dir"} + } + + # Load the upstream analysis data + adata = load_input(params["Upstream_Analysis"]) + + # Extract parameters + feature = text_to_value(params.get("Feature", "None")) + annotation = text_to_value(params.get("Annotation", "None")) + layer = params.get("Table_", "Original") + group_by = params.get("Group_by", "None") + together = params.get("Together", True) + fig_width = params.get("Figure_Width", 8) + fig_height = params.get("Figure_Height", 6) + font_size = params.get("Font_Size", 12) + fig_dpi = params.get("Figure_DPI", 300) + legend_location = params.get("Legend_Location", "best") + legend_in_figure = params.get("Legend_in_Figure", False) + take_X_log = params.get("Take_X_Log", False) + take_Y_log = params.get("Take_Y_log", False) + multiple = params.get("Multiple", "dodge") + shrink = params.get("Shrink_Number", 1) + bins = params.get("Bins", "auto") + alpha = params.get("Bin_Transparency", 0.75) + stat = params.get("Stat", "count") + x_rotate = params.get("X_Axis_Label_Rotation", 0) + histplot_by = params.get("Plot_By", "Annotation") + + # Close all existing figures to prevent extra plots + plt.close('all') + existing_fig_nums = plt.get_fignums() + + plt.rcParams.update({'font.size': font_size}) + + # Adjust feature and annotation based on histplot_by + if histplot_by == "Annotation": + feature = None + else: + annotation = None + + # If both feature and annotation are None, set default + if feature is None and annotation is None: + if histplot_by == "Annotation": + if adata.obs.columns.size > 0: + annotation = adata.obs.columns[0] + logger.info( + f'No annotation specified. Using the first annotation ' + f'"{annotation}" as default.' + ) + else: + raise ValueError( + 'No annotations available in adata.obs to plot.' + ) + else: + if adata.var_names.size > 0: + feature = adata.var_names[0] + logger.info( + f'No feature specified. Using the first feature ' + f'"{feature}" as default.' + ) + else: + raise ValueError( + 'No features available in adata.var_names to plot.' + ) + + # Validate and set bins + if feature is not None: + bins = text_to_value( + bins, + default_none_text="auto", + to_int=True, + param_name="bins" + ) + if bins is None: + num_rows = adata.X.shape[0] + bins = max(int(2 * (num_rows ** (1/3))), 1) + elif bins <= 0: + raise ValueError( + f'Bins should be a positive integer. Received "{bins}"' + ) + elif annotation is not None: + if take_X_log: + take_X_log = False + logger.warning( + "Take X log should only apply to feature. " + "Setting Take X Log to False." + ) + if bins != 'auto': + bins = 'auto' + logger.warning( + "Bin number should only apply to feature. " + "Setting bin number calculation to auto." + ) + + if (x_rotate < 0) or (x_rotate > 360): + raise ValueError( + f'The X label rotation should fall within 0 to 360 degree. ' + f'Received "{x_rotate}".' + ) + + # Initialize the x-variable before the loop + if histplot_by == "Annotation": + x_var = annotation + else: + x_var = feature + + result = histogram( + adata=adata, + feature=feature, + annotation=annotation, + layer=text_to_value(layer, "Original"), + group_by=text_to_value(group_by), + together=together, + ax=None, + x_log_scale=take_X_log, + y_log_scale=take_Y_log, + multiple=multiple, + shrink=shrink, + bins=bins, + alpha=alpha, + stat=stat + ) + + fig = result["fig"] + axs = result["axs"] + df_counts = result["df"] + + # Set figure size and dpi + fig.set_size_inches(fig_width, fig_height) + fig.set_dpi(fig_dpi) + + # Ensure axes is a list + if isinstance(axs, list): + axes = axs + else: + axes = [axs] + + # Close any extra figures created during the histogram call + fig_nums_after = plt.get_fignums() + new_fig_nums = [ + num for num in fig_nums_after if num not in existing_fig_nums + ] + histogram_fig_num = fig.number + + for num in new_fig_nums: + if num != histogram_fig_num: + plt.close(plt.figure(num)) + logger.debug(f"Closed extra figure {num}") + + # Process each axis + for ax in axes: + if feature: + logger.info(f'Plotting Feature: "{feature}"') + if ax.get_legend() is not None: + if legend_in_figure: + sns.move_legend(ax, legend_location) + else: + sns.move_legend( + ax, legend_location, bbox_to_anchor=(1, 1) + ) + + # Rotate x labels + ax.tick_params(axis='x', rotation=x_rotate) + + # Set titles based on group_by + if text_to_value(group_by): + if together: + for ax in axes: + ax.set_title( + f'Histogram of "{x_var}" grouped by "{group_by}"' + ) + else: + # compute unique groups directly from adata.obs. + unique_groups = adata.obs[ + text_to_value(group_by) + ].dropna().unique() + if len(axes) != len(unique_groups): + logger.warning( + "Number of axes does not match number of " + "groups. Titles may not correspond correctly." + ) + for ax, grp in zip(axes, unique_groups): + ax.set_title( + f'Histogram of "{x_var}" for group: "{grp}"' + ) + else: + for ax in axes: + ax.set_title(f'Count plot of "{x_var}"') + + plt.tight_layout() + + logger.info("Displaying top 10 rows of histogram dataframe:") + print(df_counts.head(10)) + + if show_plot: + plt.show() + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + # Check for dataframe output + if "dataframe" in params["outputs"]: + results_dict["dataframe"] = df_counts + + # Check for figures output + if "figures" in params["outputs"]: + results_dict["figures"] = {"histogram": fig} + + # Use centralized save_results function + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + plt.close('all') + + logger.info("Histogram analysis completed successfully.") + return saved_files + else: + # Return the figure and dataframe directly for in-memory workflows + logger.info("Returning figure and dataframe for in-memory use") + return fig, df_counts + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python histogram_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + else: + print("\nReturned figure and dataframe") diff --git a/src/spac/templates/interactive_spatial_plot_template.py b/src/spac/templates/interactive_spatial_plot_template.py new file mode 100644 index 00000000..e63e0df2 --- /dev/null +++ b/src/spac/templates/interactive_spatial_plot_template.py @@ -0,0 +1,241 @@ +""" +Platform-agnostic Interactive Spatial Plot template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Follows standardized output schema where HTML files are saved as a directory. + +Usage +----- +>>> from spac.templates.interactive_spatial_plot_template import run_from_json +>>> run_from_json("examples/interactive_spatial_plot_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union, List, Optional +import pandas as pd +import plotly.io as pio + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +# Import SPAC functions from NIDAP template +from spac.visualization import interactive_spatial_plot +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, + text_to_value, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, +) -> Union[Dict[str, Union[str, List[str]]], None]: + """ + Execute Interactive Spatial Plot analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Analysis": "path/to/data.pickle", + "Color_By": "Annotation", + "Annotation_s_to_Highlight": ["renamed_phenotypes"], + "outputs": { + "html": {"type": "directory", "name": "html_dir"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If False, returns None as plots are + shown interactively. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. All outputs will be saved relative to this directory. + + Returns + ------- + dict or None + If save_to_disk=True: Dictionary of saved file paths with structure: + {"html": ["path/to/html_dir/plot1.html", ...]} + If save_to_disk=False: None (plots are shown interactively) + + Notes + ----- + Output Structure: + - HTML files are saved in a directory (standardized for HTML outputs) + - When save_to_disk=False, plots are shown interactively + + Examples + -------- + >>> # Save results to disk + >>> saved_files = run_from_json("params.json") + >>> print(saved_files["html"]) # List of HTML file paths + >>> # ['./html_dir/plot_1.html', './html_dir/plot_2.html'] + + >>> # Display plots interactively without saving + >>> run_from_json("params.json", save_to_disk=False) + + >>> # Custom output directory + >>> saved = run_from_json("params.json", output_dir="/custom/path") + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + # HTML uses directory type per standardized schema + if "outputs" not in params: + params["outputs"] = { + "html": {"type": "directory", "name": "html_dir"} + } + + # Load the upstream analysis data + adata = load_input(params["Upstream_Analysis"]) + + # Extract parameters + color_by = params["Color_By"] + annotations = params.get("Annotation_s_to_Highlight", [""]) + feature = params.get("Feature_to_Highlight", "None") + layer = params.get("Table", "Original") + + dot_size = params.get("Dot_Size", 1.5) + dot_transparency = params.get("Dot_Transparency", 0.75) + color_map = params.get("Feature_Color_Scale", "balance") + desired_width_in = params.get("Figure_Width", 6) + desired_height_in = params.get("Figure_Height", 4) + dpi = params.get("Figure_DPI", 200) + Font_size = params.get("Font_Size", 12) + stratify_by = text_to_value( + params.get("Stratify_By", "None"), + param_name="Stratify By" + ) + + defined_color_map = text_to_value( + params.get("Define_Label_Color_Mapping", "None"), + param_name="Define Label Color Mapping" + ) + + cmin = params.get("Lower_Colorbar_Bound", 999) + cmax = params.get("Upper_Colorbar_Bound", -999) + + flip_y = params.get("Flip_Vertical_Axis", False) + + # Process parameters + feature = text_to_value(feature) + if color_by == "Annotation": + feature = None + if len(annotations) == 0: + raise ValueError( + 'Please set at least one value in the ' + '"Annotation(s) to Highlight" parameter' + ) + else: + annotations = None + if feature is None: + raise ValueError('Please set the "Feature to Highlight" parameter.') + + layer = text_to_value(layer, "Original") + + # Execute the interactive spatial plot + result_list = interactive_spatial_plot( + adata=adata, + annotations=annotations, + feature=feature, + layer=layer, + dot_size=dot_size, + dot_transparency=dot_transparency, + feature_colorscale=color_map, + figure_width=desired_width_in, + figure_height=desired_height_in, + figure_dpi=dpi, + font_size=Font_size, + stratify_by=stratify_by, + defined_color_map=defined_color_map, + reverse_y_axis=flip_y, + cmin=cmin, + cmax=cmax + ) + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare HTML outputs as a dictionary for directory saving + html_dict = {} + + for result in result_list: + image_name = result['image_name'] + image_object = result['image_object'] + + # Show the plot (as in NIDAP template) + image_object.show() + + # Convert to HTML + html_content = pio.to_html(image_object, full_html=True) + + # Add to dictionary with appropriate name + html_dict[image_name] = html_content + + # Prepare results dictionary based on outputs config + results_dict = {} + if "html" in params["outputs"]: + results_dict["html"] = html_dict + + # Use centralized save_results function + # All file handling and logging is now done by save_results + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + print( + f"Interactive Spatial Plot completed → " + f"{saved_files.get('html', [])}" + ) + return saved_files + else: + # Just show the plots without saving + for result in result_list: + result['image_object'].show() + + print("Displayed interactive plots without saving") + return None + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python interactive_spatial_plot_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + else: + print("\nDisplayed interactive plots") diff --git a/src/spac/templates/load_csv_files_template.py b/src/spac/templates/load_csv_files_template.py new file mode 100644 index 00000000..0bb7cf87 --- /dev/null +++ b/src/spac/templates/load_csv_files_template.py @@ -0,0 +1,94 @@ +""" +Platform-agnostic Load CSV Files template converted from NIDAP. +Handles both Galaxy (list of file paths) and NIDAP (directory path) inputs. + +Usage +----- +>>> from spac.templates.load_csv_files_template import run_from_json +>>> run_from_json("examples/load_csv_params.json") +""" +import sys +from pathlib import Path +from typing import Any, Dict, Union +import pandas as pd +import logging + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.templates.template_utils import ( + save_results, + parse_params, + load_csv_files, +) + +logger = logging.getLogger(__name__) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, +) -> Union[Dict[str, str], pd.DataFrame]: + """ + Execute Load CSV Files analysis with parameters from JSON. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file or parameter dictionary + save_to_disk : bool, optional + Whether to save results to disk. Default is True. + output_dir : str, optional + Base directory for outputs. + + Returns + ------- + dict or DataFrame + If save_to_disk=True: Dictionary of saved file paths + If save_to_disk=False: The processed DataFrame + """ + params = parse_params(json_path) + + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + if "outputs" not in params: + params["outputs"] = {"dataframe": {"type": "file", "name": "dataframe.csv"}} + + # Load configuration + files_config = pd.read_csv(params["CSV_Files_Configuration"]) + + # Load and combine CSV files using centralized utility + final_df = load_csv_files( + csv_input=params["CSV_Files"], + files_config=files_config, + string_columns=params.get("String_Columns", []) + ) + + logger.info(f"Load CSV Files completed: {final_df.shape}") + + # Save or return results + if save_to_disk: + saved_files = save_results( + results={"dataframe": final_df}, + params=params, + output_base_dir=output_dir + ) + return saved_files + else: + return final_df + + +if __name__ == "__main__": + if len(sys.argv) < 2: + print("Usage: python load_csv_files_template.py [output_dir]") + sys.exit(1) + + logging.basicConfig(level=logging.INFO) + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + result = run_from_json(sys.argv[1], output_dir=output_dir) + + if isinstance(result, dict): + for key, path in result.items(): + print(f"{key}: {path}") diff --git a/src/spac/templates/manual_phenotyping_template.py b/src/spac/templates/manual_phenotyping_template.py new file mode 100644 index 00000000..85f11024 --- /dev/null +++ b/src/spac/templates/manual_phenotyping_template.py @@ -0,0 +1,236 @@ +#!/usr/bin/env python3 +""" +Platform-agnostic Manual Phenotyping template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Reads outputs configuration from blueprint JSON file. + +Usage +----- +>>> from spac.templates.manual_phenotyping_template import run_from_json +>>> run_from_json("examples/manual_phenotyping_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union, List, Optional +import pandas as pd +import logging + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.phenotyping import assign_manual_phenotypes +from spac.templates.template_utils import ( + save_results, + parse_params, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, +) -> Union[Dict[str, str], pd.DataFrame]: + """ + Execute Manual Phenotyping analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Dataset": "path/to/dataframe.csv", + "Phenotypes_Code": "path/to/phenotypes.csv", + "Classification_Column_Prefix": "", + "Classification_Column_Suffix": "", + "Allow_Multiple_Phenotypes": true, + "Manual_Annotation_Name": "manual_phenotype", + "outputs": { + "dataframe": {"type": "file", "name": "dataframe.csv"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If True, saves the DataFrame with + phenotype annotations to a CSV file. If False, returns the DataFrame + directly for in-memory workflows. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. All outputs will be saved relative to this directory. + + Returns + ------- + dict or DataFrame + If save_to_disk=True: Dictionary of saved file paths with structure: + { + "dataframe": "path/to/dataframe.csv" + } + If save_to_disk=False: The processed DataFrame with phenotype annotations + + Notes + ----- + Output Structure: + - DataFrame is saved as a single CSV file + - When save_to_disk=False, the DataFrame is returned for programmatic use + + Examples + -------- + >>> # Save results to disk + >>> saved_files = run_from_json("params.json") + >>> print(saved_files["dataframe"]) # Path to saved CSV file + + >>> # Get results in memory + >>> phenotyped_df = run_from_json("params.json", save_to_disk=False) + + >>> # Custom output directory + >>> saved = run_from_json("params.json", output_dir="/custom/path") + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + if "outputs" not in params: + params["outputs"] = { + "dataframe": {"type": "file", "name": "dataframe.csv"} + } + + # Load upstream data - DataFrame or CSV file + upstream = params['Upstream_Dataset'] + if isinstance(upstream, pd.DataFrame): + dataframe = upstream # Direct DataFrame from previous step + elif isinstance(upstream, (str, Path)): + # Galaxy passes .dat files, but they contain CSV data + # Don't check extension - directly read as CSV + path = Path(upstream) + try: + dataframe = pd.read_csv(path) + logging.info(f"Successfully loaded CSV data from: {path}") + except Exception as e: + raise ValueError( + f"Failed to read CSV data from '{path}'. " + f"This tool expects CSV/tabular format. " + f"Error: {str(e)}" + ) + else: + raise TypeError( + f"Upstream_Dataset must be DataFrame or file path. " + f"Got {type(upstream)}" + ) + + # Load phenotypes code - DataFrame or CSV file + phenotypes_input = params['Phenotypes_Code'] + if isinstance(phenotypes_input, pd.DataFrame): + phenotypes = phenotypes_input + elif isinstance(phenotypes_input, (str, Path)): + # Galaxy passes .dat files, but they contain CSV data + # Don't check extension - directly read as CSV + path = Path(phenotypes_input) + try: + phenotypes = pd.read_csv(path) + logging.info(f"Successfully loaded phenotypes from: {path}") + except Exception as e: + raise ValueError( + f"Failed to read CSV data from '{path}'. " + f"This tool expects CSV/tabular format. " + f"Error: {str(e)}" + ) + else: + raise TypeError( + f"Phenotypes_Code must be DataFrame or file path. " + f"Got {type(phenotypes_input)}" + ) + + # Extract parameters + prefix = params.get('Classification_Column_Prefix', '') + suffix = params.get('Classification_Column_Suffix', '') + multiple = params.get('Allow_Multiple_Phenotypes', True) + manual_annotation = params.get('Manual_Annotation_Name', 'manual_phenotype') + + logging.info(f"Phenotypes configuration:\n{phenotypes}") + + # returned_dic is not used, but copy from original NIDAP logic + returned_dic = assign_manual_phenotypes( + dataframe, + phenotypes, + prefix=prefix, + suffix=suffix, + annotation=manual_annotation, + multiple=multiple + ) + + # The dataframe changes in place + + # Print summary statistics + phenotype_counts = dataframe[manual_annotation].value_counts() + logging.info(f"\nPhenotype distribution:\n{phenotype_counts}") + + logging.info("\nManual Phenotyping completed successfully.") + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + # Check for dataframe output + if "dataframe" in params["outputs"]: + results_dict["dataframe"] = dataframe + + # Use centralized save_results function + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + logging.info("Manual Phenotyping analysis completed successfully.") + return saved_files + else: + # Return the DataFrame directly for in-memory workflows + logging.info("Returning DataFrame for in-memory use") + return dataframe + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python manual_phenotyping_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + # Run analysis + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + # Display results based on return type + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + else: + print("\nReturned DataFrame") + print(f"DataFrame shape: {result.shape}") diff --git a/src/spac/templates/nearest_neighbor_calculation_template.py b/src/spac/templates/nearest_neighbor_calculation_template.py new file mode 100644 index 00000000..45dabb71 --- /dev/null +++ b/src/spac/templates/nearest_neighbor_calculation_template.py @@ -0,0 +1,207 @@ +""" +Platform-agnostic Nearest Neighbor Calculation template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Usage +----- +>>> from spac.templates.nearest_neighbor_calculation_template import ( +... run_from_json +... ) +>>> run_from_json("examples/nearest_neighbor_calculation_params.json") +""" +import logging +import sys +from pathlib import Path +from typing import Any, Dict, Union + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.spatial_analysis import calculate_nearest_neighbor +from spac.templates.template_utils import ( + load_input, + parse_params, + save_results, + text_to_value, +) + +# Set up logging +logger = logging.getLogger(__name__) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: Union[str, Path] = None +) -> Union[Dict[str, str], Any]: + """ + Execute Nearest Neighbor Calculation analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Analysis": "path/to/input.pickle", + "Annotation": "cell_type", + "ImageID": "None", + "Nearest_Neighbor_Associated_Table": "spatial_distance", + "Verbose": true, + "outputs": { + "analysis": {"type": "file", "name": "output.pickle"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If False, returns the adata object + directly for in-memory workflows. Default is True. + output_dir : str or Path, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. All outputs will be saved relative to this directory. + + Returns + ------- + dict or AnnData + If save_to_disk=True: Dictionary of saved file paths with structure: + {"analysis": "path/to/output.pickle"} + If save_to_disk=False: The processed AnnData object + + Notes + ----- + Output Structure: + - Analysis output is saved as a single pickle file (standardized for analysis outputs) + - When save_to_disk=False, the AnnData object is returned for programmatic use + + Examples + -------- + >>> # Save results to disk + >>> saved_files = run_from_json("params.json") + >>> print(saved_files["analysis"]) # Path to saved pickle file + >>> # './output.pickle' + + >>> # Get results in memory for further processing + >>> adata = run_from_json("params.json", save_to_disk=False) + >>> # Can now work with adata object directly + + >>> # Custom output directory + >>> saved = run_from_json("params.json", output_dir="/custom/path") + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + # Analysis uses file type per standardized schema + if "outputs" not in params: + params["outputs"] = { + "analysis": {"type": "file", "name": "output.pickle"} + } + + # Load the upstream analysis data + adata = load_input(params["Upstream_Analysis"]) + + # Extract parameters + annotation = params["Annotation"] + spatial_associated_table = "spatial" + imageid = params.get("ImageID", "None") + label = params.get( + "Nearest_Neighbor_Associated_Table", "spatial_distance" + ) + verbose = params.get("Verbose", True) + + # Convert any string "None" to actual None for Python + imageid = text_to_value(imageid, default_none_text="None") + + logger.info( + "Running `calculate_nearest_neighbor` with the following parameters:" + ) + logger.info(f" annotation: {annotation}") + logger.info(f" spatial_associated_table: {spatial_associated_table}") + logger.info(f" imageid: {imageid}") + logger.info(f" label: {label}") + logger.info(f" verbose: {verbose}") + + # Perform the nearest neighbor calculation + calculate_nearest_neighbor( + adata=adata, + annotation=annotation, + spatial_associated_table=spatial_associated_table, + imageid=imageid, + label=label, + verbose=verbose + ) + + logger.info("Nearest neighbor calculation complete.") + logger.info(f"adata.obsm keys: {list(adata.obsm.keys())}") + if label in adata.obsm: + logger.info( + f"Preview of adata.obsm['{label}']:\n{adata.obsm[label].head()}" + ) + + logger.info(f"{adata}") + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + if "analysis" in params["outputs"]: + results_dict["analysis"] = adata + + # Use centralized save_results function + # All file handling and logging is now done by save_results + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + logger.info( + f"Nearest Neighbor Calculation completed → " + f"{saved_files['analysis']}" + ) + return saved_files + else: + # Return the adata object directly for in-memory workflows + logger.info("Returning AnnData object (not saving to file)") + return adata + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python nearest_neighbor_calculation_template.py " + " [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + # Run analysis + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + # Display results based on return type + if isinstance(result, dict): + print("\nOutput files:") + for key, path in result.items(): + print(f" {key}: {path}") + else: + print("\nReturned AnnData object for in-memory use") + print(f"AnnData: {result}") + print(f"Shape: {result.shape}") diff --git a/src/spac/templates/neighborhood_profile_template.py b/src/spac/templates/neighborhood_profile_template.py new file mode 100644 index 00000000..fabe5e21 --- /dev/null +++ b/src/spac/templates/neighborhood_profile_template.py @@ -0,0 +1,272 @@ +""" +Platform-agnostic Neighborhood Profile template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Usage +----- +>>> from spac.templates.neighborhood_profile_template import run_from_json +>>> run_from_json("examples/neighborhood_profile_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union, List, Optional, Tuple +import pandas as pd +import numpy as np + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.spatial_analysis import neighborhood_profile +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, + text_to_value, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: Union[str, Path] = None +) -> Union[Dict[str, str], Dict[Tuple[str, str], pd.DataFrame]]: + """ + Execute Neighborhood Profile analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary + save_to_disk : bool, optional + Whether to save results to file. If False, returns the dataframes + directly for in-memory workflows. Default is True. + output_dir : str or Path, optional + Output directory for results. If None, uses params['Output_Directory'] or '.' + + Returns + ------- + dict + If save_to_disk=True: Dictionary of saved file paths + If save_to_disk=False: Dictionary of (anchor, neighbor) tuples + to DataFrames + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + # Neighborhood Profile dataframes use directory type per special case in template_utils + if "outputs" not in params: + params["outputs"] = { + "dataframe": {"type": "directory", "name": "dataframe_dir"} + } + + # Load the upstream analysis data + adata = load_input(params["Upstream_Analysis"]) + + # Extract parameters + cell_types_annotation = params["Annotation_of_interest"] + bins = params["Bins"] + slide_names = params.get("Stratify_By", "None") + normalization = None + output_table = "neighborhood_profile" + + anchor_neighbor_list = params["Anchor_Neighbor_List"] + anchor_neighbor_list = [ + tuple(map(str.strip, item.split(";"))) + for item in anchor_neighbor_list + ] + + # Call the spatial umap calculation + bins = [float(radius) for radius in bins] + slide_names = text_to_value(slide_names) + + neighborhood_profile( + adata, + phenotypes=cell_types_annotation, + distances=bins, + regions=slide_names, + spatial_key="spatial", + normalize=normalization, + associated_table_name=output_table + ) + + print(adata) + print(adata.obsm[output_table].shape) + print(adata.uns[output_table]) + + dataframes, filenames = neighborhood_profiles_for_pairs( + adata, + cell_types_annotation, + slide_names, + bins, + anchor_neighbor_list, + output_table + ) + + # Handle results based on save_to_disk flag + if save_to_disk: + # Package dataframes in a dictionary for directory saving + # This ensures they're saved in a directory per standardized schema + results_dict = {} + + # Create a dictionary of dataframes with their filenames as keys + dataframe_dict = {} + for (anchor_label, neighbor_label), filename in zip( + dataframes.keys(), filenames + ): + df = dataframes[(anchor_label, neighbor_label)] + # Remove .csv extension as save_results will add it + key = filename.replace('.csv', '') + dataframe_dict[key] = df + + # Store in results with "dataframe" key to match outputs config + if "dataframe" in params["outputs"]: + results_dict["dataframe"] = dataframe_dict + + # Use centralized save_results function + # All file handling and logging is now done by save_results + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + print(f"Neighborhood Profile completed → {len(saved_files.get('dataframe', []))} files") + return saved_files + else: + # Return the dataframes directly for in-memory workflows + print("Returning dataframes (not saving to file)") + return dataframes + + +# Global imports and functions included below + +def neighborhood_profiles_for_pairs( + adata, + cell_types_annotation, + slide_names, + bins, + anchor_neighbor_list, + output_table +): + """ + Compute neighborhood profiles for all anchor-neighbor pairs and return + a tuple containing a dictionary of DataFrames and a list of filenames + for saving. + + Parameters + ---------- + adata : AnnData + The AnnData object containing spatial and phenotypic data. + + cell_types_annotation : str + The column name in adata.obs containing the cell phenotype labels. + + slide_names : str + The column name in adata.obs containing the slide names. + + bins : list + List of increasing distance bins. + + anchor_neighbor_list : list of tuples + List of (anchor_label, neighbor_label) pairs. + + output_table : str + The key in adata.obsm containing neighborhood profile data. + + Returns + ------- + tuple + - A dictionary of DataFrames for each (anchor, neighbor) pair. + - A list of filenames where each DataFrame should be saved. + """ + + dataframes = {} + filenames = [] + + # Get the array of neighbor labels + neighbor_labels = adata.uns[output_table]["labels"] + + for anchor_label, neighbor_label in anchor_neighbor_list: + # Create bin labels with the neighbor type + bins_with_ranges = [ + f"{neighbor_label}_{bins[i]}-{bins[i+1]}" + for i in range(len(bins) - 1) + ] + + # Find the index of the requested neighbor label + neighbor_index = np.where(neighbor_labels == neighbor_label)[0] + + if len(neighbor_index) == 0: + raise ValueError( + f"Neighbor label '{neighbor_label}' not found in " + f"{output_table} labels." + ) + + neighbor_index = neighbor_index[0] # Extract the first index + + # Extract the neighborhood profile for the specific neighbor + # Shape: (n_cells, n_bins) + profile_data = adata.obsm[output_table][:, neighbor_index, :] + + # Construct DataFrame + df = pd.DataFrame(profile_data, columns=bins_with_ranges) + + # Add cell phenotype labels and slide names + df.insert( + 0, cell_types_annotation, + adata.obs[cell_types_annotation].values + ) + if slide_names is not None: + df.insert(0, slide_names, adata.obs[slide_names].values) + + # Filter for the anchor cell type + filtered_df = df[df[cell_types_annotation] == anchor_label] + + # Generate a filename for saving + filename = f"anchor_{anchor_label}_neighbor_{neighbor_label}.csv" + + # Store the DataFrame and filename + dataframes[(anchor_label, neighbor_label)] = filtered_df + filenames.append(filename) + + return dataframes, filenames + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python neighborhood_profile_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + # Run analysis + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths[:3]: # Show first 3 files + print(f" - {path}") + if len(paths) > 3: + print(f" ... and {len(paths) - 3} more files") + else: + print(f" {key}: {paths}") + else: + print("\nReturned dataframes for in-memory use") diff --git a/src/spac/templates/normalize_batch_template.py b/src/spac/templates/normalize_batch_template.py new file mode 100644 index 00000000..73ef838e --- /dev/null +++ b/src/spac/templates/normalize_batch_template.py @@ -0,0 +1,187 @@ +""" +Platform-agnostic Normalize Batch template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Reads outputs configuration from blueprint JSON file. + +Usage +----- +>>> from spac.templates.normalize_batch_template import run_from_json +>>> run_from_json("examples/normalize_batch_params.json") +""" +import json +import sys +import logging +from pathlib import Path +from typing import Any, Dict, Union + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.transformations import batch_normalize +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, +) + +logger = logging.getLogger(__name__) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, +) -> Union[Dict[str, str], Any]: + """ + Execute Normalize Batch analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Analysis": "path/to/data.pickle", + "Annotation": "batch_column", + "Input_Table_Name": "Original", + "Output_Table_Name": "batch_normalized_table", + "Normalization_Method": "median", + "Take_Log": false, + "Need_Normalization": true, + "outputs": { + "analysis": {"type": "file", "name": "output.pickle"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If False, returns the adata object + directly for in-memory workflows. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. + + Returns + ------- + dict or AnnData + If save_to_disk=True: Dictionary of saved file paths with structure: + {"analysis": "path/to/output.pickle"} + If save_to_disk=False: The processed AnnData object + + Notes + ----- + Output Structure: + - Analysis output is saved as a single pickle file + - When save_to_disk=False, the AnnData object is returned for programmatic use + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + if "outputs" not in params: + params["outputs"] = { + "analysis": {"type": "file", "name": "output.pickle"} + } + + # Load the upstream analysis data + all_data = load_input(params["Upstream_Analysis"]) + + # Extract parameters + annotation = params["Annotation"] + input_layer = params.get("Input_Table_Name", "Original") + + if input_layer == 'Original': + input_layer = None + + output_layer = params.get("Output_Table_Name", "batch_normalized_table") + method = params.get("Normalization_Method", "median") + take_log = params.get("Take_Log", False) + + need_normalization = params.get("Need_Normalization", False) + if need_normalization: + batch_normalize( + adata=all_data, + annotation=annotation, + input_layer=input_layer, + output_layer=output_layer, + method=method, + log=take_log + ) + + logger.info( + f"Statistics of original data:\n{all_data.to_df().describe()}" + ) + logger.info( + f"Statistics of layer data:\n" + f"{all_data.to_df(layer=output_layer).describe()}" + ) + else: + logger.info( + f"Statistics of original data:\n{all_data.to_df().describe()}" + ) + + logger.info(f"Current Analysis contains:\n{all_data}") + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + if "analysis" in params["outputs"]: + results_dict["analysis"] = all_data + + # Use centralized save_results function + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + logger.info("Normalize Batch analysis completed successfully.") + return saved_files + else: + # Return the adata object directly for in-memory workflows + logger.info("Returning AnnData object for in-memory use") + return all_data + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python normalize_batch_template.py " + "[output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + else: + print("\nReturned AnnData object") diff --git a/src/spac/templates/phenograph_clustering_template.py b/src/spac/templates/phenograph_clustering_template.py new file mode 100644 index 00000000..99d84f62 --- /dev/null +++ b/src/spac/templates/phenograph_clustering_template.py @@ -0,0 +1,197 @@ +""" +Platform-agnostic Phenograph Clustering template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Follows standardized output schema where analysis is saved as a file. + +Usage +----- +>>> from spac.templates.phenograph_clustering_template import run_from_json +>>> run_from_json("examples/phenograph_clustering_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union +import pandas as pd + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.transformations import phenograph_clustering +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, + text_to_value, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, +) -> Union[Dict[str, str], Any]: + """ + Execute Phenograph Clustering analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Analysis": "path/to/data.pickle", + "Table_to_Process": "Original", + "K_Nearest_Neighbors": 30, + "outputs": { + "analysis": {"type": "file", "name": "output.pickle"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If False, returns the AnnData object + directly for in-memory workflows. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. + + Returns + ------- + dict or AnnData + If save_to_disk=True: Dictionary of saved file paths with structure: + {"analysis": "path/to/output.pickle"} + If save_to_disk=False: The processed AnnData object + + Notes + ----- + Output Structure: + - Analysis output is saved as a single pickle file (standardized for analysis outputs) + - When save_to_disk=False, the AnnData object is returned for programmatic use + + Examples + -------- + >>> # Save results to disk + >>> saved_files = run_from_json("params.json") + >>> print(saved_files["analysis"]) # Path to saved pickle file + >>> # './output.pickle' + + >>> # Get results in memory + >>> adata = run_from_json("params.json", save_to_disk=False) + >>> # Can now work with adata object directly + + >>> # Custom output directory + >>> saved = run_from_json("params.json", output_dir="/custom/path") + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + # Analysis uses file type per standardized schema + if "outputs" not in params: + params["outputs"] = { + "analysis": {"type": "file", "name": "output.pickle"} + } + + # Load the upstream analysis data + adata = load_input(params["Upstream_Analysis"]) + + # Extract parameters + Layer_name = params.get("Table_to_Process", "Original") + K_cluster = params.get("K_Nearest_Neighbors", 30) + Seed = params.get("Seed", 42) + resolution_parameter = params.get("Resolution_Parameter", 1.0) + output_annotation_name = params.get( + "Output_Annotation_Name", "phenograph" + ) + # Used only in HPC profiling mode (not implemented in SPAC) + resolution_list = params.get("Resolution_List", []) + + n_iterations = params.get("Number_of_Iterations", 100) + + if Layer_name == "Original": + Layer_name = None + + intensities = adata.var.index.to_list() + + print("Before Phenograph Clustering: \n", adata) + + phenograph_clustering( + adata=adata, + features=intensities, + layer=Layer_name, + k=K_cluster, + seed=Seed, + resolution_parameter=resolution_parameter, + n_iterations=n_iterations + ) + if output_annotation_name != "phenograph": + adata.obs = adata.obs.rename( + columns={'phenograph': output_annotation_name} + ) + + print("After Phenograph Clustering: \n", adata) + + # Count and display occurrences of each label in the annotation + print( + f'Count of cells in the output annotation:' + f'"{output_annotation_name}":' + ) + label_counts = adata.obs[output_annotation_name].value_counts() + print(label_counts) + print("\n") + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary + results_dict = {} + if "analysis" in params["outputs"]: + results_dict["analysis"] = adata + + # Use centralized save_results function + # All file handling and logging is now done by save_results + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + print( + f"Phenograph Clustering completed → " + f"{saved_files['analysis']}" + ) + return saved_files + else: + # Return the adata object directly for in-memory workflows + print("Returning AnnData object (not saving to file)") + return adata + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python phenograph_clustering_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + if isinstance(result, dict): + print("\nOutput files:") + for filename, filepath in result.items(): + print(f" {filename}: {filepath}") + else: + print("\nReturned AnnData object") diff --git a/src/spac/templates/posit_it_python_template.py b/src/spac/templates/posit_it_python_template.py new file mode 100644 index 00000000..2b4bf440 --- /dev/null +++ b/src/spac/templates/posit_it_python_template.py @@ -0,0 +1,281 @@ +""" +Platform-agnostic Post-It-Python template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Reads outputs configuration from blueprint JSON file. + +Usage +----- +>>> from spac.templates.posit_it_python_template import run_from_json +>>> run_from_json("examples/posit_it_python_params.json") +""" +import sys +from pathlib import Path +from typing import Any, Dict, Union, List +import logging +import matplotlib.pyplot as plt + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.templates.template_utils import ( + save_results, + parse_params, + text_to_value, +) + +# Color palette mapping color names to hex codes +PAINTS = { + 'White': '#FFFFFF', + 'LightGrey': '#D3D3D3', + 'Grey': '#999999', + 'Black': '#000000', + 'Red1': '#F44E3B', + 'Red2': '#D33115', + 'Red3': '#9F0500', + 'Orange1': '#FE9200', + 'Orange2': '#E27300', + 'Orange3': '#C45100', + 'Yellow1': '#FCDC00', + 'Yellow2': '#FCC400', + 'Yellow3': '#FB9E00', + 'YellowGreen1': '#DBDF00', + 'YellowGreen2': '#B0BC00', + 'YellowGreen3': '#808900', + 'Green1': '#A4DD00', + 'Green2': '#68BC00', + 'Green3': '#194D33', + 'Teal1': '#68CCCA', + 'Teal2': '#16A5A5', + 'Teal3': '#0C797D', + 'Blue1': '#73D8FF', + 'Blue2': '#009CE0', + 'Blue3': '#0062B1', + 'Purple1': '#AEA1FF', + 'Purple2': '#7B64FF', + 'Purple3': '#653294', + 'Magenta1': '#FDA1FF', + 'Magenta2': '#FA28FF', + 'Magenta3': '#AB149E', +} + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + show_plot: bool = False, + output_dir: str = None, +) -> Union[Dict[str, Union[str, List[str]]], plt.Figure]: + """ + Execute Post-It-Python analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Label": "Post-It", + "Label_font_color": "Black", + "Label_font_size": "80", + ... + "outputs": { + "figures": {"type": "directory", "name": "figures_dir"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If False, returns the figure + directly for in-memory workflows. Default is True. + show_plot : bool, optional + Whether to display the plot. Default is False. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. + + Returns + ------- + dict or Figure + If save_to_disk=True: Dictionary of saved file paths + If save_to_disk=False: The matplotlib figure object + """ + # Set up logging + logging.basicConfig(level=logging.INFO) + logger = logging.getLogger(__name__) + + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + if "outputs" not in params: + params["outputs"] = { + "figures": {"type": "directory", "name": "figures_dir"} + } + + # Extract parameters using .get() with defaults from JSON template + text = params.get("Label", "Post-It") + text_color = params.get("Label_font_color", "Black") + text_size = params.get("Label_font_size", "80") + text_fontface = params.get("Label_font_type", "normal") + text_fontfamily = params.get("Label_font_family", "Arial") + bold = params.get("Label_Bold", "False") + + # background params + fill_color = params.get("Background_fill_color", "Yellow1") + fill_alpha = params.get("Background_fill_opacity", "10") + + # image params + image_width = params.get("Page_width", "18") + image_height = params.get("Page_height", "6") + image_resolution = params.get("Page_DPI", "300") + + # Convert string parameters to appropriate types + text_size = text_to_value( + text_size, + to_int=True, + param_name="Label_font_size" + ) + + bold = text_to_value(bold) == "True" + + fill_alpha = text_to_value( + fill_alpha, + to_float=True, + param_name="Background_fill_opacity" + ) + + image_width = text_to_value( + image_width, + to_float=True, + param_name="Page_width" + ) + + image_height = text_to_value( + image_height, + to_float=True, + param_name="Page_height" + ) + + image_resolution = text_to_value( + image_resolution, + to_int=True, + param_name="Page_DPI" + ) + + # RUN ==== + + # Create figure + fig = plt.figure( + figsize=(image_width, image_height), + dpi=image_resolution + ) + fig.patch.set_facecolor(PAINTS[fill_color]) + fig.patch.set_alpha(fill_alpha / 100) + for ax in fig.get_axes(): + for item in ([ax.title, ax.xaxis.label, ax.yaxis.label] + + ax.get_xticklabels() + ax.get_yticklabels()): + item.set_fontsize(text_size) + item.set_fontfamily(text_fontfamily) + item.set_fontstyle(text_fontface) + if bold: + item.set_fontweight('bold') + + fig.text( + 0.5, 0.5, text, + fontsize=text_size, + color=PAINTS[text_color], + ha='center', + va='center', + fontfamily=text_fontfamily, + fontstyle=text_fontface, + fontweight='bold' if bold else 'normal' + ) + + if show_plot: + plt.show() + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + if "figures" in params["outputs"]: + results_dict["figures"] = {"postit": fig} + + # Use centralized save_results function + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + # Close figure after saving + plt.close(fig) + + logger.info("Post-It-Python completed successfully.") + return saved_files + else: + # Return the figure object directly for in-memory workflows + logger.info("Returning figure object for in-memory use") + return fig + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python posit_it_python_template.py " + " [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + else: + print(f"\nReturned figure object") + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) != 2: + print( + "Usage: python posit_it_python_template.py ", + file=sys.stderr + ) + sys.exit(1) + + result = run_from_json(sys.argv[1]) + + if isinstance(result, dict): + print("\nOutput files:") + for filename, filepath in result.items(): + print(f" {filename}: {filepath}") + else: + print("\nReturned figure object") \ No newline at end of file diff --git a/src/spac/templates/quantile_scaling_template.py b/src/spac/templates/quantile_scaling_template.py new file mode 100644 index 00000000..48cc8bf8 --- /dev/null +++ b/src/spac/templates/quantile_scaling_template.py @@ -0,0 +1,319 @@ +""" +Platform-agnostic Quantile Scaling template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Follows standardized output schema where html outputs are saved as directories. + +Usage +----- +>>> from spac.templates.quantile_scaling_template import run_from_json +>>> run_from_json("examples/quantile_scaling_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union, List, Tuple +import logging +import pandas as pd +import plotly.graph_objects as go + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.transformations import normalize_features +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, + text_to_value, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + show_plot: bool = True, + output_dir: str = None, +) -> Union[Dict[str, Union[str, List[str]]], Tuple[Any, go.Figure]]: + """ + Execute Quantile Scaling analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Analysis": "path/to/data.pickle", + "Low_Quantile": "0.02", + "High_Quantile": "0.98", + "Interpolation": "nearest", + "Table_to_Process": "Original", + "Output_Table_Name": "normalized_feature", + "Per_Batch": "False", + "Annotation": null, + "outputs": { + "analysis": {"type": "file", "name": "quantile_scaled_data.pickle"}, + "html": {"type": "directory", "name": "normalization_summary"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If False, returns the adata object + and figure directly for in-memory workflows. Default is True. + show_plot : bool, optional + Whether to display the plot. Default is True. + output_dir : str, optional + Override output directory from params. Default uses params value. + + Returns + ------- + dict or tuple + If save_to_disk=True: Dictionary of saved file paths + If save_to_disk=False: Tuple of (adata, figure) + """ + # Set up logging + logging.basicConfig(level=logging.INFO) + logger = logging.getLogger(__name__) + + # Parse parameters from JSON + params = parse_params(json_path) + + # Load the upstream analysis data + logger.info(f"Loading upstream analysis data from {params['Upstream_Analysis']}") + adata = load_input(params["Upstream_Analysis"]) + + # Extract parameters using .get() with defaults from JSON template + low_quantile = params.get("Low_Quantile", "0.02") + high_quantile = params.get("High_Quantile", "0.98") + interpolation = params.get("Interpolation", "nearest") + input_layer = params.get("Table_to_Process", "Original") + output_layer = params.get("Output_Table_Name", "normalized_feature") + per_batch = params.get("Per_Batch", "False") + # Annotation may be None, '', 'None', or a real name + annotation = params.get("Annotation") + + # Convert parameters using text_to_value + if input_layer == "Original": + input_layer = None + + low_quantile = text_to_value( + low_quantile, + to_float=True, + param_name='Low_Quantile' + ) + + high_quantile = text_to_value( + high_quantile, + to_float=True, + param_name='High_Quantile' + ) + + # Convert "True"/"False" string to boolean (case-insensitive) + per_batch = str(per_batch).strip().lower() == "true" + + # Annotation is optional - empty string or "None" becomes None + annotation = text_to_value(annotation) + + # Validate annotation is provided when per_batch is True + if per_batch and annotation is None: + raise ValueError( + 'Parameter "Annotation" is required when "Per Batch" is set ' + 'to True.' + ) + + # Check if output_layer already exists in adata + logger.info(f"Checking if output layer '{output_layer}' exists in adata layers...") + if output_layer in adata.layers.keys(): + raise ValueError( + f"Output Table Name '{output_layer}' already exists, " + f"please rename it." + ) + else: + logger.info(f"Output layer '{output_layer}' does not exist. " + f"Proceeding with normalization.") + + def df_as_html( + df, + columns_to_plot, + font_size=12, + column_scaler=1 + ): + df = df.reset_index() + df = df[columns_to_plot] + df_str = df.astype(str) + + column_widths = [ + max(df_str[col].apply(len)) * font_size * column_scaler + for col in df.columns + ] + column_widths[0] = 200 + + fig_width = sum(column_widths) * 1.1 + # Create a table trace with the DataFrame data + table_trace = go.Table( + header=dict(values=list(df.columns), + font=dict(size=font_size)), + cells=dict(values=df_str.values.T, + font=dict(size=font_size), + align='left'), + columnwidth=column_widths + ) + + layout = go.Layout( + autosize=True + ) + + fig = go.Figure( + data=[table_trace], + layout=layout + ) + + return fig + + def create_normalization_info( + adata, + low_quantile, + high_quantile, + input_layer, + output_layer + ): + pre_dataframe = adata.to_df(layer=input_layer) + quantiles = pre_dataframe.quantile([low_quantile, high_quantile]) + new_row_names = { + high_quantile: 'quantile_high', + low_quantile: 'quantile_low' + } + quantiles.index = quantiles.index.map(new_row_names) + + pre_info = pre_dataframe.describe() + pre_info = pd.concat([pre_info, quantiles]) + pre_info = pre_info.reset_index() + pre_info['index'] = 'Pre-Norm: ' + pre_info['index'].astype(str) + del pre_dataframe + + post_dataframe = adata.to_df(layer=output_layer) + post_info = post_dataframe.describe() + post_info = post_info.reset_index() + post_info['index'] = 'Post-Norm: ' + post_info['index'].astype(str) + del post_dataframe + + normalization_info = pd.concat([pre_info, post_info]).transpose() + normalization_info.columns = normalization_info.iloc[0] + normalization_info = normalization_info.drop( + normalization_info.index[0] + ) + normalization_info = normalization_info.astype(float) + normalization_info = normalization_info.round(3) + normalization_info = normalization_info.astype(str) + + return normalization_info + + logger.info(f"High quantile used: {str(high_quantile)}") + logger.info(f"Low quantile used: {str(low_quantile)}") + + transformed_data = normalize_features( + adata=adata, + low_quantile=low_quantile, + high_quantile=high_quantile, + interpolation=interpolation, + input_layer=input_layer, + output_layer=output_layer, + per_batch=per_batch, + annotation=annotation + ) + + logger.info(f"Transformed data stored in layer: {output_layer}") + dataframe = pd.DataFrame(transformed_data.layers[output_layer]) + logger.info(f"Transform summary:\n{dataframe.describe()}") + + normalization_info = create_normalization_info( + adata, + low_quantile, + high_quantile, + input_layer, + output_layer + ) + + columns_to_plot = [ + 'index', 'Pre-Norm: mean', 'Pre-Norm: std', + 'Pre-Norm: quantile_high', 'Pre-Norm: quantile_low', + 'Post-Norm: mean', 'Post-Norm: std', + ] + + html_plot = df_as_html( + normalization_info, + columns_to_plot + ) + + if show_plot: + html_plot.show() + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + # Add analysis output (single file) + if "analysis" in params["outputs"]: + results_dict["analysis"] = transformed_data + + # Add HTML output (directory) + if "html" in params["outputs"]: + results_dict["html"] = {"normalization_summary": html_plot} + + # Use centralized save_results function + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + logger.info("Quantile Scaling analysis completed successfully.") + return saved_files + else: + # Return the adata object and figure directly for in-memory workflows + logger.info("Returning AnnData object and figure for in-memory use") + return transformed_data, html_plot + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python quantile_scaling_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + # Run analysis + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + # Display results based on return type + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + else: + adata, html_plot = result + print("\nReturned AnnData object and figure for in-memory use") + print(f"AnnData shape: {adata.shape}") + print(f"Output layer: {list(adata.layers.keys())}") diff --git a/src/spac/templates/relational_heatmap_template.py b/src/spac/templates/relational_heatmap_template.py new file mode 100644 index 00000000..2087f5ff --- /dev/null +++ b/src/spac/templates/relational_heatmap_template.py @@ -0,0 +1,246 @@ +""" +Relational Heatmap with Plotly-matplotlib color synchronization. +Extracts actual colors from Plotly and uses them in matplotlib. +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union, Tuple +import pandas as pd +import numpy as np +import matplotlib +matplotlib.use('Agg') +import matplotlib.pyplot as plt +import matplotlib.colors as mcolors +import plotly.io as pio +import plotly.express as px + +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.visualization import relational_heatmap +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, + text_to_value, +) + + +def get_plotly_colorscale_as_matplotlib(plotly_colormap: str) -> mcolors.LinearSegmentedColormap: + """ + Extract actual colors from Plotly colorscale and create matplotlib colormap. + This ensures exact color matching between Plotly and matplotlib. + """ + # Get Plotly's colorscale + try: + # Use plotly express to get the actual color sequence + colorscale = getattr(px.colors.sequential, plotly_colormap, None) + if colorscale is None: + colorscale = getattr(px.colors.diverging, plotly_colormap, None) + if colorscale is None: + colorscale = getattr(px.colors.cyclical, plotly_colormap, None) + + if colorscale is None: + # Fallback to a default + print(f"Warning: Could not find Plotly colorscale '{plotly_colormap}', using default") + colorscale = px.colors.sequential.Viridis + + # Convert to matplotlib colormap + if isinstance(colorscale, list): + # Create custom colormap from color list + cmap = mcolors.LinearSegmentedColormap.from_list( + f"plotly_{plotly_colormap}", + colorscale + ) + return cmap + except Exception as e: + print(f"Error extracting Plotly colors: {e}") + + # Fallback to matplotlib's viridis + return plt.cm.viridis + + +def create_matplotlib_heatmap_matching_plotly( + data: pd.DataFrame, + plotly_fig: Any, + source_annotation: str, + target_annotation: str, + colormap_name: str, + figsize: tuple, + dpi: int, + font_size: int +) -> plt.Figure: + """ + Create matplotlib heatmap that matches Plotly's appearance. + Extracts color information from the Plotly figure. + """ + fig, ax = plt.subplots(figsize=figsize, dpi=dpi) + + # Get the actual colormap from Plotly + cmap = get_plotly_colorscale_as_matplotlib(colormap_name) + + # Extract data range from Plotly figure if possible + try: + zmin = plotly_fig.data[0].zmin if hasattr(plotly_fig.data[0], 'zmin') else data.min().min() + zmax = plotly_fig.data[0].zmax if hasattr(plotly_fig.data[0], 'zmax') else data.max().max() + except: + zmin, zmax = data.min().min(), data.max().max() + + # Create heatmap matching Plotly's style + im = ax.imshow( + data.values, + aspect='auto', + cmap=cmap, + interpolation='nearest', + vmin=zmin, + vmax=zmax + ) + + # Match Plotly's tick placement + ax.set_xticks(np.arange(len(data.columns))) + ax.set_yticks(np.arange(len(data.index))) + ax.set_xticklabels(data.columns, rotation=45, ha='right', fontsize=font_size) + ax.set_yticklabels(data.index, fontsize=font_size) + + # Add colorbar + cbar = plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04) + cbar.set_label('Count', fontsize=font_size) + cbar.ax.tick_params(labelsize=font_size) + + # Title matching Plotly + ax.set_title( + f'Relational Heatmap: {source_annotation} vs {target_annotation}', + fontsize=font_size + 2, + pad=20 + ) + ax.set_xlabel(target_annotation, fontsize=font_size) + ax.set_ylabel(source_annotation, fontsize=font_size) + + # Add grid for clarity (like Plotly) + ax.set_xticks(np.arange(len(data.columns) + 1) - 0.5, minor=True) + ax.set_yticks(np.arange(len(data.index) + 1) - 0.5, minor=True) + ax.grid(which='minor', color='gray', linestyle='-', linewidth=0.3, alpha=0.3) + ax.tick_params(which='both', length=0) + + plt.tight_layout() + return fig + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, + show_static_image: bool = False +) -> Union[Dict, Tuple]: + """Execute Relational Heatmap with color-matched outputs. + + Parameters + ---------- + json_path : str, Path, or dict + Path to parameters JSON file or dict of parameters. + save_to_disk : bool, default True + Whether to save results to disk. + output_dir : str, optional + Output directory. If None, read from params. + show_static_image : bool, default False + When True, generate a static PNG figure using matplotlib. + When False (default), only produce interactive HTML output. + Disabled by default because Plotly HTML-to-PNG conversion + hangs inside the Galaxy container environment. + """ + + params = parse_params(json_path) + + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + if "outputs" not in params: + params["outputs"] = { + "figures": {"type": "directory", "name": "figures_dir"}, + "html": {"type": "directory", "name": "html_dir"}, + "dataframe": {"type": "file", "name": "dataframe.csv"} + } + + # Load data + adata = load_input(params["Upstream_Analysis"]) + print(f"Data loaded: {adata.shape[0]} cells, {adata.shape[1]} genes") + + # Parameters + source_annotation = text_to_value(params.get("Source_Annotation_Name", "None")) + target_annotation = text_to_value(params.get("Target_Annotation_Name", "None")) + + dpi = float(params.get("Figure_DPI", 300)) + width_in = float(params.get("Figure_Width_inch", 8)) + height_in = float(params.get("Figure_Height_inch", 10)) + font_size = float(params.get("Font_Size", 8)) + colormap = params.get("Colormap", "darkmint") + + print(f"Creating heatmap: {source_annotation} vs {target_annotation}") + + # Run SPAC relational heatmap + result_dict = relational_heatmap( + adata=adata, + source_annotation=source_annotation, + target_annotation=target_annotation, + color_map=colormap, + font_size=font_size + ) + + rhmap_data = result_dict['data'] + plotly_fig = result_dict['figure'] + + # Update Plotly figure + if plotly_fig: + plotly_fig.update_layout( + width=width_in * 96, + height=height_in * 96, + font=dict(size=font_size) + ) + + if save_to_disk: + results_dict = { + "html": {"relational_heatmap": pio.to_html(plotly_fig, full_html=True, include_plotlyjs='cdn')}, + "dataframe": rhmap_data + } + + if show_static_image: + # Generate static matplotlib figure matching Plotly colors. + # Disabled by default on Galaxy because Plotly HTML-to-PNG + # conversion hangs in the Galaxy container environment. + print("Creating color-matched matplotlib figure...") + static_fig = create_matplotlib_heatmap_matching_plotly( + rhmap_data, + plotly_fig, + source_annotation, + target_annotation, + colormap, + (width_in, height_in), + int(dpi), + int(font_size) + ) + results_dict["figures"] = {"relational_heatmap": static_fig} + + saved_files = save_results(results_dict, params, output_base_dir=output_dir) + + if show_static_image: + plt.close(static_fig) + + print("✓ Relational Heatmap completed") + return saved_files + else: + return plotly_fig, rhmap_data + + +if __name__ == "__main__": + if len(sys.argv) < 2: + print("Usage: python relational_heatmap_template.py ", file=sys.stderr) + sys.exit(1) + + try: + run_from_json(sys.argv[1], save_to_disk=True) + sys.exit(0) + except Exception as e: + print(f"ERROR: {e}", file=sys.stderr) + import traceback + traceback.print_exc() + sys.exit(1) diff --git a/src/spac/templates/rename_labels_template.py b/src/spac/templates/rename_labels_template.py new file mode 100644 index 00000000..5527e3b7 --- /dev/null +++ b/src/spac/templates/rename_labels_template.py @@ -0,0 +1,173 @@ +""" +Platform-agnostic Rename Labels template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Follows standardized output schema. + +Usage +----- +>>> from spac.templates.rename_labels_template import run_from_json +>>> run_from_json("examples/rename_labels_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union +import logging +import pandas as pd +import pickle + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.transformations import rename_annotations +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, + text_to_value, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, +) -> Union[Dict[str, str], Any]: + """ + Execute Rename Labels analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Analysis": "path/to/data.pickle", + "Cluster_Mapping_Dictionary": "path/to/mapping.csv", + "Source_Annotation": "original_column", + "New_Annotation": "new_column", + "outputs": { + "analysis": {"type": "file", "name": "renamed_data.pickle"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If False, returns the adata object + directly for in-memory workflows. Default is True. + output_dir : str, optional + Override output directory from params. Default uses params value. + + Returns + ------- + dict or AnnData + If save_to_disk=True: Dictionary of saved file paths + If save_to_disk=False: The processed AnnData object + """ + # Set up logging + logging.basicConfig(level=logging.INFO) + logger = logging.getLogger(__name__) + + # Parse parameters from JSON + params = parse_params(json_path) + + # Load the upstream analysis data + logger.info(f"Loading upstream analysis data from {params['Upstream_Analysis']}") + all_data = load_input(params["Upstream_Analysis"]) + + # Extract parameters + rename_list_path = params["Cluster_Mapping_Dictionary"] + original_column = params.get("Source_Annotation", "None") + renamed_column = params.get("New_Annotation", "None") + + # Load the mapping dictionary CSV + logger.info(f"Loading cluster mapping dictionary from {rename_list_path}") + rename_list = pd.read_csv(rename_list_path) + + original_column = text_to_value(original_column) + renamed_column = text_to_value(renamed_column) + + # Create a new dictionary with the desired format + dict_list = rename_list.to_dict('records') + mappings = {d['Original']: d['New'] for d in dict_list} + + logger.info(f"Cluster Name Mapping: \n{mappings}") + + rename_annotations( + all_data, + src_annotation=original_column, + dest_annotation=renamed_column, + mappings=mappings) + + logger.info(f"After Renaming Clusters: \n{all_data}") + + # Count and display occurrences of each label in the annotation + logger.info(f'Count of cells in the output annotation:"{renamed_column}":') + label_counts = all_data.obs[renamed_column].value_counts() + logger.info(f"{label_counts}") + + object_to_output = all_data + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + # Add analysis output (single file) + if "analysis" in params["outputs"]: + results_dict["analysis"] = object_to_output + + # Use centralized save_results function + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + logger.info("Rename Labels analysis completed successfully.") + return saved_files + else: + # Return the adata object directly for in-memory workflows + logger.info("Returning AnnData object for in-memory use") + return object_to_output + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python rename_labels_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + # Run analysis + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + # Display results based on return type + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + else: + print("\nReturned AnnData object") + print(f"AnnData shape: {result.shape}") + print(f"Observations columns: {list(result.obs.columns)}") diff --git a/src/spac/templates/ripley_l_calculation_template.py b/src/spac/templates/ripley_l_calculation_template.py new file mode 100644 index 00000000..68b12812 --- /dev/null +++ b/src/spac/templates/ripley_l_calculation_template.py @@ -0,0 +1,151 @@ +""" +Platform-agnostic Ripley-L template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Usage +----- +>>> from spac.templates.ripley_l_template import run_from_json +>>> run_from_json("examples/ripley_l_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union, List, Optional +import logging + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.spatial_analysis import ripley_l +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, + text_to_value, + convert_to_floats +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: Optional[Union[str, Path]] = None +) -> Union[Dict[str, str], Any]: + """ + Execute Ripley-L analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary + save_to_disk : bool, optional + Whether to save results to file. If False, returns the adata object + directly for in-memory workflows. Default is True. + output_dir : str or Path, optional + Directory for outputs. If None, uses current directory. + + Returns + ------- + dict or AnnData + If save_to_disk=True: Dictionary of saved file paths + If save_to_disk=False: The processed AnnData object + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Load the upstream analysis data + adata = load_input(params["Upstream_Analysis"]) + + # Extract parameters + radii = params["Radii"] + annotation = params["Annotation"] + phenotypes = [params["Center_Phenotype"], params["Neighbor_Phenotype"]] + regions = params.get("Stratify_By", "None") + n_simulations = params.get("Number_of_Simulations", 100) + area = params.get("Area", "None") + seed = params.get("Seed", 42) + spatial_key = params.get("Spatial_Key", "spatial") + edge_correction = params.get("Edge_Correction", True) + + # Process parameters + regions = text_to_value( + regions, + default_none_text="None" + ) + + area = text_to_value( + area, + default_none_text="None", + value_to_convert_to=None, + to_float=True, + param_name='Area' + ) + + # Convert radii to floats + radii = convert_to_floats(radii) + + # Run the analysis + ripley_l( + adata, + annotation=annotation, + phenotypes=phenotypes, + distances=radii, + regions=regions, + n_simulations=n_simulations, + area=area, + seed=seed, + spatial_key=spatial_key, + edge_correction=edge_correction + ) + + logging.info("Ripley-L analysis completed successfully.") + logging.debug(f"AnnData object: {adata}") + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + if "analysis" in params["outputs"]: + results_dict["analysis"] = adata + + # Use centralized save_results function + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + logging.info(f"Ripley-L completed → {saved_files['analysis']}") + return saved_files + else: + # Return the adata object directly for in-memory workflows + logging.info("Returning AnnData object (not saving to file)") + return adata + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print("Usage: python ripley_l_template.py ", file=sys.stderr) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + # Run analysis + result = run_from_json(sys.argv[1], output_dir=output_dir) + + if isinstance(result, dict): + print("\nOutput files:") + for filename, filepath in result.items(): + print(f" {filename}: {filepath}") + else: + print("\nReturned AnnData object") diff --git a/src/spac/templates/sankey_plot_template.py b/src/spac/templates/sankey_plot_template.py new file mode 100644 index 00000000..c34a2c81 --- /dev/null +++ b/src/spac/templates/sankey_plot_template.py @@ -0,0 +1,236 @@ +""" +Production version of Sankey Plot template for Galaxy. +save files only, no show() calls, no blocking operations. +""" +import json +import sys +import os +from pathlib import Path +from typing import Any, Dict, List, Union, Optional, Tuple +import pandas as pd +import matplotlib +# Set non-interactive backend for Galaxy +matplotlib.use('Agg') +import matplotlib.pyplot as plt +import plotly.io as pio + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.visualization import sankey_plot +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, + text_to_value, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, # Always True for Galaxy + output_dir: str = None, + show_static_image: bool = False, +) -> Union[Dict[str, Union[str, List[str]]], None]: + """ + Execute Sankey Plot analysis for Galaxy. + + Parameters + ---------- + json_path : str, Path, or dict + Path to parameters JSON file or dict of parameters. + save_to_disk : bool, default True + Whether to save results to disk. Always True for Galaxy. + output_dir : str, optional + Output directory. If None, read from params. + show_static_image : bool, default False + When True, generate a static PNG placeholder figure. + When False (default), only produce interactive HTML output. + Disabled by default because Plotly HTML-to-PNG conversion + hangs inside the Galaxy container environment. + """ + # Parse parameters from JSON + params = parse_params(json_path) + print(f"Loaded parameters for {params.get('Source_Annotation_Name')} -> {params.get('Target_Annotation_Name')}") + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + if "outputs" not in params: + params["outputs"] = { + "figures": {"type": "directory", "name": "figures_dir"}, + "html": {"type": "directory", "name": "html_dir"} + } + + # Load the upstream analysis data + print("Loading upstream analysis data...") + adata = load_input(params["Upstream_Analysis"]) + print(f"Data loaded: {adata.shape[0]} cells, {adata.shape[1]} genes") + + # Extract parameters + annotation_columns = [ + params.get("Source_Annotation_Name", "None"), + params.get("Target_Annotation_Name", "None") + ] + + # Parse numeric parameters with error handling + try: + dpi = float(params.get("Figure_DPI", 300)) + except (ValueError, TypeError): + dpi = 300 + print(f"Warning: Invalid DPI value, using default {dpi}") + + width_num = float(params.get("Figure_Width_inch", 6)) + height_num = float(params.get("Figure_Height_inch", 6)) + + source_color_map = params.get("Source_Annotation_Color_Map", "tab20") + target_color_map = params.get("Target_Annotation_Color_Map", "tab20b") + + try: + sankey_font = float(params.get("Font_Size", 12)) + except (ValueError, TypeError): + sankey_font = 12 + print(f"Warning: Invalid font size, using default {sankey_font}") + + target_annotation = text_to_value(annotation_columns[1]) + source_annotation = text_to_value(annotation_columns[0]) + + print(f"Creating Sankey plot: {source_annotation} -> {target_annotation}") + + # Execute the sankey plot + fig = sankey_plot( + adata=adata, + source_annotation=source_annotation, + target_annotation=target_annotation, + source_color_map=source_color_map, + target_color_map=target_color_map, + sankey_font=sankey_font + ) + + # Customize the Sankey diagram layout + width_in_pixels = width_num * dpi + height_in_pixels = height_num * dpi + + fig.update_layout( + width=width_in_pixels, + height=height_in_pixels + ) + + print("Sankey plot generated") + + # IMPORTANT: No show() calls — causes hang in Galaxy + # plt.show() - REMOVED + # fig.show() - REMOVED + + # Handle saving — always save to disk for Galaxy + if save_to_disk: + # Prepare results dictionary + results_dict = {} + + # Save Plotly HTML (the actual interactive Sankey diagram) + if "html" in params["outputs"]: + html_content = pio.to_html(fig, full_html=True, include_plotlyjs='cdn') + results_dict["html"] = {"sankey_plot": html_content} + print("Plotly HTML prepared for saving") + + if show_static_image: + # Generate a static matplotlib placeholder figure. + # Disabled by default on Galaxy because Plotly HTML-to-PNG + # conversion hangs in the Galaxy container environment. + # The interactive HTML is the first-class output. + print("Creating matplotlib figure...") + static_fig, ax = plt.subplots( + figsize=(width_num, height_num), dpi=dpi + ) + ax.text( + 0.5, 0.6, 'Sankey Diagram', + ha='center', va='center', transform=ax.transAxes, + fontsize=16, fontweight='bold' + ) + ax.text( + 0.5, 0.5, + f'{source_annotation} → {target_annotation}', + ha='center', va='center', transform=ax.transAxes, + fontsize=12 + ) + ax.text( + 0.5, 0.3, + 'View HTML output for interactive diagram', + ha='center', va='center', transform=ax.transAxes, + fontsize=10, style='italic' + ) + ax.axis('off') + ax.add_patch(plt.Rectangle( + (0.1, 0.2), 0.8, 0.5, + fill=False, edgecolor='gray', linewidth=1, + transform=ax.transAxes + )) + + if "figures" in params["outputs"]: + results_dict["figures"] = {"sankey_plot": static_fig} + print("Matplotlib figure prepared for saving") + + # Use centralized save_results function + print("Saving all results...") + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + if show_static_image: + plt.close(static_fig) + + print(f"✓ Sankey Plot completed successfully") + print(f" Outputs saved: {list(saved_files.keys())}") + + return saved_files + else: + # For non-Galaxy use (testing) + print("Returning None (display mode not supported)") + return None + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python sankey_plot_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + print("\n" + "="*60) + print("SANKEY PLOT - GALAXY PRODUCTION VERSION") + print("="*60 + "\n") + + try: + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir, + save_to_disk=True # Always save for Galaxy + ) + + if isinstance(result, dict): + print("\nOutput files generated:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + + print("\n✓ SUCCESS - Job completed without hanging") + sys.exit(0) + + except Exception as e: + print(f"\n✗ ERROR: {e}", file=sys.stderr) + import traceback + traceback.print_exc() + sys.exit(1) diff --git a/src/spac/templates/select_values_template.py b/src/spac/templates/select_values_template.py new file mode 100644 index 00000000..e84723a4 --- /dev/null +++ b/src/spac/templates/select_values_template.py @@ -0,0 +1,204 @@ +""" +Platform-agnostic Select Values template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Reads outputs configuration from blueprint JSON file. + +Usage +----- +>>> from spac.templates.select_values_template import run_from_json +>>> run_from_json("examples/select_values_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union, Tuple +import pandas as pd +import warnings +import logging + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.data_utils import select_values +from spac.templates.template_utils import ( + save_results, + parse_params, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, +) -> Union[Dict[str, str], pd.DataFrame]: + """ + Execute Select Values analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Dataset": "path/to/dataframe.csv", + "Annotation_of_Interest": "cell_type", + "Label_s_of_Interest": ["T cells", "B cells"], + "outputs": { + "dataframe": {"type": "file", "name": "dataframe.csv"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If True, saves the filtered DataFrame + to a CSV file. If False, returns the DataFrame directly for in-memory + workflows. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. All outputs will be saved relative to this directory. + + Returns + ------- + dict or DataFrame + If save_to_disk=True: Dictionary of saved file paths with structure: + { + "dataframe": "path/to/dataframe.csv" + } + If save_to_disk=False: The filtered DataFrame + + Notes + ----- + Output Structure: + - DataFrame is saved as a single CSV file + - When save_to_disk=False, the DataFrame is returned for programmatic use + + Examples + -------- + >>> # Save results to disk + >>> saved_files = run_from_json("params.json") + >>> print(saved_files["dataframe"]) # Path to saved CSV file + + >>> # Get results in memory + >>> filtered_df = run_from_json("params.json", save_to_disk=False) + + >>> # Custom output directory + >>> saved = run_from_json("params.json", output_dir="/custom/path") + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + # DataFrames typically use file type per standardized schema + if "outputs" not in params: + params["outputs"] = { + "dataframe": {"type": "file", "name": "dataframe.csv"} + } + + # Load upstream data - could be DataFrame, CSV + upstream_dataset = params["Upstream_Dataset"] + + if isinstance(upstream_dataset, pd.DataFrame): + input_dataset = upstream_dataset # Direct DataFrame from previous step + elif isinstance(upstream_dataset, (str, Path)): + try: + input_dataset = pd.read_csv(upstream_dataset) + except Exception as e: + raise ValueError(f"Failed to read CSV from {upstream_dataset}: {e}") + else: + raise TypeError( + f"Upstream_Dataset must be DataFrame or file path. " + f"Got {type(upstream_dataset)}" + ) + + # Extract parameters - support both "Label_s_of_Interest" and "Labels_of_Interest" + # for backward compatibility with JSON template + observation = params.get("Annotation_of_Interest") + values = params.get("Label_s_of_Interest") or params.get("Labels_of_Interest") + + with warnings.catch_warnings(record=True) as caught_warnings: + warnings.simplefilter("always") + filtered_dataset = select_values( + data=input_dataset, + annotation=observation, + values=values + ) + # Only process warnings that are relevant to the select_values operation + if caught_warnings: + for warning in caught_warnings: + # Skip deprecation warnings from numpy/pandas + if (hasattr(warning, 'category') and + issubclass(warning.category, DeprecationWarning)): + continue + # Raise actual operational warnings as errors + if hasattr(warning, 'message'): + raise ValueError(str(warning.message)) + + logging.info(filtered_dataset.info()) + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + # Check for dataframe output + if "dataframe" in params["outputs"]: + results_dict["dataframe"] = filtered_dataset + + # Use centralized save_results function + # All file handling and logging is now done by save_results + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + logging.info("Select Values analysis completed successfully.") + return saved_files + else: + # Return the dataframe directly for in-memory workflows + logging.info("Returning DataFrame for in-memory use") + return filtered_dataset + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python select_values_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + # Run analysis + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + # Display results based on return type + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + else: + print("\nReturned DataFrame") + print(f"DataFrame shape: {result.shape}") diff --git a/src/spac/templates/setup_analysis_template.py b/src/spac/templates/setup_analysis_template.py new file mode 100644 index 00000000..8bc8100b --- /dev/null +++ b/src/spac/templates/setup_analysis_template.py @@ -0,0 +1,233 @@ +""" +Platform-agnostic Setup Analysis template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Follows standardized output schema where analysis is saved as a file. + +Usage +----- +>>> from spac.templates.setup_analysis_template import run_from_json +>>> run_from_json("examples/setup_analysis_params.json") +""" + +import sys +from pathlib import Path +from typing import Any, Dict, Union +import pandas as pd +import ast +import logging + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.data_utils import ingest_cells +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, + text_to_value +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, +) -> Union[Dict[str, str], Any]: + """ + Execute Setup Analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Dataset": "path/to/data.csv", + "Features_to_Analyze": ["CD25", "CD3D"], + "Feature_Regex": [], + "X_Coordinate_Column": "X_centroid", + "Y_Coordinate_Column": "Y_centroid", + "Annotation_s_": ["cell_type"], + "outputs": { + "analysis": {"type": "file", "name": "output.pickle"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If True, saves the AnnData object + to a pickle file. If False, returns the AnnData object directly + for in-memory workflows. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. All outputs will be saved relative to this directory. + + Returns + ------- + dict or AnnData + If save_to_disk=True: Dictionary of saved file paths with structure: + {"analysis": "path/to/output.pickle"} + If save_to_disk=False: The processed AnnData object for in-memory use + + Notes + ----- + Output Structure: + - Analysis output is saved as a single pickle file (standardized for analysis outputs) + - When save_to_disk=False, the AnnData object is returned for programmatic use + + Examples + -------- + >>> # Save results to disk + >>> saved_files = run_from_json("params.json") + >>> print(saved_files["analysis"]) # Path to saved pickle file + >>> # './output.pickle' + + >>> # Get results in memory for further processing + >>> adata = run_from_json("params.json", save_to_disk=False) + >>> # Can now work with adata object directly + + >>> # Custom output directory + >>> saved = run_from_json("params.json", output_dir="/custom/path") + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Ensure outputs configuration exists with standardized defaults + # Analysis uses file type per standardized schema + if "outputs" not in params: + # Get output filename from params or use default + output_file = params.get("Output_File", "output.pickle") + if not output_file.endswith(('.pickle', '.pkl', '.h5ad')): + output_file = output_file + '.pickle' + + params["outputs"] = { + "analysis": {"type": "file", "name": output_file} + } + + # Extract parameters + upstream_dataset = params["Upstream_Dataset"] + feature_names = params["Features_to_Analyze"] + regex_str = params.get("Feature_Regex", []) + x_col = params["X_Coordinate_Column"] + y_col = params["Y_Coordinate_Column"] + annotation = params["Annotation_s_"] + + # Load upstream data - could be DataFrame or CSV + if isinstance(upstream_dataset, (str, Path)): + try: + input_dataset = pd.read_csv(upstream_dataset) + # Validate it's a proper DataFrame + if input_dataset.empty: + raise ValueError("CSV file is empty") + except Exception as e: + raise ValueError(f"Failed to read CSV from {upstream_dataset}: {e}") + else: + # Already a DataFrame + input_dataset = upstream_dataset + + # Process annotation parameter + if isinstance(annotation, str): + annotation = [annotation] + + if len(annotation) == 1 and annotation[0] == "None": + annotation = None + + if annotation and len(annotation) != 1 and "None" in annotation: + error_msg = 'String "None" found in the annotation list' + raise ValueError(error_msg) + + # Process coordinate columns + x_col = text_to_value(x_col, default_none_text="None") + y_col = text_to_value(y_col, default_none_text="None") + + # Process feature names and regex + if isinstance(feature_names, str): + feature_names = [feature_names] + if isinstance(regex_str, str): + try: + regex_str = ast.literal_eval(regex_str) + except (ValueError, SyntaxError): + regex_str = [regex_str] if regex_str else [] + + # Processing two search methods + for feature in feature_names: + regex_str.append(f"^{feature}$") + + # Sanitizing search list + regex_str_set = set(regex_str) + regex_str_list = list(regex_str_set) + + # Run the ingestion + ingested_anndata = ingest_cells( + dataframe=input_dataset, + regex_str=regex_str_list, + x_col=x_col, + y_col=y_col, + annotation=annotation + ) + + logging.info("Analysis Setup:") + logging.info(f"{ingested_anndata}") + logging.info("Schema:") + logging.info(f"{ingested_anndata.var_names.tolist()}") + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + if "analysis" in params["outputs"]: + results_dict["analysis"] = ingested_anndata + + # Use centralized save_results function + # All file handling and logging is now done by save_results + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + logging.info( + f"Setup Analysis completed → {saved_files['analysis']}" + ) + return saved_files + else: + # Return the adata object directly for in-memory workflows + logging.info("Returning AnnData object (not saving to file)") + return ingested_anndata + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python setup_analysis_template.py ", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + # Run analysis + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + # Display results based on return type + if isinstance(result, dict): + print("\nOutput files:") + for key, path in result.items(): + print(f" {key}: {path}") + else: + print("\nReturned AnnData object for in-memory use") + print(f"AnnData: {result}") + print(f"Shape: {result.shape}") \ No newline at end of file diff --git a/src/spac/templates/spatial_interaction_template.py b/src/spac/templates/spatial_interaction_template.py new file mode 100644 index 00000000..bee16a41 --- /dev/null +++ b/src/spac/templates/spatial_interaction_template.py @@ -0,0 +1,324 @@ +""" +Platform-agnostic Spatial Interaction template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Follows standardized output schema where figures are saved as directories. + +Usage +----- +>>> from spac.templates.spatial_interaction_template import run_from_json +>>> run_from_json("examples/spatial_interaction_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union, List, Optional, Tuple +import pandas as pd +import numpy as np +from PIL import Image +from pprint import pprint +import matplotlib.pyplot as plt + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.spatial_analysis import spatial_interaction +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, + text_to_value, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + show_plot: bool = True, + output_dir: str = None, +) -> Union[Dict[str, Union[str, List[str]]], Tuple[List[Any], Dict[str, pd.DataFrame]]]: + """ + Execute Spatial Interaction analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Analysis": "path/to/data.pickle", + "Annotation": "cell_type", + "Spatial_Analysis_Method": "Neighborhood Enrichment", + "outputs": { + "figures": {"type": "directory", "name": "figures_dir"}, + "dataframe": {"type": "file", "name": "dataframe.csv"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If True, saves figures to a directory + and matrices to CSV files using centralized save_results. Default is True. + show_plot : bool, optional + Whether to display the plot. Default is True. + output_dir : str or Path, optional + Base directory for outputs. If None, uses params['Output_Directory'] or '.' + + Returns + ------- + dict or tuple + If save_to_disk=True: Dictionary mapping output types to saved file paths + If save_to_disk=False: Tuple of (figures_list, matrices_dict) for in-memory use + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Load the upstream analysis data + adata = load_input(params["Upstream_Analysis"]) + + # Extract parameters + annotation = params["Annotation"] + analysis_method = params["Spatial_Analysis_Method"] + # Two analysis methods available: + # 1. "Neighborhood Enrichment": Calculates how often pairs of cell types + # are neighbors compared to random chance. Positive scores indicate + # attraction/co-location, negative scores indicate avoidance. + # Output: z-scores (can be positive or negative) + # Files: neighborhood_enrichment_{identifier}.csv + # 2. "Cluster Interaction Matrix": Counts the number of edges/connections + # between different cell types in the spatial neighborhood graph. + # Shows absolute interaction frequencies rather than enrichment. + # Output: raw counts (always positive integers) + # Files: cluster_interaction_matrix_{identifier}.csv + # Both methods produce the same data structure, just different values + stratify_by = params.get("Stratify_By", ["None"]) + seed = params.get("Seed", "None") + coord_type = params.get("Coordinate_Type", "None") + n_rings = 1 + n_neighs = params.get("K_Nearest_Neighbors", 6) + radius = params.get("Radius", "None") + image_width = params.get("Figure_Width", 15) + image_height = params.get("Figure_Height", 12) + dpi = params.get("Figure_DPI", 200) + font_size = params.get("Font_Size", 12) + color_bar_range = params.get("Color_Bar_Range", "Automatic") + + def save_matrix(matrix): + for file_name in matrix: + data_df = matrix[file_name] + print("\n") + print(file_name) + print(data_df) + # In SPAC, collect matrices for later saving instead of + # direct file write. Store them with proper extension if missing. + if not file_name.endswith('.csv'): + file_name = f"{file_name}.csv" + matrices[file_name] = data_df + + def update_nidap_display( + axs, + image_width, + image_height, + dpi, + font_size + ): + # NIDAP display logic is different than the generic python + # image output. For example, a 12in*8in image with font 12 + # should properly display all text in generic Image + # But in nidap code workbook resizing, the text will be reduced. + # This function is to adjust the image sizing and font sizing + # to fit the NIDAP display + # Get the figure associated with the axes + fig = axs.get_figure() + + # Set figure size and DPI + fig.set_size_inches(image_width, image_height) + fig.set_dpi(dpi) + + # Customize font sizes + axs.title.set_fontsize(font_size) # Title font size + axs.xaxis.label.set_fontsize(font_size) # X-axis label font size + axs.yaxis.label.set_fontsize(font_size) # Y-axis label font size + axs.tick_params(axis='both', labelsize=font_size) # Tick labels + # Return the updated figure and axes for chaining or further use + # Note: This adjustment was specific to NIDAP display resizing + # behavior and may not be necessary in other environments + return fig, axs + + for i, item in enumerate(stratify_by): + item_is_none = text_to_value(item) + if item_is_none is None and i == 0: + stratify_by = item_is_none + elif item_is_none is None and i != 0: + raise ValueError( + 'Found string "None" in the stratify by list that is ' + 'not the first entry.\n' + 'Please remove the "None" to proceed with the list of ' + 'stratify by options, \n' + 'or move the "None" to start of the list to disable ' + 'stratification. Thank you.') + + seed = text_to_value(seed, to_int=True) + radius = text_to_value(radius, to_float=True) + coord_type = text_to_value(coord_type) + color_bar_range = text_to_value( + color_bar_range, + "Automatic", + to_float=True) + + if color_bar_range is not None: + cmap = "seismic" + vmin = -abs(color_bar_range) + vmax = abs(color_bar_range) + else: + cmap = "seismic" + vmin = vmax = color_bar_range + + plt.rcParams['font.size'] = font_size + + result_dictionary = spatial_interaction( + adata=adata, + annotation=annotation, + analysis_method=analysis_method, + stratify_by=stratify_by, + return_matrix=True, + seed=seed, + coord_type=coord_type, + n_rings=n_rings, + n_neighs=n_neighs, + radius=radius, + cmap=cmap, + vmin=vmin, + vmax=vmax, + figsize=(image_width, image_height), + dpi=dpi + ) + + # Track figures and matrices for optional saving + figures = [] + matrices = {} + + if not stratify_by: + axs = result_dictionary['Ax'] + fig, axs = update_nidap_display( + axs=axs, + image_width=image_width, + image_height=image_height, + dpi=dpi, + font_size=font_size + ) + figures.append(fig) + if show_plot: + plt.show() + + matrix = result_dictionary['Matrix']['annotation'] + save_matrix(matrix) + else: + plt.close(1) + axs_dict = result_dictionary['Ax'] + for key in axs_dict: + axs = axs_dict[key] + fig, axs = update_nidap_display( + axs=axs, + image_width=image_width, + image_height=image_height, + dpi=dpi, + font_size=font_size + ) + figures.append(fig) + if show_plot: + plt.show() + + matrix_dict = result_dictionary['Matrix'] + for identifier in matrix_dict: + matrix = matrix_dict[identifier] + save_matrix(matrix) + + # Handle saving if requested (separate from NIDAP logic) + if save_to_disk: + # Ensure outputs configuration exists + if "outputs" not in params: + # Provide default outputs config if not present + params["outputs"] = { + "figures": {"type": "directory", "name": "figures"}, + "dataframes": {"type": "directory", "name": "matrices"} + } + + # Prepare results dictionary + results_dict = {} + + # Package figures in a dictionary for directory saving + if figures: + # Store figures with meaningful names + figures_dict = {} + for i, fig in enumerate(figures): + # Extract title if available for better naming + try: + ax = fig.axes[0] if fig.axes else None + title = ax.get_title() if ax and ax.get_title() else f"interaction_plot_{i+1}" + # Clean title for filename + title = title.replace(" ", "_").replace("/", "_").replace(":", "") + figures_dict[f"{title}.png"] = fig + except: + figures_dict[f"interaction_plot_{i+1}.png"] = fig + + results_dict["figures"] = figures_dict + + # Add matrices (already have .csv extension added) + if matrices: + results_dict["dataframes"] = matrices + + # Use centralized save_results function + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + # Close figures after saving to free memory + for fig in figures: + plt.close(fig) + + print( + f"Spatial Interaction completed -> " + f"{list(saved_files.keys())}" + ) + return saved_files + else: + # Return objects directly for in-memory workflows + return figures, matrices + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python spatial_interaction_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + # Run analysis + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + # Display results based on return type + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + else: + figures_list, matrices_dict = result + print("\nReturned figures and matrices for in-memory use") + print(f"Number of figures: {len(figures_list)}") + print(f"Number of matrices: {len(matrices_dict)}") diff --git a/src/spac/templates/spatial_plot_template.py b/src/spac/templates/spatial_plot_template.py new file mode 100644 index 00000000..deb93239 --- /dev/null +++ b/src/spac/templates/spatial_plot_template.py @@ -0,0 +1,271 @@ +""" +Platform-agnostic Spatial Plot template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Reads outputs configuration from blueprint JSON file. + +Usage +----- +>>> from spac.templates.spatial_plot_template import run_from_json +>>> run_from_json("examples/spatial_plot_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union, List, Optional +import matplotlib.pyplot as plt +from functools import partial +import logging + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.visualization import spatial_plot +from spac.data_utils import select_values +from spac.utils import check_annotation +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, + text_to_value, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + show_plots: bool = True, + output_dir: str = None, +) -> Union[Dict[str, Union[str, List[str]]], List[plt.Figure]]: + """ + Execute Spatial Plot analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Analysis": "path/to/data.pickle", + "Stratify": true, + "Stratify_By": ["slide_id"], + "Color_By": "Annotation", + ... + "outputs": { + "figures": {"type": "directory", "name": "figures_dir"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If False, returns the figures + directly for in-memory workflows. Default is True. + show_plots : bool, optional + Whether to display the plots. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. + + Returns + ------- + dict or list + If save_to_disk=True: Dictionary of saved file paths + If save_to_disk=False: List of matplotlib figures + """ + # Set up logging + logging.basicConfig(level=logging.INFO) + logger = logging.getLogger(__name__) + + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + if "outputs" not in params: + params["outputs"] = { + "figures": {"type": "directory", "name": "figures_dir"} + } + + # Load the upstream analysis data + adata = load_input(params["Upstream_Analysis"]) + + # Extract parameters exactly as in NIDAP template + annotation = params.get("Annotation_to_Highlight", "None") + feature = params.get("Feature_to_Highlight", "") + layer = params.get("Table", "Original") + + alpha = params.get("Dot_Transparency", 0.5) + spot_size = params.get("Dot_Size", 25) + image_height = params.get("Figure_Height", 6) + image_width = params.get("Figure_Width", 12) + dpi = params.get("Figure_DPI", 200) + font_size = params.get("Font_Size", 12) + vmin = params.get("Lower_Colorbar_Bound", 999) + vmax = params.get("Upper_Colorbar_Bound", -999) + color_by = params.get("Color_By", "Annotation") + stratify = params.get("Stratify", True) + stratify_by = params.get("Stratify_By", []) + + if stratify and len(stratify_by) == 0: + raise ValueError( + 'Please set at least one annotation in the "Stratify By" ' + 'option, or set the "Stratify" to False.' + ) + + if stratify: + check_annotation( + adata, + annotations=stratify_by + ) + + # Process feature and annotation with text_to_value + feature = text_to_value(feature) + annotation = text_to_value(annotation) + + if color_by == "Annotation": + feature = None + else: + annotation = None + + layer = text_to_value(layer, "Original") + + prefilled_spatial = partial( + spatial_plot, + spot_size=spot_size, + alpha=alpha, + vmin=vmin, + vmax=vmax, + annotation=annotation, + feature=feature, + layer=layer + ) + + # Track figures for saving + figures_dict = {} + + if not stratify: + plt.rcParams['font.size'] = font_size + fig, ax = plt.subplots( + figsize=(image_width, image_height), dpi=dpi + ) + + ax = prefilled_spatial(adata=adata, ax=ax) + + if color_by == "Annotation": + title = f'Annotation: {annotation}' + else: + title = f'Table:"{layer}" \n Feature:"{feature}"' + ax[0].set_title(title) + + figures_dict["spatial_plot"] = fig + + if show_plots: + plt.show() + else: + combined_label = "concatenated_label" + + adata.obs[combined_label] = adata.obs[stratify_by].astype(str).agg( + '_'.join, axis=1 + ) + + unique_values = adata.obs[combined_label].unique() + + logger.info(f"Unique stratification values: {unique_values}") + + max_length = min(len(unique_values), 20) + if len(unique_values) > 20: + logger.warning( + f'There are "{len(unique_values)}" unique plots, ' + 'displaying only the first 20 plots.' + ) + + for idx, value in enumerate(unique_values[:max_length]): + filtered_adata = select_values( + data=adata, annotation=combined_label, values=value + ) + + fig, ax = plt.subplots( + figsize=(image_width, image_height), dpi=dpi + ) + + ax = prefilled_spatial(adata=filtered_adata, ax=ax) + + if color_by == "Annotation": + title = f'Annotation: {annotation}' + else: + title = f'Table:"{layer}" \n Feature:"{feature}"' + title = f'{title}\n Stratify by: {value}' + ax[0].set_title(title) + + # Use sanitized value for figure name + safe_value = str(value).replace('/', '_').replace('\\', '_') + figures_dict[f"spatial_plot_{safe_value}"] = fig + + if show_plots: + plt.show() + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + # Check for figures output + if "figures" in params["outputs"]: + results_dict["figures"] = figures_dict + + # Use centralized save_results function + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + # Close figures after saving + for fig in figures_dict.values(): + plt.close(fig) + + logger.info("Spatial Plot analysis completed successfully.") + return saved_files + else: + # Return the figures directly for in-memory workflows + logger.info("Returning figures for in-memory use") + return list(figures_dict.values()) + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python spatial_plot_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + else: + print(f"\nReturned {len(result)} figures") diff --git a/src/spac/templates/subset_analysis_template.py b/src/spac/templates/subset_analysis_template.py new file mode 100644 index 00000000..e32286de --- /dev/null +++ b/src/spac/templates/subset_analysis_template.py @@ -0,0 +1,219 @@ +""" +Platform-agnostic Subset Analysis template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Reads outputs configuration from blueprint JSON file. + +Usage +----- +>>> from spac.templates.subset_analysis_template import run_from_json +>>> run_from_json("examples/subset_analysis_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union, List, Tuple +import pandas as pd +import warnings +import logging + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +# Import SPAC functions from NIDAP template +from spac.data_utils import select_values +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, + text_to_value, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, +) -> Union[Dict[str, str], Any]: + """ + Execute Subset Analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Analysis": "path/to/data.pickle", + "Annotation_of_interest": "cell_type", + "Labels": ["T cells", "B cells"], + "Include_Exclude": "Include Selected Labels", + "outputs": { + "analysis": {"type": "file", "name": "transform_output.pickle"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If True, saves the filtered AnnData + to a pickle file. If False, returns the AnnData object directly for + in-memory workflows. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. All outputs will be saved relative to this directory. + + Returns + ------- + dict or AnnData + If save_to_disk=True: Dictionary of saved file paths with structure: + { + "analysis": "path/to/transform_output.pickle" + } + If save_to_disk=False: The processed AnnData object + + Notes + ----- + Output Structure: + - Analysis output is saved as a single pickle file + - When save_to_disk=False, the AnnData object is returned for programmatic use + + Examples + -------- + >>> # Save results to disk + >>> saved_files = run_from_json("params.json") + >>> print(saved_files["analysis"]) # Path to saved pickle file + + >>> # Get results in memory + >>> filtered_adata = run_from_json("params.json", save_to_disk=False) + + >>> # Custom output directory + >>> saved = run_from_json("params.json", output_dir="/custom/path") + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + # Analysis outputs use file type per standardized schema + if "outputs" not in params: + params["outputs"] = { + "analysis": {"type": "file", "name": "transform_output.pickle"} + } + + # Load the upstream analysis data + adata = load_input(params["Upstream_Analysis"]) + + # Extract parameters + # Use direct dictionary access for required parameters (NIDAP style) + annotation = params["Annotation_of_interest"] + labels = params["Labels"] + + # Use .get() with defaults for optional parameters from JSON template + toggle = params.get("Include_Exclude", "Include Selected Labels") + + if toggle == "Include Selected Labels": + values_to_include = labels + values_to_exclude = None + else: + values_to_include = None + values_to_exclude = labels + + with warnings.catch_warnings(record=True) as caught_warnings: + warnings.simplefilter("always") + filtered_adata = select_values( + data=adata, + annotation=annotation, + values=values_to_include, + exclude_values=values_to_exclude + ) + # Only process warnings that are relevant to the select_values operation + if caught_warnings: + for warning in caught_warnings: + # Skip deprecation warnings from numpy/pandas + if (hasattr(warning, 'category') and + issubclass(warning.category, DeprecationWarning)): + continue + # Raise actual operational warnings as errors + if hasattr(warning, 'message'): + raise ValueError(str(warning.message)) + + logging.info(filtered_adata) + logging.info("\n") + + # Count and display occurrences of each label in the annotation + label_counts = filtered_adata.obs[annotation].value_counts() + logging.info(label_counts) + logging.info("\n") + + dataframe = pd.DataFrame( + filtered_adata.X, + columns=filtered_adata.var.index, + index=filtered_adata.obs.index + ) + logging.info(dataframe.describe()) + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + # Check for analysis output (backward compatibility with "Output_File") + if "analysis" in params["outputs"]: + results_dict["analysis"] = filtered_adata + + # Use centralized save_results function + # All file handling and logging is now done by save_results + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + logging.info("Subset Analysis completed successfully.") + return saved_files + else: + # Return the adata object directly for in-memory workflows + logging.info("Returning AnnData object for in-memory use") + return filtered_adata + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python subset_analysis_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + # Run analysis + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + # Display results based on return type + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + else: + print("\nReturned AnnData object") + print(f"AnnData shape: {result.shape}") diff --git a/src/spac/templates/summarize_annotation_statistics_template.py b/src/spac/templates/summarize_annotation_statistics_template.py new file mode 100644 index 00000000..04557b35 --- /dev/null +++ b/src/spac/templates/summarize_annotation_statistics_template.py @@ -0,0 +1,185 @@ +""" +Platform-agnostic Summarize Annotation's Statistics template converted from +NIDAP. Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Reads outputs configuration from blueprint JSON file. + +Usage +----- +>>> from spac.templates.summarize_annotation_statistics_template import \ +... run_from_json +>>> run_from_json("examples/summarize_annotation_statistics_params.json") +""" +import json +import sys +import logging +from pathlib import Path +from typing import Any, Dict, Union, List, Optional +import pandas as pd + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.transformations import get_cluster_info +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, + text_to_value, +) + +logger = logging.getLogger(__name__) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, +) -> Union[Dict[str, str], pd.DataFrame]: + """ + Execute Summarize Annotation's Statistics analysis with parameters from + JSON. Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Analysis": "path/to/data.pickle", + "Table_to_Process": "Original", + "Annotation": "phenotype", + "Feature_s_": ["All"], + "outputs": { + "dataframe": {"type": "file", "name": "dataframe.csv"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If False, returns the dataframe + directly for in-memory workflows. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. + + Returns + ------- + dict or DataFrame + If save_to_disk=True: Dictionary of saved file paths with structure: + {"dataframe": "path/to/dataframe.csv"} + If save_to_disk=False: The processed DataFrame + + Notes + ----- + Output Structure: + - DataFrame is saved as a single CSV file + - When save_to_disk=False, the DataFrame is returned for programmatic use + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + if "outputs" not in params: + params["outputs"] = { + "dataframe": {"type": "file", "name": "dataframe.csv"} + } + + # Load the upstream analysis data + adata = load_input(params["Upstream_Analysis"]) + + # Extract parameters + layer = params.get("Table_to_Process", "Original") + features = params.get("Feature_s_", ["All"]) + annotation = params.get("Annotation", "None") + + if layer == "Original": + layer = None + + if len(features) == 1 and features[0] == "All": + features = None + + if annotation == "None": + annotation = None + + info = get_cluster_info( + adata=adata, + layer=layer, + annotation=annotation, + features=features + ) + + df = pd.DataFrame(info) + + # Renaming columns to avoid spaces and special characters + df.columns = [ + col.replace(" ", "_").replace("-", "_") for col in df.columns + ] + + # Get summary statistics of returned dataset + logger.info(f"Summary statistics of the dataset:\n{df.describe()}") + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + if "dataframe" in params["outputs"]: + results_dict["dataframe"] = df + + # Use centralized save_results function + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + logger.info( + "Summarize Annotation's Statistics analysis completed successfully." + ) + return saved_files + else: + # Return the dataframe directly for in-memory workflows + logger.info("Returning DataFrame for in-memory use") + return df + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python summarize_annotation_statistics_template.py " + " [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + else: + print("\nReturned DataFrame") + print(f"DataFrame shape: {result.shape}") diff --git a/src/spac/templates/summarize_dataframe_template.py b/src/spac/templates/summarize_dataframe_template.py new file mode 100644 index 00000000..92a43e0d --- /dev/null +++ b/src/spac/templates/summarize_dataframe_template.py @@ -0,0 +1,207 @@ +""" +Platform-agnostic Summarize DataFrame template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Reads outputs configuration from blueprint JSON file. + +Usage +----- +>>> from spac.templates.summarize_dataframe_template import run_from_json +>>> run_from_json("examples/summarize_dataframe_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union, List, Optional, Tuple +import pandas as pd +import logging + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.data_utils import summarize_dataframe +from spac.visualization import present_summary_as_figure +from spac.templates.template_utils import ( + save_results, + parse_params, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, + show_plot: bool = False, +) -> Union[Dict[str, str], Tuple[Any, pd.DataFrame]]: + """ + Execute Summarize DataFrame analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Dataset": "path/to/dataframe.csv", + "Columns": ["col1", "col2"], + "Print_Missing_Location": false, + "outputs": { + "html": {"type": "directory", "name": "html_dir"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If True, saves the HTML summary + to a directory. If False, returns the figure and dataframe directly + for in-memory workflows. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. All outputs will be saved relative to this directory. + show_plot : bool, optional + Whether to display the plot interactively. Default is False. + + Returns + ------- + dict or tuple + If save_to_disk=True: Dictionary of saved file paths with structure: + { + "html": ["path/to/html_dir/summary.html"] + } + If save_to_disk=False: Tuple of (figure, summary_dataframe) + + Notes + ----- + Output Structure: + - HTML is saved to a directory as specified in outputs config + - When save_to_disk=False, returns (figure, summary_df) for programmatic use + + Examples + -------- + >>> # Save results to disk + >>> saved_files = run_from_json("params.json") + >>> print(saved_files["html"]) # List of paths to saved HTML files + + >>> # Get results in memory + >>> fig, summary_df = run_from_json("params.json", save_to_disk=False) + + >>> # Custom output directory with interactive display + >>> saved = run_from_json("params.json", output_dir="/custom/path", show_plot=True) + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + # HTML outputs use directory type per standardized schema + if "outputs" not in params: + params["outputs"] = { + "html": {"type": "directory", "name": "html_dir"} + } + + # Load upstream data - DataFrame or CSV file + # Corrected "Calculate_Centroids" to "Upstream_Dataset" in the blueprint + input_path = params.get("Upstream_Dataset") + if isinstance(input_path, pd.DataFrame): + df = input_path # Direct DataFrame from previous step + elif isinstance(input_path, (str, Path)): + # Galaxy passes .dat files, but they contain CSV data + # Don't check extension - directly read as CSV + path = Path(input_path) + try: + df = pd.read_csv(path) + logging.info(f"Successfully loaded CSV data from: {path}") + except Exception as e: + raise ValueError( + f"Failed to read CSV data from '{path}'. " + f"This tool expects CSV/tabular format. " + f"Error: {str(e)}" + ) + else: + raise TypeError( + f"Input dataset must be DataFrame or file path. " + f"Got {type(input_path)}" + ) + + # Extract parameters + columns = params["Columns"] + print_missing_location = params.get("Print_Missing_Location", False) + + # Run the analysis exactly as in NIDAP template + summary = summarize_dataframe( + df, + columns=columns, + print_nan_locations=print_missing_location + ) + + # Generate figure from the summary + fig = present_summary_as_figure(summary) + + if show_plot: + fig.show() # Opens in an interactive Plotly window + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + # Check for html output - convert figure to HTML string + if "html" in params["outputs"]: + # Convert Plotly figure to HTML string for save_results + html_content = fig.to_html(full_html=True, include_plotlyjs='cdn') + results_dict["html"] = {"summary": html_content} + + # Use centralized save_results function + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + logging.info("Summarize DataFrame analysis completed successfully.") + return saved_files + else: + # Return the figure and summary dataframe directly for in-memory workflows + logging.info("Returning figure and dataframe for in-memory use") + return fig, summary + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python summarize_dataframe_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + # Run analysis + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + # Display results based on return type + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + else: + print("\nReturned figure and dataframe") diff --git a/src/spac/templates/template_utils.py b/src/spac/templates/template_utils.py new file mode 100644 index 00000000..7c7e9872 --- /dev/null +++ b/src/spac/templates/template_utils.py @@ -0,0 +1,876 @@ +from pathlib import Path +import pickle +from typing import Any, Dict, Union, Optional, List +import json +import pandas as pd +import anndata as ad +import re +import logging +import matplotlib.pyplot as plt + +logger = logging.getLogger(__name__) + + +def load_input(file_path: Union[str, Path]): + """ + Load input data from either h5ad or pickle file. + + Parameters + ---------- + file_path : str or Path + Path to input file (h5ad or pickle) + + Returns + ------- + Loaded data object (typically AnnData) + """ + path = Path(file_path) + + if not path.exists(): + raise FileNotFoundError(f"Input file not found: {file_path}") + + # Check file extension + suffix = path.suffix.lower() + + if suffix in ['.h5ad', '.h5']: + # Load h5ad file + try: + return ad.read_h5ad(path) + except ImportError: + raise ImportError( + "anndata package required to read h5ad files" + ) + except Exception as e: + raise ValueError(f"Error reading h5ad file: {e}") + + elif suffix in ['.pickle', '.pkl', '.p']: + # Load pickle file + with path.open('rb') as fh: + return pickle.load(fh) + + else: + # Try to detect file type by content + try: + # First try h5ad + return ad.read_h5ad(path) + except Exception: + # Fall back to pickle + try: + with path.open('rb') as fh: + return pickle.load(fh) + except Exception as e: + raise ValueError( + f"Unable to load file '{file_path}'. " + f"Supported formats: h5ad, pickle. Error: {e}" + ) + + +def save_results( + results: Dict[str, Any], + params: Dict[str, Any], + output_base_dir: Union[str, Path] = None +) -> Dict[str, Union[str, List[str]]]: + """ + Save results based on output configuration in params. + + This function reads the output configuration from the params dictionary + and saves results accordingly. It applies a standardized schema where: + - figures → directory (may contain one or many) + - analysis → file + - dataframe → file (or directory for exceptions like "Neighborhood Profile") + - html → directory + + Parameters + ---------- + results : dict + Dictionary of results to save where: + - key: result type ("analysis", "dataframes", "figures", "html") + - value: object(s) to save (single object, list, or dict of objects) + params : dict + Parameters dict containing 'outputs' configuration with structure: + { + "outputs": { + "figures": {"type": "directory", "name": "figures_dir"}, + "html": {"type": "directory", "name": "html_dir"}, + "dataframe": {"type": "file", "name": "dataframe.csv"}, + "analysis": {"type": "file", "name": "output.pickle"} + } + } + output_base_dir : str or Path, optional + Base directory for outputs. If None, uses params['Output_Directory'] or '.' + + Returns + ------- + dict + Dictionary mapping output types to saved file paths: + - For files: string path + - For directories: list of string paths + + Example + ------- + >>> params = { + ... "outputs": { + ... "figures": {"type": "directory", "name": "figure_outputs"}, + ... "dataframe": {"type": "file", "name": "summary.csv"} + ... } + ... } + >>> results = {"figures": {"boxplot": fig}, "dataframe": df} + >>> saved = save_results(results, params) + """ + # Get output directory from params if not provided + if output_base_dir is None: + output_base_dir = params.get("Output_Directory", ".") + output_base_dir = Path(output_base_dir) + + # Get outputs config from params + outputs_config = params.get("outputs", {}) + if not outputs_config: + logger.warning("No outputs configuration found in params") + return {} + + saved_files = {} + + # Process each result based on configuration + for result_key, data in results.items(): + # Find matching config (case-insensitive match) + config = None + config_key = None + + for key, value in outputs_config.items(): + if key.lower() == result_key.lower(): + config = value + config_key = key + break + + if not config: + logger.warning(f"No output config for '{result_key}', skipping") + continue + + # Determine output type and name + output_type = config.get("type") + output_name = config.get("name", result_key) + + # Apply standardized schema if type not explicitly specified + if not output_type: + result_key_lower = result_key.lower() + if "figures" in result_key_lower: + output_type = "directory" + elif "analysis" in result_key_lower: + output_type = "file" + elif "dataframe" in result_key_lower: + # Special case: Neighborhood Profile gets directory treatment + if "neighborhood" in output_name.lower() and "profile" in output_name.lower(): + output_type = "directory" + else: + output_type = "file" + elif "html" in result_key_lower: + output_type = "directory" + else: + # Default based on data structure + output_type = "directory" if isinstance(data, (dict, list)) else "file" + + logger.debug(f"Auto-determined type '{output_type}' for '{result_key}'") + + # Save based on determined type + if output_type == "directory": + # Create directory and save multiple files + output_dir = output_base_dir / output_name + output_dir.mkdir(parents=True, exist_ok=True) + saved_files[config_key or result_key] = [] + + if isinstance(data, dict): + # Dictionary of named items + for name, obj in data.items(): + filepath = _save_single_object(obj, name, output_dir) + saved_files[config_key or result_key].append(str(filepath)) + + elif isinstance(data, (list, tuple)): + # List of items - auto-name them + for idx, obj in enumerate(data): + name = f"{result_key}_{idx}" + filepath = _save_single_object(obj, name, output_dir) + saved_files[config_key or result_key].append(str(filepath)) + + else: + # Single item saved to directory + filepath = _save_single_object(data, result_key, output_dir) + saved_files[config_key or result_key] = [str(filepath)] + + elif output_type == "file": + # Save as single file + output_path = output_base_dir / output_name + output_path.parent.mkdir(parents=True, exist_ok=True) + + # Handle different file types based on extension + if output_name.endswith('.pickle'): + with open(output_path, 'wb') as f: + pickle.dump(data, f) + + elif output_name.endswith('.csv'): + if isinstance(data, pd.DataFrame): + data.to_csv(output_path, index=False) + else: + # Convert to DataFrame if possible + df = pd.DataFrame(data) + df.to_csv(output_path, index=False) + + elif output_name.endswith('.h5ad'): + if hasattr(data, 'write_h5ad'): + data.write_h5ad(str(output_path)) + + elif output_name.endswith('.html'): + with open(output_path, 'w') as f: + f.write(str(data)) + + elif output_name.endswith(('.png', '.pdf', '.svg')): + if hasattr(data, 'savefig'): + data.savefig(output_path, dpi=300, bbox_inches='tight') + plt.close(data) # Close figure to free memory + + else: + # Default to pickle for unknown types + if not output_name.endswith('.pickle'): + output_path = output_path.with_suffix('.pickle') + with open(output_path, 'wb') as f: + pickle.dump(data, f) + + saved_files[config_key or result_key] = str(output_path) + + # Log summary of saved files + logger.info(f"Results saved to {output_base_dir}:") + for key, paths in saved_files.items(): + if isinstance(paths, list): + output_name = outputs_config.get(key, {}).get('name', key) + logger.info(f" {key}: {len(paths)} files in {output_base_dir}/{output_name}/") + for path in paths[:3]: # Show first 3 files + logger.debug(f" - {Path(path).name}") + if len(paths) > 3: + logger.debug(f" ... and {len(paths) - 3} more files") + else: + logger.info(f" {key}: {Path(paths).name}") + + return saved_files + + +def _save_single_object(obj: Any, name: str, output_dir: Path) -> Path: + """ + Save a single object to file with appropriate format. + Internal helper function for save_results. + + Parameters + ---------- + obj : Any + Object to save + name : str + Base name for the file (extension will be added if needed) + output_dir : Path + Directory to save to + + Returns + ------- + Path + Path to saved file + """ + # Determine file format based on object type + if isinstance(obj, pd.DataFrame): + # DataFrames -> CSV + if not name.endswith('.csv'): + name = f"{name}.csv" + filepath = output_dir / name + obj.to_csv(filepath, index=False) + + elif hasattr(obj, 'savefig'): + # Matplotlib figures -> PNG only + if not name.endswith('.png'): + name = f"{name}.png" + filepath = output_dir / name + obj.savefig(filepath, dpi=300, bbox_inches='tight') + plt.close(obj) # Close figure to free memory + + elif isinstance(obj, str) and (' pickle (for consistency, could be h5ad) + if not name.endswith('.pickle'): + name = f"{name}.pickle" + filepath = output_dir / name + with open(filepath, 'wb') as f: + pickle.dump(obj, f) + + else: + # Everything else -> pickle + if '.' not in name: + name = f"{name}.pickle" + filepath = output_dir / name + with open(filepath, 'wb') as f: + pickle.dump(obj, f) + + logger.debug(f"Saved {type(obj).__name__} to {filepath}") + return filepath + + +def parse_params( + json_input: Union[str, Path, Dict[str, Any]] +) -> Dict[str, Any]: + """ + Parse parameters from JSON file, string, or dict. + + Parameters + ---------- + json_input : str, Path, or dict + JSON file path, JSON string, or dictionary + + Returns + ------- + dict + Parsed parameters + """ + if isinstance(json_input, dict): + return json_input + + if isinstance(json_input, (str, Path)): + path = Path(json_input) + + # Check if it's a file path + if path.exists() or str(json_input).endswith('.json'): + with open(path, 'r') as file: + return json.load(file) + else: + # It's a JSON string + return json.loads(str(json_input)) + + raise TypeError( + "json_input must be dict, JSON string, or path to JSON file" + ) + + +def text_to_value( + var: Any, + default_none_text: str = "None", + value_to_convert_to: Any = None, + to_float: bool = False, + to_int: bool = False, + param_name: str = '' +): + """ + Converts a string to a specified value or type. Handles conversion to + float or integer and provides a default value if the input string + matches a specified 'None' text. + + Parameters + ---------- + var : str + The input string to be converted. + default_none_text : str, optional + The string that represents a 'None' value. If `var` matches this + string, it will be converted to `value_to_convert_to`. + Default is "None". + value_to_convert_to : any, optional + The value to assign to `var` if it matches `default_none_text` or + is an empty string. Default is None. + to_float : bool, optional + If True, attempt to convert `var` to a float. Default is False. + to_int : bool, optional + If True, attempt to convert `var` to an integer. Default is False. + param_name : str, optional + The name of the parameter, used in error messages for conversion + failures. Default is ''. + + Returns + ------- + any + The converted value, which may be the original string, a float, + an integer, or the specified `value_to_convert_to`. + + Raises + ------ + ValueError + If `to_float` or `to_int` is set to True and conversion fails. + + Notes + ----- + - If both `to_float` and `to_int` are set to True, the function will + prioritize conversion to float. + - If the string `var` matches `default_none_text` or is an empty + string, `value_to_convert_to` is returned. + + Examples + -------- + Convert a string representing a float: + + >>> text_to_value("3.14", to_float=True) + 3.14 + + Handle a 'None' string: + + >>> text_to_value("None", value_to_convert_to=None) + None + + Convert a string to an integer: + + >>> text_to_value("42", to_int=True) + 42 + + Handle invalid conversion: + + >>> text_to_value("abc", to_int=True, param_name="test_param") + Error: can't convert test_param to integer. Received:"abc" + 'abc' + """ + # Handle non-string inputs + if not isinstance(var, str): + var = str(var) + + none_condition = ( + var.lower().strip() == default_none_text.lower().strip() or + var.strip() == '' + ) + + if none_condition: + var = value_to_convert_to + + elif to_float: + try: + var = float(var) + except ValueError: + error_msg = ( + f'Error: can\'t convert {param_name} to float. ' + f'Received:"{var}"' + ) + raise ValueError(error_msg) + + elif to_int: + try: + var = int(var) + except ValueError: + error_msg = ( + f'Error: can\'t convert {param_name} to integer. ' + f'Received:"{var}"' + ) + raise ValueError(error_msg) + + return var + + +def convert_to_floats(text_list: List[Any]) -> List[float]: + """ + Convert list of text values to floats. + + Parameters + ---------- + text_list : list + List of values to convert + + Returns + ------- + list + List of float values + + Raises + ------ + ValueError + If any value cannot be converted to float + """ + float_list = [] + for value in text_list: + try: + float_list.append(float(value)) + except ValueError: + msg = f"Failed to convert value: '{value}' to float." + raise ValueError(msg) + return float_list + + +def convert_pickle_to_h5ad( + pickle_path: Union[str, Path], + h5ad_path: Optional[Union[str, Path]] = None +) -> str: + """ + Convert a pickle file containing AnnData to h5ad format. + + Parameters + ---------- + pickle_path : str or Path + Path to input pickle file + h5ad_path : str or Path, optional + Path for output h5ad file. If None, uses same name with .h5ad + extension + + Returns + ------- + str + Path to saved h5ad file + """ + pickle_path = Path(pickle_path) + + if not pickle_path.exists(): + raise FileNotFoundError(f"Pickle file not found: {pickle_path}") + + # Load from pickle + with pickle_path.open('rb') as fh: + adata = pickle.load(fh) + + # Check if it's AnnData + try: + import anndata as ad + if not isinstance(adata, ad.AnnData): + raise TypeError( + f"Loaded object is not AnnData, got {type(adata)}" + ) + except ImportError: + raise ImportError( + "anndata package required for conversion to h5ad" + ) + + # Determine output path + if h5ad_path is None: + h5ad_path = pickle_path.with_suffix('.h5ad') + else: + h5ad_path = Path(h5ad_path) + + # Save as h5ad + adata.write_h5ad(h5ad_path) + + return str(h5ad_path) + + +def spell_out_special_characters(text: str) -> str: + """ + Clean column names by replacing special characters with text equivalents. + + Handles biological marker names like: + - "CD4+" → "CD4_pos" + - "CD8-" → "CD8_neg" + - "CD4+CD20-" → "CD4_pos_CD20_neg" + - "CD4+/CD20-" → "CD4_pos_slashCD20_neg" + - "CD4+ CD20-" → "CD4_pos_CD20_neg" + - "Area µm²" → "Area_um2" + + Parameters + ---------- + text : str + The text to clean + + Returns + ------- + str + Cleaned text with special characters replaced + """ + # Replace spaces with underscores + text = text.replace(' ', '_') + + # Replace specific substrings for units + text = text.replace('µm²', 'um2') + text = text.replace('µm', 'um') + + # Handle hyphens between alphanumeric characters FIRST + # (before + and - replacements) + # This pattern matches a hyphen that has alphanumeric on both sides + text = re.sub(r'(?<=[A-Za-z0-9])-(?=[A-Za-z0-9])', '_', text) + + # Now replace remaining '+' with '_pos_' and '-' with '_neg_' + text = text.replace('+', '_pos_') + text = text.replace('-', '_neg_') + + # Mapping for specific characters + special_char_map = { + 'µ': 'u', # Micro symbol replaced with 'u' + '²': '2', # Superscript two replaced with '2' + '@': 'at', + '#': 'hash', + '$': 'dollar', + '%': 'percent', + '&': 'and', + '*': 'asterisk', + '/': 'slash', + '\\': 'backslash', + '=': 'equals', + '^': 'caret', + '!': 'exclamation', + '?': 'question', + '~': 'tilde', + '|': 'pipe', + ',': '', # Remove commas + '(': '', # Remove parentheses + ')': '', # Remove parentheses + '[': '', # Remove brackets + ']': '', # Remove brackets + '{': '', # Remove braces + '}': '', # Remove braces + } + + # Replace special characters using special_char_map + for char, replacement in special_char_map.items(): + text = text.replace(char, replacement) + + # Remove any remaining disallowed characters + # (keep only alphanumeric and underscore) + text = re.sub(r'[^a-zA-Z0-9_]', '', text) + + # Remove multiple consecutive underscores and + # replace with single underscore + text = re.sub(r'_+', '_', text) + + # Strip both leading and trailing underscores + text = text.strip('_') + + return text + + +def clean_column_name(column_name: str) -> str: + """ + Clean a single column name using spell_out_special_characters. + + Parameters + ---------- + column_name : str + Original column name + + Returns + ------- + str + Cleaned column name + """ + original = column_name + cleaned = spell_out_special_characters(column_name) + # Ensure doesn't start with digit + if cleaned and cleaned[0].isdigit(): + cleaned = f'col_{cleaned}' + if original != cleaned: + logger.info(f'Column Name Updated: "{original}" -> "{cleaned}"') + return cleaned + + +def load_csv_files( + csv_input: Union[str, Path, List[str]], + files_config: pd.DataFrame, + string_columns: Optional[List[str]] = None +) -> pd.DataFrame: + """ + Load and combine CSV files based on configuration. + + Supports both: + - Galaxy input: list of file paths + - NIDAP input: directory path + + Parameters + ---------- + csv_input : str, Path, or list + Either a directory path (NIDAP) or list of file paths (Galaxy) + files_config : pd.DataFrame + Configuration dataframe with 'file_name' column and optional metadata + string_columns : list, optional + Columns to force as string type + + Returns + ------- + pd.DataFrame + Combined dataframe with all CSV data + """ + import pprint + + filename_col = "file_name" + + # Build file path mapping based on input type + if isinstance(csv_input, list): + # Galaxy: list of file paths + file_path_map = {Path(p).name: Path(p) for p in csv_input} + logger.info(f"Galaxy mode: {len(file_path_map)} files provided") + else: + # NIDAP: directory path + csv_dir = Path(csv_input) + file_path_map = {p.name: p for p in csv_dir.glob("*.csv")} + logger.info(f"NIDAP mode: {len(file_path_map)} CSV files in {csv_dir}") + + # Clean configuration + files_config = files_config.applymap( + lambda x: x.strip() if isinstance(x, str) else x + ) + + # Get column names + all_column_names = files_config.columns.tolist() + metadata_columns = [ + col for col in all_column_names if col != filename_col + ] + + # Validate string_columns + if string_columns is None: + string_columns = [] + elif not isinstance(string_columns, list): + raise ValueError( + "String Columns must be a *list* of column names (strings)." + ) + + # Handle ["None"] or [""] => empty list + if (len(string_columns) == 1 and + isinstance(string_columns[0], str) and + text_to_value(string_columns[0]) is None): + string_columns = [] + + # Extract data types + dtypes = files_config.dtypes.to_dict() + + # Get files to process + files_config = files_config.astype(str) + files_to_use = [ + f.strip() for f in files_config[filename_col].tolist() + ] + + # Check all files exist + missing_files = [f for f in files_to_use if f not in file_path_map] + if missing_files: + raise FileNotFoundError( + f"Files not found: {', '.join(missing_files)}\n" + f"Available: {', '.join(file_path_map.keys())}" + ) + + # Prepare dtype override + dtype_override = ( + {col: str for col in string_columns} if string_columns else None + ) + + # Process files + processed_df_list = [] + + for file_name in files_to_use: + file_path = file_path_map[file_name] + + try: + current_df = pd.read_csv(file_path, dtype=dtype_override) + logger.info(f'Processing: "{file_name}"') + current_df.columns = [ + clean_column_name(col) for col in current_df.columns + ] + + except pd.errors.EmptyDataError: + raise ValueError(f'File "{file_name}" is empty.') + except pd.errors.ParserError: + raise ValueError( + f'File "{file_name}" could not be parsed as CSV.' + ) + + current_df[filename_col] = file_name + + # Reorder columns: filename first + cols = [filename_col] + [c for c in current_df.columns if c != filename_col] + current_df = current_df[cols] + + processed_df_list.append(current_df) + logger.info(f'File "{file_name}" processed: {current_df.shape}') + + # Combine dataframes + final_df = pd.concat(processed_df_list, ignore_index=True) + + # Ensure string columns remain strings + for col in string_columns: + if col in final_df.columns: + final_df[col] = final_df[col].astype(str) + + # Add metadata columns + if metadata_columns: + for column in metadata_columns: + file_to_value = ( + files_config.set_index(filename_col)[column].to_dict() + ) + final_df[column] = final_df[filename_col].map(file_to_value) + final_df[column] = final_df[column].astype(dtypes[column]) + + logger.info(f'Added metadata column "{column}"') + logger.debug(f'Mapping: {file_to_value}') + + logger.info(f"Combined {len(processed_df_list)} files -> {final_df.shape}") + + return final_df + + +def string_list_to_dictionary( + input_list: List[str], + key_name: str = "key", + value_name: str = "color" +) -> Dict[str, str]: + """ + Validate that a list contains strings in the "key:value" format + and return the parsed dictionary. Reports all invalid entries with + custom key and value names in error messages. + + Parameters + ---------- + input_list : list + List of strings to validate and parse + key_name : str, optional + Name to describe the 'key' part in error messages. Default is "key" + value_name : str, optional + Name to describe the 'value' part in error messages. Default is "color" + + Returns + ------- + dict + A dictionary parsed from the input list if all entries are valid + + Raises + ------ + TypeError + If input is not a list + ValueError + If any entry in the list is not a valid "key:value" format + + Examples + -------- + >>> string_list_to_dictionary(["red:#FF0000", "blue:#0000FF"]) + {'red': '#FF0000', 'blue': '#0000FF'} + + >>> string_list_to_dictionary(["TypeA:Cancer", "TypeB:Normal"], "cell_type", "diagnosis") + {'TypeA': 'Cancer', 'TypeB': 'Normal'} + """ + if not isinstance(input_list, list): + raise TypeError("Input must be a list.") + + parsed_dict = {} + errors = [] + seen_keys = set() + + for entry in input_list: + if not isinstance(entry, str): + errors.append( + f"\nInvalid entry '{entry}': Must be a string in the " + f"'{key_name}:{value_name}' format." + ) + continue + if ":" not in entry: + errors.append( + f"\nInvalid entry '{entry}': Missing ':' separator to " + f"separate '{key_name}' and '{value_name}'." + ) + continue + + key, *value = map(str.strip, entry.split(":", 1)) + if not key or not value: + errors.append( + f"\nInvalid entry '{entry}': Both '{key_name}' and " + f"'{value_name}' must be non-empty." + ) + continue + + if key in seen_keys: + errors.append(f"\nDuplicate {key_name} '{key}' found.") + else: + seen_keys.add(key) + parsed_dict[key] = value[0] + + # Add to dictionary if valid + parsed_dict[key] = value[0] + + # Raise error if there are invalid entries + if errors: + raise ValueError( + "\nValidation failed for the following entries:\n" + + "\n".join(errors) + ) + + return parsed_dict diff --git a/src/spac/templates/tsne_analysis_template.py b/src/spac/templates/tsne_analysis_template.py new file mode 100644 index 00000000..d72de6ec --- /dev/null +++ b/src/spac/templates/tsne_analysis_template.py @@ -0,0 +1,163 @@ +""" +Platform-agnostic tSNE Analysis template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Reads outputs configuration from blueprint JSON file. + +Usage +----- +>>> from spac.templates.tsne_analysis_template import run_from_json +>>> run_from_json("examples/tsne_analysis_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.transformations import tsne +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, +) -> Union[Dict[str, str], Any]: + """ + Execute tSNE Analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Analysis": "path/to/data.pickle", + "Table_to_Process": "Original", + "outputs": { + "analysis": {"type": "file", "name": "output.pickle"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If False, returns the AnnData object + directly for in-memory workflows. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. + + Returns + ------- + dict or AnnData + If save_to_disk=True: Dictionary of saved file paths with structure: + {"analysis": "path/to/output.pickle"} + If save_to_disk=False: The processed AnnData object + + Notes + ----- + Output Structure: + - Analysis output is saved as a single pickle file (standardized for analysis outputs) + - When save_to_disk=False, the AnnData object is returned for programmatic use + + Examples + -------- + >>> # Save results to disk + >>> saved_files = run_from_json("params.json") + >>> print(saved_files["analysis"]) # Path to saved pickle file + + >>> # Get results in memory + >>> adata = run_from_json("params.json", save_to_disk=False) + + >>> # Custom output directory + >>> saved = run_from_json("params.json", output_dir="/custom/path") + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + # Analysis uses file type per standardized schema + if "outputs" not in params: + params["outputs"] = { + "analysis": {"type": "file", "name": "output.pickle"} + } + + # Load the upstream analysis data + all_data = load_input(params["Upstream_Analysis"]) + + # Extract parameters + # Select layer to perform tSNE + Layer_to_Analysis = params.get("Table_to_Process", "Original") + + print(all_data) + if Layer_to_Analysis == "Original": + Layer_to_Analysis = None + + print("tSNE Layer: \n", Layer_to_Analysis) + + print("Performing tSNE ...") + + tsne(all_data, layer=Layer_to_Analysis) + + print("tSNE Done!") + + print(all_data) + + object_to_output = all_data + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary + results_dict = {} + if "analysis" in params["outputs"]: + results_dict["analysis"] = object_to_output + + # Use centralized save_results function + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + print(f"tSNE Analysis completed → {saved_files['analysis']}") + return saved_files + else: + # Return the adata object directly for in-memory workflows + print("Returning AnnData object (not saving to file)") + return object_to_output + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python tsne_analysis_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + if isinstance(result, dict): + print("\nOutput files:") + for filename, filepath in result.items(): + print(f" {filename}: {filepath}") + else: + print("\nReturned AnnData object") diff --git a/src/spac/templates/umap_transformation_template.py b/src/spac/templates/umap_transformation_template.py new file mode 100644 index 00000000..388f3d11 --- /dev/null +++ b/src/spac/templates/umap_transformation_template.py @@ -0,0 +1,174 @@ +""" +Platform-agnostic UMAP transformation template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Reads outputs configuration from blueprint JSON file. + +Usage +----- +>>> from spac.templates.umap_transformation_template import run_from_json +>>> run_from_json("examples/umap_transformation_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union +import pickle + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +# Import SPAC functions from NIDAP template +from spac.transformations import run_umap +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, + text_to_value, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, +) -> Union[Dict[str, str], Any]: + """ + Execute UMAP transformation analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Analysis": "path/to/data.pickle", + "Table_to_Process": "Original", + "Number_of_Neighbors": 75, + "outputs": { + "analysis": {"type": "file", "name": "output.pickle"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If False, returns the AnnData object + directly for in-memory workflows. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. + + Returns + ------- + dict or AnnData + If save_to_disk=True: Dictionary of saved file paths with structure: + {"analysis": "path/to/output.pickle"} + If save_to_disk=False: The processed AnnData object + + Notes + ----- + Output Structure: + - Analysis output is saved as a single pickle file (standardized for analysis outputs) + - When save_to_disk=False, the AnnData object is returned for programmatic use + + Examples + -------- + >>> # Save results to disk + >>> saved_files = run_from_json("params.json") + >>> print(saved_files["analysis"]) # Path to saved pickle file + + >>> # Get results in memory + >>> adata = run_from_json("params.json", save_to_disk=False) + + >>> # Custom output directory + >>> saved = run_from_json("params.json", output_dir="/custom/path") + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + # Analysis uses file type per standardized schema + if "outputs" not in params: + params["outputs"] = { + "analysis": {"type": "file", "name": "output.pickle"} + } + + # Load the upstream analysis data + adata = load_input(params["Upstream_Analysis"]) + + # Extract parameters - Note: HPC parameters are ignored in SPAC version + n_neighbors = params.get("Number_of_Neighbors", 75) + min_dist = params.get("Minimum_Distance_between_Points", 0.1) + n_components = params.get("Target_Dimension_Number", 2) + metric = params.get("Computational_Metric", "euclidean") + random_state = params.get("Random_State", 0) + transform_seed = params.get("Transform_Seed", 42) + layer = params.get("Table_to_Process", "Original") + + if layer == "Original": + layer = None + + updated_dataset = run_umap( + adata=adata, + n_neighbors=n_neighbors, + min_dist=min_dist, + n_components=n_components, + metric=metric, + random_state=random_state, + transform_seed=transform_seed, + layer=layer, + verbose=True + ) + + # Print adata info as in NIDAP + print(adata) + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary + results_dict = {} + if "analysis" in params["outputs"]: + results_dict["analysis"] = updated_dataset + + # Use centralized save_results function + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + print(f"UMAP transformation completed → {saved_files['analysis']}") + return saved_files + else: + # Return the adata object directly for in-memory workflows + print("Returning AnnData object (not saving to file)") + return adata + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python umap_transformation_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + if isinstance(result, dict): + print("\nOutput files:") + for filename, filepath in result.items(): + print(f" {filename}: {filepath}") + else: + print("\nReturned data object") diff --git a/src/spac/templates/umap_tsne_pca_visualization_template.py b/src/spac/templates/umap_tsne_pca_visualization_template.py new file mode 100644 index 00000000..47394b04 --- /dev/null +++ b/src/spac/templates/umap_tsne_pca_visualization_template.py @@ -0,0 +1,242 @@ +""" +Platform-agnostic UMAP\\tSNE\\PCA Visualization template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Reads outputs configuration from blueprint JSON file. + +Usage +----- +>>> from spac.templates.umap_tsne_pca_template import run_from_json +>>> run_from_json("examples/umap_tsne_pca_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union, Optional, List +import matplotlib.pyplot as plt +import logging + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.visualization import dimensionality_reduction_plot +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, + text_to_value, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + show_plot: bool = True, + output_dir: str = None, +) -> Union[Dict[str, Union[str, List[str]]], plt.Figure]: + """ + Execute UMAP\\tSNE\\PCA Visualization analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Analysis": "path/to/data.pickle", + "Color_By": "Annotation", + "Annotation_to_Highlight": "cell_type", + "Dimension_Reduction_Method": "umap", + ... + "outputs": { + "figures": {"type": "directory", "name": "figures_dir"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If False, returns the figure + directly for in-memory workflows. Default is True. + show_plot : bool, optional + Whether to display the plot. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. + + Returns + ------- + dict or Figure + If save_to_disk=True: Dictionary of saved file paths + If save_to_disk=False: The matplotlib figure + """ + # Set up logging + logging.basicConfig(level=logging.INFO) + logger = logging.getLogger(__name__) + + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + if "outputs" not in params: + params["outputs"] = { + "figures": {"type": "directory", "name": "figures_dir"} + } + + # Load the upstream analysis data + adata = load_input(params["Upstream_Analysis"]) + + # Extract parameters + annotation = params.get("Annotation_to_Highlight", "None") + feature = params.get("Feature_to_Highlight", "None") + layer = params.get("Table", "Original") + method = params.get("Dimension_Reduction_Method", "umap") + fig_width = params.get("Figure_Width", 12) + fig_height = params.get("Figure_Height", 12) + font_size = params.get("Font_Size", 12) + fig_dpi = params.get("Figure_DPI", 300) + legend_location = params.get("Legend_Location", "best") + legend_label_size = params.get("Legend_Font_Size", 16) + legend_marker_scale = params.get("Legend_Marker_Size", 5.0) + color_by = params.get("Color_By", "Annotation") + point_size = params.get("Dot_Size", 1) + v_min = params.get("Value_Min", "None") + v_max = params.get("Value_Max", "None") + + feature = text_to_value(feature) + annotation = text_to_value(annotation) + + if color_by == "Annotation": + feature = None + else: + annotation = None + + # Store the original value of layer + layer_input = layer + + layer = text_to_value(layer, default_none_text="Original") + + vmin = text_to_value( + v_min, + default_none_text="None", + value_to_convert_to=None, + to_float=True, + param_name="Value Min" + ) + + vmax = text_to_value( + v_max, + default_none_text="None", + value_to_convert_to=None, + to_float=True, + param_name="Value Max" + ) + + plt.rcParams.update({'font.size': font_size}) + + fig, ax = dimensionality_reduction_plot( + adata=adata, + method=method, + annotation=annotation, + feature=feature, + layer=layer, + point_size=point_size, + vmin=vmin, + vmax=vmax + ) + + if color_by == "Annotation": + title = annotation + else: + title = f'Table:"{layer_input}" \n Feature:"{feature}"' + ax.set_title(title) + + fig = ax.get_figure() + + fig.set_size_inches( + fig_width, + fig_height + ) + fig.set_dpi(fig_dpi) + + legend = ax.get_legend() + has_legend = legend is not None + + if has_legend: + ax.legend( + loc=legend_location, + bbox_to_anchor=(1, 0.5), + fontsize=legend_label_size, + markerscale=legend_marker_scale + ) + + plt.tight_layout() + + if show_plot: + plt.show() + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + # Check for figures output + if "figures" in params["outputs"]: + results_dict["figures"] = {f"{method}_plot": fig} + + # Use centralized save_results function + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + plt.close(fig) + + logger.info( + f"{method.upper()} Visualization completed successfully." + ) + return saved_files + else: + # Return the figure directly for in-memory workflows + logger.info("Returning figure for in-memory use") + return fig + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python umap_tsne_pca_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + else: + print("\nReturned figure") diff --git a/src/spac/templates/utag_clustering_template.py b/src/spac/templates/utag_clustering_template.py new file mode 100644 index 00000000..304f6149 --- /dev/null +++ b/src/spac/templates/utag_clustering_template.py @@ -0,0 +1,229 @@ +""" +Platform-agnostic UTAG Clustering template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Reads outputs configuration from blueprint JSON file. + +Usage +----- +>>> from spac.templates.utag_clustering_template import run_from_json +>>> run_from_json("examples/utag_clustering_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union +import pandas as pd + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.transformations import run_utag_clustering +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, + text_to_value, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, +) -> Union[Dict[str, str], Any]: + """ + Execute UTAG Clustering analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Analysis": "path/to/data.pickle", + "Table_to_Process": "Original", + "K_Nearest_Neighbors": 15, + "outputs": { + "analysis": {"type": "file", "name": "output.pickle"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If False, returns the AnnData object + directly for in-memory workflows. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. + + Returns + ------- + dict or AnnData + If save_to_disk=True: Dictionary of saved file paths with structure: + {"analysis": "path/to/output.pickle"} + If save_to_disk=False: The processed AnnData object + + Notes + ----- + Output Structure: + - Analysis output is saved as a single pickle file (standardized for analysis outputs) + - When save_to_disk=False, the AnnData object is returned for programmatic use + + Examples + -------- + >>> # Save results to disk + >>> saved_files = run_from_json("params.json") + >>> print(saved_files["analysis"]) # Path to saved pickle file + + >>> # Get results in memory + >>> adata = run_from_json("params.json", save_to_disk=False) + + >>> # Custom output directory + >>> saved = run_from_json("params.json", output_dir="/custom/path") + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + # Analysis uses file type per standardized schema + if "outputs" not in params: + params["outputs"] = { + "analysis": {"type": "file", "name": "output.pickle"} + } + + # Load the upstream analysis data + adata = load_input(params["Upstream_Analysis"]) + + # Extract parameters + layer = params.get("Table_to_Process", "Original") + features = params.get("Features", ["All"]) + slide = params.get("Slide_Annotation", "None") + Distance_threshold = params.get("Distance_Threshold", 20.0) + K_neighbors = params.get("K_Nearest_Neighbors", 15) + resolution = params.get("Resolution_Parameter", 1) + principal_components = params.get("PCA_Components", "None") + random_seed = params.get("Random_Seed", 42) + n_jobs = params.get("N_Jobs", 1) + N_iterations = params.get("Leiden_Iterations", 5) + Parallel_processes = params.get("Parellel_Processes", False) + output_annotation = params.get("Output_Annotation_Name", "UTAG") + + # layer: convert "Original" → None + layer_arg = None if layer.lower().strip() == "original" else layer + + # features: ["All"] → None, else leave list and print selection + if isinstance(features, list) and any( + item == "All" for item in features + ): + print("Clustering all features") + features_arg = None + else: + feature_str = "\n".join(features) + print(f"Clustering features:\n{feature_str}") + features_arg = features + + # slide: "None" → None + slide_arg = text_to_value( + slide, + default_none_text="None", + value_to_convert_to=None + ) + + # principal_components: "None" or integer string → None or int + principal_components_arg = text_to_value( + principal_components, + default_none_text="None", + value_to_convert_to=None, + to_int=True, + param_name="principal_components" + ) + + print("\nBefore UTAG Clustering: \n", adata) + + run_utag_clustering( + adata, + features=features_arg, + k=K_neighbors, + resolution=resolution, + max_dist=Distance_threshold, + n_pcs=principal_components_arg, + random_state=random_seed, + n_jobs=n_jobs, + n_iterations=N_iterations, + slide_key=slide_arg, + layer=layer_arg, + output_annotation=output_annotation, + parallel=Parallel_processes, + ) + + print("\nAfter UTAG Clustering: \n", adata) + + print( + "\nUTAG Cluster Count: \n", + len(adata.obs[output_annotation].unique().tolist()) + ) + + print( + "\nUTAG Cluster Names: \n", + adata.obs[output_annotation].unique().tolist() + ) + + # Count and display occurrences of each label in the annotation + print( + f'\nCount of cells in the output annotation:' + f'"{output_annotation}":' + ) + label_counts = adata.obs[output_annotation].value_counts() + print(label_counts) + print("\n") + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary + results_dict = {} + if "analysis" in params["outputs"]: + results_dict["analysis"] = adata + + # Use centralized save_results function + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + print(f"UTAG Clustering completed → {saved_files['analysis']}") + return saved_files + else: + # Return the adata object directly for in-memory workflows + print("Returning AnnData object (not saving to file)") + return adata + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python utag_clustering_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + if isinstance(result, dict): + print("\nOutput files:") + for filename, filepath in result.items(): + print(f" {filename}: {filepath}") + else: + print("\nReturned AnnData object") diff --git a/src/spac/templates/visualize_nearest_neighbor_template.py b/src/spac/templates/visualize_nearest_neighbor_template.py new file mode 100644 index 00000000..42193531 --- /dev/null +++ b/src/spac/templates/visualize_nearest_neighbor_template.py @@ -0,0 +1,523 @@ +""" +Platform-agnostic Visualize Nearest Neighbor template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Usage +----- +>>> from spac.templates.visualize_nearest_neighbor_template import ( +... run_from_json +... ) +>>> run_from_json("examples/visualize_nearest_neighbor_params.json") +""" +import logging +import sys +from pathlib import Path +from typing import Any, Dict, List, Tuple, Union +import pandas as pd +import numpy as np +from matplotlib.axes import Axes +import matplotlib.pyplot as plt +import matplotlib.patches as mpatches + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.visualization import visualize_nearest_neighbor +from spac.templates.template_utils import ( + load_input, + parse_params, + save_results, + text_to_value, +) + +# Set up logging +logger = logging.getLogger(__name__) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: Union[str, Path] = None, + show_plot: bool = True +) -> Union[Dict[str, Union[str, List[str]]], Tuple[Any, pd.DataFrame]]: + """ + Execute Visualize Nearest Neighbor analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Analysis": "path/to/input.pickle", + "Annotation": "cell_type", + "Source_Anchor_Cell_Label": "CD4_T", + "Target_Cell_Label": "All", + "Plot_Method": "numeric", + "Plot_Type": "boxen", + "outputs": { + "figures": {"type": "directory", "name": "figures_dir"}, + "dataframe": {"type": "file", "name": "dataframe.csv"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If False, returns the figure and + dataframe directly for in-memory workflows. Default is True. + output_dir : str or Path, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. + show_plot : bool, optional + Whether to display the plot. Default is True. + + Returns + ------- + dict or tuple + If save_to_disk=True: Dictionary of saved file paths with structure: + {"figures": ["path/to/fig1.png", ...], "dataframe": "path/to/df.csv"} + If save_to_disk=False: Tuple of (figure(s), dataframe) + + Notes + ----- + Output Structure: + - Figures are saved as a directory containing one or more plot files (standardized) + - DataFrame is saved as a single CSV file (standardized) + - When save_to_disk=False, returns (figure(s), dataframe) for programmatic use + + Examples + -------- + >>> # Save results to disk + >>> saved_files = run_from_json("params.json") + >>> print(saved_files["figures"]) # List of figure paths + >>> print(saved_files["dataframe"]) # Path to CSV + + >>> # Get results in memory for further processing + >>> figures, df = run_from_json("params.json", save_to_disk=False) + + >>> # Custom output directory + >>> saved = run_from_json("params.json", output_dir="/custom/path") + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + # Figures use directory type, dataframe uses file type per standardized schema + if "outputs" not in params: + params["outputs"] = { + "figures": {"type": "directory", "name": "figures"}, + "dataframe": {"type": "file", "name": "dataframe.csv"} + } + + # Load the upstream analysis data + adata = load_input(params["Upstream_Analysis"]) + + # Extract parameters + # Use direct dictionary access for required parameters + # Will raise KeyError if missing + annotation = params["Annotation"] + source_label = params["Source_Anchor_Cell_Label"] + + # Use .get() with defaults for optional parameters from JSON template + image_id = params.get("ImageID", "None") + method = params.get("Plot_Method", "numeric") + plot_type = params.get("Plot_Type", "boxen") + target_label = params.get("Target_Cell_Label", "All") + distance_key = params.get( + "Nearest_Neighbor_Associated_Table", "spatial_distance" + ) + log_scale = params.get("Log_Scale", False) + facet_plot = params.get("Facet_Plot", False) + x_axis_title_rotation = params.get("X_Axis_Label_Rotation", 0) + shared_x_axis_title = params.get("Shared_X_Axis_Title_", True) + x_axis_title_fontsize = params.get("X_Axis_Title_Font_Size", "None") + + defined_color_map = text_to_value( + params.get("Defined_Color_Mapping", "None"), + param_name="Define Label Color Mapping" + ) + annotation_colorscale = "rainbow" + + fig_width = params.get("Figure_Width", 12) + fig_height = params.get("Figure_Height", 6) + fig_dpi = params.get("Figure_DPI", 300) + global_font_size = params.get("Font_Size", 12) + fig_title = ( + f'Nearest Neighbor Distance Distribution\nMeasured from ' + f'"{source_label}"' + ) + + image_id = text_to_value( + image_id, + default_none_text="None", + value_to_convert_to=None + ) + + # If target_label is None, it means "All distance columns" + # If it's a comma-separated string (e.g. "Stroma,Immune"), + # split into a list + target_label = text_to_value( + target_label, + default_none_text="All", + value_to_convert_to=None + ) + + if target_label is not None: + distance_to_processed = [x.strip() for x in target_label.split(",")] + else: + distance_to_processed = None + + x_axis_title_fontsize = text_to_value( + x_axis_title_fontsize, + default_none_text="None", + to_int="True" + ) + + # Configure Matplotlib font size + plt.rcParams.update({'font.size': global_font_size}) + + # If facet_plot=True but no valid stratify column => revert to single figure + if facet_plot and image_id is None: + warning_message = ( + "Facet plotting was requested, but there is no annotation " + "to group by. Switching to a single-figure display." + ) + logger.warning(warning_message) + facet_plot = False + + result_dict = visualize_nearest_neighbor( + adata=adata, + annotation=annotation, + spatial_distance=distance_key, + distance_from=source_label, + distance_to=distance_to_processed, + method=method, + plot_type=plot_type, + stratify_by=image_id, + facet_plot=facet_plot, + log=log_scale, + annotation_colorscale=annotation_colorscale, + defined_color_map=defined_color_map, + ) + + # Extract the data and figure(s) + df_long = result_dict["data"] + figs_out = result_dict["fig"] # Single Figure or List of Figures + palette_hex = result_dict["palette"] + axes_out = result_dict["ax"] + + logger.info("Summary statistics of the dataset:") + logger.info(f"\n{df_long.describe()}") + + # Customize figure legends & X-axis rotation + legend_labels = ( + distance_to_processed or df_long["group"].unique().tolist() + ) + legend_labels = ( + legend_labels if distance_to_processed else sorted(legend_labels) + ) + + handles = [ + mpatches.Patch( + facecolor=palette_hex[label], + edgecolor='none', + label=label + ) + for label in legend_labels + ] + + def _flatten_axes(ax_input): + if isinstance(ax_input, Axes): + return [ax_input] + if isinstance(ax_input, (list, tuple, np.ndarray)): + return [ + ax for ax in np.ravel(ax_input) if isinstance(ax, Axes) + ] + return [] + + flat_axes_list = _flatten_axes(axes_out) + shared_x_title_applied_to_fig = None + + if flat_axes_list: + # Attach legend to the last axis + flat_axes_list[-1].legend( + handles=handles, + title="Target phenotype", + bbox_to_anchor=(1.02, 1), + loc="upper left", + frameon=False, + ) + + # X-Axis Title Handling + current_x_label_text = "" + if flat_axes_list[0].get_xlabel(): + current_x_label_text = flat_axes_list[0].get_xlabel() + + if not current_x_label_text: + current_x_label_text = ( + f"Log({distance_key})" if log_scale else distance_key + ) + if not current_x_label_text: + current_x_label_text = "Distance" # Ultimate fallback + + effective_fontsize = ( + x_axis_title_fontsize if x_axis_title_fontsize is not None + else global_font_size + ) + + if (facet_plot and shared_x_axis_title and + isinstance(figs_out, plt.Figure)): + for ax_item in flat_axes_list: + ax_item.set_xlabel('') + + sup_ha_align = 'center' + if 0 < x_axis_title_rotation % 360 < 180: + sup_ha_align = 'right' + elif 180 < x_axis_title_rotation % 360 < 360: + sup_ha_align = 'left' + + figs_out.supxlabel( + current_x_label_text, y=0.02, fontsize=effective_fontsize, + rotation=x_axis_title_rotation, ha=sup_ha_align + ) + shared_x_title_applied_to_fig = figs_out + + else: # Apply to individual subplot x-axis titles + for ax_item in flat_axes_list: + label_object = ax_item.xaxis.get_label() + if not label_object.get_text(): # If no label, set it + ax_item.set_xlabel(current_x_label_text) + label_object = ax_item.xaxis.get_label() + + if label_object.get_text(): # Configure if actual label + label_object.set_rotation(x_axis_title_rotation) + label_object.set_fontsize(effective_fontsize) + ha_align_val = 'center' + if 0 < x_axis_title_rotation % 360 < 180: + ha_align_val = 'right' + elif 180 < x_axis_title_rotation % 360 < 360: + ha_align_val = 'left' + label_object.set_ha(ha_align_val) + + # Stratification Info + if image_id is not None and image_id in df_long.columns: + unique_vals = df_long[image_id].unique() + n_unique = len(unique_vals) + + if n_unique == 0: + logger.warning( + f"The annotation '{image_id}' has 0 unique values or is empty. " + "No data to plot => Potential empty plot." + ) + elif n_unique == 1 and facet_plot: + logger.info( + f"The annotation '{image_id}' has only one unique value " + f"({unique_vals[0]}). Facet plot will resemble a single plot." + ) + elif n_unique > 1: + logger.info( + f"The annotation '{image_id}' has {n_unique} unique values: " + f"{unique_vals}" + ) + + # Figure Configuration & Display + def _title_main(fig, title): + """ + Sets a bold, centered main title on the figure, and + adjusts figure size and layout accordingly. + """ + fig.set_size_inches(fig_width, fig_height) + fig.set_dpi(fig_dpi) + fig.suptitle( + title, + fontsize=global_font_size + 4, + weight='bold', + x=0.5, # center horizontally + horizontalalignment='center' + ) + + def _label_each_figure(fig_list, categories): + """ + Adds a title to each figure, typically used when multiple + separate figures are returned (one per category). + """ + for fig, cat in zip(fig_list, categories): + if fig: + _title_main(fig, f"{fig_title}\n{image_id}: {cat}") + # Adjust top for the suptitle + fig.tight_layout(rect=[0.01, 0.01, 0.99, 0.96]) + if show_plot: + plt.show() + + # Determine the actual distance column name used in df_long for summary + distance_col = ( + "log_distance" if "log_distance" in df_long.columns else "distance" + ) + + # Displaying Figures + cat_list = [] + if image_id and (image_id in df_long.columns): + if pd.api.types.is_categorical_dtype(df_long[image_id]): + cat_list = list(df_long[image_id].cat.categories) + else: + cat_list = df_long[image_id].unique().tolist() + + # Track figures for saving + figures_to_save = [] + + if isinstance(figs_out, list) and not facet_plot and \ + cat_list and len(figs_out) == len(cat_list): + # Scenario: Multiple separate figures, one per category (non-faceted) + figures_to_save = figs_out + _label_each_figure(figs_out, cat_list) + if show_plot: + plt.show() + else: + # Scenario: Single figure (faceted) or list of figures not matching categories + figures_to_display = ( + figs_out if isinstance(figs_out, list) else [figs_out] + ) + figures_to_save = figures_to_display + for fig_item_to_display in figures_to_display: + if fig_item_to_display is not None: + _title_main(fig_item_to_display, fig_title) + + bottom_padding = 0.01 + # Make space for shared x-title + if fig_item_to_display is shared_x_title_applied_to_fig: + bottom_padding = 0.01 # Adjusted from 0.05 + + top_padding = 0.99 # Adjusted from 0.90 + + # rect=[left, bottom, right, top] + fig_item_to_display.tight_layout( + rect=[0.01, bottom_padding, 0.99, top_padding] + ) + if show_plot: + plt.show() + + # Summary statistics + # 1) Per-group summary + df_summary_group = ( + df_long + .groupby("group")[distance_col] + .describe() + .reset_index() + ) + + # 2) Per-group-and-stratify, if image_id is valid + if image_id and (image_id in df_long.columns): + df_summary_group_strat = ( + df_long + .groupby([image_id, "group"])[distance_col] + .describe() + .reset_index() + ) + else: + df_summary_group_strat = None + + if df_summary_group_strat is not None: + logger.info(f"\nSummary by group(target phenotypes) AND '{image_id}':") + logger.info(f"\n{df_summary_group_strat}") + else: + logger.info("\nSummary: By group(target phenotypes) only") + logger.info(f"\n{df_summary_group}") + + # CSV Output + final_df = ( + df_summary_group_strat if df_summary_group_strat is not None + else df_summary_group + ) + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + # Package figures in a dictionary for directory saving + # This ensures they're saved in a directory per standardized schema + if "figures" in params["outputs"] and figures_to_save: + # Create a dictionary with named figures + figures_dict = {} + for idx, fig in enumerate(figures_to_save): + if fig is not None: + # Name figures appropriately + if cat_list and len(cat_list) == len(figures_to_save): + fig_name = f"nearest_neighbor_{cat_list[idx]}" + else: + fig_name = f"nearest_neighbor_{idx}" + figures_dict[fig_name] = fig + results_dict["figures"] = figures_dict # Dict triggers directory save + + # Check for DataFrame output (case-insensitive) + if any(k.lower() == "dataframe" for k in params["outputs"].keys()): + results_dict["dataframe"] = final_df + + # Use centralized save_results function + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + logger.info("Visualize Nearest Neighbor completed successfully.") + logger.info(f"Saved summary statistics to dataframe output.") + return saved_files + else: + # Return the figure(s) and dataframe directly for in-memory workflows + logger.info("Returning figure(s) and dataframe (not saving to file)") + # If single figure, return it directly; if multiple, return list + if len(figures_to_save) == 1: + return figures_to_save[0], final_df + else: + return figures_to_save, final_df + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python visualize_nearest_neighbor_template.py " + " [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + # Run analysis + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + # Display results based on return type + if isinstance(result, dict): + print("\nOutput files:") + for key, paths in result.items(): + if isinstance(paths, list): + print(f" {key}:") + for path in paths: + print(f" - {path}") + else: + print(f" {key}: {paths}") + else: + figures, df = result + print("\nReturned figure(s) and dataframe for in-memory use") + if isinstance(figures, list): + print(f"Number of figures: {len(figures)}") + else: + print(f"Figure size: {figures.get_size_inches()}") + print(f"DataFrame shape: {df.shape}") + print("\nSummary statistics preview:") + print(df.head()) diff --git a/src/spac/templates/visualize_ripley_l_template.py b/src/spac/templates/visualize_ripley_l_template.py new file mode 100644 index 00000000..17e8b7b2 --- /dev/null +++ b/src/spac/templates/visualize_ripley_l_template.py @@ -0,0 +1,155 @@ +""" +Platform-agnostic Visualize Ripley L template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Usage +----- +>>> from spac.templates.visualize_ripley_template import run_from_json +>>> run_from_json("examples/visualize_ripley_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union, List, Optional, Tuple +import pandas as pd +import matplotlib.pyplot as plt +import logging + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.visualization import plot_ripley_l +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, + text_to_value, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + show_plot: bool = True, + output_dir: Optional[Union[str, Path]] = None +) -> Union[Dict[str, str], Tuple[Any, pd.DataFrame]]: + """ + Execute Visualize Ripley L analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary + save_to_disk : bool, optional + Whether to save results to file. If False, returns the figure and + dataframe directly for in-memory workflows. Default is True. + show_plot : bool, optional + Whether to display the plot. Default is True. + output_dir : str or Path, optional + Directory for outputs. If None, uses current directory. + + Returns + ------- + dict or tuple + If save_to_disk=True: Dictionary of saved file paths + If save_to_disk=False: Tuple of (figure, dataframe) + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Load the upstream analysis data + adata = load_input(params["Upstream_Analysis"]) + + # Extract parameters + center_phenotype = params["Center_Phenotype"] + neighbor_phenotype = params["Neighbor_Phenotype"] + plot_specific_regions = params.get("Plot_Specific_Regions", False) + regions_labels = params.get("Regions_Labels", []) + plot_simulations = params.get("Plot_Simulations", True) + + logging.info(f"Running with center_phenotype: {center_phenotype}, neighbor_phenotype: {neighbor_phenotype}") + + # Process regions parameter exactly as in NIDAP template + if plot_specific_regions: + if len(regions_labels) == 0: + raise ValueError( + 'Please identify at least one region in the ' + '"Regions Label(s) parameter' + ) + else: + regions_labels = None + + # Run the visualization exactly as in NIDAP template + fig, plots_df = plot_ripley_l( + adata, + phenotypes=(center_phenotype, neighbor_phenotype), + regions=regions_labels, + sims=plot_simulations, + return_df=True + ) + + if show_plot: + plt.show() + + # Print the dataframe to console + logging.info(f"\n{plots_df.to_string()}") + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + # Check for dataframe output in config + if "dataframe" in params["outputs"]: + results_dict["dataframe"] = plots_df + + # Add figure if configured (usually not in the original template) + # but we can add it as an enhancement + if "figures" in params.get("outputs", {}): + # Package figure in a dictionary for directory saving + results_dict["figures"] = {"ripley_l_plot": fig} + + # Add analysis output if in config (for compatibility) + if "analysis" in params.get("outputs", {}): + results_dict["analysis"] = adata + + # Use centralized save_results function + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + logging.info(f"Visualize Ripley L completed → {list(saved_files.keys())}") + return saved_files + else: + # Return the figure and dataframe directly for in-memory workflows + logging.info("Returning figure and dataframe (not saving to file)") + return fig, plots_df + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print("Usage: python visualize_ripley_template.py ", file=sys.stderr) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + # Run analysis + result = run_from_json(sys.argv[1], output_dir=output_dir) + + if isinstance(result, dict): + print("\nOutput files:") + for filename, filepath in result.items(): + print(f" {filename}: {filepath}") + else: + print("\nReturned figure and dataframe") diff --git a/src/spac/templates/z_score_normalization_template.py b/src/spac/templates/z_score_normalization_template.py new file mode 100644 index 00000000..38e90a30 --- /dev/null +++ b/src/spac/templates/z_score_normalization_template.py @@ -0,0 +1,181 @@ +""" +Platform-agnostic Z-Score Normalization template converted from NIDAP. +Maintains the exact logic from the NIDAP template. + +Refactored to use centralized save_results from template_utils. +Follows standardized output schema where analysis is saved as a file. + +Usage +----- +>>> from spac.templates.zscore_normalization_template import run_from_json +>>> run_from_json("examples/zscore_normalization_params.json") +""" +import json +import sys +from pathlib import Path +from typing import Any, Dict, Union +import pandas as pd +import pickle +import logging + +# Add parent directory to path for imports +sys.path.append(str(Path(__file__).parent.parent.parent)) + +from spac.transformations import z_score_normalization +from spac.templates.template_utils import ( + load_input, + save_results, + parse_params, +) + + +def run_from_json( + json_path: Union[str, Path, Dict[str, Any]], + save_to_disk: bool = True, + output_dir: str = None, +) -> Union[Dict[str, str], Any]: + """ + Execute Z-Score Normalization analysis with parameters from JSON. + Replicates the NIDAP template functionality exactly. + + Parameters + ---------- + json_path : str, Path, or dict + Path to JSON file, JSON string, or parameter dictionary. + Expected JSON structure: + { + "Upstream_Analysis": "path/to/data.pickle", + "Table_to_Process": "Original", + "Output_Table_Name": "z_scores", + "outputs": { + "analysis": {"type": "file", "name": "output.pickle"} + } + } + save_to_disk : bool, optional + Whether to save results to disk. If True, saves the AnnData object + to a pickle file. If False, returns the AnnData object directly + for in-memory workflows. Default is True. + output_dir : str, optional + Base directory for outputs. If None, uses params['Output_Directory'] + or current directory. All outputs will be saved relative to this directory. + + Returns + ------- + dict or AnnData + If save_to_disk=True: Dictionary of saved file paths with structure: + {"analysis": "path/to/output.pickle"} + If save_to_disk=False: The processed AnnData object for in-memory use + + Notes + ----- + Output Structure: + - Analysis output is saved as a single pickle file (standardized for analysis outputs) + - When save_to_disk=False, the AnnData object is returned for programmatic use + + Examples + -------- + >>> # Save results to disk + >>> saved_files = run_from_json("params.json") + >>> print(saved_files["analysis"]) # Path to saved pickle file + >>> # './output.pickle' + + >>> # Get results in memory for further processing + >>> adata = run_from_json("params.json", save_to_disk=False) + >>> # Can now work with adata object directly + + >>> # Custom output directory + >>> saved = run_from_json("params.json", output_dir="/custom/path") + """ + # Parse parameters from JSON + params = parse_params(json_path) + + # Set output directory + if output_dir is None: + output_dir = params.get("Output_Directory", ".") + + # Ensure outputs configuration exists with standardized defaults + # Analysis uses file type per standardized schema + if "outputs" not in params: + params["outputs"] = { + "analysis": {"type": "file", "name": "output.pickle"} + } + + # Load the upstream analysis data + adata = load_input(params["Upstream_Analysis"]) + + # Extract parameters + input_layer = params["Table_to_Process"] + output_layer = params["Output_Table_Name"] + + if input_layer == "Original": + input_layer = None + + z_score_normalization( + adata, + output_layer=output_layer, + input_layer=input_layer + ) + + # Convert the normalized layer to a DataFrame and print its summary + post_dataframe = adata.to_df(layer=output_layer) + logging.info(f"Z-score normalization summary:\n{post_dataframe.describe()}") + logging.info(f"Transformed data:\n{adata}") + + # Handle results based on save_to_disk flag + if save_to_disk: + # Prepare results dictionary based on outputs config + results_dict = {} + + if "analysis" in params["outputs"]: + results_dict["analysis"] = adata + + # Use centralized save_results function + # All file handling and logging is now done by save_results + saved_files = save_results( + results=results_dict, + params=params, + output_base_dir=output_dir + ) + + logging.info( + f"Z-Score Normalization completed → {saved_files['analysis']}" + ) + return saved_files + else: + # Return the adata object directly for in-memory workflows + logging.info("Returning AnnData object (not saving to file)") + return adata + + +# CLI interface +if __name__ == "__main__": + if len(sys.argv) < 2: + print( + "Usage: python zscore_normalization_template.py [output_dir]", + file=sys.stderr + ) + sys.exit(1) + + # Set up logging for CLI usage + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + # Get output directory if provided + output_dir = sys.argv[2] if len(sys.argv) > 2 else None + + # Run analysis + result = run_from_json( + json_path=sys.argv[1], + output_dir=output_dir + ) + + # Display results based on return type + if isinstance(result, dict): + print("\nOutput files:") + for key, path in result.items(): + print(f" {key}: {path}") + else: + print("\nReturned AnnData object for in-memory use") + print(f"AnnData: {result}") diff --git a/src/spac/transformations.py b/src/spac/transformations.py index b2044f1c..228c55f1 100644 --- a/src/spac/transformations.py +++ b/src/spac/transformations.py @@ -8,7 +8,7 @@ from spac.utils import check_table, check_annotation, check_feature from scipy import stats import umap as umap_lib -from scipy.sparse import issparse +from scipy.sparse import issparse, csr_matrix from typing import List, Union, Optional from numpy.lib import NumpyVersion from sklearn.neighbors import KNeighborsClassifier @@ -16,6 +16,9 @@ import multiprocessing import parmap from spac.utag_functions import utag +from anndata import AnnData +from spac.utils import compute_summary_qc_stats +from typing import List, Optional # Configure logging logging.basicConfig(level=logging.INFO, @@ -1286,3 +1289,179 @@ def run_utag_clustering( cluster_list = utag_results.obs[cur_cluster_col].copy() adata.obs[output_annotation] = cluster_list.copy() adata.uns["utag_features"] = features + +# add QC metrics to AnnData object +def add_qc_metrics(adata, + organism="hs", + mt_match_pattern=None, + layer=None): + """ + Adds quality control (QC) metrics to the AnnData object. + + Parameters: + ----------- + adata : AnnData + The AnnData object containing single-cell or spatial + transcriptomics data. + organism : str, optional + The organism type. Default is "hs" (human). Use "mm" for mouse. + Determines the mitochondrial gene prefix + ("MT-" for human, "mt-" for mouse). + mt_match_pattern : str, optional + A custom pattern to identify mitochondrial genes. If None, it defaults + to "MT-" for human or "mt-" for mouse based on the `organism` parameter. + Takes precedence over the default patterns. + If provided, it should match the prefix of mitochondrial gene names in + `adata.var_names`. + layer : str, optional + The name of the layer in `adata.layers` to use for calculations. + If None, the default `adata.X` matrix is used. + + Modifies: + --------- + adata.obs : pandas.DataFrame + Adds the following QC metrics as new columns: + - "nFeature": Number of genes with non-zero expression for each cell. + - "nCount": Total counts (sum of all gene expression values) + for each cell. + - "nCount_mt": Total counts for mitochondrial genes for each cell. + - "percent.mt": Percentage of counts in mitochondrial genes + for each cell. + + Raises: + ------- + ValueError + If the specified `layer` is not found in `adata.layers`. + + Notes: + ------ + - If the input matrix (`adata.X` or the specified layer) is dense, + it is converted to a sparse matrix for efficient computation. + - Mitochondrial genes are identified based on the `mt_match_pattern`. + + Example: + -------- + >>> add_qc_metrics(adata, organism="hs") + >>> print(adata.obs[["nFeature", "nCount", "nCount_mt", "percent.mt"]]) + """ + # identify mitochondrial genes pattern + if mt_match_pattern is None: + if organism == "hs": + mt_match_pattern = "MT-" + elif organism == "mm": + mt_match_pattern = "mt-" + else: + raise ValueError(f"Unsupported organism '{organism}'. Supported values are 'hs' and 'mm'.") + + if layer is None: + test_matrix = adata.X + else: + check_table(adata, tables=layer) + test_matrix = adata.layers[layer] + + # Check if adata.X is sparse, and convert if necessary + if not issparse(test_matrix): + test_matrix = csr_matrix(test_matrix) + + # Calculate total number of genes with values > 0 for each cell + adata.obs["nFeature"] = np.array((test_matrix > 0).sum(axis=1)).flatten() + # Calculate the sum of counts for all genes for each cell + adata.obs["nCount"] = np.array(test_matrix.sum(axis=1)).flatten() + # Identify mitochondrial genes based on the match pattern + mt_genes = adata.var_names.str.startswith(mt_match_pattern) + # Calculate the sum of counts for mitochondrial genes for each cell + adata.obs["nCount_mt"] = np.array(test_matrix[:, mt_genes] + .sum(axis=1)).flatten() + # Calculate the percentage of counts in mitochondrial genes for each cell + adata.obs["percent.mt"] = (adata.obs["nCount_mt"] / + adata.obs["nCount"]) * 100 + # Handle NaN values in percent.mt + adata.obs["percent.mt"] = adata.obs["percent.mt"].fillna(0) + # Ensure percent.mt is stored as a float + adata.obs["percent.mt"] = adata.obs["percent.mt"].astype(float) + +# Add the QC summary table to AnnData object +def get_qc_summary_table( + adata: AnnData, + n_mad: int = 5, + upper_quantile: float = 0.95, + lower_quantile: float = 0.05, + stat_columns_list: Optional[List[str]] = None, + sample_column: str = None +) -> None: + """ + Compute summary statistics for quality control metrics in an AnnData object + and store the result in adata.uns['qc_summary_table']. + If QC columns are not in the adata.obs, run add_qc_metrics first. + + Parameters: + adata (AnnData): The AnnData object containing the data. + n_mad (int): Number of MADs to use for upper/lower thresholds. + upper_quantile (float): Upper quantile to compute (e.g., 0.95). + lower_quantile (float): Lower quantile to compute (e.g., 0.05). + stat_columns_list (list): List of column names to compute statistics for. + If None, defaults to ['nFeature', 'nCount', 'percent.mt']. + sample_column (str, optional): Column name to group by sample. + If None, computes for all data. + + Returns: + None. The summary table is stored in adata.uns['qc_summary_table']. + """ + # if not provided select default stat columns + if stat_columns_list is None: + stat_columns_list = ['nFeature', 'nCount', 'percent.mt'] + + # Check that required columns exist in adata.obs + check_annotation( + adata, + annotations=stat_columns_list, + should_exist=True) + + # check that stat_column_list is not empty + if not stat_columns_list: # catches [], (), None + raise ValueError( + 'Parameter "stat_columns_list" must contain at least one column name.' + ) + + # check grouping column + if sample_column is not None: + check_annotation(adata, annotations=[sample_column], should_exist=True) + + # validate numerical parameters input + if not 0 <= upper_quantile <= 1: + raise ValueError(f'Parameter "upper_quantile" must be between 0 and 1, got "{upper_quantile}"' + ) + if not 0 <= lower_quantile <= 1: + raise ValueError(f'Parameter "lower_quantile" must be between 0 and 1, got "{lower_quantile}"' + ) + if n_mad < 0: + raise ValueError(f'Parameter "n_mad" must be non-negative, got "{n_mad}"') + + obs_df = adata.obs + summary_table = pd.DataFrame() + # If no sample_column, compute stats for all data + if sample_column is None: + stat_df = compute_summary_qc_stats(df=obs_df, + n_mad=n_mad, + upper_quantile=upper_quantile, + lower_quantile=lower_quantile, + stat_columns_list=stat_columns_list) + stat_df["Sample"] = "All" + summary_table = stat_df + else: + # Otherwise, compute stats for each sample group + samples_list = pd.unique(obs_df[sample_column]) + stat_dfs = [] + for current_sample in samples_list: + sample_df = obs_df[obs_df[sample_column] == current_sample].copy() + stat_df = compute_summary_qc_stats(df=sample_df, + n_mad=n_mad, + upper_quantile=upper_quantile, + lower_quantile=lower_quantile, + stat_columns_list=stat_columns_list) + stat_df["Sample"] = current_sample + stat_dfs.append(stat_df) + summary_table = pd.concat(stat_dfs, ignore_index=True) + # Reset index and store in adata.uns + summary_table = summary_table.reset_index(drop=True) + adata.uns["qc_summary_table"] = summary_table \ No newline at end of file diff --git a/src/spac/utils.py b/src/spac/utils.py index f3506a20..2a616b68 100644 --- a/src/spac/utils.py +++ b/src/spac/utils.py @@ -7,6 +7,8 @@ import logging import warnings import numbers +from scipy.stats import median_abs_deviation +from typing import List, Optional # Configure logging logging.basicConfig(level=logging.INFO, @@ -1108,12 +1110,13 @@ def compute_metrics(data): # Ensure the maximum and minimum outliers are included max_outlier = outlier_series.max() min_outlier = outlier_series.min() - outliers_sampled = outliers_sampled.append( - pd.Series([max_outlier, min_outlier]) + outliers_sampled = pd.concat( + [outliers_sampled, pd.Series([max_outlier, min_outlier])], + ignore_index=True ) # Convert the sampled values back to a list - outliers = outliers_sampled.reset_index(drop=True).tolist() + outliers = outliers_sampled.tolist() metrics = [ lower_whisker, @@ -1190,3 +1193,84 @@ def compute_metrics(data): return metrics return metrics + +# compute summary statistics for the specified columns +def compute_summary_qc_stats( + df: pd.DataFrame, + n_mad: int = 5, + upper_quantile: float = 0.95, + lower_quantile: float = 0.05, + stat_columns_list: List[str] = ['nFeature', 'nCount', 'percent.mt'] + ) -> pd.DataFrame: + + """ + Compute summary quality control statistics for specified columns in a dataset. + + For each column in stat_columns_list, this function calculates: + - Mean + - Median + - Upper and lower thresholds based on median ± n_mad * MAD + (median absolute deviation) + - Upper and lower quantiles + + Parameters + ---------- + df : pd.DataFrame + Input DataFrame containing the data. + n_mad : int, optional + Number of MADs to use for upper and lower thresholds (default is 5). + upper_quantile : float, optional + Upper quantile to compute (default is 0.95). + lower_quantile : float, optional + Lower quantile to compute (default is 0.05). + stat_columns_list : list of str, optional + List of column names to compute statistics for. Columns must be numeric. + + Returns + ------- + pd.DataFrame + DataFrame with summary statistics for each specified column. + Columns: ["metric_name", "mean", "median", "upper_mad", "lower_mad", + "upper_quantile", "lower_quantile"] + + Raises + ------ + TypeError + If any column in stat_columns_list is not numeric or all values are NaN. + """ + stat_vals = [] + for col_name in stat_columns_list: + # Ensure the column is numeric + if not pd.api.types.is_numeric_dtype(df[col_name]): + raise TypeError( + f'Column "{col_name}" must be numeric to compute statistics.' + ) + # Check for all-NaN column + if df[col_name].isna().all(): + raise TypeError( + f'Column "{col_name}" must be numeric to compute statistics. ' + 'All values are NaN.' + ) + # Compute median and MAD (median absolute deviation) + median = df[col_name].median() + mad = median_abs_deviation(df[col_name], nan_policy='omit') + # Collect statistics for this column + col_stats = [ + col_name, + df[col_name].mean(), + median, + median + n_mad * mad, + median - n_mad * mad, + df[col_name].quantile(upper_quantile), + df[col_name].quantile(lower_quantile) + ] + stat_vals.append(col_stats) + # Return DataFrame with statistics for all columns + return pd.DataFrame( + stat_vals, + columns=[ + "metric_name", "mean", "median", + "upper_mad", "lower_mad", + "upper_quantile", "lower_quantile" + ] + ) \ No newline at end of file diff --git a/tests/templates/__init__.py b/tests/templates/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/tests/templates/test_add_pin_color_rule.py b/tests/templates/test_add_pin_color_rule.py new file mode 100644 index 00000000..762da8d6 --- /dev/null +++ b/tests/templates/test_add_pin_color_rule.py @@ -0,0 +1,98 @@ +# tests/templates/test_add_pin_color_rule.py +""" +Real (non-mocked) unit test for the Append Pin Color Rule template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.append_pin_color_rule_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData: 4 cells for color rule assignment.""" + rng = np.random.default_rng(42) + X = rng.random((4, 2)) + obs = pd.DataFrame({"cell_type": ["A", "B", "A", "B"]}) + var = pd.DataFrame(index=["Gene_0", "Gene_1"]) + return ad.AnnData(X=X, obs=obs, var=var) + + +class TestAddPinColorRuleTemplate(unittest.TestCase): + """Real (non-mocked) tests for the append pin color rule template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Label_Color_Map": ["A:red", "B:blue"], + "Color_Map_Name": "_spac_colors", + "Overwrite_Previous_Color_Map": True, + "Output_Directory": self.tmp_dir.name, + "outputs": { + "analysis": {"type": "file", "name": "output.pickle"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_add_pin_color_rule_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run pin color rule and verify outputs. + + Validates: + 1. saved_files dict has 'analysis' key + 2. Pickle exists and contains AnnData + 3. Color map is stored in .uns + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("analysis", saved_files) + + pkl_path = Path(saved_files["analysis"]) + self.assertTrue(pkl_path.exists()) + self.assertGreater(pkl_path.stat().st_size, 0) + + with open(pkl_path, "rb") as f: + result_adata = pickle.load(f) + self.assertIsInstance(result_adata, ad.AnnData) + self.assertIn("_spac_colors", result_adata.uns) + + mem_adata = run_from_json(self.json_file, save_to_disk=False) + self.assertIsInstance(mem_adata, ad.AnnData) + self.assertIn("_spac_colors", mem_adata.uns) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_analysis_to_csv_template.py b/tests/templates/test_analysis_to_csv_template.py new file mode 100644 index 00000000..3eaeb051 --- /dev/null +++ b/tests/templates/test_analysis_to_csv_template.py @@ -0,0 +1,98 @@ +# tests/templates/test_analysis_to_csv_template.py +""" +Real (non-mocked) unit test for the Analysis to CSV template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.analysis_to_csv_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData: 4 cells, 2 genes for CSV export.""" + rng = np.random.default_rng(42) + X = rng.random((4, 2)) + obs = pd.DataFrame({"cell_type": ["A", "B", "A", "B"]}) + var = pd.DataFrame(index=["Gene_0", "Gene_1"]) + adata = ad.AnnData(X=X, obs=obs, var=var) + adata.obsm["spatial"] = rng.random((4, 2)) * 100 + return adata + + +class TestAnalysisToCSVTemplate(unittest.TestCase): + """Real (non-mocked) tests for the analysis to CSV template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Table_to_Export": "Original", + "Output_Directory": self.tmp_dir.name, + "outputs": { + "dataframe": {"type": "file", "name": "dataframe.csv"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_analysis_to_csv_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: export AnnData to CSV and verify outputs. + + Validates: + 1. saved_files dict has 'dataframe' key + 2. CSV exists, is non-empty + 3. CSV has expected columns (genes + obs) + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("dataframe", saved_files) + + csv_path = Path(saved_files["dataframe"]) + self.assertTrue(csv_path.exists()) + self.assertGreater(csv_path.stat().st_size, 0) + + result_df = pd.read_csv(csv_path) + # Should have gene columns and obs columns + self.assertIn("Gene_0", result_df.columns) + self.assertIn("Gene_1", result_df.columns) + self.assertEqual(len(result_df), 4) + + mem_df = run_from_json(self.json_file, save_to_disk=False) + self.assertIsInstance(mem_df, pd.DataFrame) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_append_annotation_template.py b/tests/templates/test_append_annotation_template.py new file mode 100644 index 00000000..fcec1b56 --- /dev/null +++ b/tests/templates/test_append_annotation_template.py @@ -0,0 +1,114 @@ +# tests/templates/test_append_annotation_template.py +""" +Real (non-mocked) unit test for the Append Annotation template. + +Validates template I/O behaviour only: + - Expected output files are produced on disk + - Filenames follow the convention + - Output artifacts are non-empty + +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import sys +import tempfile +import unittest +from pathlib import Path + +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.append_annotation_template import run_from_json + + +def _make_tiny_dataframe() -> pd.DataFrame: + """ + Minimal synthetic DataFrame for append annotation testing. + + 4 rows, 2 columns -- the smallest dataset that exercises the + template's column-append code path. + """ + return pd.DataFrame({ + "cell_type": ["B cell", "T cell", "B cell", "T cell"], + "marker": [1.0, 2.0, 3.0, 4.0], + }) + + +class TestAppendAnnotationTemplate(unittest.TestCase): + """Real (non-mocked) tests for the append annotation template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.csv") + + _make_tiny_dataframe().to_csv(self.in_file, index=False) + + params = { + "Upstream_Dataset": self.in_file, + "Annotation_Pair_List": ["batch_id:batch_1", "site:lung"], + "Output_Directory": self.tmp_dir.name, + "outputs": { + "dataframe": {"type": "file", "name": "dataframe.csv"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_append_annotation_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run append annotation template and verify + output artifacts. + + Validates: + 1. saved_files is a dict with 'dataframe' key + 2. Output CSV exists and is non-empty + 3. New annotation columns are present in the output + 4. In-memory return is a DataFrame with the appended columns + """ + # -- Act (save_to_disk=True) ----------------------------------- + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + # -- Assert: return type --------------------------------------- + self.assertIsInstance(saved_files, dict) + + # -- Assert: CSV file exists and is non-empty ------------------ + self.assertIn("dataframe", saved_files) + csv_path = Path(saved_files["dataframe"]) + self.assertTrue(csv_path.exists(), f"CSV not found: {csv_path}") + self.assertGreater(csv_path.stat().st_size, 0) + + # -- Assert: appended columns present -------------------------- + result_df = pd.read_csv(csv_path) + self.assertIn("batch_id", result_df.columns) + self.assertIn("site", result_df.columns) + self.assertEqual(result_df["batch_id"].unique().tolist(), ["batch_1"]) + self.assertEqual(result_df["site"].unique().tolist(), ["lung"]) + + # -- Act (save_to_disk=False) ---------------------------------- + mem_df = run_from_json( + self.json_file, + save_to_disk=False, + ) + + # -- Assert: in-memory return is DataFrame --------------------- + self.assertIsInstance(mem_df, pd.DataFrame) + self.assertIn("batch_id", mem_df.columns) + self.assertIn("site", mem_df.columns) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_arcsinh_normalization_template.py b/tests/templates/test_arcsinh_normalization_template.py new file mode 100644 index 00000000..9e9c61f3 --- /dev/null +++ b/tests/templates/test_arcsinh_normalization_template.py @@ -0,0 +1,98 @@ +# tests/templates/test_arcsinh_normalization_template.py +""" +Real (non-mocked) unit test for the Arcsinh Normalization template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.arcsinh_normalization_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData: 4 cells, 2 genes with positive values.""" + rng = np.random.default_rng(42) + X = rng.integers(1, 100, size=(4, 2)).astype(float) + obs = pd.DataFrame({"cell_type": ["A", "B", "A", "B"]}) + var = pd.DataFrame(index=["Gene_0", "Gene_1"]) + return ad.AnnData(X=X, obs=obs, var=var) + + +class TestArcsinhNormalizationTemplate(unittest.TestCase): + """Real (non-mocked) tests for the arcsinh normalization template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Co_Factor": "5", + "Percentile": "None", + "Output_Directory": self.tmp_dir.name, + "outputs": { + "analysis": {"type": "file", "name": "output.pickle"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_arcsinh_normalization_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run arcsinh normalization and verify outputs. + + Validates: + 1. saved_files is a dict with 'analysis' key + 2. Output pickle exists and is non-empty + 3. Output pickle contains an AnnData with 'arcsinh' layer + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("analysis", saved_files) + + pkl_path = Path(saved_files["analysis"]) + self.assertTrue(pkl_path.exists(), f"Pickle not found: {pkl_path}") + self.assertGreater(pkl_path.stat().st_size, 0) + + with open(pkl_path, "rb") as f: + result_adata = pickle.load(f) + self.assertIsInstance(result_adata, ad.AnnData) + self.assertIn("arcsinh", result_adata.layers) + + # -- save_to_disk=False returns AnnData in memory -------------- + mem_adata = run_from_json(self.json_file, save_to_disk=False) + self.assertIsInstance(mem_adata, ad.AnnData) + self.assertIn("arcsinh", mem_adata.layers) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_binary_to_categorical_annotation_template.py b/tests/templates/test_binary_to_categorical_annotation_template.py new file mode 100644 index 00000000..9e5a44f4 --- /dev/null +++ b/tests/templates/test_binary_to_categorical_annotation_template.py @@ -0,0 +1,117 @@ +# tests/templates/test_binary_to_categorical_annotation_template.py +""" +Real (non-mocked) unit test for the Binary to Categorical Annotation template. + +Validates template I/O behaviour only: + - Expected output files are produced on disk + - Filenames follow the convention + - Output artifacts are non-empty + +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import sys +import tempfile +import unittest +from pathlib import Path + +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.binary_to_categorical_annotation_template import ( + run_from_json, +) + + +def _make_tiny_dataframe() -> pd.DataFrame: + """ + Minimal synthetic DataFrame with binary one-hot columns. + + 4 rows -- each row has exactly one 1 across the binary columns. + """ + return pd.DataFrame({ + "B_cell": [1, 0, 0, 0], + "T_cell": [0, 1, 0, 1], + "NK_cell": [0, 0, 1, 0], + "marker": [1.5, 2.5, 3.5, 4.5], + }) + + +class TestBinaryToCategoricalAnnotationTemplate(unittest.TestCase): + """Real (non-mocked) tests for the binary-to-categorical template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.csv") + + _make_tiny_dataframe().to_csv(self.in_file, index=False) + + params = { + "Upstream_Dataset": self.in_file, + "Binary_Annotation_Columns": ["B_cell", "T_cell", "NK_cell"], + "New_Annotation_Name": "cell_labels", + "Output_Directory": self.tmp_dir.name, + "outputs": { + "dataframe": {"type": "file", "name": "dataframe.csv"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_bin2cat_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run binary-to-categorical template and verify + output artifacts. + + Validates: + 1. saved_files is a dict with 'dataframe' key + 2. Output CSV exists and is non-empty + 3. New categorical column 'cell_labels' is present + 4. Categorical values match the original binary column names + """ + # -- Act (save_to_disk=True) ----------------------------------- + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + # -- Assert: return type --------------------------------------- + self.assertIsInstance(saved_files, dict) + + # -- Assert: CSV file exists and is non-empty ------------------ + self.assertIn("dataframe", saved_files) + csv_path = Path(saved_files["dataframe"]) + self.assertTrue(csv_path.exists(), f"CSV not found: {csv_path}") + self.assertGreater(csv_path.stat().st_size, 0) + + # -- Assert: categorical column present with expected values --- + result_df = pd.read_csv(csv_path) + self.assertIn("cell_labels", result_df.columns) + expected_labels = {"B_cell", "T_cell", "NK_cell"} + actual_labels = set(result_df["cell_labels"].dropna().unique()) + self.assertEqual(actual_labels, expected_labels) + + # -- Act (save_to_disk=False) ---------------------------------- + mem_df = run_from_json( + self.json_file, + save_to_disk=False, + ) + + # -- Assert: in-memory return is DataFrame --------------------- + self.assertIsInstance(mem_df, pd.DataFrame) + self.assertIn("cell_labels", mem_df.columns) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_boxplot_template.py b/tests/templates/test_boxplot_template.py new file mode 100644 index 00000000..3588f618 --- /dev/null +++ b/tests/templates/test_boxplot_template.py @@ -0,0 +1,194 @@ +# tests/templates/test_boxplot_template.py +""" +Real (non-mocked) unit test for the Boxplot template. + +Snowball seed test — validates template I/O behaviour only: + • Expected output files are produced on disk + • Filenames follow the convention + • Output artifacts are non-empty + +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import matplotlib +matplotlib.use("Agg") # Headless backend for CI + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.boxplot_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """ + Minimal synthetic AnnData for boxplot template testing. + + 4 cells, 2 genes, 2 cell types — the smallest dataset that exercises + the template's grouping, plotting, and summary-stats code paths. + """ + rng = np.random.default_rng(42) + + # 4 cells × 2 genes — small enough to reason about, + # large enough for describe() to return meaningful stats + n_cells, n_genes = 4, 2 + X = rng.integers(1, 10, size=(n_cells, n_genes)).astype(float) + + obs = pd.DataFrame( + {"cell_type": ["B cell", "T cell", "B cell", "T cell"]}, + ) + var = pd.DataFrame(index=["Gene_0", "Gene_1"]) + + return ad.AnnData(X=X, obs=obs, var=var) + + +class TestBoxplotTemplate(unittest.TestCase): + """Real (non-mocked) tests for the boxplot template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + # Save minimal real data as pickle (simulates upstream analysis) + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + # Write a JSON params file — the actual input the template receives + # in production (from Galaxy / Code Ocean) + params = { + "Upstream_Analysis": self.in_file, + "Primary_Annotation": "cell_type", + "Secondary_Annotation": "None", + "Table_to_Visualize": "Original", + "Feature_s_to_Plot": ["All"], + "Value_Axis_Log_Scale": False, + "Figure_Title": "Test BoxPlot", + "Horizontal_Plot": False, + "Figure_Width": 6, + "Figure_Height": 4, + "Figure_DPI": 72, # low DPI for fast save + "Font_Size": 10, + "Keep_Outliers": True, + "Output_Directory": self.tmp_dir.name, + "outputs": { + "figures": {"type": "directory", "name": "figures"}, + "dataframe": {"type": "file", "name": "output.csv"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_boxplot_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run boxplot template and verify output + artifacts. + + Validates: + 1. saved_files is a dict with 'figures' and 'dataframe' keys + 2. A figures directory is created containing a non-empty PNG + 3. The figure title matches the "Figure_Title" param + 4. A summary CSV is created with the exact describe() rows + """ + # -- Act (save_to_disk=True): write outputs to disk ------------ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + show_plot=False, # no GUI in CI + output_dir=self.tmp_dir.name, + ) + + # -- Act (save_to_disk=False): get figure + df in memory ------- + fig, summary_df_mem = run_from_json( + self.json_file, + save_to_disk=False, + show_plot=False, + ) + + # -- Assert: return type --------------------------------------- + self.assertIsInstance( + saved_files, dict, + f"Expected dict from run_from_json, got {type(saved_files)}" + ) + + # -- Assert: figures directory contains at least one PNG ------- + self.assertIn("figures", saved_files, + "Missing 'figures' key in saved_files") + figure_paths = saved_files["figures"] + self.assertGreaterEqual( + len(figure_paths), 1, "No figure files were saved" + ) + + for fig_path in figure_paths: + fig_file = Path(fig_path) + self.assertTrue( + fig_file.exists(), f"Figure not found: {fig_path}" + ) + self.assertGreater( + fig_file.stat().st_size, 0, + f"Figure file is empty: {fig_path}" + ) + # Template saves matplotlib figures as .png + self.assertEqual( + fig_file.suffix, ".png", + f"Expected .png extension, got {fig_file.suffix}" + ) + + # -- Assert: figure has the correct title ---------------------- + # The template calls ax.set_title(figure_title), so the axes + # title must match the "Figure_Title" parameter we passed in. + axes_title = fig.axes[0].get_title() + self.assertEqual( + axes_title, "Test BoxPlot", + f"Expected figure title 'Test BoxPlot', got '{axes_title}'" + ) + + # -- Assert: summary CSV exists and is non-empty --------------- + self.assertIn("dataframe", saved_files, + "Missing 'dataframe' key in saved_files") + csv_path = Path(saved_files["dataframe"]) + self.assertTrue( + csv_path.exists(), f"Summary CSV not found: {csv_path}" + ) + self.assertGreater( + csv_path.stat().st_size, 0, + f"Summary CSV is empty: {csv_path}" + ) + + # -- Assert: CSV has the exact describe() stat rows ------------ + # The template calls df.describe().reset_index() which produces + # exactly these 8 rows in this order. + summary_df = pd.read_csv(csv_path) + expected_stats = [ + "count", "mean", "std", "min", + "25%", "50%", "75%", "max", + ] + + # First column after reset_index() is called "index" + actual_stats = summary_df["index"].tolist() + self.assertEqual( + actual_stats, expected_stats, + f"Summary CSV stat rows don't match.\n" + f" Expected: {expected_stats}\n" + f" Actual: {actual_stats}" + ) + + +if __name__ == "__main__": + unittest.main() \ No newline at end of file diff --git a/tests/templates/test_calculate_centroid_template.py b/tests/templates/test_calculate_centroid_template.py new file mode 100644 index 00000000..7093034e --- /dev/null +++ b/tests/templates/test_calculate_centroid_template.py @@ -0,0 +1,126 @@ +# tests/templates/test_calculate_centroid_template.py +""" +Real (non-mocked) unit test for the Calculate Centroid template. + +Validates template I/O behaviour only: + - Expected output files are produced on disk + - Filenames follow the convention + - Output artifacts are non-empty + +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import sys +import tempfile +import unittest +from pathlib import Path + +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.calculate_centroid_template import run_from_json + + +def _make_tiny_dataframe() -> pd.DataFrame: + """ + Minimal synthetic DataFrame with bounding-box coordinate columns. + + 4 rows -- enough to exercise the centroid calculation. + """ + return pd.DataFrame({ + "XMin": [0.0, 10.0, 20.0, 30.0], + "XMax": [10.0, 20.0, 30.0, 40.0], + "YMin": [0.0, 5.0, 10.0, 15.0], + "YMax": [4.0, 9.0, 14.0, 19.0], + "cell_type": ["A", "B", "A", "B"], + }) + + +class TestCalculateCentroidTemplate(unittest.TestCase): + """Real (non-mocked) tests for the calculate centroid template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.csv") + + _make_tiny_dataframe().to_csv(self.in_file, index=False) + + params = { + "Upstream_Dataset": self.in_file, + "Min_X_Coordinate_Column_Name": "XMin", + "Max_X_Coordinate_Column_Name": "XMax", + "Min_Y_Coordinate_Column_Name": "YMin", + "Max_Y_Coordinate_Column_Name": "YMax", + "X_Centroid_Name": "XCentroid", + "Y_Centroid_Name": "YCentroid", + "Output_Directory": self.tmp_dir.name, + "outputs": { + "dataframe": {"type": "file", "name": "dataframe.csv"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_calculate_centroid_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run calculate centroid template and verify + output artifacts. + + Validates: + 1. saved_files is a dict with 'dataframe' key + 2. Output CSV exists and is non-empty + 3. Centroid columns are present and correctly computed + """ + # -- Act (save_to_disk=True) ----------------------------------- + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + # -- Assert: return type --------------------------------------- + self.assertIsInstance(saved_files, dict) + + # -- Assert: CSV file exists and is non-empty ------------------ + self.assertIn("dataframe", saved_files) + csv_path = Path(saved_files["dataframe"]) + self.assertTrue(csv_path.exists(), f"CSV not found: {csv_path}") + self.assertGreater(csv_path.stat().st_size, 0) + + # -- Assert: centroid columns are present and correct ---------- + result_df = pd.read_csv(csv_path) + self.assertIn("XCentroid", result_df.columns) + self.assertIn("YCentroid", result_df.columns) + + # XCentroid = (XMin + XMax) / 2 + expected_x = [5.0, 15.0, 25.0, 35.0] + self.assertEqual(result_df["XCentroid"].tolist(), expected_x) + + # YCentroid = (YMin + YMax) / 2 + expected_y = [2.0, 7.0, 12.0, 17.0] + self.assertEqual(result_df["YCentroid"].tolist(), expected_y) + + # -- Act (save_to_disk=False) ---------------------------------- + mem_df = run_from_json( + self.json_file, + save_to_disk=False, + ) + + # -- Assert: in-memory return is DataFrame --------------------- + self.assertIsInstance(mem_df, pd.DataFrame) + self.assertIn("XCentroid", mem_df.columns) + self.assertIn("YCentroid", mem_df.columns) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_combine_annotations_template.py b/tests/templates/test_combine_annotations_template.py new file mode 100644 index 00000000..a8f91a23 --- /dev/null +++ b/tests/templates/test_combine_annotations_template.py @@ -0,0 +1,110 @@ +# tests/templates/test_combine_annotations_template.py +""" +Real (non-mocked) unit test for the Combine Annotations template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.combine_annotations_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData: 4 cells with two annotation columns to combine.""" + rng = np.random.default_rng(42) + X = rng.random((4, 2)) + obs = pd.DataFrame({ + "tissue": ["lung", "liver", "lung", "liver"], + "cell_type": ["B cell", "T cell", "T cell", "B cell"], + }) + var = pd.DataFrame(index=["Gene_0", "Gene_1"]) + return ad.AnnData(X=X, obs=obs, var=var) + + +class TestCombineAnnotationsTemplate(unittest.TestCase): + """Real (non-mocked) tests for the combine annotations template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Annotations_Names": ["tissue", "cell_type"], + "Separator": "_", + "New_Annotation_Name": "combined", + "Output_Directory": self.tmp_dir.name, + "outputs": { + "dataframe": {"type": "file", "name": "dataframe.csv"}, + "analysis": {"type": "file", "name": "output.pickle"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_combine_annotations_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run combine annotations and verify outputs. + + Validates: + 1. saved_files dict has 'analysis' and 'dataframe' keys + 2. Pickle contains AnnData with 'combined' obs column + 3. CSV exists and is non-empty + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("analysis", saved_files) + self.assertIn("dataframe", saved_files) + + # -- Pickle output -- + pkl_path = Path(saved_files["analysis"]) + self.assertTrue(pkl_path.exists()) + self.assertGreater(pkl_path.stat().st_size, 0) + + with open(pkl_path, "rb") as f: + result_adata = pickle.load(f) + self.assertIsInstance(result_adata, ad.AnnData) + self.assertIn("combined", result_adata.obs.columns) + + # -- CSV output -- + csv_path = Path(saved_files["dataframe"]) + self.assertTrue(csv_path.exists()) + self.assertGreater(csv_path.stat().st_size, 0) + + # -- In-memory -- + mem_adata = run_from_json(self.json_file, save_to_disk=False) + self.assertIsInstance(mem_adata, ad.AnnData) + self.assertIn("combined", mem_adata.obs.columns) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_combine_dataframes_template.py b/tests/templates/test_combine_dataframes_template.py new file mode 100644 index 00000000..6942eb78 --- /dev/null +++ b/tests/templates/test_combine_dataframes_template.py @@ -0,0 +1,114 @@ +# tests/templates/test_combine_dataframes_template.py +""" +Real (non-mocked) unit test for the Combine DataFrames template. + +Validates template I/O behaviour only: + - Expected output files are produced on disk + - Filenames follow the convention + - Output artifacts are non-empty + +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import sys +import tempfile +import unittest +from pathlib import Path + +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.combine_dataframes_template import run_from_json + + +def _make_tiny_dataframes(): + """Two minimal DataFrames with the same schema for concatenation.""" + df_a = pd.DataFrame({ + "cell_type": ["B cell", "T cell"], + "marker": [1.0, 2.0], + }) + df_b = pd.DataFrame({ + "cell_type": ["NK cell", "Monocyte"], + "marker": [3.0, 4.0], + }) + return df_a, df_b + + +class TestCombineDataFramesTemplate(unittest.TestCase): + """Real (non-mocked) tests for the combine dataframes template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + + df_a, df_b = _make_tiny_dataframes() + self.file_a = os.path.join(self.tmp_dir.name, "first.csv") + self.file_b = os.path.join(self.tmp_dir.name, "second.csv") + df_a.to_csv(self.file_a, index=False) + df_b.to_csv(self.file_b, index=False) + + params = { + "First_Dataframe": self.file_a, + "Second_Dataframe": self.file_b, + "Output_Directory": self.tmp_dir.name, + "outputs": { + "dataframe": {"type": "file", "name": "dataframe.csv"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_combine_dataframes_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run combine dataframes template and verify + output artifacts. + + Validates: + 1. saved_files is a dict with 'dataframe' key + 2. Output CSV exists and is non-empty + 3. Combined DataFrame has all rows from both inputs + """ + # -- Act (save_to_disk=True) ----------------------------------- + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + # -- Assert: return type --------------------------------------- + self.assertIsInstance(saved_files, dict) + + # -- Assert: CSV file exists and is non-empty ------------------ + self.assertIn("dataframe", saved_files) + csv_path = Path(saved_files["dataframe"]) + self.assertTrue(csv_path.exists(), f"CSV not found: {csv_path}") + self.assertGreater(csv_path.stat().st_size, 0) + + # -- Assert: combined row count -------------------------------- + result_df = pd.read_csv(csv_path) + self.assertEqual(len(result_df), 4) + expected_types = {"B cell", "T cell", "NK cell", "Monocyte"} + self.assertEqual(set(result_df["cell_type"]), expected_types) + + # -- Act (save_to_disk=False) ---------------------------------- + mem_df = run_from_json( + self.json_file, + save_to_disk=False, + ) + + # -- Assert: in-memory return is DataFrame --------------------- + self.assertIsInstance(mem_df, pd.DataFrame) + self.assertEqual(len(mem_df), 4) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_downsample_cells_template.py b/tests/templates/test_downsample_cells_template.py new file mode 100644 index 00000000..7f76d5b3 --- /dev/null +++ b/tests/templates/test_downsample_cells_template.py @@ -0,0 +1,116 @@ +# tests/templates/test_downsample_cells_template.py +""" +Real (non-mocked) unit test for the Downsample Cells template. + +Validates template I/O behaviour only: + - Expected output files are produced on disk + - Filenames follow the convention + - Output artifacts are non-empty + +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import sys +import tempfile +import unittest +from pathlib import Path + +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.downsample_cells_template import run_from_json + + +def _make_tiny_dataframe() -> pd.DataFrame: + """ + Minimal synthetic DataFrame for downsampling. + + 8 rows, 2 groups of 4 -- enough to exercise group-based downsampling. + """ + return pd.DataFrame({ + "cell_type": ["A", "A", "A", "A", "B", "B", "B", "B"], + "marker": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0], + }) + + +class TestDownsampleCellsTemplate(unittest.TestCase): + """Real (non-mocked) tests for the downsample cells template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.csv") + + _make_tiny_dataframe().to_csv(self.in_file, index=False) + + params = { + "Upstream_Dataset": self.in_file, + "Annotations_List": ["cell_type"], + "Number_of_Samples": 2, + "Stratify_Option": False, + "Random_Selection": False, + "Output_Directory": self.tmp_dir.name, + "outputs": { + "dataframe": {"type": "file", "name": "dataframe.csv"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_downsample_cells_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run downsample cells template and verify + output artifacts. + + Validates: + 1. saved_files is a dict with 'dataframe' key + 2. Output CSV exists and is non-empty + 3. Row count is reduced (2 per group = 4 total from 8) + """ + # -- Act (save_to_disk=True) ----------------------------------- + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + # -- Assert: return type --------------------------------------- + self.assertIsInstance(saved_files, dict) + + # -- Assert: CSV file exists and is non-empty ------------------ + self.assertIn("dataframe", saved_files) + csv_path = Path(saved_files["dataframe"]) + self.assertTrue(csv_path.exists(), f"CSV not found: {csv_path}") + self.assertGreater(csv_path.stat().st_size, 0) + + # -- Assert: downsampled row count ----------------------------- + result_df = pd.read_csv(csv_path) + # 2 samples per group * 2 groups = 4 rows + self.assertEqual(len(result_df), 4) + # Both groups should still be present + self.assertEqual( + set(result_df["cell_type"].unique()), {"A", "B"} + ) + + # -- Act (save_to_disk=False) ---------------------------------- + mem_df = run_from_json( + self.json_file, + save_to_disk=False, + ) + + # -- Assert: in-memory return is DataFrame --------------------- + self.assertIsInstance(mem_df, pd.DataFrame) + self.assertEqual(len(mem_df), 4) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_hierarchical_heatmap_template.py b/tests/templates/test_hierarchical_heatmap_template.py new file mode 100644 index 00000000..1964b9a0 --- /dev/null +++ b/tests/templates/test_hierarchical_heatmap_template.py @@ -0,0 +1,106 @@ +# tests/templates/test_hierarchical_heatmap_template.py +""" +Real (non-mocked) unit test for the Hierarchical Heatmap template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import matplotlib +matplotlib.use("Agg") + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.hierarchical_heatmap_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData: 8 cells, 3 genes, 2 groups for heatmap.""" + rng = np.random.default_rng(42) + X = rng.integers(1, 20, size=(8, 3)).astype(float) + obs = pd.DataFrame({ + "cell_type": ["A", "A", "B", "B", "A", "A", "B", "B"], + }) + var = pd.DataFrame(index=["Gene_0", "Gene_1", "Gene_2"]) + return ad.AnnData(X=X, obs=obs, var=var) + + +class TestHierarchicalHeatmapTemplate(unittest.TestCase): + """Real (non-mocked) tests for the hierarchical heatmap template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Annotation": "cell_type", + "Table_to_Visualize": "Original", + "Features_to_Visualize": ["All"], + "Standard_Scale": "None", + "Method": "average", + "Metric": "euclidean", + "Figure_Width": 6, + "Figure_Height": 4, + "Figure_DPI": 72, + "Font_Size": 8, + "Output_Directory": self.tmp_dir.name, + "outputs": { + "figures": {"type": "directory", "name": "figures_dir"}, + "dataframe": {"type": "file", "name": "dataframe.csv"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_hierarchical_heatmap_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run hierarchical heatmap and verify outputs. + + Validates: + 1. saved_files dict has 'figures' and 'dataframe' keys + 2. Figures directory contains non-empty PNG(s) + 3. Summary CSV exists + """ + saved_files = run_from_json( + self.json_file, + save_results_flag=True, + show_plot=False, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("figures", saved_files) + + figure_paths = saved_files["figures"] + self.assertGreaterEqual(len(figure_paths), 1) + for fig_path in figure_paths: + fig_file = Path(fig_path) + self.assertTrue(fig_file.exists()) + self.assertGreater(fig_file.stat().st_size, 0) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_histogram_template.py b/tests/templates/test_histogram_template.py new file mode 100644 index 00000000..5a8e49e8 --- /dev/null +++ b/tests/templates/test_histogram_template.py @@ -0,0 +1,111 @@ +# tests/templates/test_histogram_template.py +""" +Real (non-mocked) unit test for the Histogram template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import matplotlib +matplotlib.use("Agg") + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.histogram_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData: 4 cells, 2 genes for histogram plotting.""" + rng = np.random.default_rng(42) + X = rng.integers(1, 10, size=(4, 2)).astype(float) + obs = pd.DataFrame({"cell_type": ["A", "B", "A", "B"]}) + var = pd.DataFrame(index=["Gene_0", "Gene_1"]) + return ad.AnnData(X=X, obs=obs, var=var) + + +class TestHistogramTemplate(unittest.TestCase): + """Real (non-mocked) tests for the histogram template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Annotation": "cell_type", + "Table_to_Visualize": "Original", + "Feature_s_to_Plot": ["All"], + "Figure_Title": "Test Histogram", + "Legend_Title": "Cell Type", + "Figure_Width": 6, + "Figure_Height": 4, + "Figure_DPI": 72, + "Font_Size": 10, + "Number_of_Bins": 20, + "Output_Directory": self.tmp_dir.name, + "outputs": { + "dataframe": {"type": "file", "name": "dataframe.csv"}, + "figures": {"type": "directory", "name": "figures_dir"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_histogram_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run histogram and verify outputs. + + Validates: + 1. saved_files dict has 'figures' and 'dataframe' keys + 2. Figures directory contains non-empty PNG(s) + 3. Summary CSV exists and is non-empty + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + show_plot=False, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("figures", saved_files) + self.assertIn("dataframe", saved_files) + + # Figures + figure_paths = saved_files["figures"] + self.assertGreaterEqual(len(figure_paths), 1) + for fig_path in figure_paths: + fig_file = Path(fig_path) + self.assertTrue(fig_file.exists()) + self.assertGreater(fig_file.stat().st_size, 0) + + # CSV + csv_path = Path(saved_files["dataframe"]) + self.assertTrue(csv_path.exists()) + self.assertGreater(csv_path.stat().st_size, 0) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_interactive_spatial_plot_template.py b/tests/templates/test_interactive_spatial_plot_template.py new file mode 100644 index 00000000..ecb9e8f4 --- /dev/null +++ b/tests/templates/test_interactive_spatial_plot_template.py @@ -0,0 +1,97 @@ +# tests/templates/test_interactive_spatial_plot_template.py +""" +Real (non-mocked) unit test for the Interactive Spatial Plot template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.interactive_spatial_plot_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData: 8 cells with spatial coords.""" + rng = np.random.default_rng(42) + X = rng.random((8, 2)) + obs = pd.DataFrame({ + "cell_type": ["A", "B", "A", "B", "A", "B", "A", "B"], + }) + var = pd.DataFrame(index=["Gene_0", "Gene_1"]) + spatial = rng.random((8, 2)) * 100 + adata = ad.AnnData(X=X, obs=obs, var=var) + adata.obsm["spatial"] = spatial + return adata + + +class TestInteractiveSpatialPlotTemplate(unittest.TestCase): + """Real (non-mocked) tests for the interactive spatial plot template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Color_By": "Annotation", + "Annotation_s_to_Highlight": ["cell_type"], + "Feature_to_Highlight": "None", + "Dot_Size": 5, + "Output_Directory": self.tmp_dir.name, + "outputs": { + "html": {"type": "directory", "name": "html_dir"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_interactive_spatial_plot_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run interactive spatial plot and verify outputs. + + Validates: + 1. saved_files dict has 'html' key + 2. HTML directory contains non-empty file(s) + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("html", saved_files) + + html_paths = saved_files["html"] + self.assertGreaterEqual(len(html_paths), 1) + for html_path in html_paths: + html_file = Path(html_path) + self.assertTrue(html_file.exists()) + self.assertGreater(html_file.stat().st_size, 0) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_load_csv_files_with_config.py b/tests/templates/test_load_csv_files_with_config.py new file mode 100644 index 00000000..38a8fa1e --- /dev/null +++ b/tests/templates/test_load_csv_files_with_config.py @@ -0,0 +1,104 @@ +# tests/templates/test_load_csv_files_with_config.py +""" +Real (non-mocked) unit test for the Load CSV Files template. + +Snowball test -- validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import sys +import tempfile +import unittest +from pathlib import Path + +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.load_csv_files_template import run_from_json + + +class TestLoadCSVFilesWithConfig(unittest.TestCase): + """Real (non-mocked) tests for the load CSV files template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + + # Create CSV data directory + csv_dir = os.path.join(self.tmp_dir.name, "csv_data") + os.makedirs(csv_dir) + + df1 = pd.DataFrame({ + "Feature_A": [1.0, 2.0], + "Feature_B": [3.0, 4.0], + "ID": ["cell_1", "cell_2"], + }) + df2 = pd.DataFrame({ + "Feature_A": [5.0, 6.0], + "Feature_B": [7.0, 8.0], + "ID": ["cell_3", "cell_4"], + }) + + df1.to_csv(os.path.join(csv_dir, "data1.csv"), index=False) + df2.to_csv(os.path.join(csv_dir, "data2.csv"), index=False) + + # Configuration CSV with file_name column + metadata + config_df = pd.DataFrame({ + "file_name": ["data1.csv", "data2.csv"], + "experiment": ["Exp1", "Exp2"], + }) + config_file = os.path.join(self.tmp_dir.name, "config.csv") + config_df.to_csv(config_file, index=False) + + params = { + "CSV_Files": csv_dir, + "CSV_Files_Configuration": config_file, + "String_Columns": ["ID"], + "Output_Directory": self.tmp_dir.name, + "outputs": { + "dataframe": {"type": "file", "name": "dataframe.csv"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_load_csv_files_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: load CSV files with config and verify. + + Validates: + 1. saved_files dict has 'dataframe' key + 2. CSV exists and is non-empty + 3. Combined data has rows from both input files + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("dataframe", saved_files) + + csv_path = Path(saved_files["dataframe"]) + self.assertTrue(csv_path.exists()) + self.assertGreater(csv_path.stat().st_size, 0) + + result_df = pd.read_csv(csv_path) + self.assertEqual(len(result_df), 4) + + mem_df = run_from_json(self.json_file, save_to_disk=False) + self.assertIsInstance(mem_df, pd.DataFrame) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_manual_phenotyping_template.py b/tests/templates/test_manual_phenotyping_template.py new file mode 100644 index 00000000..c3a9227b --- /dev/null +++ b/tests/templates/test_manual_phenotyping_template.py @@ -0,0 +1,132 @@ +#!/usr/bin/env python3 +# tests/templates/test_manual_phenotyping_template.py +""" +Real (non-mocked) unit test for the Manual Phenotyping template. + +Validates template I/O behaviour only: + - Expected output files are produced on disk + - Filenames follow the convention + - Output artifacts are non-empty + +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import sys +import tempfile +import unittest +from pathlib import Path + +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.manual_phenotyping_template import run_from_json + + +def _make_tiny_dataframe() -> pd.DataFrame: + """ + Minimal synthetic DataFrame with binary phenotype marker columns. + + 4 rows -- each row has one positive marker matching a phenotype rule. + """ + return pd.DataFrame({ + "cd4": [1, 0, 0, 1], + "cd8": [0, 1, 0, 0], + "cd20": [0, 0, 1, 0], + "marker_intensity": [1.5, 2.5, 3.5, 4.5], + }) + + +def _make_phenotype_rules() -> pd.DataFrame: + """ + Phenotype rule table: maps binary codes to phenotype names. + + Each row uses a '+' or '-' code referencing column names. + """ + return pd.DataFrame({ + "phenotype_name": ["T_helper", "Cytotoxic_T", "B_cell"], + "phenotype_code": ["cd4+cd8-", "cd4-cd8+", "cd20+"], + }) + + +class TestManualPhenotypingTemplate(unittest.TestCase): + """Real (non-mocked) tests for the manual phenotyping template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.csv") + self.rules_file = os.path.join(self.tmp_dir.name, "phenotypes.csv") + + _make_tiny_dataframe().to_csv(self.in_file, index=False) + _make_phenotype_rules().to_csv(self.rules_file, index=False) + + params = { + "Upstream_Dataset": self.in_file, + "Phenotypes_Code": self.rules_file, + "Classification_Column_Prefix": "", + "Classification_Column_Suffix": "", + "Allow_Multiple_Phenotypes": True, + "Manual_Annotation_Name": "manual_phenotype", + "Output_Directory": self.tmp_dir.name, + "outputs": { + "dataframe": {"type": "file", "name": "dataframe.csv"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_manual_phenotyping_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run manual phenotyping template and verify + output artifacts. + + Validates: + 1. saved_files is a dict with 'dataframe' key + 2. Output CSV exists and is non-empty + 3. Phenotype annotation column is present in output + """ + # -- Act (save_to_disk=True) ----------------------------------- + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + # -- Assert: return type --------------------------------------- + self.assertIsInstance(saved_files, dict) + + # -- Assert: CSV file exists and is non-empty ------------------ + self.assertIn("dataframe", saved_files) + csv_path = Path(saved_files["dataframe"]) + self.assertTrue(csv_path.exists(), f"CSV not found: {csv_path}") + self.assertGreater(csv_path.stat().st_size, 0) + + # -- Assert: phenotype column present -------------------------- + result_df = pd.read_csv(csv_path) + self.assertIn("manual_phenotype", result_df.columns) + # At least some rows should have assigned phenotypes + non_null = result_df["manual_phenotype"].dropna() + self.assertGreater(len(non_null), 0) + + # -- Act (save_to_disk=False) ---------------------------------- + mem_df = run_from_json( + self.json_file, + save_to_disk=False, + ) + + # -- Assert: in-memory return is DataFrame --------------------- + self.assertIsInstance(mem_df, pd.DataFrame) + self.assertIn("manual_phenotype", mem_df.columns) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_nearest_neighbor_calculation_template.py b/tests/templates/test_nearest_neighbor_calculation_template.py new file mode 100644 index 00000000..cc888a57 --- /dev/null +++ b/tests/templates/test_nearest_neighbor_calculation_template.py @@ -0,0 +1,99 @@ +# tests/templates/test_nearest_neighbor_calculation_template.py +""" +Real (non-mocked) unit test for the Nearest Neighbor Calculation template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.nearest_neighbor_calculation_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData: 8 cells with spatial coords and annotation.""" + rng = np.random.default_rng(42) + X = rng.random((8, 2)) + obs = pd.DataFrame({ + "cell_type": ["A", "B", "A", "B", "A", "B", "A", "B"], + }) + var = pd.DataFrame(index=["Gene_0", "Gene_1"]) + spatial = rng.random((8, 2)) * 100 + adata = ad.AnnData(X=X, obs=obs, var=var) + adata.obsm["spatial"] = spatial + return adata + + +class TestNearestNeighborCalculationTemplate(unittest.TestCase): + """Real (non-mocked) tests for nearest neighbor calculation.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Annotation": "cell_type", + "ImageID": "None", + "Output_Directory": self.tmp_dir.name, + "outputs": { + "analysis": {"type": "file", "name": "output.pickle"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_nearest_neighbor_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: calculate nearest neighbors and verify outputs. + + Validates: + 1. saved_files dict has 'analysis' key + 2. Pickle contains AnnData with nearest neighbor results + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("analysis", saved_files) + + pkl_path = Path(saved_files["analysis"]) + self.assertTrue(pkl_path.exists()) + self.assertGreater(pkl_path.stat().st_size, 0) + + with open(pkl_path, "rb") as f: + result_adata = pickle.load(f) + self.assertIsInstance(result_adata, ad.AnnData) + + mem_adata = run_from_json(self.json_file, save_to_disk=False) + self.assertIsInstance(mem_adata, ad.AnnData) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_neighborhood_profile_template.py b/tests/templates/test_neighborhood_profile_template.py new file mode 100644 index 00000000..36a1fad9 --- /dev/null +++ b/tests/templates/test_neighborhood_profile_template.py @@ -0,0 +1,97 @@ +# tests/templates/test_neighborhood_profile_template.py +""" +Real (non-mocked) unit test for the Neighborhood Profile template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.neighborhood_profile_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData: 20 cells with spatial coords and annotation.""" + rng = np.random.default_rng(42) + X = rng.random((20, 2)) + obs = pd.DataFrame({ + "cell_type": (["A"] * 10) + (["B"] * 10), + }) + var = pd.DataFrame(index=["Gene_0", "Gene_1"]) + spatial = rng.random((20, 2)) * 100 + adata = ad.AnnData(X=X, obs=obs, var=var) + adata.obsm["spatial"] = spatial + return adata + + +class TestNeighborhoodProfileTemplate(unittest.TestCase): + """Real (non-mocked) tests for the neighborhood profile template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Annotation_of_interest": "cell_type", + "Bins": [10, 25, 50], + "Anchor_Neighbor_List": ["A;B"], + "Stratify_By": "None", + "Output_Directory": self.tmp_dir.name, + "outputs": { + "dataframe": {"type": "directory", "name": "dataframe_dir"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_neighborhood_profile_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: compute neighborhood profiles and verify. + + Validates: + 1. saved_files dict has 'dataframe' key + 2. Output directory contains CSV file(s) + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("dataframe", saved_files) + + csv_paths = saved_files["dataframe"] + self.assertGreaterEqual(len(csv_paths), 1) + for csv_path in csv_paths: + csv_file = Path(csv_path) + self.assertTrue(csv_file.exists()) + self.assertGreater(csv_file.stat().st_size, 0) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_normalize_batch_template.py b/tests/templates/test_normalize_batch_template.py new file mode 100644 index 00000000..f6d87e61 --- /dev/null +++ b/tests/templates/test_normalize_batch_template.py @@ -0,0 +1,97 @@ +# tests/templates/test_normalize_batch_template.py +""" +Real (non-mocked) unit test for the Normalize Batch template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.normalize_batch_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData: 6 cells, 2 genes, 2 batches for batch normalization.""" + rng = np.random.default_rng(42) + X = rng.integers(1, 50, size=(6, 2)).astype(float) + obs = pd.DataFrame({ + "batch": ["A", "A", "A", "B", "B", "B"], + "cell_type": ["T", "B", "T", "B", "T", "B"], + }) + var = pd.DataFrame(index=["Gene_0", "Gene_1"]) + return ad.AnnData(X=X, obs=obs, var=var) + + +class TestNormalizeBatchTemplate(unittest.TestCase): + """Real (non-mocked) tests for the normalize batch template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Annotation": "batch", + "Need_Normalization": True, + "Output_Directory": self.tmp_dir.name, + "outputs": { + "analysis": {"type": "file", "name": "output.pickle"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_normalize_batch_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run normalize batch and verify outputs. + + Validates: + 1. saved_files is a dict with 'analysis' key + 2. Output pickle exists, is non-empty, and contains AnnData + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("analysis", saved_files) + + pkl_path = Path(saved_files["analysis"]) + self.assertTrue(pkl_path.exists()) + self.assertGreater(pkl_path.stat().st_size, 0) + + with open(pkl_path, "rb") as f: + result_adata = pickle.load(f) + self.assertIsInstance(result_adata, ad.AnnData) + + mem_adata = run_from_json(self.json_file, save_to_disk=False) + self.assertIsInstance(mem_adata, ad.AnnData) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_phenograph_clustering_template.py b/tests/templates/test_phenograph_clustering_template.py new file mode 100644 index 00000000..d87dc3fe --- /dev/null +++ b/tests/templates/test_phenograph_clustering_template.py @@ -0,0 +1,103 @@ +# tests/templates/test_phenograph_clustering_template.py +""" +Real (non-mocked) unit test for the Phenograph Clustering template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.phenograph_clustering_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData: 50 cells, 5 genes for Phenograph clustering.""" + rng = np.random.default_rng(42) + # Two distinct clusters + X_a = rng.normal(0, 1, size=(25, 5)) + X_b = rng.normal(5, 1, size=(25, 5)) + X = np.vstack([X_a, X_b]) + obs = pd.DataFrame({"cell_type": ["A"] * 25 + ["B"] * 25}) + var = pd.DataFrame(index=[f"Gene_{i}" for i in range(5)]) + return ad.AnnData(X=X, obs=obs, var=var) + + +class TestPhenographClusteringTemplate(unittest.TestCase): + """Real (non-mocked) tests for the phenograph clustering template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Table_to_Process": "Original", + "K_Nearest_Neighbors": 10, + "Seed": 42, + "Resolution_Parameter": 1.0, + "Output_Annotation_Name": "phenograph", + "Number_of_Iterations": 10, + "Output_Directory": self.tmp_dir.name, + "outputs": { + "analysis": {"type": "file", "name": "output.pickle"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_phenograph_clustering_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run phenograph clustering and verify outputs. + + Validates: + 1. saved_files dict has 'analysis' key + 2. Pickle contains AnnData with 'phenograph' obs column + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("analysis", saved_files) + + pkl_path = Path(saved_files["analysis"]) + self.assertTrue(pkl_path.exists()) + self.assertGreater(pkl_path.stat().st_size, 0) + + with open(pkl_path, "rb") as f: + result_adata = pickle.load(f) + self.assertIsInstance(result_adata, ad.AnnData) + self.assertIn("phenograph", result_adata.obs.columns) + + mem_adata = run_from_json(self.json_file, save_to_disk=False) + self.assertIsInstance(mem_adata, ad.AnnData) + self.assertIn("phenograph", mem_adata.obs.columns) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_posit_it_python_template.py b/tests/templates/test_posit_it_python_template.py new file mode 100644 index 00000000..fdfd64f6 --- /dev/null +++ b/tests/templates/test_posit_it_python_template.py @@ -0,0 +1,131 @@ +# tests/templates/test_posit_it_python_template.py +""" +Real (non-mocked) unit test for the Posit-It Python template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import sys +import tempfile +import unittest +from pathlib import Path + +import matplotlib +matplotlib.use("Agg") + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.posit_it_python_template import run_from_json + + +class TestPostItPythonTemplate(unittest.TestCase): + """Real (non-mocked) tests for the posit-it python template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + + params = { + "Label": "Test Note", + "Label_font_color": "Black", + "Label_font_size": "40", + "Label_font_type": "normal", + "Label_font_family": "Arial", + "Label_Bold": "False", + "Background_fill_color": "Yellow1", + "Background_fill_opacity": "10", + "Page_width": "6", + "Page_height": "2", + "Page_DPI": "72", + "Output_Directory": self.tmp_dir.name, + "outputs": { + "figures": {"type": "directory", "name": "figures_dir"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_posit_it_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run posit-it template and verify outputs. + + Validates: + 1. save_to_disk=True returns a dict with 'figures' key + 2. Figures directory contains a non-empty PNG + 3. save_to_disk=False returns a matplotlib Figure with correct text + """ + # -- Act (save_to_disk=True): write outputs to disk ------------ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + show_plot=False, + output_dir=self.tmp_dir.name, + ) + + # -- Act (save_to_disk=False): get figure in memory ------------ + fig = run_from_json( + self.json_file, + save_to_disk=False, + show_plot=False, + ) + + # -- Assert: return type --------------------------------------- + self.assertIsInstance( + saved_files, dict, + f"Expected dict from run_from_json, got {type(saved_files)}" + ) + + # -- Assert: figures directory contains at least one PNG ------- + self.assertIn("figures", saved_files, + "Missing 'figures' key in saved_files") + figure_paths = saved_files["figures"] + self.assertGreaterEqual( + len(figure_paths), 1, "No figure files were saved" + ) + + for fig_path in figure_paths: + fig_file = Path(fig_path) + self.assertTrue( + fig_file.exists(), f"Figure not found: {fig_path}" + ) + self.assertGreater( + fig_file.stat().st_size, 0, + f"Figure file is empty: {fig_path}" + ) + self.assertEqual( + fig_file.suffix, ".png", + f"Expected .png extension, got {fig_file.suffix}" + ) + + # -- Assert: in-memory figure is valid ------------------------- + import matplotlib.figure + self.assertIsInstance( + fig, matplotlib.figure.Figure, + f"Expected matplotlib Figure, got {type(fig)}" + ) + + # The figure text at (0.5, 0.5) should contain "Test Note" + text_artists = fig.texts + self.assertGreaterEqual( + len(text_artists), 1, + "Figure has no text artists" + ) + # First text artist is the label placed by fig.text(0.5, 0.5, ...) + self.assertEqual( + text_artists[0].get_text(), "Test Note", + f"Expected figure text 'Test Note', " + f"got '{text_artists[0].get_text()}'" + ) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_quantile_scaling_template.py b/tests/templates/test_quantile_scaling_template.py new file mode 100644 index 00000000..13c93046 --- /dev/null +++ b/tests/templates/test_quantile_scaling_template.py @@ -0,0 +1,100 @@ +# tests/templates/test_quantile_scaling_template.py +""" +Real (non-mocked) unit test for the Quantile Scaling template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.quantile_scaling_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData: 4 cells, 2 genes for quantile scaling.""" + rng = np.random.default_rng(42) + X = rng.integers(1, 100, size=(4, 2)).astype(float) + obs = pd.DataFrame({"cell_type": ["A", "B", "A", "B"]}) + var = pd.DataFrame(index=["Gene_0", "Gene_1"]) + return ad.AnnData(X=X, obs=obs, var=var) + + +class TestQuantileScalingTemplate(unittest.TestCase): + """Real (non-mocked) tests for the quantile scaling template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Table_to_Normalize": "Original", + "Lower_Quantile": "0.01", + "Upper_Quantile": "0.99", + "Output_Directory": self.tmp_dir.name, + "outputs": { + "analysis": {"type": "file", "name": "output.pickle"}, + "html": {"type": "directory", "name": "html_dir"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_quantile_scaling_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run quantile scaling and verify outputs. + + Validates: + 1. saved_files dict has 'analysis' key + 2. Pickle exists, is non-empty, contains AnnData with normalized layer + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("analysis", saved_files) + + pkl_path = Path(saved_files["analysis"]) + self.assertTrue(pkl_path.exists()) + self.assertGreater(pkl_path.stat().st_size, 0) + + with open(pkl_path, "rb") as f: + result_adata = pickle.load(f) + self.assertIsInstance(result_adata, ad.AnnData) + # quantile scaling creates a layer named "quantile__" + layer_names = list(result_adata.layers.keys()) + self.assertGreater(len(layer_names), 0) + + mem_result = run_from_json(self.json_file, save_to_disk=False) + # save_to_disk=False returns (adata, fig) tuple + self.assertIsNotNone(mem_result) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_relational_heatmap_template.py b/tests/templates/test_relational_heatmap_template.py new file mode 100644 index 00000000..8c2db32a --- /dev/null +++ b/tests/templates/test_relational_heatmap_template.py @@ -0,0 +1,139 @@ +# tests/templates/test_relational_heatmap_template.py +""" +Real (non-mocked) unit test for the Relational Heatmap template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import matplotlib +matplotlib.use("Agg") + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.relational_heatmap_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData: 8 cells, 3 genes, 2 groups for heatmap.""" + rng = np.random.default_rng(42) + X = rng.integers(1, 20, size=(8, 3)).astype(float) + obs = pd.DataFrame({ + "cell_type": ["A", "A", "B", "B", "A", "A", "B", "B"], + }) + var = pd.DataFrame(index=["Gene_0", "Gene_1", "Gene_2"]) + return ad.AnnData(X=X, obs=obs, var=var) + + +class TestRelationalHeatmapTemplate(unittest.TestCase): + """Real (non-mocked) tests for the relational heatmap template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Source_Annotation_Name": "cell_type", + "Target_Annotation_Name": "cell_type", + "Figure_Width_inch": 6, + "Figure_Height_inch": 4, + "Figure_DPI": 72, + "Font_Size": 8, + "Output_Directory": self.tmp_dir.name, + "outputs": { + "figures": {"type": "directory", "name": "figures_dir"}, + "html": {"type": "directory", "name": "html_dir"}, + "dataframe": {"type": "file", "name": "dataframe.csv"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_relational_heatmap_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run relational heatmap with show_static_image=False + (default). + + Validates: + 1. saved_files dict has 'html' key (interactive HTML is default output) + 2. HTML file exists and is non-empty + 3. No 'figures' key when show_static_image=False + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("html", saved_files) + + html_paths = saved_files["html"] + self.assertGreaterEqual(len(html_paths), 1) + for html_path in html_paths: + html_file = Path(html_path) + self.assertTrue(html_file.exists()) + self.assertGreater(html_file.stat().st_size, 0) + + # When show_static_image defaults to False, no figures produced + self.assertNotIn("figures", saved_files) + + def test_relational_heatmap_with_static_image(self) -> None: + """ + End-to-end I/O test: run relational heatmap with show_static_image=True. + + Validates: + 1. saved_files dict has both 'figures' and 'html' keys + 2. Figure PNG and HTML files exist and are non-empty + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + show_static_image=True, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("figures", saved_files) + self.assertIn("html", saved_files) + + figure_paths = saved_files["figures"] + self.assertGreaterEqual(len(figure_paths), 1) + for fig_path in figure_paths: + fig_file = Path(fig_path) + self.assertTrue(fig_file.exists()) + self.assertGreater(fig_file.stat().st_size, 0) + + html_paths = saved_files["html"] + self.assertGreaterEqual(len(html_paths), 1) + for html_path in html_paths: + html_file = Path(html_path) + self.assertTrue(html_file.exists()) + self.assertGreater(html_file.stat().st_size, 0) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_rename_labels_template.py b/tests/templates/test_rename_labels_template.py new file mode 100644 index 00000000..41842124 --- /dev/null +++ b/tests/templates/test_rename_labels_template.py @@ -0,0 +1,109 @@ +# tests/templates/test_rename_labels_template.py +""" +Real (non-mocked) unit test for the Rename Labels template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.rename_labels_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData: 4 cells with cell_type annotation to rename.""" + rng = np.random.default_rng(42) + X = rng.random((4, 2)) + obs = pd.DataFrame({"cell_type": ["A", "B", "A", "B"]}) + var = pd.DataFrame(index=["Gene_0", "Gene_1"]) + return ad.AnnData(X=X, obs=obs, var=var) + + +class TestRenameLabelsTemplate(unittest.TestCase): + """Real (non-mocked) tests for the rename labels template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + # Create mapping CSV: old_label -> new_label + mapping_df = pd.DataFrame({ + "Original": ["A", "B"], + "New": ["Alpha", "Beta"], + }) + self.mapping_file = os.path.join(self.tmp_dir.name, "mapping.csv") + mapping_df.to_csv(self.mapping_file, index=False) + + params = { + "Upstream_Analysis": self.in_file, + "Source_Annotation": "cell_type", + "Cluster_Mapping_Dictionary": self.mapping_file, + "New_Annotation": "cell_type_renamed", + "Output_Directory": self.tmp_dir.name, + "outputs": { + "analysis": {"type": "file", "name": "output.pickle"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_rename_labels_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run rename labels and verify outputs. + + Validates: + 1. saved_files dict has 'analysis' key + 2. Pickle exists, is non-empty, contains AnnData + 3. Renamed annotation column is present with new values + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("analysis", saved_files) + + pkl_path = Path(saved_files["analysis"]) + self.assertTrue(pkl_path.exists()) + self.assertGreater(pkl_path.stat().st_size, 0) + + with open(pkl_path, "rb") as f: + result_adata = pickle.load(f) + self.assertIsInstance(result_adata, ad.AnnData) + self.assertIn("cell_type_renamed", result_adata.obs.columns) + self.assertEqual( + set(result_adata.obs["cell_type_renamed"].unique()), + {"Alpha", "Beta"}, + ) + + mem_adata = run_from_json(self.json_file, save_to_disk=False) + self.assertIsInstance(mem_adata, ad.AnnData) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_ripley_l_template.py b/tests/templates/test_ripley_l_template.py new file mode 100644 index 00000000..1bc926ac --- /dev/null +++ b/tests/templates/test_ripley_l_template.py @@ -0,0 +1,108 @@ +# tests/templates/test_ripley_l_template.py +""" +Real (non-mocked) unit test for the Ripley L Calculation template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.ripley_l_calculation_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData: 20 cells with spatial coords for Ripley L.""" + rng = np.random.default_rng(42) + X = rng.random((20, 2)) + obs = pd.DataFrame({ + "cell_type": (["A"] * 10) + (["B"] * 10), + }) + var = pd.DataFrame(index=["Gene_0", "Gene_1"]) + spatial = rng.random((20, 2)) * 100 + adata = ad.AnnData(X=X, obs=obs, var=var) + adata.obsm["spatial"] = spatial + return adata + + +class TestRipleyLTemplate(unittest.TestCase): + """Real (non-mocked) tests for the Ripley L calculation template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Radii": [5, 10, 20], + "Annotation": "cell_type", + "Center_Phenotype": "A", + "Neighbor_Phenotype": "B", + "Stratify_By": "None", + "Number_of_Simulations": 5, + "Seed": 42, + "Spatial_Key": "spatial", + "Edge_Correction": True, + "Output_Directory": self.tmp_dir.name, + "outputs": { + "analysis": {"type": "file", "name": "output.pickle"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_ripley_l_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run Ripley L calculation and verify outputs. + + Validates: + 1. saved_files dict has 'analysis' key + 2. Pickle contains AnnData with Ripley results in .uns + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("analysis", saved_files) + + pkl_path = Path(saved_files["analysis"]) + self.assertTrue(pkl_path.exists()) + self.assertGreater(pkl_path.stat().st_size, 0) + + with open(pkl_path, "rb") as f: + result_adata = pickle.load(f) + self.assertIsInstance(result_adata, ad.AnnData) + # Ripley results stored in .uns + self.assertGreater(len(result_adata.uns), 0) + + mem_adata = run_from_json(self.json_file, save_to_disk=False) + self.assertIsInstance(mem_adata, ad.AnnData) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_sankey_plot_template.py b/tests/templates/test_sankey_plot_template.py new file mode 100644 index 00000000..dc73a2c0 --- /dev/null +++ b/tests/templates/test_sankey_plot_template.py @@ -0,0 +1,132 @@ +# tests/templates/test_sankey_plot_template.py +""" +Real (non-mocked) unit test for the Sankey Plot template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import matplotlib +matplotlib.use("Agg") + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.sankey_plot_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData: 8 cells with two annotation columns for Sankey.""" + rng = np.random.default_rng(42) + X = rng.random((8, 2)) + obs = pd.DataFrame({ + "cell_type": ["A", "A", "B", "B", "A", "A", "B", "B"], + "cluster": ["1", "2", "1", "2", "1", "2", "1", "2"], + }) + var = pd.DataFrame(index=["Gene_0", "Gene_1"]) + return ad.AnnData(X=X, obs=obs, var=var) + + +class TestSankeyPlotTemplate(unittest.TestCase): + """Real (non-mocked) tests for the sankey plot template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Source_Annotation_Name": "cell_type", + "Target_Annotation_Name": "cluster", + "Figure_Width_inch": 6, + "Figure_Height_inch": 6, + "Font_Size": 10, + "Output_Directory": self.tmp_dir.name, + "outputs": { + "figures": {"type": "directory", "name": "figures_dir"}, + "html": {"type": "directory", "name": "html_dir"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_sankey_plot_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run sankey plot with show_static_image=False + (default). + + Validates: + 1. saved_files dict has 'html' key (interactive HTML is default) + 2. HTML output files exist and are non-empty + 3. No 'figures' key when show_static_image=False + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("html", saved_files) + + html_paths = saved_files["html"] + self.assertGreaterEqual(len(html_paths), 1) + for p in html_paths: + pf = Path(p) + self.assertTrue(pf.exists()) + self.assertGreater(pf.stat().st_size, 0) + + # When show_static_image defaults to False, no figures produced + self.assertNotIn("figures", saved_files) + + def test_sankey_plot_with_static_image(self) -> None: + """ + End-to-end I/O test: run sankey plot with show_static_image=True. + + Validates: + 1. saved_files dict has both 'figures' and 'html' keys + 2. Figure PNG and HTML files exist and are non-empty + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + show_static_image=True, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("figures", saved_files) + self.assertIn("html", saved_files) + + for key in ["html", "figures"]: + paths = saved_files[key] + self.assertGreaterEqual(len(paths), 1) + for p in paths: + pf = Path(p) + self.assertTrue(pf.exists()) + self.assertGreater(pf.stat().st_size, 0) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_select_values_template.py b/tests/templates/test_select_values_template.py new file mode 100644 index 00000000..abfd4c8d --- /dev/null +++ b/tests/templates/test_select_values_template.py @@ -0,0 +1,112 @@ +# tests/templates/test_select_values_template.py +""" +Real (non-mocked) unit test for the Select Values template. + +Validates template I/O behaviour only: + - Expected output files are produced on disk + - Filenames follow the convention + - Output artifacts are non-empty + +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import sys +import tempfile +import unittest +from pathlib import Path + +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.select_values_template import run_from_json + + +def _make_tiny_dataframe() -> pd.DataFrame: + """ + Minimal synthetic DataFrame for value filtering. + + 6 rows, 3 cell types -- enough to test include-based selection. + """ + return pd.DataFrame({ + "cell_type": ["A", "B", "C", "A", "B", "C"], + "marker": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0], + }) + + +class TestSelectValuesTemplate(unittest.TestCase): + """Real (non-mocked) tests for the select values template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.csv") + + _make_tiny_dataframe().to_csv(self.in_file, index=False) + + params = { + "Upstream_Dataset": self.in_file, + "Annotation_of_Interest": "cell_type", + "Label_s_of_Interest": ["A", "B"], + "Output_Directory": self.tmp_dir.name, + "outputs": { + "dataframe": {"type": "file", "name": "dataframe.csv"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_select_values_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run select values template and verify + output artifacts. + + Validates: + 1. saved_files is a dict with 'dataframe' key + 2. Output CSV exists and is non-empty + 3. Only selected values (A, B) remain in the output + """ + # -- Act (save_to_disk=True) ----------------------------------- + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + # -- Assert: return type --------------------------------------- + self.assertIsInstance(saved_files, dict) + + # -- Assert: CSV file exists and is non-empty ------------------ + self.assertIn("dataframe", saved_files) + csv_path = Path(saved_files["dataframe"]) + self.assertTrue(csv_path.exists(), f"CSV not found: {csv_path}") + self.assertGreater(csv_path.stat().st_size, 0) + + # -- Assert: only selected values remain ----------------------- + result_df = pd.read_csv(csv_path) + self.assertEqual(len(result_df), 4) + self.assertEqual( + set(result_df["cell_type"].unique()), {"A", "B"} + ) + + # -- Act (save_to_disk=False) ---------------------------------- + mem_df = run_from_json( + self.json_file, + save_to_disk=False, + ) + + # -- Assert: in-memory return is DataFrame --------------------- + self.assertIsInstance(mem_df, pd.DataFrame) + self.assertEqual(len(mem_df), 4) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_setup_analysis_template.py b/tests/templates/test_setup_analysis_template.py new file mode 100644 index 00000000..79fe9fa3 --- /dev/null +++ b/tests/templates/test_setup_analysis_template.py @@ -0,0 +1,106 @@ +# tests/templates/test_setup_analysis_template.py +""" +Real (non-mocked) unit test for the Setup Analysis template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.setup_analysis_template import run_from_json + + +def _make_tiny_dataframe() -> pd.DataFrame: + """ + Minimal synthetic DataFrame simulating raw cell data. + + 4 cells with spatial coordinates, features, and an annotation column. + """ + return pd.DataFrame({ + "Gene_0": [1.0, 2.0, 3.0, 4.0], + "Gene_1": [5.0, 6.0, 7.0, 8.0], + "X_coord": [10.0, 20.0, 30.0, 40.0], + "Y_coord": [11.0, 21.0, 31.0, 41.0], + "cell_type": ["A", "B", "A", "B"], + }) + + +class TestSetupAnalysisTemplate(unittest.TestCase): + """Real (non-mocked) tests for the setup analysis template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.csv") + + _make_tiny_dataframe().to_csv(self.in_file, index=False) + + params = { + "Upstream_Dataset": self.in_file, + "Features_to_Analyze": ["Gene_0", "Gene_1"], + "Annotation_s_": ["cell_type"], + "X_Coordinate_Column": "X_coord", + "Y_Coordinate_Column": "Y_coord", + "Output_File": "output.pickle", + "Output_Directory": self.tmp_dir.name, + "outputs": { + "analysis": {"type": "file", "name": "output.pickle"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_setup_analysis_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run setup analysis and verify outputs. + + Validates: + 1. saved_files dict has 'analysis' key + 2. Pickle exists, is non-empty, contains AnnData + 3. AnnData has correct features, obs, and spatial coords + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("analysis", saved_files) + + pkl_path = Path(saved_files["analysis"]) + self.assertTrue(pkl_path.exists()) + self.assertGreater(pkl_path.stat().st_size, 0) + + with open(pkl_path, "rb") as f: + result_adata = pickle.load(f) + self.assertIsInstance(result_adata, ad.AnnData) + self.assertEqual(result_adata.n_obs, 4) + self.assertIn("cell_type", result_adata.obs.columns) + self.assertIn("spatial", result_adata.obsm) + + mem_adata = run_from_json(self.json_file, save_to_disk=False) + self.assertIsInstance(mem_adata, ad.AnnData) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_spatial_interaction_template.py b/tests/templates/test_spatial_interaction_template.py new file mode 100644 index 00000000..e531f8c9 --- /dev/null +++ b/tests/templates/test_spatial_interaction_template.py @@ -0,0 +1,112 @@ +# tests/templates/test_spatial_interaction_template.py +""" +Real (non-mocked) unit test for the Spatial Interaction template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import matplotlib +matplotlib.use("Agg") + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.spatial_interaction_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData: 20 cells with spatial coords for interaction.""" + rng = np.random.default_rng(42) + X = rng.random((20, 2)) + obs = pd.DataFrame({ + "cell_type": (["A"] * 10) + (["B"] * 10), + }) + var = pd.DataFrame(index=["Gene_0", "Gene_1"]) + spatial = rng.random((20, 2)) * 100 + adata = ad.AnnData(X=X, obs=obs, var=var) + adata.obsm["spatial"] = spatial + return adata + + +class TestSpatialInteractionTemplate(unittest.TestCase): + """Real (non-mocked) tests for the spatial interaction template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Annotation": "cell_type", + "Spatial_Analysis_Method": "Neighborhood Enrichment", + "Stratify_By": ["None"], + "K_Nearest_Neighbors": 6, + "Seed": 42, + "Coordinate_Type": "None", + "Radius": "None", + "Figure_Width": 6, + "Figure_Height": 4, + "Figure_DPI": 72, + "Font_Size": 10, + "Color_Bar_Range": "Automatic", + "Output_Directory": self.tmp_dir.name, + "outputs": { + "figures": {"type": "directory", "name": "figures"}, + "dataframes": {"type": "directory", "name": "matrices"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_spatial_interaction_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run spatial interaction and verify outputs. + + Validates: + 1. saved_files dict has 'figures' and/or 'dataframes' keys + 2. Output files exist and are non-empty + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + show_plot=False, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertGreater(len(saved_files), 0) + + for key in ["figures", "dataframes"]: + if key in saved_files: + paths = saved_files[key] + self.assertGreaterEqual(len(paths), 1) + for p in paths: + pf = Path(p) + self.assertTrue(pf.exists()) + self.assertGreater(pf.stat().st_size, 0) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_spatial_plot_template.py b/tests/templates/test_spatial_plot_template.py new file mode 100644 index 00000000..2373f894 --- /dev/null +++ b/tests/templates/test_spatial_plot_template.py @@ -0,0 +1,107 @@ +# tests/templates/test_spatial_plot_template.py +""" +Real (non-mocked) unit test for the Spatial Plot template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import matplotlib +matplotlib.use("Agg") + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.spatial_plot_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData: 8 cells with spatial coords for plotting.""" + rng = np.random.default_rng(42) + X = rng.random((8, 2)) + obs = pd.DataFrame({ + "cell_type": ["A", "B", "A", "B", "A", "B", "A", "B"], + }) + var = pd.DataFrame(index=["Gene_0", "Gene_1"]) + spatial = rng.random((8, 2)) * 100 + adata = ad.AnnData(X=X, obs=obs, var=var) + adata.obsm["spatial"] = spatial + return adata + + +class TestSpatialPlotTemplate(unittest.TestCase): + """Real (non-mocked) tests for the spatial plot template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Color_By": "Annotation", + "Annotation_to_Highlight": "cell_type", + "Feature_to_Highlight": "None", + "Stratify": False, + "Stratify_By": [], + "Figure_Width": 6, + "Figure_Height": 4, + "Figure_DPI": 72, + "Font_Size": 10, + "Dot_Size": 50, + "Output_Directory": self.tmp_dir.name, + "outputs": { + "figures": {"type": "directory", "name": "figures_dir"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_spatial_plot_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run spatial plot and verify outputs. + + Validates: + 1. saved_files dict has 'figures' key + 2. Figures directory contains non-empty PNG(s) + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + show_plots=False, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("figures", saved_files) + + figure_paths = saved_files["figures"] + self.assertGreaterEqual(len(figure_paths), 1) + for fig_path in figure_paths: + fig_file = Path(fig_path) + self.assertTrue(fig_file.exists()) + self.assertGreater(fig_file.stat().st_size, 0) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_subset_analysis_template.py b/tests/templates/test_subset_analysis_template.py new file mode 100644 index 00000000..6d601c4a --- /dev/null +++ b/tests/templates/test_subset_analysis_template.py @@ -0,0 +1,103 @@ +# tests/templates/test_subset_analysis_template.py +""" +Real (non-mocked) unit test for the Subset Analysis template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.subset_analysis_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData: 6 cells, 3 cell types for subset filtering.""" + rng = np.random.default_rng(42) + X = rng.random((6, 2)) + obs = pd.DataFrame({ + "cell_type": ["A", "B", "C", "A", "B", "C"], + }) + var = pd.DataFrame(index=["Gene_0", "Gene_1"]) + return ad.AnnData(X=X, obs=obs, var=var) + + +class TestSubsetAnalysisTemplate(unittest.TestCase): + """Real (non-mocked) tests for the subset analysis template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Annotation_of_interest": "cell_type", + "Labels": ["A", "B"], + "Output_Directory": self.tmp_dir.name, + "outputs": { + "analysis": {"type": "file", "name": "transform_output.pickle"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_subset_analysis_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run subset analysis and verify outputs. + + Validates: + 1. saved_files dict has 'analysis' key + 2. Pickle exists, is non-empty, contains AnnData + 3. Subset has fewer cells than original (only A and B) + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("analysis", saved_files) + + pkl_path = Path(saved_files["analysis"]) + self.assertTrue(pkl_path.exists()) + self.assertGreater(pkl_path.stat().st_size, 0) + + with open(pkl_path, "rb") as f: + result_adata = pickle.load(f) + self.assertIsInstance(result_adata, ad.AnnData) + # 6 original cells, selecting A and B = 4 cells + self.assertEqual(result_adata.n_obs, 4) + self.assertEqual( + set(result_adata.obs["cell_type"].unique()), {"A", "B"} + ) + + mem_adata = run_from_json(self.json_file, save_to_disk=False) + self.assertIsInstance(mem_adata, ad.AnnData) + self.assertEqual(mem_adata.n_obs, 4) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_summarize_annotation_statistics_template.py b/tests/templates/test_summarize_annotation_statistics_template.py new file mode 100644 index 00000000..7454a2ff --- /dev/null +++ b/tests/templates/test_summarize_annotation_statistics_template.py @@ -0,0 +1,97 @@ +# tests/templates/test_summarize_annotation_statistics_template.py +""" +Real (non-mocked) unit test for the Summarize Annotation Statistics template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.summarize_annotation_statistics_template import ( + run_from_json, +) + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData: 6 cells with cell_type annotation for statistics.""" + rng = np.random.default_rng(42) + X = rng.random((6, 2)) + obs = pd.DataFrame({ + "cell_type": ["A", "A", "B", "B", "B", "C"], + }) + var = pd.DataFrame(index=["Gene_0", "Gene_1"]) + return ad.AnnData(X=X, obs=obs, var=var) + + +class TestSummarizeAnnotationStatisticsTemplate(unittest.TestCase): + """Real (non-mocked) tests for summarize annotation statistics.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Annotation": "cell_type", + "Output_Directory": self.tmp_dir.name, + "outputs": { + "dataframe": {"type": "file", "name": "dataframe.csv"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_summarize_annotation_stats_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: summarize annotation stats and verify outputs. + + Validates: + 1. saved_files dict has 'dataframe' key + 2. CSV exists and is non-empty + 3. Summary includes count/percentage information + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("dataframe", saved_files) + + csv_path = Path(saved_files["dataframe"]) + self.assertTrue(csv_path.exists()) + self.assertGreater(csv_path.stat().st_size, 0) + + result_df = pd.read_csv(csv_path) + self.assertGreater(len(result_df), 0) + + mem_df = run_from_json(self.json_file, save_to_disk=False) + self.assertIsInstance(mem_df, pd.DataFrame) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_summarize_dataframe_template.py b/tests/templates/test_summarize_dataframe_template.py new file mode 100644 index 00000000..516a3053 --- /dev/null +++ b/tests/templates/test_summarize_dataframe_template.py @@ -0,0 +1,85 @@ +# tests/templates/test_summarize_dataframe_template.py +""" +Real (non-mocked) unit test for the Summarize DataFrame template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import sys +import tempfile +import unittest +from pathlib import Path + +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.summarize_dataframe_template import run_from_json + + +def _make_tiny_dataframe() -> pd.DataFrame: + """Minimal synthetic DataFrame for summarization.""" + return pd.DataFrame({ + "cell_type": ["A", "B", "A", "B", "C", "C"], + "marker_1": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0], + "marker_2": [10.0, 20.0, 30.0, 40.0, 50.0, 60.0], + }) + + +class TestSummarizeDataFrameTemplate(unittest.TestCase): + """Real (non-mocked) tests for the summarize dataframe template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.csv") + + _make_tiny_dataframe().to_csv(self.in_file, index=False) + + params = { + "Upstream_Dataset": self.in_file, + "Columns": ["cell_type", "marker_1", "marker_2"], + "Output_Directory": self.tmp_dir.name, + "outputs": { + "html": {"type": "directory", "name": "html_dir"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_summarize_dataframe_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: summarize dataframe and verify outputs. + + Validates: + 1. saved_files dict has 'html' key + 2. HTML directory contains non-empty file(s) + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("html", saved_files) + + html_paths = saved_files["html"] + self.assertGreaterEqual(len(html_paths), 1) + for html_path in html_paths: + html_file = Path(html_path) + self.assertTrue(html_file.exists()) + self.assertGreater(html_file.stat().st_size, 0) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_template_utils.py b/tests/templates/test_template_utils.py new file mode 100644 index 00000000..76cd2a58 --- /dev/null +++ b/tests/templates/test_template_utils.py @@ -0,0 +1,989 @@ +# tests/templates/test_template_utils.py +""" +Real (non-mocked) unit tests for template utility functions. + +Validates utility I/O behaviour only: + • Functions produce correct outputs from real inputs + • File I/O operations work on real filesystem + • Error messages are accurate + +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +import warnings +import anndata as ad +import numpy as np +import pandas as pd +from pathlib import Path +import matplotlib.pyplot as plt + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.template_utils import ( + load_input, + save_results, + _save_single_object, + text_to_value, + convert_pickle_to_h5ad, + convert_to_floats, + spell_out_special_characters, + load_csv_files, + parse_params, + string_list_to_dictionary, + clean_column_name, +) + + +def create_test_adata(n_cells: int = 10) -> ad.AnnData: + """Return a minimal synthetic AnnData for fast tests.""" + rng = np.random.default_rng(0) + obs = pd.DataFrame({ + "cell_type": ["TypeA", "TypeB"] * (n_cells // 2) + }) + x_mat = rng.normal(size=(n_cells, 2)) + adata = ad.AnnData(X=x_mat, obs=obs) + return adata + + +def create_test_dataframe(n_rows: int = 5) -> pd.DataFrame: + """Return a minimal DataFrame for fast tests.""" + return pd.DataFrame({ + "col1": range(n_rows), + "col2": [f"value_{i}" for i in range(n_rows)] + }) + + +class TestTemplateUtils(unittest.TestCase): + """Unit tests for template utility functions.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.test_adata = create_test_adata() + self.test_df = create_test_dataframe() + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_complete_io_workflow(self) -> None: + """Single I/O test covering all input/output scenarios.""" + # Suppress warnings for cleaner test output + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + + # Test 1: Load h5ad file + h5ad_path = os.path.join(self.tmp_dir.name, "test.h5ad") + self.test_adata.write_h5ad(h5ad_path) + loaded_h5ad = load_input(h5ad_path) + self.assertEqual(loaded_h5ad.n_obs, 10) + self.assertIn("cell_type", loaded_h5ad.obs.columns) + + # Test 2: Load pickle file + pickle_path = os.path.join(self.tmp_dir.name, "test.pickle") + with open(pickle_path, "wb") as f: + pickle.dump(self.test_adata, f) + loaded_pickle = load_input(pickle_path) + self.assertEqual(loaded_pickle.n_obs, 10) + + # Test 3: Load .pkl extension + pkl_path = os.path.join(self.tmp_dir.name, "test.pkl") + with open(pkl_path, "wb") as f: + pickle.dump(self.test_adata, f) + loaded_pkl = load_input(pkl_path) + self.assertEqual(loaded_pkl.n_obs, 10) + + # Test 4: Load .p extension + p_path = os.path.join(self.tmp_dir.name, "test.p") + with open(p_path, "wb") as f: + pickle.dump(self.test_adata, f) + loaded_p = load_input(p_path) + self.assertEqual(loaded_p.n_obs, 10) + + # Test 5: Convert pickle to h5ad + pickle_src = os.path.join( + self.tmp_dir.name, "convert_src.pickle" + ) + with open(pickle_src, "wb") as f: + pickle.dump(self.test_adata, f) + + h5ad_dest = convert_pickle_to_h5ad(pickle_src) + self.assertTrue(os.path.exists(h5ad_dest)) + self.assertTrue(h5ad_dest.endswith(".h5ad")) + + # Test with custom output path + custom_dest = os.path.join( + self.tmp_dir.name, "custom_output.h5ad" + ) + h5ad_custom = convert_pickle_to_h5ad(pickle_src, custom_dest) + self.assertEqual(h5ad_custom, custom_dest) + self.assertTrue(os.path.exists(custom_dest)) + + # Test 7: Load file with no extension (content detection) + no_ext_path = os.path.join(self.tmp_dir.name, "noextension") + with open(no_ext_path, "wb") as f: + pickle.dump(self.test_adata, f) + loaded_no_ext = load_input(no_ext_path) + self.assertEqual(loaded_no_ext.n_obs, 10) + + def test_text_to_value_conversions(self) -> None: + """Test all text_to_value conversion scenarios.""" + # Test 1: Convert to float + result = text_to_value("3.14", to_float=True) + self.assertEqual(result, 3.14) + self.assertIsInstance(result, float) + + # Test 2: Convert to int + result = text_to_value("42", to_int=True) + self.assertEqual(result, 42) + self.assertIsInstance(result, int) + + # Test 3: None text handling + result = text_to_value("None", value_to_convert_to=None) + self.assertIsNone(result) + + # Test 4: Empty string handling + result = text_to_value("", value_to_convert_to=-1) + self.assertEqual(result, -1) + + # Test 5: Case insensitive None + result = text_to_value("none", value_to_convert_to=0) + self.assertEqual(result, 0) + + # Test 6: Custom none text + result = text_to_value( + "NA", default_none_text="NA", value_to_convert_to=999 + ) + self.assertEqual(result, 999) + + # Test 7: No conversion + result = text_to_value("keep_as_string") + self.assertEqual(result, "keep_as_string") + self.assertIsInstance(result, str) + + # Test 8: Whitespace handling + result = text_to_value(" None ", value_to_convert_to=None) + self.assertIsNone(result) + + # Test 9: Non-string input + result = text_to_value(123, to_float=True) + self.assertEqual(result, 123.0) + self.assertIsInstance(result, float) + + def test_convert_to_floats(self) -> None: + """Test convert_to_floats function.""" + # Test 1: String list + result = convert_to_floats(["1.5", "2.0", "3.14"]) + self.assertEqual(result, [1.5, 2.0, 3.14]) + self.assertTrue(all(isinstance(x, float) for x in result)) + + # Test 2: Mixed numeric types + result = convert_to_floats([1, "2.5", 3.0]) + self.assertEqual(result, [1.0, 2.5, 3.0]) + + # Test 3: Invalid value + with self.assertRaises(ValueError) as context: + convert_to_floats(["1.0", "invalid", "3.0"]) + expected_msg = "Failed to convert value: 'invalid' to float" + self.assertIn(expected_msg, str(context.exception)) + + # Test 4: Empty list + result = convert_to_floats([]) + self.assertEqual(result, []) + + def test_load_input_missing_file_error_message(self) -> None: + """Test exact error message for missing input file.""" + missing_path = "/nonexistent/path/file.h5ad" + + with self.assertRaises(FileNotFoundError) as context: + load_input(missing_path) + + expected_msg = f"Input file not found: {missing_path}" + actual_msg = str(context.exception) + self.assertEqual(expected_msg, actual_msg) + + def test_load_input_unsupported_format_error_message(self) -> None: + """Test exact error message for unsupported file format.""" + # Create a text file with unsupported content + txt_path = os.path.join(self.tmp_dir.name, "test.txt") + with open(txt_path, "w") as f: + f.write("This is not a valid data file") + + with self.assertRaises(ValueError) as context: + load_input(txt_path) + + actual_msg = str(context.exception) + self.assertTrue(actual_msg.startswith("Unable to load file")) + self.assertIn("Supported formats: h5ad, pickle", actual_msg) + + def test_text_to_value_float_conversion_error_message(self) -> None: + """Test exact error message for invalid float conversion.""" + with self.assertRaises(ValueError) as context: + text_to_value( + "not_a_number", to_float=True, param_name="test_param" + ) + + expected_msg = ( + 'Error: can\'t convert test_param to float. ' + 'Received:"not_a_number"' + ) + actual_msg = str(context.exception) + self.assertEqual(expected_msg, actual_msg) + + def test_text_to_value_int_conversion_error_message(self) -> None: + """Test exact error message for invalid integer conversion.""" + with self.assertRaises(ValueError) as context: + text_to_value("3.14", to_int=True, param_name="count") + + expected_msg = ( + 'Error: can\'t convert count to integer. ' + 'Received:"3.14"' + ) + actual_msg = str(context.exception) + self.assertEqual(expected_msg, actual_msg) + + def test_convert_pickle_to_h5ad_missing_file_error_message(self) -> None: + """Test exact error message for missing pickle file.""" + missing_pickle = "/nonexistent/file.pickle" + + with self.assertRaises(FileNotFoundError) as context: + convert_pickle_to_h5ad(missing_pickle) + + expected_msg = f"Pickle file not found: {missing_pickle}" + actual_msg = str(context.exception) + self.assertEqual(expected_msg, actual_msg) + + def test_convert_pickle_to_h5ad_wrong_type_error_message(self) -> None: + """Test exact error message when pickle doesn't contain AnnData.""" + # Create pickle with wrong type + wrong_pickle = os.path.join(self.tmp_dir.name, "wrong_type.pickle") + with open(wrong_pickle, "wb") as f: + pickle.dump({"not": "anndata"}, f) + + with self.assertRaises(TypeError) as context: + convert_pickle_to_h5ad(wrong_pickle) + + expected_msg = "Loaded object is not AnnData, got " + actual_msg = str(context.exception) + self.assertEqual(expected_msg, actual_msg) + + def test_spell_out_special_characters(self) -> None: + """Test spell_out_special_characters function.""" + from spac.templates.template_utils import spell_out_special_characters + + # Test space replacement + result = spell_out_special_characters("Cell Type") + self.assertEqual(result, "Cell_Type") + + # Test special units + result = spell_out_special_characters("Area µm²") + self.assertEqual(result, "Area_um2") + + # Test hyphen between letters + result = spell_out_special_characters("CD4-positive") + self.assertEqual(result, "CD4_positive") + + # Test plus/minus + result = spell_out_special_characters("CD4+") + self.assertEqual(result, "CD4_pos") # Trailing underscore is stripped + result = spell_out_special_characters("CD8-") + self.assertEqual(result, "CD8_neg") # Trailing underscore is stripped + + # Test combination markers + result = spell_out_special_characters("CD4+CD20-") + self.assertEqual(result, "CD4_pos_CD20_neg") + + # Test edge cases with special separators + result = spell_out_special_characters("CD4+/CD20-") + self.assertEqual(result, "CD4_pos_slashCD20_neg") + + result = spell_out_special_characters("CD4+ CD20-") + self.assertEqual(result, "CD4_pos_CD20_neg") + + result = spell_out_special_characters("CD4+,CD20-") + self.assertEqual(result, "CD4_pos_CD20_neg") + + # Test parentheses removal + result = spell_out_special_characters("CD4+ (bright)") + self.assertEqual(result, "CD4_pos_bright") + + # Test special characters + result = spell_out_special_characters("Cell@100%") + self.assertEqual(result, "Cellat100percent") + + # Test multiple underscores + result = spell_out_special_characters("Cell___Type") + self.assertEqual(result, "Cell_Type") + + # Test leading/trailing underscores + result = spell_out_special_characters("_Cell_Type_") + self.assertEqual(result, "Cell_Type") + + # Test complex case + result = spell_out_special_characters("CD4+ T-cells (µm²)") + self.assertEqual(result, "CD4_pos_T_cells_um2") + + # Test empty string + result = spell_out_special_characters("") + self.assertEqual(result, "") + + # Additional edge cases + result = spell_out_special_characters("CD3+CD4+CD8-") + self.assertEqual(result, "CD3_pos_CD4_pos_CD8_neg") + + result = spell_out_special_characters("PD-1/PD-L1") + self.assertEqual(result, "PD_1slashPD_L1") + + result = spell_out_special_characters("CD45RA+CD45RO-") + self.assertEqual(result, "CD45RA_pos_CD45RO_neg") + + result = spell_out_special_characters("CD4+CD25+FOXP3+") + self.assertEqual(result, "CD4_pos_CD25_pos_FOXP3_pos") + + # Test with multiple special characters + result = spell_out_special_characters("CD4+ & CD8+ (double positive)") + self.assertEqual(result, "CD4_pos_and_CD8_pos_double_positive") + + # Test with numbers at start (should add col_ prefix in + # clean_column_name) + result = spell_out_special_characters("123ABC") + # Note: col_ prefix is added by clean_column_name + self.assertEqual(result, "123ABC") + + def test_load_csv_files(self) -> None: + """Test load_csv_files function.""" + + # Create test CSV files + csv_dir = Path(self.tmp_dir.name) / "csv_data" + csv_dir.mkdir() + + # CSV 1: Normal data + csv1 = pd.DataFrame({ + 'ID': ['001', '002', '003'], + 'Value': [1.5, 2.5, 3.5], + 'Type': ['A', 'B', 'A'] + }) + csv1.to_csv(csv_dir / 'data1.csv', index=False) + + # CSV 2: Special characters in columns + csv2 = pd.DataFrame({ + 'ID': ['004', '005'], + 'Value': [4.5, 5.5], + 'Type': ['B', 'C'], + 'Area µm²': [100, 200] + }) + csv2.to_csv(csv_dir / 'data2.csv', index=False) + + # Test 1: Basic loading with metadata + config = pd.DataFrame({ + 'file_name': ['data1.csv', 'data2.csv'], + 'experiment': ['Exp1', 'Exp2'], + 'batch': [1, 2] + }) + + result = load_csv_files(csv_dir, config) + + # Verify basic structure + self.assertEqual(len(result), 5) # 3 + 2 rows + self.assertIn('file_name', result.columns) + self.assertIn('experiment', result.columns) + self.assertIn('batch', result.columns) + self.assertIn('ID', result.columns) + self.assertIn('Area_um2', result.columns) # Cleaned name + + # Verify metadata mapping + exp1_rows = result[result['file_name'] == 'data1.csv'] + self.assertTrue(all(exp1_rows['experiment'] == 'Exp1')) + self.assertTrue(all(exp1_rows['batch'] == 1)) + + # Test 2: String columns preservation + result_str = load_csv_files( + csv_dir, config, string_columns=['ID'] + ) + self.assertEqual(result_str['ID'].dtype, 'object') + self.assertTrue(all(isinstance(x, str) for x in result_str['ID'])) + + # Test 3: Empty string_columns list + result_empty = load_csv_files(csv_dir, config, string_columns=[]) + self.assertIsInstance(result_empty, pd.DataFrame) + + # Test 4: Column name with spaces in config + config_spaces = pd.DataFrame({ + 'file_name': ['data1.csv'], + 'Sample Type': ['Control'] # Space in column name + }) + with self.assertRaises(ValueError): + # Should fail validation due to string_columns not being list + load_csv_files(csv_dir, config_spaces, string_columns="ID") + + # Test 5: Missing file in config + config_missing = pd.DataFrame({ + 'file_name': ['missing.csv'], + 'experiment': ['Exp3'] + }) + with self.assertRaises(FileNotFoundError) as context: + load_csv_files(csv_dir, config_missing) + self.assertIn("not found", str(context.exception)) + + # Test 6: Empty CSV file + empty_csv = csv_dir / 'empty.csv' + empty_csv.write_text('') + config_empty = pd.DataFrame({ + 'file_name': ['empty.csv'], + 'experiment': ['Exp4'] + }) + with self.assertRaises(ValueError) as context: + load_csv_files(csv_dir, config_empty) + self.assertIn("empty", str(context.exception)) + + # Test 7: Non-existent string_columns are silently ignored + config_single = pd.DataFrame({ + 'file_name': ['data1.csv'] + }) + result_nonexist = load_csv_files( + csv_dir, config_single, + string_columns=['NonExistentColumn'] + ) + self.assertIsInstance(result_nonexist, pd.DataFrame) + + def test_load_csv_files_special_character_column_cleaning(self) -> None: + """Test that load_csv_files cleans special character column names.""" + # Setup test data with special character columns + csv_dir = Path(self.tmp_dir.name) / "csv_test" + csv_dir.mkdir() + + csv_data = pd.DataFrame({ + 'ID': [1, 2], + 'CD4+': ['pos', 'neg'], # Special character + 'Area µm²': [100.0, 200.0], + }) + csv_data.to_csv(csv_dir / 'test.csv', index=False) + + config = pd.DataFrame({ + 'file_name': ['test.csv'], + 'group': ['A'] + }) + + result = load_csv_files(csv_dir, config) + + # Assert: special character columns cleaned + self.assertIn('CD4_pos', result.columns) + self.assertIn('Area_um2', result.columns) + self.assertNotIn('CD4+', result.columns) + self.assertNotIn('Area µm²', result.columns) + + # Assert: data integrity preserved + self.assertEqual(len(result), 2) + self.assertEqual(result['group'].unique().tolist(), ['A']) + + def test_save_results_single_csv_file(self) -> None: + """Test saving DataFrame as single CSV file using save_results.""" + # Setup + df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) + + params = { + "outputs": { + "dataframe": {"type": "file", "name": "data.csv"} + } + } + + results = { + "dataframe": df + } + + # Execute + saved = save_results(results, params, self.tmp_dir.name) + + # Verify + csv_path = Path(self.tmp_dir.name) / "data.csv" + self.assertTrue(csv_path.exists()) + self.assertTrue(csv_path.is_file()) + + # Check content + loaded_df = pd.read_csv(csv_path) + pd.testing.assert_frame_equal(loaded_df, df) + + def test_save_results_multiple_csvs_directory(self) -> None: + """Test saving multiple DataFrames in directory using save_results.""" + # Setup + df1 = pd.DataFrame({'X': [1, 2]}) + df2 = pd.DataFrame({'Y': [3, 4]}) + + params = { + "outputs": { + "dataframe": {"type": "directory", "name": "dataframe_dir"} + } + } + + results = { + "dataframe": { + "first": df1, + "second": df2 + } + } + + # Execute + saved = save_results(results, params, self.tmp_dir.name) + + # Verify + dir_path = Path(self.tmp_dir.name) / "dataframe_dir" + self.assertTrue(dir_path.exists()) + self.assertTrue(dir_path.is_dir()) + self.assertTrue((dir_path / "first.csv").exists()) + self.assertTrue((dir_path / "second.csv").exists()) + + def test_save_results_figures_directory(self) -> None: + """Test saving multiple figures in directory using save_results.""" + # Suppress matplotlib warnings + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + + # Setup + fig1, ax1 = plt.subplots() + ax1.plot([1, 2, 3]) + + fig2, ax2 = plt.subplots() + ax2.bar(['A', 'B'], [5, 10]) + + params = { + "outputs": { + "figures": {"type": "directory", "name": "plots"} + } + } + + results = { + "figures": { + "line_plot": fig1, + "bar_plot": fig2 + } + } + + # Execute + saved = save_results(results, params, self.tmp_dir.name) + + # Verify + plots_dir = Path(self.tmp_dir.name) / "plots" + self.assertTrue(plots_dir.exists()) + self.assertTrue(plots_dir.is_dir()) + self.assertTrue((plots_dir / "line_plot.png").exists()) + self.assertTrue((plots_dir / "bar_plot.png").exists()) + + # Clean up + plt.close('all') + + def test_save_results_analysis_pickle_file(self) -> None: + """Test saving analysis object as pickle file using save_results.""" + # Setup + analysis = { + "method": "test_analysis", + "results": [1, 2, 3, 4, 5], + "params": {"alpha": 0.05} + } + + params = { + "outputs": { + "analysis": {"type": "file", "name": "results.pickle"} + } + } + + results = { + "analysis": analysis + } + + # Execute + saved = save_results(results, params, self.tmp_dir.name) + + # Verify + pickle_path = Path(self.tmp_dir.name) / "results.pickle" + self.assertTrue(pickle_path.exists()) + self.assertTrue(pickle_path.is_file()) + + # Check content + with open(pickle_path, 'rb') as f: + loaded = pickle.load(f) + self.assertEqual(loaded, analysis) + + def test_save_results_html_directory(self) -> None: + """Test saving HTML reports in directory using save_results.""" + # Setup + html1 = "

Report 1

" + html2 = "

Report 2

" + + params = { + "outputs": { + "html": {"type": "directory", "name": "reports"} + } + } + + results = { + "html": { + "main": html1, + "summary": html2 + } + } + + # Execute + saved = save_results(results, params, self.tmp_dir.name) + + # Verify + reports_dir = Path(self.tmp_dir.name) / "reports" + self.assertTrue(reports_dir.exists()) + self.assertTrue(reports_dir.is_dir()) + self.assertTrue((reports_dir / "main.html").exists()) + self.assertTrue((reports_dir / "summary.html").exists()) + + # Check content + with open(reports_dir / "main.html") as f: + content = f.read() + self.assertIn("Report 1", content) + + def test_save_results_complete_configuration(self) -> None: + """Test complete configuration with all output types using save_results.""" + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + + # Setup + fig, ax = plt.subplots() + ax.plot([1, 2, 3]) + + df = pd.DataFrame({'A': [1, 2]}) + analysis = {"result": "complete"} + html = "Report" + + params = { + "outputs": { + "figures": {"type": "directory", "name": "figure_dir"}, + "dataframe": {"type": "file", "name": "dataframe.csv"}, + "analysis": {"type": "file", "name": "output.pickle"}, + "html": {"type": "directory", "name": "html_dir"} + } + } + + results = { + "figures": {"plot": fig}, + "dataframe": df, + "analysis": analysis, + "html": {"report": html} + } + + # Execute + saved = save_results(results, params, self.tmp_dir.name) + + # Verify all outputs created + self.assertTrue((Path(self.tmp_dir.name) / "figure_dir").is_dir()) + self.assertTrue((Path(self.tmp_dir.name) / "dataframe.csv").is_file()) + self.assertTrue((Path(self.tmp_dir.name) / "output.pickle").is_file()) + self.assertTrue((Path(self.tmp_dir.name) / "html_dir").is_dir()) + + # Clean up + plt.close('all') + + def test_save_results_case_insensitive_matching(self) -> None: + """Test case-insensitive matching of result keys to config.""" + # Setup + df = pd.DataFrame({'A': [1, 2]}) + + params = { + "outputs": { + "dataframe": {"type": "file", "name": "data.csv"} # Capital D + } + } + + results = { + "dataframe": df # lowercase d + } + + # Execute + saved = save_results(results, params, self.tmp_dir.name) + + # Should still match and save + self.assertTrue((Path(self.tmp_dir.name) / "data.csv").exists()) + + def test_save_results_missing_config(self) -> None: + """Test that missing config for result type generates warning.""" + # Setup + df = pd.DataFrame({'A': [1, 2]}) + + params = { + "outputs": { + # No config for "dataframes" + "figures": {"type": "directory", "name": "plots"} + } + } + + results = { + "dataframe": df, # No matching config + "figures": {} + } + + # Execute (should not raise, just warn) + saved = save_results(results, params, self.tmp_dir.name) + + # Only figures should be in saved files + self.assertIn("figures", saved) + self.assertNotIn("dataframes", saved) + self.assertNotIn("DataFrames", saved) + + def test_save_single_object_dataframe(self) -> None: + """Test _save_single_object helper with DataFrame.""" + df = pd.DataFrame({'A': [1, 2]}) + + path = _save_single_object(df, "test", Path(self.tmp_dir.name)) + + self.assertEqual(path.name, "test.csv") + self.assertTrue(path.exists()) + + def test_save_single_object_figure(self) -> None: + """Test _save_single_object helper with matplotlib figure.""" + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + + fig, ax = plt.subplots() + ax.plot([1, 2, 3]) + + path = _save_single_object(fig, "plot", Path(self.tmp_dir.name)) + + self.assertEqual(path.name, "plot.png") + self.assertTrue(path.exists()) + + plt.close('all') + + def test_save_single_object_html(self) -> None: + """Test _save_single_object helper with HTML string.""" + html = "Test" + + path = _save_single_object(html, "report.html", Path(self.tmp_dir.name)) + + self.assertEqual(path.name, "report.html") + self.assertTrue(path.exists()) + + def test_save_single_object_generic(self) -> None: + """Test _save_single_object helper with generic object.""" + data = {"test": "data", "value": 123} + + path = _save_single_object(data, "data", Path(self.tmp_dir.name)) + + self.assertEqual(path.name, "data.pickle") + self.assertTrue(path.exists()) + + def test_save_results_dataframes_both_configurations(self) -> None: + """Test DataFrames can be saved as both file and directory.""" + # Test 1: Single DataFrame as file + df_single = pd.DataFrame({'A': [1, 2, 3]}) + + params_file = { + "outputs": { + "dataframe": {"type": "file", "name": "single.csv"} + } + } + + results_single = {"dataframe": df_single} + + saved = save_results(results_single, params_file, self.tmp_dir.name) + self.assertTrue((Path(self.tmp_dir.name) / "single.csv").exists()) + + # Test 2: Multiple DataFrames as directory + df1 = pd.DataFrame({'X': [1, 2]}) + df2 = pd.DataFrame({'Y': [3, 4]}) + + params_dir = { + "outputs": { + "dataframe": {"type": "directory", "name": "multi_df"} + } + } + + results_multi = { + "dataframe": { + "data1": df1, + "data2": df2 + } + } + + saved = save_results(results_multi, params_dir, + os.path.join(self.tmp_dir.name, "test2")) + + dir_path = Path(self.tmp_dir.name) / "test2" / "multi_df" + self.assertTrue(dir_path.exists()) + self.assertTrue(dir_path.is_dir()) + self.assertTrue((dir_path / "data1.csv").exists()) + self.assertTrue((dir_path / "data2.csv").exists()) + + def test_save_results_auto_type_detection(self) -> None: + """Test automatic type detection based on standardized schema.""" + # Setup - params with no explicit type + params = { + "outputs": { + "figures": {"name": "plot.png"}, # No type specified + "analysis": {"name": "results.pickle"}, # No type specified + "dataframe": {"name": "data.csv"}, # No type specified + "html": {"name": "report_dir"} # No type specified + } + } + + # Create test data + fig, ax = plt.subplots() + ax.plot([1, 2, 3]) + + results = { + "figures": {"plot1": fig, "plot2": fig}, # Should auto-detect as directory + "analysis": {"data": [1, 2, 3]}, # Should auto-detect as file + "dataframe": pd.DataFrame({'A': [1, 2]}), # Should auto-detect as file + "html": {"report": ""} # Should auto-detect as directory + } + + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + + # Execute + saved = save_results(results, params, self.tmp_dir.name) + + # Verify auto-detection worked correctly + # figure should be directory (standardized for figures) + self.assertTrue((Path(self.tmp_dir.name) / "plot.png").is_dir()) + + # analysis should be file + self.assertTrue((Path(self.tmp_dir.name) / "results.pickle").is_file()) + + # dataframes should be file (standard case) + self.assertTrue((Path(self.tmp_dir.name) / "data.csv").is_file()) + + # html should be directory (standardized for html) + self.assertTrue((Path(self.tmp_dir.name) / "report_dir").is_dir()) + + plt.close('all') + + def test_save_results_neighborhood_profile_special_case(self) -> None: + """Test special case for Neighborhood Profile as directory.""" + # Setup - Neighborhood Profile should be directory even though it's a dataframe + params = { + "outputs": { + "dataframes": {"name": "Neighborhood_Profile_Results"} # No type, should auto-detect + } + } + + df1 = pd.DataFrame({'X': [1, 2]}) + df2 = pd.DataFrame({'Y': [3, 4]}) + + results = { + "dataframes": { + "profile1": df1, + "profile2": df2 + } + } + + # Execute + saved = save_results(results, params, self.tmp_dir.name) + + # Verify it was saved as directory (special case) + dir_path = Path(self.tmp_dir.name) / "Neighborhood_Profile_Results" + self.assertTrue(dir_path.exists()) + self.assertTrue(dir_path.is_dir()) + self.assertTrue((dir_path / "profile1.csv").exists()) + self.assertTrue((dir_path / "profile2.csv").exists()) + + def test_save_results_with_output_directory_param(self) -> None: + """Test using Output_Directory from params.""" + custom_dir = os.path.join(self.tmp_dir.name, "custom_output") + + # Setup - params includes Output_Directory + params = { + "Output_Directory": custom_dir, + "outputs": { + "dataframes": {"type": "file", "name": "data.csv"} + } + } + + results = { + "dataframes": pd.DataFrame({'A': [1, 2]}) + } + + # Execute without specifying output_base_dir (should use params) + saved = save_results(results, params) + + # Verify it used the Output_Directory from params + csv_path = Path(custom_dir) / "data.csv" + self.assertTrue(csv_path.exists()) + + def test_parse_params_from_json_file(self) -> None: + """Test parse_params loads parameters from a JSON file.""" + params = {"key1": "value1", "key2": 42, "nested": {"a": True}} + json_path = os.path.join(self.tmp_dir.name, "params.json") + with open(json_path, "w") as f: + json.dump(params, f) + + result = parse_params(json_path) + + self.assertEqual(result, params) + self.assertEqual(result["key1"], "value1") + self.assertEqual(result["key2"], 42) + self.assertTrue(result["nested"]["a"]) + + def test_parse_params_from_dict(self) -> None: + """Test parse_params passes through a dict unchanged.""" + params = {"key": "value"} + result = parse_params(params) + self.assertIs(result, params) + + def test_parse_params_from_json_string(self) -> None: + """Test parse_params parses a raw JSON string.""" + json_str = '{"key": "value", "num": 7}' + result = parse_params(json_str) + self.assertEqual(result, {"key": "value", "num": 7}) + + def test_parse_params_invalid_type_raises(self) -> None: + """Test parse_params raises TypeError for unsupported input.""" + with self.assertRaises(TypeError): + parse_params(12345) + + def test_string_list_to_dictionary_valid(self) -> None: + """Test string_list_to_dictionary with valid key:value pairs.""" + result = string_list_to_dictionary( + ["red:#FF0000", "blue:#0000FF"] + ) + self.assertEqual(result, {"red": "#FF0000", "blue": "#0000FF"}) + + def test_string_list_to_dictionary_custom_names(self) -> None: + """Test string_list_to_dictionary with custom key/value names.""" + result = string_list_to_dictionary( + ["TypeA:Cancer", "TypeB:Normal"], + key_name="cell_type", + value_name="diagnosis", + ) + self.assertEqual( + result, {"TypeA": "Cancer", "TypeB": "Normal"} + ) + + def test_string_list_to_dictionary_invalid_entry(self) -> None: + """Test string_list_to_dictionary raises on missing colon.""" + with self.assertRaises(ValueError) as ctx: + string_list_to_dictionary(["valid:pair", "no_colon"]) + self.assertIn("Missing ':'", str(ctx.exception)) + + def test_string_list_to_dictionary_not_list_raises(self) -> None: + """Test string_list_to_dictionary raises TypeError for non-list.""" + with self.assertRaises(TypeError): + string_list_to_dictionary("not_a_list") + + def test_clean_column_name_basic(self) -> None: + """Test clean_column_name on normal and special-char columns.""" + # Normal name — unchanged + self.assertEqual(clean_column_name("cell_type"), "cell_type") + + # Special characters cleaned + self.assertEqual(clean_column_name("CD4+"), "CD4_pos") + self.assertEqual(clean_column_name("Area µm²"), "Area_um2") + + def test_clean_column_name_digit_prefix(self) -> None: + """Test clean_column_name adds col_ prefix for digit-leading names.""" + result = clean_column_name("123ABC") + self.assertEqual(result, "col_123ABC") + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_tsne_analysis_template.py b/tests/templates/test_tsne_analysis_template.py new file mode 100644 index 00000000..72039f1e --- /dev/null +++ b/tests/templates/test_tsne_analysis_template.py @@ -0,0 +1,95 @@ +# tests/templates/test_tsne_analysis_template.py +""" +Real (non-mocked) unit test for the tSNE Analysis template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.tsne_analysis_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData: 50 cells, 5 genes for tSNE (needs enough cells).""" + rng = np.random.default_rng(42) + X = rng.random((50, 5)) + obs = pd.DataFrame({"cell_type": ["A", "B"] * 25}) + var = pd.DataFrame(index=[f"Gene_{i}" for i in range(5)]) + return ad.AnnData(X=X, obs=obs, var=var) + + +class TestTsneAnalysisTemplate(unittest.TestCase): + """Real (non-mocked) tests for the tSNE analysis template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Table_to_Process": "Original", + "Output_Directory": self.tmp_dir.name, + "outputs": { + "analysis": {"type": "file", "name": "output.pickle"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_tsne_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run tSNE and verify outputs. + + Validates: + 1. saved_files dict has 'analysis' key + 2. Pickle contains AnnData with 'X_tsne' in .obsm + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("analysis", saved_files) + + pkl_path = Path(saved_files["analysis"]) + self.assertTrue(pkl_path.exists()) + self.assertGreater(pkl_path.stat().st_size, 0) + + with open(pkl_path, "rb") as f: + result_adata = pickle.load(f) + self.assertIsInstance(result_adata, ad.AnnData) + self.assertIn("X_tsne", result_adata.obsm) + + mem_adata = run_from_json(self.json_file, save_to_disk=False) + self.assertIsInstance(mem_adata, ad.AnnData) + self.assertIn("X_tsne", mem_adata.obsm) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_umap_transformation_template.py b/tests/templates/test_umap_transformation_template.py new file mode 100644 index 00000000..96cd2b8e --- /dev/null +++ b/tests/templates/test_umap_transformation_template.py @@ -0,0 +1,101 @@ +# tests/templates/test_umap_transformation_template.py +""" +Real (non-mocked) unit test for the UMAP Transformation template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.umap_transformation_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData: 20 cells, 5 genes for UMAP.""" + rng = np.random.default_rng(42) + X = rng.random((20, 5)) + obs = pd.DataFrame({"cell_type": ["A", "B"] * 10}) + var = pd.DataFrame(index=[f"Gene_{i}" for i in range(5)]) + return ad.AnnData(X=X, obs=obs, var=var) + + +class TestUmapTransformationTemplate(unittest.TestCase): + """Real (non-mocked) tests for the UMAP transformation template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Table_to_Process": "Original", + "Number_of_Neighbors": 5, + "Minimum_Distance_between_Points": 0.1, + "Target_Dimension_Number": 2, + "Computational_Metric": "euclidean", + "Random_State": 0, + "Transform_Seed": 42, + "Output_Directory": self.tmp_dir.name, + "outputs": { + "analysis": {"type": "file", "name": "output.pickle"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_umap_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run UMAP and verify outputs. + + Validates: + 1. saved_files dict has 'analysis' key + 2. Pickle contains AnnData with 'X_umap' in .obsm + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("analysis", saved_files) + + pkl_path = Path(saved_files["analysis"]) + self.assertTrue(pkl_path.exists()) + self.assertGreater(pkl_path.stat().st_size, 0) + + with open(pkl_path, "rb") as f: + result_adata = pickle.load(f) + self.assertIsInstance(result_adata, ad.AnnData) + self.assertIn("X_umap", result_adata.obsm) + + mem_adata = run_from_json(self.json_file, save_to_disk=False) + self.assertIsInstance(mem_adata, ad.AnnData) + self.assertIn("X_umap", mem_adata.obsm) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_umap_tsne_pca_template.py b/tests/templates/test_umap_tsne_pca_template.py new file mode 100644 index 00000000..f04215f2 --- /dev/null +++ b/tests/templates/test_umap_tsne_pca_template.py @@ -0,0 +1,103 @@ +# tests/templates/test_umap_tsne_pca_template.py +""" +Real (non-mocked) unit test for the UMAP/tSNE/PCA Visualization template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import matplotlib +matplotlib.use("Agg") + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.umap_tsne_pca_visualization_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData with pre-computed UMAP embedding for visualization.""" + rng = np.random.default_rng(42) + X = rng.random((8, 2)) + obs = pd.DataFrame({"cell_type": ["A", "B"] * 4}) + var = pd.DataFrame(index=["Gene_0", "Gene_1"]) + adata = ad.AnnData(X=X, obs=obs, var=var) + adata.obsm["X_umap"] = rng.random((8, 2)) * 10 + return adata + + +class TestUmapTsnePcaTemplate(unittest.TestCase): + """Real (non-mocked) tests for the UMAP/tSNE/PCA visualization.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Dimensionality_Reduction_Method": "UMAP", + "Color_By": "Annotation", + "Annotation": "cell_type", + "Feature": "None", + "Figure_Width": 6, + "Figure_Height": 4, + "Figure_DPI": 72, + "Font_Size": 10, + "Spot_Size": 50, + "Output_Directory": self.tmp_dir.name, + "outputs": { + "figures": {"type": "directory", "name": "figures_dir"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_umap_tsne_pca_visualization_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run dim reduction visualization and verify. + + Validates: + 1. saved_files dict has 'figures' key + 2. Figures directory contains non-empty PNG(s) + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + show_plot=False, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("figures", saved_files) + + figure_paths = saved_files["figures"] + self.assertGreaterEqual(len(figure_paths), 1) + for fig_path in figure_paths: + fig_file = Path(fig_path) + self.assertTrue(fig_file.exists()) + self.assertGreater(fig_file.stat().st_size, 0) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_utag_clustering_template.py b/tests/templates/test_utag_clustering_template.py new file mode 100644 index 00000000..dc5f172d --- /dev/null +++ b/tests/templates/test_utag_clustering_template.py @@ -0,0 +1,109 @@ +# tests/templates/test_utag_clustering_template.py +""" +Real (non-mocked) unit test for the UTAG Clustering template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.utag_clustering_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData: 30 cells with spatial coords for UTAG clustering.""" + rng = np.random.default_rng(42) + X = rng.random((30, 3)) + obs = pd.DataFrame({"cell_type": ["A", "B", "C"] * 10}) + var = pd.DataFrame(index=["Gene_0", "Gene_1", "Gene_2"]) + spatial = rng.random((30, 2)) * 100 + adata = ad.AnnData(X=X, obs=obs, var=var) + adata.obsm["spatial"] = spatial + return adata + + +class TestUTAGClusteringTemplate(unittest.TestCase): + """Real (non-mocked) tests for the UTAG clustering template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Table_to_Process": "Original", + "Features": ["All"], + "Slide_Annotation": "None", + "Distance_Threshold": 20.0, + "K_Nearest_Neighbors": 5, + "Resolution_Parameter": 1, + "PCA_Components": "None", + "Random_Seed": 42, + "N_Jobs": 1, + "Leiden_Iterations": 3, + "Parellel_Processes": False, + "Output_Annotation_Name": "UTAG", + "Output_Directory": self.tmp_dir.name, + "outputs": { + "analysis": {"type": "file", "name": "output.pickle"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_utag_clustering_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run UTAG clustering and verify outputs. + + Validates: + 1. saved_files dict has 'analysis' key + 2. Pickle contains AnnData with UTAG obs column + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("analysis", saved_files) + + pkl_path = Path(saved_files["analysis"]) + self.assertTrue(pkl_path.exists()) + self.assertGreater(pkl_path.stat().st_size, 0) + + with open(pkl_path, "rb") as f: + result_adata = pickle.load(f) + self.assertIsInstance(result_adata, ad.AnnData) + self.assertIn("UTAG", result_adata.obs.columns) + + mem_adata = run_from_json(self.json_file, save_to_disk=False) + self.assertIsInstance(mem_adata, ad.AnnData) + self.assertIn("UTAG", mem_adata.obs.columns) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_visualize_nearest_neighbor_template.py b/tests/templates/test_visualize_nearest_neighbor_template.py new file mode 100644 index 00000000..ba62948c --- /dev/null +++ b/tests/templates/test_visualize_nearest_neighbor_template.py @@ -0,0 +1,130 @@ +# tests/templates/test_visualize_nearest_neighbor_template.py +""" +Real (non-mocked) unit test for the Visualize Nearest Neighbor template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import matplotlib +matplotlib.use("Agg") + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.visualize_nearest_neighbor_template import run_from_json +from spac.templates.nearest_neighbor_calculation_template import ( + run_from_json as run_nn, +) + + +def _make_adata_with_nn() -> ad.AnnData: + """Create AnnData with pre-computed nearest neighbor results.""" + rng = np.random.default_rng(42) + X = rng.random((12, 2)) + obs = pd.DataFrame({ + "cell_type": ["A", "B", "C"] * 4, + }) + var = pd.DataFrame(index=["Gene_0", "Gene_1"]) + spatial = rng.random((12, 2)) * 100 + adata = ad.AnnData(X=X, obs=obs, var=var) + adata.obsm["spatial"] = spatial + + # Run actual nearest neighbor to populate .obsm + import tempfile as tf + with tf.TemporaryDirectory() as td: + pkl_in = os.path.join(td, "in.pickle") + with open(pkl_in, "wb") as f: + pickle.dump(adata, f) + nn_params = { + "Upstream_Analysis": pkl_in, + "Annotation": "cell_type", + "ImageID": "None", + } + json_path = os.path.join(td, "p.json") + with open(json_path, "w") as f: + json.dump(nn_params, f) + adata = run_nn(json_path, save_to_disk=False) + return adata + + +class TestVisualizeNearestNeighborTemplate(unittest.TestCase): + """Real (non-mocked) tests for visualize nearest neighbor template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_adata_with_nn(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Annotation": "cell_type", + "Source_Anchor_Cell_Label": "A", + "Nearest_Neighbor_Associated_Table": "spatial_distance", + "Figure_Width": 6, + "Figure_Height": 4, + "Figure_DPI": 72, + "Font_Size": 10, + "Output_Directory": self.tmp_dir.name, + "outputs": { + "figures": {"type": "directory", "name": "figures"}, + "dataframe": {"type": "file", "name": "dataframe.csv"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_visualize_nn_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: visualize nearest neighbors and verify. + + Validates: + 1. saved_files dict has 'figures' and/or 'dataframe' keys + 2. Output files exist and are non-empty + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + show_plot=False, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertGreater(len(saved_files), 0) + + if "figures" in saved_files: + figure_paths = saved_files["figures"] + self.assertGreaterEqual(len(figure_paths), 1) + for fig_path in figure_paths: + fig_file = Path(fig_path) + self.assertTrue(fig_file.exists()) + self.assertGreater(fig_file.stat().st_size, 0) + + if "dataframe" in saved_files: + csv_path = Path(saved_files["dataframe"]) + self.assertTrue(csv_path.exists()) + self.assertGreater(csv_path.stat().st_size, 0) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_visualize_ripley_template.py b/tests/templates/test_visualize_ripley_template.py new file mode 100644 index 00000000..c7ada182 --- /dev/null +++ b/tests/templates/test_visualize_ripley_template.py @@ -0,0 +1,134 @@ +# tests/templates/test_visualize_ripley_template.py +""" +Real (non-mocked) unit test for the Visualize Ripley L template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import matplotlib +matplotlib.use("Agg") + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.visualize_ripley_l_template import run_from_json +from spac.templates.ripley_l_calculation_template import ( + run_from_json as run_ripley, +) + + +def _make_adata_with_ripley() -> ad.AnnData: + """Create AnnData with pre-computed Ripley L results in .uns.""" + rng = np.random.default_rng(42) + X = rng.random((20, 2)) + obs = pd.DataFrame({"cell_type": (["A"] * 10) + (["B"] * 10)}) + var = pd.DataFrame(index=["Gene_0", "Gene_1"]) + spatial = rng.random((20, 2)) * 100 + adata = ad.AnnData(X=X, obs=obs, var=var) + adata.obsm["spatial"] = spatial + + # Run actual Ripley L to populate .uns + import tempfile as tf + with tf.TemporaryDirectory() as td: + pkl_in = os.path.join(td, "in.pickle") + with open(pkl_in, "wb") as f: + pickle.dump(adata, f) + ripley_params = { + "Upstream_Analysis": pkl_in, + "Radii": [5, 10, 20], + "Annotation": "cell_type", + "Center_Phenotype": "A", + "Neighbor_Phenotype": "B", + "Number_of_Simulations": 5, + "Seed": 42, + "Spatial_Key": "spatial", + "Edge_Correction": True, + } + json_path = os.path.join(td, "p.json") + with open(json_path, "w") as f: + json.dump(ripley_params, f) + adata = run_ripley(json_path, save_to_disk=False) + return adata + + +class TestVisualizeRipleyTemplate(unittest.TestCase): + """Real (non-mocked) tests for the visualize Ripley L template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_adata_with_ripley(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Radii": [5, 10, 20], + "Annotation": "cell_type", + "Center_Phenotype": "A", + "Neighbor_Phenotype": "B", + "Figure_Width": 6, + "Figure_Height": 4, + "Figure_DPI": 72, + "Font_Size": 10, + "Output_Directory": self.tmp_dir.name, + "outputs": { + "figures": {"type": "directory", "name": "figures_dir"}, + "dataframe": {"type": "file", "name": "dataframe.csv"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_visualize_ripley_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: visualize Ripley L and verify outputs. + + Validates: + 1. saved_files dict has output keys + 2. Output files exist and are non-empty + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + show_plot=False, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + # Check that at least some output was produced + self.assertGreater(len(saved_files), 0) + + for key, value in saved_files.items(): + if isinstance(value, list): + for p in value: + pf = Path(p) + self.assertTrue(pf.exists()) + self.assertGreater(pf.stat().st_size, 0) + elif isinstance(value, str): + pf = Path(value) + self.assertTrue(pf.exists()) + self.assertGreater(pf.stat().st_size, 0) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/templates/test_zscore_normalization_template.py b/tests/templates/test_zscore_normalization_template.py new file mode 100644 index 00000000..6c0049e7 --- /dev/null +++ b/tests/templates/test_zscore_normalization_template.py @@ -0,0 +1,98 @@ +# tests/templates/test_zscore_normalization_template.py +""" +Real (non-mocked) unit test for the Z-Score Normalization template. + +Validates template I/O behaviour only. +No mocking. Uses real data, real filesystem, and tempfile. +""" + +import json +import os +import pickle +import sys +import tempfile +import unittest +from pathlib import Path + +import anndata as ad +import numpy as np +import pandas as pd + +sys.path.append( + os.path.dirname(os.path.realpath(__file__)) + "/../../src" +) + +from spac.templates.z_score_normalization_template import run_from_json + + +def _make_tiny_adata() -> ad.AnnData: + """Minimal AnnData: 4 cells, 2 genes for z-score normalization.""" + rng = np.random.default_rng(42) + X = rng.integers(1, 100, size=(4, 2)).astype(float) + obs = pd.DataFrame({"cell_type": ["A", "B", "A", "B"]}) + var = pd.DataFrame(index=["Gene_0", "Gene_1"]) + return ad.AnnData(X=X, obs=obs, var=var) + + +class TestZScoreNormalizationTemplate(unittest.TestCase): + """Real (non-mocked) tests for the z-score normalization template.""" + + def setUp(self) -> None: + self.tmp_dir = tempfile.TemporaryDirectory() + self.in_file = os.path.join(self.tmp_dir.name, "input.pickle") + + with open(self.in_file, "wb") as f: + pickle.dump(_make_tiny_adata(), f) + + params = { + "Upstream_Analysis": self.in_file, + "Table_to_Process": "Original", + "Output_Table_Name": "zscore", + "Output_Directory": self.tmp_dir.name, + "outputs": { + "analysis": {"type": "file", "name": "output.pickle"}, + }, + } + + self.json_file = os.path.join(self.tmp_dir.name, "params.json") + with open(self.json_file, "w") as f: + json.dump(params, f) + + def tearDown(self) -> None: + self.tmp_dir.cleanup() + + def test_zscore_normalization_produces_expected_outputs(self) -> None: + """ + End-to-end I/O test: run z-score normalization and verify outputs. + + Validates: + 1. saved_files dict has 'analysis' key + 2. Pickle exists, is non-empty, contains AnnData + 3. Z-score layer is present in the AnnData + """ + saved_files = run_from_json( + self.json_file, + save_to_disk=True, + output_dir=self.tmp_dir.name, + ) + + self.assertIsInstance(saved_files, dict) + self.assertIn("analysis", saved_files) + + pkl_path = Path(saved_files["analysis"]) + self.assertTrue(pkl_path.exists()) + self.assertGreater(pkl_path.stat().st_size, 0) + + with open(pkl_path, "rb") as f: + result_adata = pickle.load(f) + self.assertIsInstance(result_adata, ad.AnnData) + # z-score normalization creates a 'zscore' layer + self.assertIn("zscore", result_adata.layers) + + mem_adata = run_from_json(self.json_file, save_to_disk=False) + self.assertIsInstance(mem_adata, ad.AnnData) + self.assertIn("zscore", mem_adata.layers) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/test_performance/__init__.py b/tests/test_performance/__init__.py new file mode 100644 index 00000000..8042e032 --- /dev/null +++ b/tests/test_performance/__init__.py @@ -0,0 +1,3 @@ +import os +import sys +sys.path.append(os.path.dirname(os.path.realpath(__file__)) + "/../../src") diff --git a/tests/test_performance/test_boxplot_performance.py b/tests/test_performance/test_boxplot_performance.py new file mode 100644 index 00000000..1e3ea434 --- /dev/null +++ b/tests/test_performance/test_boxplot_performance.py @@ -0,0 +1,210 @@ +import os +import unittest +import time +import numpy as np +import pandas as pd +import anndata as ad +import matplotlib +import matplotlib.pyplot as plt +from sklearn.datasets import make_blobs +from sklearn.preprocessing import StandardScaler +from spac.visualization import boxplot, boxplot_interactive + +matplotlib.use('Agg') # Set the backend to 'Agg' to suppress plot window + + +skip_perf = unittest.skipUnless( + os.getenv("SPAC_RUN_PERF") == "1", + "Perf tests disabled by default" +) + +@skip_perf +class TestBoxplotPerformance(unittest.TestCase): + """Performance comparison tests for boxplot vs boxplot_interactive.""" + + @classmethod + def setUpClass(cls): + """Generate large datasets once for all tests.""" + print("\n" + "=" * 70) + print("Setting up large datasets for boxplot performance tests...") + print("=" * 70) + + # Generate 1M cell dataset + print("\nGenerating 1M cell dataset...") + start = time.time() + cls.adata_1m = cls._generate_dataset(n_obs=1_000_000, random_state=42) + print(f" Completed in {time.time() - start:.2f} seconds") + + # Generate 5M cell dataset + print("\nGenerating 5M cell dataset...") + start = time.time() + cls.adata_5m = cls._generate_dataset(n_obs=5_000_000, random_state=42) + print(f" Completed in {time.time() - start:.2f} seconds") + + # Generate 10M cell dataset + print("\nGenerating 10M cell dataset...") + start = time.time() + cls.adata_10m = cls._generate_dataset(n_obs=10_000_000, random_state=42) + print(f" Completed in {time.time() - start:.2f} seconds") + print("=" * 70 + "\n") + + @staticmethod + def _generate_dataset(n_obs: int, random_state: int = 42) -> ad.AnnData: + """ + Generate a synthetic AnnData object with realistic clustering. + + Creates dataset with: + - 5 features (marker_1 to marker_5) + - 5 annotations (cell_type, phenotype, region, batch, treatment) + - 3 layers (normalized, log_transformed, scaled) + """ + np.random.seed(random_state) + + # Generate base data with natural clustering + n_features = 5 + n_centers = 5 + + X, cluster_labels = make_blobs( + n_samples=n_obs, + n_features=n_features, + centers=n_centers, + cluster_std=1.5, + random_state=random_state + ) + + # Make values positive and add variation + X = np.abs(X) + np.random.exponential(scale=2.0, size=X.shape) + + # Create feature names + feature_names = [f"marker_{i+1}" for i in range(n_features)] + + # Create annotations based on clusters + cell_types = [f"Type_{chr(65+i)}" for i in range(5)] + cell_type = np.array([cell_types[i % 5] for i in cluster_labels]) + + phenotypes = [f"Pheno_{i+1}" for i in range(4)] + phenotype = np.array([phenotypes[i % 4] for i in cluster_labels]) + random_mask = np.random.random(n_obs) < 0.2 + phenotype[random_mask] = np.random.choice(phenotypes, size=random_mask.sum()) + + regions = ["Region_X", "Region_Y", "Region_Z"] + region = np.random.choice(regions, size=n_obs) + + batches = ["Batch_1", "Batch_2", "Batch_3"] + batch = np.random.choice(batches, size=n_obs) + + treatments = ["Control", "Treated"] + treatment = np.random.choice(treatments, size=n_obs, p=[0.5, 0.5]) + + # Create observations DataFrame + obs = pd.DataFrame({ + 'cell_type': pd.Categorical(cell_type), + 'phenotype': pd.Categorical(phenotype), + 'region': pd.Categorical(region), + 'batch': pd.Categorical(batch), + 'treatment': pd.Categorical(treatment) + }) + + # Create AnnData object + adata = ad.AnnData(X=X, obs=obs) + adata.var_names = feature_names + + # Create layers with different transformations + X_normalized = np.zeros_like(X) + for i in range(n_features): + feature_min = X[:, i].min() + feature_max = X[:, i].max() + X_normalized[:, i] = (X[:, i] - feature_min) / (feature_max - feature_min) + adata.layers['normalized'] = X_normalized + + adata.layers['log_transformed'] = np.log1p(X) + + scaler = StandardScaler() + adata.layers['scaled'] = scaler.fit_transform(X) + + return adata + + def tearDown(self): + """Clean up matplotlib figures after each test.""" + plt.close('all') + + def _run_comparison(self, adata, test_name): + """Run comparison between boxplot and boxplot_interactive.""" + n_obs = adata.n_obs + features = ['marker_1', 'marker_2', 'marker_3', 'marker_4', 'marker_5'] + annotation = 'cell_type' + layer = 'normalized' + + print(f"\n{'=' * 70}") + print(f"{test_name}: {n_obs:,} cells") + print(f" Features: {', '.join(features)}") + print(f" Annotation: {annotation}") + print(f" Layer: {layer}") + print(f"{'=' * 70}") + + # Test boxplot + print("\n Running boxplot...") + start = time.time() + fig, ax, df = boxplot( + adata, + features=features, + annotation=annotation, + layer=layer + ) + boxplot_time = time.time() - start + print(f" Time: {boxplot_time:.2f} seconds") + plt.close('all') + + # Test boxplot_interactive with downsampling + print("\n Running boxplot_interactive (with downsampling)...") + start = time.time() + result = boxplot_interactive( + adata, + features=features, + annotation=annotation, + layer=layer, + showfliers='downsample' + ) + interactive_time = time.time() - start + print(f" Time: {interactive_time:.2f} seconds") + + # Calculate speedup + speedup = boxplot_time / interactive_time if interactive_time > 0 else 0 + + print(f"\n Results:") + print(f" boxplot: {boxplot_time:.2f}s") + print(f" boxplot_interactive: {interactive_time:.2f}s") + print(f" Speedup factor: {speedup:.2f}x") + + if speedup > 1: + print(f" → boxplot_interactive is {speedup:.2f}x faster") + elif speedup < 1: + print(f" → boxplot is {1/speedup:.2f}x faster") + else: + print(f" → Both functions have similar performance") + + print(f"{'=' * 70}\n") + + # Store results for potential further analysis + return { + 'n_obs': n_obs, + 'boxplot_time': boxplot_time, + 'boxplot_interactive_time': interactive_time, + 'speedup_factor': speedup + } + + def test_comparison_1m(self): + """Compare boxplot vs boxplot_interactive with 1M cells.""" + self._run_comparison(self.adata_1m, "Boxplot Performance Comparison [1M cells]") + + def test_comparison_5m(self): + """Compare boxplot vs boxplot_interactive with 5M cells.""" + self._run_comparison(self.adata_5m, "Boxplot Performance Comparison [5M cells]") + + def test_comparison_10m(self): + """Compare boxplot vs boxplot_interactive with 10M cells.""" + self._run_comparison(self.adata_10m, "Boxplot Performance Comparison [10M cells]") + + +if __name__ == '__main__': + unittest.main(verbosity=2) diff --git a/tests/test_performance/test_histogram_performance.py b/tests/test_performance/test_histogram_performance.py new file mode 100644 index 00000000..308e80b3 --- /dev/null +++ b/tests/test_performance/test_histogram_performance.py @@ -0,0 +1,386 @@ +import os +import unittest +import time +import warnings +import numpy as np +import pandas as pd +import anndata as ad +import matplotlib +import matplotlib.pyplot as plt +import seaborn as sns +from sklearn.datasets import make_blobs +from sklearn.preprocessing import StandardScaler +from spac.visualization import histogram +from spac.utils import check_annotation, check_feature, check_table + +matplotlib.use('Agg') # Set the backend to 'Agg' to suppress plot window + + + + +skip_perf = unittest.skipUnless( + os.getenv("SPAC_RUN_PERF") == "1", + "Perf tests disabled by default" +) + +@skip_perf +class TestHistogramPerformance(unittest.TestCase): + """Performance comparison tests for histogram vs histogram_old.""" + + @classmethod + def setUpClass(cls): + """Generate large datasets once for all tests.""" + print("\n" + "=" * 70) + print("Setting up large datasets for histogram performance tests...") + print("=" * 70) + + # Generate 1M cell dataset + print("\nGenerating 1M cell dataset...") + start = time.time() + cls.adata_1m = cls._generate_dataset(n_obs=1_000_000, random_state=42) + print(f" Completed in {time.time() - start:.2f} seconds") + + # Generate 5M cell dataset + print("\nGenerating 5M cell dataset...") + start = time.time() + cls.adata_5m = cls._generate_dataset(n_obs=5_000_000, random_state=42) + print(f" Completed in {time.time() - start:.2f} seconds") + + # Generate 10M cell dataset + print("\nGenerating 10M cell dataset...") + start = time.time() + cls.adata_10m = cls._generate_dataset(n_obs=10_000_000, random_state=42) + print(f" Completed in {time.time() - start:.2f} seconds") + print("=" * 70 + "\n") + + @staticmethod + def _generate_dataset(n_obs: int, random_state: int = 42) -> ad.AnnData: + """ + Generate a synthetic AnnData object with realistic clustering. + + Creates dataset with: + - 5 features (marker_1 to marker_5) + - 5 annotations (cell_type, phenotype, region, batch, treatment) + - 3 layers (normalized, log_transformed, scaled) + """ + np.random.seed(random_state) + + # Generate base data with natural clustering + n_features = 5 + n_centers = 5 + + X, cluster_labels = make_blobs( + n_samples=n_obs, + n_features=n_features, + centers=n_centers, + cluster_std=1.5, + random_state=random_state + ) + + # Make values positive and add variation + X = np.abs(X) + np.random.exponential(scale=2.0, size=X.shape) + + # Create feature names + feature_names = [f"marker_{i+1}" for i in range(n_features)] + + # Create annotations based on clusters + cell_types = [f"Type_{chr(65+i)}" for i in range(5)] + cell_type = np.array([cell_types[i % 5] for i in cluster_labels]) + + phenotypes = [f"Pheno_{i+1}" for i in range(4)] + phenotype = np.array([phenotypes[i % 4] for i in cluster_labels]) + random_mask = np.random.random(n_obs) < 0.2 + phenotype[random_mask] = np.random.choice(phenotypes, size=random_mask.sum()) + + regions = ["Region_X", "Region_Y", "Region_Z"] + region = np.random.choice(regions, size=n_obs) + + batches = ["Batch_1", "Batch_2", "Batch_3"] + batch = np.random.choice(batches, size=n_obs) + + treatments = ["Control", "Treated"] + treatment = np.random.choice(treatments, size=n_obs, p=[0.5, 0.5]) + + # Create observations DataFrame + obs = pd.DataFrame({ + 'cell_type': pd.Categorical(cell_type), + 'phenotype': pd.Categorical(phenotype), + 'region': pd.Categorical(region), + 'batch': pd.Categorical(batch), + 'treatment': pd.Categorical(treatment) + }) + + # Create AnnData object + adata = ad.AnnData(X=X, obs=obs) + adata.var_names = feature_names + + # Create layers with different transformations + X_normalized = np.zeros_like(X) + for i in range(n_features): + feature_min = X[:, i].min() + feature_max = X[:, i].max() + X_normalized[:, i] = (X[:, i] - feature_min) / (feature_max - feature_min) + adata.layers['normalized'] = X_normalized + + adata.layers['log_transformed'] = np.log1p(X) + + scaler = StandardScaler() + adata.layers['scaled'] = scaler.fit_transform(X) + + return adata + + def tearDown(self): + """Clean up matplotlib figures after each test.""" + plt.close('all') + + @staticmethod + def histogram_old(adata, feature=None, annotation=None, layer=None, + group_by=None, together=False, ax=None, + x_log_scale=False, y_log_scale=False, **kwargs): + """ + Old histogram implementation for performance comparison. + + Copied from commit 1cfad52f00aa6c1b8384f727b60e3bf07f57bee6 in + visualization.py, before the refactor to histogram + """ + # If no feature or annotation is specified, apply default behavior + if feature is None and annotation is None: + feature = adata.var_names[0] + warnings.warn( + "No feature or annotation specified. " + "Defaulting to the first feature: " + f"'{feature}'.", + UserWarning + ) + + # Use utility functions for input validation + if layer: + check_table(adata, tables=layer) + if annotation: + check_annotation(adata, annotations=annotation) + if feature: + check_feature(adata, features=feature) + if group_by: + check_annotation(adata, annotations=group_by) + + # If layer is specified, get the data from that layer + if layer: + df = pd.DataFrame( + adata.layers[layer], index=adata.obs.index, columns=adata.var_names + ) + else: + df = pd.DataFrame( + adata.X, index=adata.obs.index, columns=adata.var_names + ) + layer = 'Original' + + df = pd.concat([df, adata.obs], axis=1) + + if feature and annotation: + raise ValueError("Cannot pass both feature and annotation," + " choose one.") + + data_column = feature if feature else annotation + + # Check for negative values and apply log1p transformation if x_log_scale is True + if x_log_scale: + if (df[data_column] < 0).any(): + print( + "There are negative values in the data, disabling x_log_scale." + ) + x_log_scale = False + else: + df[data_column] = np.log1p(df[data_column]) + + if ax is not None: + fig = ax.get_figure() + else: + fig, ax = plt.subplots() + + axs = [] + + # Prepare the data for plotting + plot_data = df.dropna(subset=[data_column]) + + # Bin calculation section + def cal_bin_num(num_rows): + bins = max(int(2*(num_rows ** (1/3))), 1) + print(f'Automatically calculated number of bins is: {bins}') + return(bins) + + num_rows = plot_data.shape[0] + + # Check if bins is being passed + if 'bins' not in kwargs: + kwargs['bins'] = cal_bin_num(num_rows) + + # Plotting with or without grouping + if group_by: + groups = df[group_by].dropna().unique().tolist() + n_groups = len(groups) + if n_groups == 0: + raise ValueError("There must be at least one group to create a" + " histogram.") + + if together: + kwargs.setdefault("multiple", "stack") + kwargs.setdefault("element", "bars") + + sns.histplot(data=df.dropna(), x=data_column, hue=group_by, + ax=ax, **kwargs) + if feature: + ax.set_title(f'Layer: {layer}') + axs.append(ax) + else: + fig, ax_array = plt.subplots( + n_groups, 1, figsize=(5, 5 * n_groups) + ) + + if n_groups == 1: + ax_array = [ax_array] + else: + ax_array = ax_array.flatten() + + for i, ax_i in enumerate(ax_array): + group_data = plot_data[plot_data[group_by] == groups[i]] + + sns.histplot(data=group_data, x=data_column, ax=ax_i, **kwargs) + if feature: + ax_i.set_title(f'{groups[i]} with Layer: {layer}') + else: + ax_i.set_title(f'{groups[i]}') + + if y_log_scale: + ax_i.set_yscale('log') + + if x_log_scale: + xlabel = f'log({data_column})' + else: + xlabel = data_column + ax_i.set_xlabel(xlabel) + + stat = kwargs.get('stat', 'count') + ylabel_map = { + 'count': 'Count', + 'frequency': 'Frequency', + 'density': 'Density', + 'probability': 'Probability' + } + ylabel = ylabel_map.get(stat, 'Count') + if y_log_scale: + ylabel = f'log({ylabel})' + ax_i.set_ylabel(ylabel) + + axs.append(ax_i) + else: + sns.histplot(data=plot_data, x=data_column, ax=ax, **kwargs) + if feature: + ax.set_title(f'Layer: {layer}') + axs.append(ax) + + if y_log_scale: + ax.set_yscale('log') + + if x_log_scale: + xlabel = f'log({data_column})' + else: + xlabel = data_column + ax.set_xlabel(xlabel) + + stat = kwargs.get('stat', 'count') + ylabel_map = { + 'count': 'Count', + 'frequency': 'Frequency', + 'density': 'Density', + 'probability': 'Probability' + } + ylabel = ylabel_map.get(stat, 'Count') + if y_log_scale: + ylabel = f'log({ylabel})' + ax.set_ylabel(ylabel) + + if len(axs) == 1: + return fig, axs[0] + else: + return fig, axs + + def _run_comparison(self, adata, test_name): + """Run comparison between histogram_old and histogram.""" + n_obs = adata.n_obs + feature = 'marker_1' + annotation = None + layer = 'normalized' + + print(f"\n{'=' * 70}") + print(f"{test_name}: {n_obs:,} cells") + print(f" Feature: {feature}") + print(f" Annotation: {annotation}") + print(f" Layer: {layer}") + print(f"{'=' * 70}") + + # Test histogram_old + print("\n Running histogram_old...") + start = time.time() + fig_old, ax_old = self.histogram_old( + adata, + feature=feature, + annotation=annotation, + layer=layer + ) + old_time = time.time() - start + print(f" Time: {old_time:.2f} seconds") + plt.close('all') + + # Test histogram from SPAC + print("\n Running histogram (SPAC)...") + start = time.time() + result = histogram( + adata, + feature=feature, + annotation=annotation, + layer=layer + ) + new_time = time.time() - start + print(f" Time: {new_time:.2f} seconds") + plt.close('all') + + # Calculate speedup + speedup = old_time / new_time if new_time > 0 else 0 + + print(f"\n Results:") + print(f" histogram_old: {old_time:.2f}s") + print(f" histogram: {new_time:.2f}s") + print(f" Speedup factor: {speedup:.2f}x") + + if speedup > 1: + print(f" → histogram (SPAC) is {speedup:.2f}x faster") + elif speedup < 1: + print(f" → histogram_old is {1/speedup:.2f}x faster") + else: + print(f" → Both functions have similar performance") + + print(f"{'=' * 70}\n") + + # Store results for potential further analysis + return { + 'n_obs': n_obs, + 'histogram_old_time': old_time, + 'histogram_time': new_time, + 'speedup_factor': speedup + } + + def test_comparison_1m(self): + """Compare histogram_old vs histogram with 1M cells.""" + self._run_comparison(self.adata_1m, "Histogram Performance Comparison [1M cells]") + + def test_comparison_5m(self): + """Compare histogram_old vs histogram with 5M cells.""" + self._run_comparison(self.adata_5m, "Histogram Performance Comparison [5M cells]") + + def test_comparison_10m(self): + """Compare histogram_old vs histogram with 10M cells.""" + self._run_comparison(self.adata_10m, "Histogram Performance Comparison [10M cells]") + + +if __name__ == '__main__': + unittest.main(verbosity=2) diff --git a/tests/test_transformations/test_add_qc_metrics.py b/tests/test_transformations/test_add_qc_metrics.py new file mode 100644 index 00000000..65d650fb --- /dev/null +++ b/tests/test_transformations/test_add_qc_metrics.py @@ -0,0 +1,62 @@ +import unittest +import numpy as np +import scanpy as sc +from scipy.sparse import csr_matrix +from spac.transformations import add_qc_metrics + +class TestAddQCMetrics(unittest.TestCase): + @classmethod + def setUpClass(cls): + np.random.seed(42) + + def create_test_adata(self, sparse=False): + X = np.array([ + [1, 0, 3, 0], + [0, 2, 0, 4], + [5, 0, 0, 6] + ]) + var_names = ["MT-CO1", "MT-CO2", "GeneA", "GeneB"] + obs_names = ["cell1", "cell2", "cell3"] + adata = sc.AnnData(X=csr_matrix(X) if sparse else X) + adata.var_names = var_names + adata.obs_names = obs_names + return adata + + def test_qc_metrics_dense(self): + adata = self.create_test_adata(sparse=False) + add_qc_metrics(adata, organism="hs") + self.assertIn("nFeature", adata.obs) + self.assertIn("nCount", adata.obs) + self.assertIn("nCount_mt", adata.obs) + self.assertIn("percent.mt", adata.obs) + np.testing.assert_array_equal(adata.obs["nFeature"].values, [2, 2, 2]) + np.testing.assert_array_equal(adata.obs["nCount"].values, [4, 6, 11]) + np.testing.assert_array_equal(adata.obs["nCount_mt"].values, [1, 2, 5]) + np.testing.assert_allclose(adata.obs["percent.mt"].values, + [25.0, 33.333333, 45.454545], rtol=1e-4) + + def test_qc_metrics_sparse(self): + adata = self.create_test_adata(sparse=True) + add_qc_metrics(adata, organism="hs") + self.assertIn("nFeature", adata.obs) + self.assertIn("nCount", adata.obs) + self.assertIn("nCount_mt", adata.obs) + self.assertIn("percent.mt", adata.obs) + np.testing.assert_array_equal(adata.obs["nFeature"].values, [2, 2, 2]) + np.testing.assert_array_equal(adata.obs["nCount"].values, [4, 6, 11]) + np.testing.assert_array_equal(adata.obs["nCount_mt"].values, [1, 2, 5]) + np.testing.assert_allclose(adata.obs["percent.mt"].values, + [25.0, 33.333333, 45.454545], rtol=1e-4) + + def test_custom_mt_pattern(self): + adata = self.create_test_adata() + add_qc_metrics(adata, mt_match_pattern="Gene") + np.testing.assert_array_equal(adata.obs["nCount_mt"].values, [3, 4, 6]) + + def test_invalid_layer(self): + adata = self.create_test_adata() + with self.assertRaises(ValueError): + add_qc_metrics(adata, layer="not_a_layer") + +if __name__ == "__main__": + unittest.main() \ No newline at end of file diff --git a/tests/test_transformations/test_get_qc_summary_table.py b/tests/test_transformations/test_get_qc_summary_table.py new file mode 100644 index 00000000..2894a3af --- /dev/null +++ b/tests/test_transformations/test_get_qc_summary_table.py @@ -0,0 +1,95 @@ +import unittest +import numpy as np +import pandas as pd +import scanpy as sc +from anndata import AnnData +from spac.transformations import add_qc_metrics +from spac.transformations import get_qc_summary_table + +class TestGetQCSummaryTable(unittest.TestCase): + @classmethod + def setUpClass(cls): + # Set a random seed for reproducibility + np.random.seed(42) + + # Create a small AnnData object for testing + def create_test_adata(self): + X = np.array([ + [1, 0, 3, 0], + [0, 2, 0, 4], + [5, 0, 0, 6] + ]) + var_names = ["MT-CO1", "MT-CO2", "GeneA", "GeneB"] + obs_names = ["cell1", "cell2", "cell3"] + adata = AnnData(X=X) + adata.var_names = var_names + adata.obs_names = obs_names + # Compute QC metrics using the provided function + add_qc_metrics(adata) + return adata + + # Test that the summary table is created and has the correct structure + def test_qc_summary_table_basic(self): + adata = self.create_test_adata() + get_qc_summary_table(adata) + summary = adata.uns["qc_summary_table"] + self.assertIn("qc_summary_table", adata.uns) + self.assertTrue(isinstance(summary, pd.DataFrame)) + # Check that all expected columns are present + self.assertIn("mean", summary.columns) + self.assertIn("median", summary.columns) + self.assertIn("upper_mad", summary.columns) + self.assertIn("lower_mad", summary.columns) + self.assertIn("upper_quantile", summary.columns) + self.assertIn("lower_quantile", summary.columns) + self.assertIn("Sample", summary.columns) + # Check that the correct metrics are summarized + self.assertEqual(set(summary["metric_name"]), + {"nFeature", "nCount", "percent.mt"}) + # Check that the sample label is correct when not grouping + self.assertEqual(summary["Sample"].iloc[0], "All") + + # Test that a TypeError is raised if a non-numeric column is included + def test_qc_summary_table_non_numeric(self): + adata = self.create_test_adata() + adata.obs["non_numeric"] = ["a", "b", "c"] + with self.assertRaises(TypeError) as exc_info: + get_qc_summary_table(adata, + stat_columns_list=["nFeature", "non_numeric"]) + expected_msg = 'Column "non_numeric" must be numeric to compute statistics.' + self.assertEqual(str(exc_info.exception), expected_msg) + + # Test that summary statistics is computed correctly with + # sample_column grouping + def test_qc_summary_table_grouping(self): + adata = self.create_test_adata() + get_qc_summary_table(adata) + # Add a sample column with two groups + adata.obs["batch"] = ["A", "A", "B"] + get_qc_summary_table(adata, sample_column="batch") + summary = adata.uns["qc_summary_table"] + # There should be two groups: A and B + self.assertEqual(set(summary["Sample"]), {"A", "B"}) + # For group A (cells 0 and 1): nCount = [4, 6] + group_a = summary[(summary["Sample"] == "A") & + (summary["metric_name"] == "nCount")].iloc[0] + self.assertAlmostEqual(group_a["mean"], 5.0) + self.assertAlmostEqual(group_a["median"], 5.0) + # For group B (cell 2): nCount = [11] + group_b = summary[(summary["Sample"] == "B") & + (summary["metric_name"] == "nCount")].iloc[0] + self.assertAlmostEqual(group_b["mean"], 11.0) + self.assertAlmostEqual(group_b["median"], 11.0) + + #Test that ValueError is raised if stat_columns_list is empty + def test_qc_summary_table_empty_stat_columns_list(self): + adata = self.create_test_adata() + with self.assertRaises(ValueError) as exc_info: + get_qc_summary_table(adata, stat_columns_list=[]) + expected_msg = ( + 'Parameter "stat_columns_list" must contain at least one column name.' + ) + self.assertEqual(str(exc_info.exception), expected_msg) + +if __name__ == "__main__": + unittest.main() \ No newline at end of file diff --git a/tests/test_utils/test_compute_summary_qc_stats.py b/tests/test_utils/test_compute_summary_qc_stats.py new file mode 100644 index 00000000..9ae628d4 --- /dev/null +++ b/tests/test_utils/test_compute_summary_qc_stats.py @@ -0,0 +1,74 @@ +import unittest +import numpy as np +import pandas as pd +from spac.utils import compute_summary_qc_stats + +class TestComputeSummaryQCStats(unittest.TestCase): + def setUp(self): + # Create a simple DataFrame for testing + self.df = pd.DataFrame({ + "nFeature": [2, 2, 2], + "nCount": [4, 6, 11], + "percent.mt": [25.0, 33.33333333333333, 45.45454545454545], + "all_nan": [np.nan, np.nan, np.nan], + "non_numeric": ["a", "b", "c"] + }) + + # Test that summary statistics are computed correctly for nFeature + def test_basic_statistics(self): + result = compute_summary_qc_stats(self.df, + stat_columns_list=["nFeature"]) + row = result.iloc[0] + self.assertEqual(row["mean"], 2) + self.assertEqual(row["median"], 2) + self.assertEqual(row["upper_mad"], 2) + self.assertEqual(row["lower_mad"], 2) + self.assertEqual(row["upper_quantile"], 2) + self.assertEqual(row["lower_quantile"], 2) + + # Test that summary statistics are computed correctly for nCount + def test_ncount_statistics(self): + # nCount: [4, 6, 11] -> mean 7.0, median 6.0, 95th pct 10.5, 5th pct 4.2 + result = compute_summary_qc_stats(self.df, + stat_columns_list=["nCount"]) + row = result.iloc[0] + self.assertAlmostEqual(row["mean"], 7.0) + self.assertAlmostEqual(row["median"], 6.0) + self.assertAlmostEqual(row["upper_quantile"], 10.5) + self.assertAlmostEqual(row["lower_quantile"], 4.2) + + # Test that summary statistics are computed correctly for percent.mt + def test_percent_mt_statistics(self): + # percent.mt: [25.0, 33.33333333333333, 45.45454545454545] -> + # mean 34.59596, median 33.33333, upper_quantile 44.24242, + # lower_quantile 25.83333 + result = compute_summary_qc_stats(self.df, + stat_columns_list=["percent.mt"]) + row = result.iloc[0] + self.assertAlmostEqual(row["mean"], 34.59596, places=5) + self.assertAlmostEqual(row["median"], 33.33333, places=5) + self.assertAlmostEqual(row["upper_quantile"], 44.24242, places=5) + self.assertAlmostEqual(row["lower_quantile"], 25.83333, places=5) + + # Test that a TypeError is raised if a non-numeric column is included + def test_non_numeric_column_raises(self): + with self.assertRaises(TypeError) as exc_info: + compute_summary_qc_stats(self.df, + stat_columns_list=["non_numeric"]) + expected_msg = ( + 'Column "non_numeric" must be numeric to compute statistics.' + ) + self.assertEqual(str(exc_info.exception), expected_msg) + + # Test that all-NaN columns are handled gracefully + def test_all_nan_column_raises(self): + with self.assertRaises(TypeError) as exc_info: + compute_summary_qc_stats(self.df, stat_columns_list=["all_nan"]) + expected_msg = ( + 'Column "all_nan" must be numeric to compute statistics. ' + 'All values are NaN.' + ) + self.assertEqual(str(exc_info.exception), expected_msg) + +if __name__ == "__main__": + unittest.main()