Materials and scripts for building cell type encyclopedia table
Make sure that the latest models are uploaded to s3://celltypist/models/*/. Put all shareable models in a local folder (a subset of s3 models), and run the following:
python src/generate_json_from_latest_models.py /path/to/local_model_folder(Find all settings within the 'setting here' section <- no need to change in most cases)
New json file will stay in json/models.json. Upload to s3://celltypist/models/.
Run the following command with the atlas name (e.g. Pan_Immune_CellTypist) and version (e.g. v1).
python src/generate_encyclopedia.py an_atlas_name a_versionAll settings can be found in the configuration file specific to each atlas (atlases/an_atlas_name/a_version/config/Encyclo.config), including:
filter_out: cell types with <filter_outcells from a tissue-dataset combination are removed (no such cell type in the given tissue and dataset).model: model to extract top marker genes. Make sure the model of interest is exported in CellTypist (or use a local model).no_celltypes: number of cell types to double-check with the meta csv file and with the model.
Details of the four tables specific to each atlas used during the execution can be found in the sections below (Images and Other tables).
The resulting table will stay in atlases/an_atlas_name/a_version/encyclopedia/encyclopedia_table.xlsx, and database in atlases/an_atlas_name/a_version/encyclopedia/encyclopedia.db. Upload the latter to s3://celltypist/atlases/an_atlas_name/a_version/.
Run the following command with the atlas name (e.g. Pan_Immune_CellTypist) and version (e.g. v1).
python src/generate_Heatmap_data.py an_atlas_name a_versionAll settings can be found in the configuration file specific to each atlas (atlases/an_atlas_name/a_version/config/Heatmap.config), including:
adata_path: path to the AnnData.tissue_column: cell metadata column specifying tissue/organ information.celltype_column: cell metadata column specifying cell type information.use_raw: whether to use the.rawattribute for expression matrix in the AnnData.filter_out: cell types with <=filter_outcells from a tissue-celltype combination are thought as non-existing (black grids in the heat map).do_normalize: log-normalise (to 1e4) the data if the AnnData is provided in raw counts.
Tissue and cell type orders are defined in the atlases/an_atlas_name/a_version/Heatmap_data/tissue_order.txt and atlases/an_atlas_name/a_version/Heatmap_data/celltype_order.txt, respectively.
Heatmap data will stay in atlases/an_atlas_name/a_version/Heatmap_data/exp_pct_celltypist.pkl. Upload to s3://celltypist/atlases/an_atlas_name/a_version/.
Images are in images/*.png. White background, 842 x 736 (pixels).
Correspondence between cell type names and images for a given atlas is in atlases/an_atlas_name/a_version/tables/celltype_to_image.csv (no headers).
atlases/an_atlas_name/a_version/tables/Basic_celltype_information.xlsx: free text of basic cell type information. Headers must be High-hierarchy cell types, Low-hierarchy cell types, Description, Cell Ontology ID and Curated markers.
atlases/an_atlas_name/a_version/tables/celltypist_meta.csv: cell meta-information for deriving the tissue and dataset information (e.g. adata.obs[['CellType', 'Tissue', 'Dataset']].to_csv('celltypist_meta.csv', header=True, index=False)). Header names are arbitrary, but should be in such a order (<-).
atlases/an_atlas_name/a_version/tables/dataset_to_PMID.csv: link/paper of each data set. No headers. Datasets without available PMIDs can have urls instead.