Skip to content
Merged

Dev #10

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 47 additions & 43 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,24 +40,24 @@ jobs:
- name: Run Pylint
run: find . -type f -name "*.py" | xargs pylint --disable=import-error,missing-module-docstring,invalid-name,not-callable,duplicate-code --load-plugins=pylint.extensions.docparams

# tests:
# runs-on: ubuntu-22.04
# container:
# image: ghcr.io/bastien-mva/docker_image:latest
# steps:
# - uses: actions/checkout@v4
# - name: Install package locally and run tests
# run: |
# pip install '.[tests]'
# pip install -e .
# jupyter nbconvert Getting_started.ipynb --to python --output tests/untestable_getting_started
# cd tests
# python _create_readme_getting_started_and_docstrings_tests.py
# pytest --cov --cov-branch --cov-report=xml .
# - name: Upload coverage reports to Codecov
# uses: codecov/codecov-action@v5
# with:
# token: ${{ secrets.CODECOV_TOKEN }}
tests:
runs-on: ubuntu-22.04
container:
image: ghcr.io/bastien-mva/docker_image:latest
steps:
- uses: actions/checkout@v4
- name: Install package locally and run tests
run: |
pip install '.[tests]'
pip install -e .
jupyter nbconvert Getting_started.ipynb --to python --output tests/untestable_getting_started
cd tests
python _create_readme_getting_started_and_docstrings_tests.py
pytest --cov --cov-branch --cov-report=xml .
- name: Upload coverage reports to Codecov
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}


build_package:
Expand All @@ -80,30 +80,30 @@ jobs:
name: dist
path: dist/

# publish_package:
# runs-on: ubuntu-22.04
# needs:
# - build_package
# - tests
# if: github.event_name == 'release'
# steps:
# - uses: actions/checkout@v4
# - name: Set up Python
# uses: actions/setup-python@v4
# with:
# python-version: '3.9'
# - name: Install Twine
# run: pip install twine
# - name: download artifacts and publish
# uses: actions/download-artifact@v4
# with:
# name: dist
# path: dist/
# - name: Publish package
# env:
# TWINE_USERNAME: __token__
# TWINE_PASSWORD: ${{ secrets.PYPLN_TOKEN }}
# run: python -m twine upload dist/*
publish_package:
runs-on: ubuntu-22.04
needs:
- build_package
- tests
if: github.event_name == 'release'
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: Install Twine
run: pip install twine
- name: download artifacts and publish
uses: actions/download-artifact@v4
with:
name: dist
path: dist/
- name: Publish package
env:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.PYPLN_TOKEN }}
run: python -m twine upload dist/*

pages:
runs-on: ubuntu-22.04
Expand All @@ -122,7 +122,10 @@ jobs:
wget https://github.com/jgm/pandoc/releases/download/1.15.1/pandoc-1.15.1-1-amd64.deb
sudo dpkg -i pandoc-1.15.1-1-amd64.deb
- name: Convert README
run: pandoc README.md --from markdown --to rst -s -o docs/source/readme.rst
run: |
pandoc README.md --from markdown --to rst -s -o docs/source/readme.rst
echo "HEEEEEERE"
cat docs/source/readme.rst
- name: Build docs
run: |
pip install .
Expand All @@ -136,3 +139,4 @@ jobs:
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: ./docs/build/html
force: true
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,9 @@ that an `R` version of the package is available [here](https://pln-team.github.i
```sh
pip install pyPLNmodels
```
The package depends on resource-intensive libraries like `torch`, so it may
require significant storage space.


## Statistical description

Expand Down
1 change: 1 addition & 0 deletions docs/source/tutorials/_quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,4 @@ format:
css: custom_css_yml.css
theme: cosme
code-copy: true
# page-navigation: true
2,239 changes: 962 additions & 1,277 deletions docs/source/tutorials/autoreg.html

Large diffs are not rendered by default.

2,619 changes: 1,146 additions & 1,473 deletions docs/source/tutorials/basic_analysis.html

Large diffs are not rendered by default.

57 changes: 38 additions & 19 deletions docs/source/tutorials/basic_analysis.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ and the model parameters are:
These models aim to capture the structure of the data through the latent variables $Z$.

The `Pln` model assumes $\Sigma$ has full rank, while the `PlnPCA` model
assumes $\Sigma$ has a low rank, which must be specified by the user. A lower
assumes $\Sigma$ has a low rank, which must be specified by the user.
A lower rank introduces a trade-off, reducing computational complexity but potentially
compromising parameter estimation accuracy.

Expand All @@ -61,16 +61,20 @@ The `pyPLNmodels` package is designed to:
* Retrieve the latent variables $Z$ (which typically contains more information than $Y$)
* Visualize the latent variables and their relationships

This is achieved using the input count matrix $Y$, along with optional covariate matrix $X$ (defaulting to a vector of 1's) and offsets $O$ (defaulting to a matrix of 0's).
This is achieved using the input count matrix $Y$, along with optional covariate matrix $X$ (defaulting to a vector of 1s) and offsets $O$ (defaulting to a matrix of 0s).


# Importing Data

In this example, we analyze single-cell RNA-seq data provided by the `load_scrna` function in the package. Each column in the dataset represents a gene, while each row corresponds to a cell (i.e., an individual). Covariates for cell types (`labels`) are also included. For simplicity, we limit the analysis to 20 variables (dimensions).
In this example, we analyze single-cell RNA-seq data provided by the
`load_scrna` function in the package. Each column in the dataset represents a
gene, while each row corresponds to a cell (i.e., an individual). Covariates
for cell types (`labels`) are also included. For simplicity, we limit the
analysis to $10$ variables (dimensions).

```{python}
from pyPLNmodels import load_scrna
rna = load_scrna(dim=20)
rna = load_scrna(dim=10)
print('Data: ', rna.keys())
```

Expand Down Expand Up @@ -118,17 +122,15 @@ To gain deeper insights into the model parameters and the optimization process,
pln.show()
```

Monitoring the norm of each parameters allows to know if the model has
converged. If it has not converged, one can refit the model with a lower
tolerance (`tol`) and a bigger number iterations than the default (`maxiter=400`):
Monitoring the norm of each parameter is essential to assess model convergence.
If the model has not converged, consider refitting it with additional iterations and
a reduced tolerance (`tol`). To adjust the number of iterations, use the
`maxiter` parameter:

```{python}
#|eval : false
pln.fit(maxiter=1000, tol = 0)
pln.fit(maxiter=1000, tol = 0).show()
```



## Exploring Latent Variables

The latent variables $Z$, which capture the underlying structure of the data, are accessible via the `latent_variables` attribute, or the `.transform()` method:
Expand All @@ -139,20 +141,23 @@ Z = pln.transform()
print('Shape of Z:', Z.shape)
```

The effect of covariates on the latent variables can be removed by using the `remove_exog_effect` keyword:

You can visualize these latent variables using the `.viz()` method:

```{python}
Z_moins_XB = pln.transform(remove_exog_effect=True)
pln.viz(colors=cell_type)
```


You can visualize these latent variables using the `.viz()` method:
By default the effect of covariates on the latent variables is included in the
visualization. This means that the latent variables are represented as $Z +
XB$. The effect of covariates on the latent variables can be removed by using
the `remove_exog_effect` keyword:

```{python}
pln.viz(colors=cell_type)
Z_moins_XB = pln.transform(remove_exog_effect=True)
```

To visualize the latent positions without the effect of covariates (i.e., \(Z - XB\)), set the `remove_exog_effect` parameter to `True`:
To visualize the latent positions without the effect of covariates (i.e., \(Z - XB\)), set the `remove_exog_effect` parameter to `True` in the `.viz()` method:

```{python}
pln.viz(colors=cell_type, remove_exog_effect=True)
Expand All @@ -163,6 +168,7 @@ Additionally, you can generate a pair plot of the first Principal Components (PC
```{python}
pln.pca_pairplot(n_components=4, colors=cell_type)
```
The `remove_exog_effect` parameter is also available in the `pca_pairplot` method.

# Analyzing Covariate Effects

Expand All @@ -175,6 +181,8 @@ To summarize the model, including confidence intervals and p-values, use the `su
```{python}
pln.summary()
```
The p-value corresponds to the coding used in one-hot encoding, with
`Macrophages` set as the reference category.

You can also visualize confidence intervals for regression coefficients using the `plot_regression_forest` method:

Expand Down Expand Up @@ -220,12 +228,23 @@ pca = PlnPCA.from_formula('endog ~ 1 + labels', data=high_d_rna, rank=5).fit()

**⚠️ Note:** P-values are not available in the `PlnPCA` model.


```{python}
print(pca)
```

This model is particularly efficient for high-dimensional datasets, offering significantly reduced computation time compared to `Pln`:
A low-dimensional of dimension `rank` of the latent variables can be obtained using the `project` keyword of the `.transform()` method:

```{python}
Z_low_dim = pca.transform(project=True)
print('Shape of Z_low_dim:', Z_low_dim.shape)
```



This model is particularly efficient for high-dimensional datasets, offering
significantly reduced computation time compared to `Pln`. See [this
paper](https://joss.theoj.org/papers/10.21105/joss.06969) for a computational
comparison between `Pln` and `PlnPCA`

## Selecting the Rank

Expand Down
Loading