Problem
Currently, PIDGIN scoring functions require runtime conda environment creation and model downloads (~11GB), which fails in containerized environments where:
- Conda environment creation fails due to read-only filesystems or restricted permissions
- Runtime model downloads are problematic for reproducibility and security
- First-use delays of several minutes due to environment setup and downloads
Current Behavior
pidgin = PIDGIN(prefix="test", uniprot="P21918")
# Attempts to:
# 1. Run `conda env create -f environment.yml`
# 2. Download 11GB models from Zenodo
# 3. Launch Flask server
This fails in containers with:
Command 'conda env create -f environment.yml' returned non-zero exit status 1
Proposed Solution
Add a pre-setup utility that container builds can use:
Option 1: CLI utility
molscore-setup pidgin --download-models --create-env
Option 2: Python utility
from molscore.setup import setup_pidgin
setup_pidgin(download_models=True, create_env=True)
Option 3: Container build helper
# In Dockerfile/Singularity definition
RUN python -m molscore.setup pidgin
Implementation Details
The utility should:
-
Create conda environment using existing environment.yml:
conda env create -f /opt/conda/.../molscore/data/models/PIDGINv5/environment.yml
-
Pre-download models to ~/.pidgin_data/:
# Trigger model download without server launch
# Use existing zenodo-client dependency
-
Validate setup:
# Test that PIDGIN can instantiate without runtime setup
Benefits
- ✅ Container-friendly: No runtime conda operations needed
- ✅ Reproducible: Models baked into container at build time
- ✅ Faster startup: No first-use delays
- ✅ Offline capable: No runtime network dependencies
- ✅ Security: No runtime downloads from external sources
Current Workaround
Users must manually:
- Create conda environment using the provided
environment.yml
- Somehow trigger model downloads (unclear how)
- Hope the server-based architecture works in containers
Use Case
This is critical for HPC/cloud deployments using Singularity/Docker containers where PIDGIN scoring is needed for molecular optimization workflows.
Would appreciate guidance on the preferred approach!
Problem
Currently, PIDGIN scoring functions require runtime conda environment creation and model downloads (~11GB), which fails in containerized environments where:
Current Behavior
This fails in containers with:
Proposed Solution
Add a pre-setup utility that container builds can use:
Option 1: CLI utility
Option 2: Python utility
Option 3: Container build helper
Implementation Details
The utility should:
Create conda environment using existing
environment.yml:Pre-download models to
~/.pidgin_data/:Validate setup:
# Test that PIDGIN can instantiate without runtime setupBenefits
Current Workaround
Users must manually:
environment.ymlUse Case
This is critical for HPC/cloud deployments using Singularity/Docker containers where PIDGIN scoring is needed for molecular optimization workflows.
Would appreciate guidance on the preferred approach!