Skip to content
Merged
4 changes: 2 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ jobs:
shell: bash -l {0}
run: |
echo "Download combinatorial library from zenodo..."
wget -q -O data/combinatorial_library/combinatorial_library.tar.bz2 https://zenodo.org/record/17368450/files/combinatorial_library.tar.bz2?download=1
wget -q -O data/combinatorial_library/combinatorial_library.tar.bz2 https://zenodo.org/record/18386001/files/combinatorial_library.tar.bz2?download=1
ls -l data/combinatorial_library/
echo "Decompress selected files..."
tar -xvf data/combinatorial_library/combinatorial_library.tar.bz2 combinatorial_library/combinatorial_library_deduplicated.json combinatorial_library/chembl_standardized_inchi.csv
Expand All @@ -74,6 +74,6 @@ jobs:
shell: bash -l {0}
run: |
PYTEST_ARGS="--nbval-lax --nbval-current-env --nbval-cell-timeout=3600"
PYTEST_IGNORE="--ignore=notebooks/custom_kinfraglib/2_4_custom_filters_paper.ipynb --ignore=notebooks/custom_kinfraglib/1_4_custom_filters_pairwise_retrosynthesizability.ipynb --ignore=notebooks/custom_kinfraglib/2_1_custom_filters_pipeline.ipynb"
PYTEST_IGNORE="--ignore=notebooks/custom_kinfraglib/2_5_custom_filters_paper.ipynb --ignore=notebooks/custom_kinfraglib/1_4_custom_filters_pairwise_retrosynthesizability.ipynb --ignore=notebooks/custom_kinfraglib/2_1_custom_filters_pipeline.ipynb --ignore=notebooks/custom_kinfraglib/2_4_custom_filters_enumeration_analysis.ipynb"

pytest $PYTEST_ARGS $PYTEST_IGNORE
10 changes: 7 additions & 3 deletions data/combinatorial_library/README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,20 @@
# KinFragLib: Combinatorial library

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.17368450.svg)](https://doi.org/10.5281/zenodo.17368450)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18386001.svg)](https://doi.org/10.5281/zenodo.18386001)

This folder is meant for the metadata and properties of the KinFragLib combinatorial library, which is based on the KinFragLib fragment library at https://github.com/volkamerlab/KinFragLib. This dataset is used for the analysis of the combinatorial library.

**Note**: Since this dataset contains large files, we provide it outside this repository at https://zenodo.org/records/17368450 (DOI: 10.5281/zenodo.17368450, v2.0.2).
**Note**: Since this dataset contains large files, we provide it outside this repository at https://zenodo.org/records/18386001 (DOI: 10.5281/zenodo.18386001, v2.0.3).
In order to run the analysis notebooks, please download this dataset to this folder.

## Raw data

- `combinatorial_library.json`: Full combinatorial library, please refer to `notebooks/kinfraglib/4_1_combinatorial_library_data_preparation.ipynb` at https://github.com/volkamerlab/KinFragLib for detailed information about this data format
- `combinatorial_library_deduplicated.json`: Deduplicated combinatorial library (based on InChIs)
- `chembl_standardized_inchi.csv`: Standardized ChEMBL 36 molecules in the form of InChI strings.
- `chembl_standardized_inchi.csv`: Standardized ChEMBL36 molecules in the form of InChI strings.
- `KLIFS_download_summary.csv`: PDB codes of all KLIFS structures used to generate the KinFragLib fragmentation library.
- `combinatorial_library_custom_sampled.sdf`: Combinatorial library created from a subset of CustomKinFragLib fragments.
- `combinatorial_library_rejected_sampled.sdf`: Combinatorial library created from a subset of fragments rejected by the CustomKinFragLib filtering pipeline

## Processed data

Expand All @@ -26,3 +28,5 @@ Data extracted from `combinatorial_library_deduplicated.json`, performed in `not
- `chembl_exact.json`: Ligands with exact matches in ChEMBL
- `chembl_most_similar.json`: Most similar ligand in ChEMBL for each recombined ligand
- `chembl_highly_similar.json`: Most similar ligand in ChEMBL for each recombined ligand with similarity greater than 0.9.
- `custom_enamine_search_sampled.csv`: Most similar molecule from Enamine REAL Space for each molecule in the CustomKinFragLib combinatorial library.
- `reference_enamine_search_sampled.csv`: Most similar molecule from Enamine REAL Space for each molecule in the rejected fragments combinatorial library.
20 changes: 12 additions & 8 deletions kinfraglib/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -1328,16 +1328,20 @@ def construct_ligand(fragment_ids, bond_ids, fragment_library):
ed_combo = Chem.EditableMol(combo)
replaced_dummies = []

atoms = combo.GetAtoms()

for bond in bond_ids:

dummy_1 = next(
atom for atom in combo.GetAtoms() if atom.GetProp("fragment_atom_id") == bond[0]
)
dummy_2 = next(
atom for atom in combo.GetAtoms() if atom.GetProp("fragment_atom_id") == bond[1]
)
# should be a one element lists
dummy_1_candidates = [atom for atom in combo.GetAtoms() if atom.GetProp("fragment_atom_id") == bond[0]]
dummy_2_candidates = [atom for atom in combo.GetAtoms() if atom.GetProp("fragment_atom_id") == bond[1]]

if len(dummy_1_candidates) == 0 or len(dummy_2_candidates) == 0:
raise RuntimeError(f'Dummy atoms for bond {bond} not found')
elif len(dummy_1_candidates) > 1 or len(dummy_2_candidates) > 1:
raise RuntimeError(f'This should not happen: Dummy atoms found for bond {bond} are unambigious')

dummy_1 = dummy_1_candidates[0]
dummy_2 = dummy_2_candidates[0]

atom_1 = dummy_1.GetNeighbors()[0]
atom_2 = dummy_2.GetNeighbors()[0]

Expand Down
412 changes: 357 additions & 55 deletions notebooks/custom_kinfraglib/1_3_custom_filters_synthesizability.ipynb

Large diffs are not rendered by default.

Loading
Loading