GitHub - stanuch/BioLogic: Collection of bioinformatics algorithms, data pipelines, and structural analysis tools implemented in Python

This repository contains implementations of algorithms, data pipelines, and analysis tools developed during the Bioinformatics 2 curriculum at the Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University. The focus is on the mathematical implementation of biological algorithms and efficient data processing

Overview

A major component of the coursework involved the prediction and mathematical validation of macromolecular structures. This includes an implementation of the Nussinov algorithm, which utilizes dynamic programming for RNA secondary structure prediction based on base-pair maximization.

To validate structural models, the Kabsch algorithm was implemented using Singular Value Decomposition (SVD) to calculate the Root-Mean-Square Deviation (RMSD) between atomic coordinates. Additionally, the projects utilize the PyMOL API to calculate validation metrics such as TM-scores and GDT_TS. Protein threading was performed using the I-TASSER suite.

Project Modules

Module	Topic	Key Techniques
01_Biological_Databases	Data Access & Chemoinformatics	PDB/PubChem/NCBI APIs, TF-IDF, RDKit
02_Sequence_Analysis	Sequence Alignment & Visualization	BLAST, Polars, Matplotlib
03_Genome_Analysis	Restriction Site Analysis	Biopython, E. coli genome
04_Genome_Assembly	De Novo Assembly	De Bruijn graphs, k-mers
05_Protein_Domains	Domain Architecture	PyHMMER, Pfam, HMM profiles
06_Protein_Structure_Prediction	Structure Modeling	I-TASSER, AlphaFold 2/3
07_Structure_Validation	Model Quality Assessment	Kabsch algorithm, RMSD, TM-score
08_RNA_Structure	Secondary Structure Prediction	Nussinov algorithm, Dynamic Programming
09_scRNA_Seq	Single-Cell Transcriptomics	R, Seurat-style analysis
10_Crystallography	Electron Density Analysis	Crystallographic maps

Data Engineering and Chemoinformatics

This section contains scripts for acquiring and processing biological data from public repositories (NCBI E-utils, PDB, PubChem). I used the Polars library to handle larger datasets more efficiently than standard Pandas.

The work also includes:

NLP techniques: Applying TF-IDF vectorization to scientific literature for MeSH term analysis.
Chemoinformatics: Using RDKit to compute physicochemical properties and visualize molecular structures.

Genomics and Sequence Analysis

Projects focused on understanding genome assembly logic. I implemented De Bruijn graphs to demonstrate how short sequencing reads are processed into contigs.

For protein domain analysis, I used PyHMMER to query the Pfam database. This involved applying Hidden Markov Models (HMMs) and statistical filtering (E-values, sequence coverage) to correctly identify domain architectures.

Structural Bioinformatics

This part covers the prediction and validation of macromolecular structures, including protein threading (I-TASSER) and a custom implementation of the Nussinov algorithm for RNA secondary structure prediction.

To validate structural models, I implemented the Kabsch algorithm using Singular Value Decomposition (SVD) to calculate RMSD between atomic coordinates. Additionally, I utilized the PyMOL API to compute TM-scores and GDT_TS for comparing predicted models against experimental data.

Technical Stack

Category	Tools
Data Analysis	Polars, NumPy, SciPy, Scikit-learn
Bioinformatics	Biopython, PyHMMER, RDKit, ViennaRNA
Structural Tools	PyMOL API, I-TASSER, AlphaFold
Visualization	Matplotlib, Seaborn
Environment	Jupyter Notebooks, R Markdown

Disclaimer and Missing Data Files

Warning

This repository has been reorganized from its original academic structure into a thematic portfolio. As a result, some relative file paths inside the notebooks (.ipynb) and scripts (.py) may reference directories that have been moved or renamed.

While this might prevent some scripts from executing "out of the box" without path adjustments, the logic, implementation details, and code structure remain fully intact. Also, notebooks were translated from Polish to English, so some phrasing might sound a bit unnatural or incorrect. Some leftovers of my beautiful Polish langauge may still be found in the code.

Some large data files are excluded from this repository. Each module's README contains instructions for obtaining necessary files. Key exclusions:

File	Module	Source
`e_coli.fasta`	03_Genome_Analysis	NCBI
Protein FASTAs	05_Protein_Domains	UniProt batch download
`*.cif` files	06_Protein_Structure_Prediction	AlphaFold and I-TASSER
scRNA data	09_scRNA_Seq	Course materials
`*.map` files	10_Crystallography	PDB electron density server

License

My solutions, code, and documentation are licensed under the MIT License.
Course materials and exercise descriptions (embedded in notebooks) remain the intellectual property of their respective authors at Jagiellonian University and are included here for educational and portfolio purposes with attribution.

Maintained by Aleksander Stanuch — Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Overview

Project Modules

Data Engineering and Chemoinformatics

Genomics and Sequence Analysis

Structural Bioinformatics

Technical Stack

Disclaimer and Missing Data Files

License

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
01_Biological_Databases		01_Biological_Databases
02_Sequence_Analysis		02_Sequence_Analysis
03_Genome_Analysis		03_Genome_Analysis
04_Genome_Assembly		04_Genome_Assembly
05_Protein_Domains		05_Protein_Domains
06_Protein_Structure_Prediction		06_Protein_Structure_Prediction
07_Structure_Validation		07_Structure_Validation
08_RNA_Structure		08_RNA_Structure
09_scRNA_Seq		09_scRNA_Seq
10_Crystallography		10_Crystallography
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

stanuch/BioLogic

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Overview

Project Modules

Data Engineering and Chemoinformatics

Genomics and Sequence Analysis

Structural Bioinformatics

Technical Stack

Disclaimer and Missing Data Files

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages