Applications of Artificial Intelligence in Bioinformatics
Agentic Proteomics is a Streamlit application for accession-guided structural proteomics analysis.
The app combines:
- a dropdown-first UI for selecting a protein accession from a dataset
- retrieval-augmented generation (RAG) using accession-specific rows from
VNMX_LiP_DA.csv - a LangGraph ReAct workflow for deciding when to call structure tools
- live structure-source queries against:
- PDB
- AlphaFold DB
- AlphaFill
- SWISS-MODEL / 3D-Beacons
- deterministic ranking logic to choose the best candidate structure
Given a protein accession from the dataset, the app:
- retrieves accession-specific experimental context from the CSV
- asks an LLM to reason over that context
- allows the LLM to call structure tools when needed
- collects returned structure candidates
- ranks the candidates using explicit code-based scoring
- returns the best available structure source with an explanation
The app uses a LangGraph state machine with the following major steps:
load_accession_contextreact_modeltool_nodecollect_candidatesvalidate_and_rankfinalize_answer
The ReAct loop continues until the model no longer requests tools.
- Streamlit app
- accession selected from a dropdown built from
PG.ProteinAccessions
- accession-specific rows retrieved from
VNMX_LiP_DA.csv - selected columns are converted to compact markdown and passed to the model
- local Ollama model via
ChatOllama - tools bound to the model:
query_pdbquery_alphafoldquery_alphafillquery_swiss_model
Candidate structures are scored deterministically using:
- coverage
- resolution
- confidence
- ligand/cofactor context
- source-specific priority
The graph state tracks:
accessionrag_contextmessagesstructure_candidatesbest_candidatefinal_answererrors
START
↓
load_accession_context
↓
react_plan
├── if tools requested → execute_tools
└── if done / no tools → validate_and_rank
↓
validate_and_rank
↓
finalize_answer
↓
END