agentD is an open-source Python package designed to accelerate drug discovery workflows using Large Language Models (LLMs) and AI-driven tools. It provides modular agents and utilities for tasks such as literature extraction, molecular property prediction, molecule generation, and more. agentD integrates with external APIs (e.g., OpenAI, Serper) and cheminformatics libraries, enabling both automated and interactive research pipelines.
-
Clone the repository:
git clone https://github.com/hoon-ock/llm-dd.git cd llm_dd -
Create and activate a conda environment (recommended):
conda create -n agentd python=3.10 -y conda activate agentd
-
Install dependencies in editable mode:
pip install -e .Or, to install all dependencies directly:
pip install -r requirements.txt
-
Install REINVENT4 (required for some tools):
git clone https://github.com/MolecularAI/REINVENT4.git cd REINVENT4 python install.py --help python install.py cu124 # or rocm6.2.4, cpu, mac, etc.
-
API Keys:
After installation, copy the template file and fill in your API keys:cp configs/secret_keys.py.example configs/secret_keys.py
Then edit
configs/secret_keys.pywith your Serper API key and OpenAI API key:# configs/secret_keys.py serper_api_key = "YOUR_SERPER_API_KEY" openai_api_key = "YOUR_OPENAI_API_KEY"
-
Global Variables:
The file configs/tool_globals.py contains global variables used by the tools. You can edit this file to adjust default behaviors and settings.
AgentD can be run as an MCP (Model Context Protocol) server for automated end-to-end drug discovery pipelines.
# Run full pipeline with config
conda run -n agentd python run_agentd.py --config pipeline_config.yaml
# Run in Q&A mode (interactive RAG-based research Q&A)
conda run -n agentd python run_agentd.py --qna --config pipeline_config.yamlEdit pipeline_config.yaml to customize your run:
protein: "BCL-2"
disease: "chronic lymphocytic leukemia"
iterations: 2
num_smiles: 20 # Candidates per model (use 2-5 for testing)
run_boltz: true # Generate 3D structures
boltz_top_k: 10
model: "gpt-4o"The pipeline will:
- Extract drug information using LLM (discovers drug name, UniProt ID, FASTA, SMILES)
- Pool candidate molecules using REINVENT (Mol2Mol + Reinvent models)
- Iterate through prediction (affinity + ADMET) and LLM-driven refinement
- Select final candidates based on drug-likeness filters (Oprea, Lipinski, Veber, Ghose, QED, pKd)
- Generate 3D protein-ligand structures with Boltz (if enabled)
Results are saved in runs/<run_id>/ with boltz_candidates.csv containing the final filtered candidates.
Note: To reproduce results from the paper, use release v1.0
Example Jupyter notebooks demonstrating step-by-step workflows are in example/test_case/:
1. extraction.ipynb– Data extraction and retrieval2. qna.ipynb– Domain-specific question answering3. pooling.ipynb– Molecule pooling4. prediction.ipynb– Molecular property prediction5. refinement.ipynb– SMILES refinement6. generation.ipynb– Protein-ligand 3D structure generation
This project is licensed under the MIT License.
- Make sure to set up your API keys before running any LLM agent notebooks.
- For any additional dependencies (e.g., REINVENT4), follow the instructions above.
- If you encounter missing package errors, check that all dependencies in requirements.txt are installed.
If you use agentD in your research or project, please cite:
(soon to be updated)
@misc{ock2025agentD,
title={Large Language Model Agent for Modular Task Execution in Drug Discovery},
author={Janghoon Ock and Radheesh Sharma Meda and Srivathsan Badrinarayanan and Neha S. Aluru and Achuth Chandrasekhar and Amir Barati Farimani},
year={2025},
eprint={2507.02925},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2507.02925},
}For questions, suggestions, or support, please contact:
Email: jock@andrew.cmu.edu
