agentD

agentD is an open-source Python package designed to accelerate drug discovery workflows using Large Language Models (LLMs) and AI-driven tools. It provides modular agents and utilities for tasks such as literature extraction, molecular property prediction, molecule generation, and more. agentD integrates with external APIs (e.g., OpenAI, Serper) and cheminformatics libraries, enabling both automated and interactive research pipelines.

Installation

Clone the repository:

git clone https://github.com/hoon-ock/llm-dd.git
cd llm_dd

Create and activate a conda environment (recommended):

conda create -n agentd python=3.10 -y
conda activate agentd

Install dependencies in editable mode:
```
pip install -e .
```
Or, to install all dependencies directly:
```
pip install -r requirements.txt
```

Install REINVENT4 (required for some tools):

git clone https://github.com/MolecularAI/REINVENT4.git
cd REINVENT4
python install.py --help
python install.py cu124  # or rocm6.2.4, cpu, mac, etc.

Configuration

API Keys:
After installation, copy the template file and fill in your API keys:

cp configs/secret_keys.py.example configs/secret_keys.py

Then edit configs/secret_keys.py with your Serper API key and OpenAI API key:

# configs/secret_keys.py
serper_api_key = "YOUR_SERPER_API_KEY"
openai_api_key = "YOUR_OPENAI_API_KEY"

Global Variables:
The file configs/tool_globals.py contains global variables used by the tools. You can edit this file to adjust default behaviors and settings.

Quick Start: MCP Server

AgentD can be run as an MCP (Model Context Protocol) server for automated end-to-end drug discovery pipelines.

Running the Pipeline

# Run full pipeline with config
conda run -n agentd python run_agentd.py --config pipeline_config.yaml

# Run in Q&A mode (interactive RAG-based research Q&A)
conda run -n agentd python run_agentd.py --qna --config pipeline_config.yaml

Pipeline Configuration

Edit pipeline_config.yaml to customize your run:

protein: "BCL-2"
disease: "chronic lymphocytic leukemia"
iterations: 2
num_smiles: 20       # Candidates per model (use 2-5 for testing)
run_boltz: true      # Generate 3D structures
boltz_top_k: 10
model: "gpt-4o"

The pipeline will:

Extract drug information using LLM (discovers drug name, UniProt ID, FASTA, SMILES)
Pool candidate molecules using REINVENT (Mol2Mol + Reinvent models)
Iterate through prediction (affinity + ADMET) and LLM-driven refinement
Select final candidates based on drug-likeness filters (Oprea, Lipinski, Veber, Ghose, QED, pKd)
Generate 3D protein-ligand structures with Boltz (if enabled)

Results are saved in runs/<run_id>/ with boltz_candidates.csv containing the final filtered candidates.

Example Notebooks (v1.0 - Paper Reproduction)

Note: To reproduce results from the paper, use release v1.0

Example Jupyter notebooks demonstrating step-by-step workflows are in example/test_case/:

1. extraction.ipynb – Data extraction and retrieval
2. qna.ipynb – Domain-specific question answering
3. pooling.ipynb – Molecule pooling
4. prediction.ipynb – Molecular property prediction
5. refinement.ipynb – SMILES refinement
6. generation.ipynb – Protein-ligand 3D structure generation

License

This project is licensed under the MIT License.

Notes

Make sure to set up your API keys before running any LLM agent notebooks.
For any additional dependencies (e.g., REINVENT4), follow the instructions above.
If you encounter missing package errors, check that all dependencies in requirements.txt are installed.

Citation

If you use agentD in your research or project, please cite:

(soon to be updated)

@misc{ock2025agentD,
      title={Large Language Model Agent for Modular Task Execution in Drug Discovery}, 
      author={Janghoon Ock and Radheesh Sharma Meda and Srivathsan Badrinarayanan and Neha S. Aluru and Achuth Chandrasekhar and Amir Barati Farimani},
      year={2025},
      eprint={2507.02925},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2507.02925}, 
}

Contact

For questions, suggestions, or support, please contact:
Email: jock@andrew.cmu.edu

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.github/workflows		.github/workflows
agentD		agentD
configs		configs
docs		docs
example/test_case		example/test_case
examples/BCL-2		examples/BCL-2
tests		tests
.gitignore		.gitignore
License		License
README.md		README.md
mcp_agent.py		mcp_agent.py
pipeline_config.yaml		pipeline_config.yaml
requirements.txt		requirements.txt
run_agentd.py		run_agentd.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agentD

Installation

Configuration

Quick Start: MCP Server

Running the Pipeline

Pipeline Configuration

Example Notebooks (v1.0 - Paper Reproduction)

License

Notes

Citation

Contact

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

hoon-ock/AgentD

Folders and files

Latest commit

History

Repository files navigation

agentD

Installation

Configuration

Quick Start: MCP Server

Running the Pipeline

Pipeline Configuration

Example Notebooks (v1.0 - Paper Reproduction)

License

Notes

Citation

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages