Skip to content

ZhaohanM/SSE-Bio

Repository files navigation

SSE-Bio

🧠 Structured self-evolving biomedical multi-hop reasoning with adaptive retrieval

SSE-Bio is an agent framework for biomedical multi-hop question answering. It is designed for settings where a model must resolve intermediate entities, retrieve supporting evidence only when needed, and refine its reasoning process without drifting into unconstrained prompt rewriting.

✨ Highlights

  • Structured self-evolution rather than free-form workflow rewriting
  • Adaptive retrieval over knowledge triplets and prior templates
  • Proxy-only training with SFT -> GRPO
  • Biomedical multi-hop QA support for BioHopR, MedHop, and HLE: Biomedicine

🧩 Method at a Glance

SSE-Bio is built around four components:

Component Responsibility
Manager Maintains the structured state summary as short-term memory, reuses template memory as long-term memory, and converts the current state into a query-specific plan
Proxy Explicitly controls retrieval by deciding whether knowledge triplets and/or prior templates should be retrieved at the current step
Execution (Dev) Executes the current plan with the retrieved evidence, and produces the current reasoning trajectory and answer candidate
Critic Assesses whether the trajectory and answer are coherent and sufficiently supported, and returns structured feedback for refinement

Core Commitments

  1. Structural constraints prevent self-evolution from drifting into unconstrained prompt mutation.
  2. Knowledge triplets ground each step in biomedical evidence.
  3. Prior templates store reusable reasoning guidance rather than factual shortcuts.

🔄 Inference Flow

SSE-Bio framework

The key idea is local repair. Instead of rewriting the whole reasoning scaffold after a failure, SSE-Bio revises only the current state, the routing decision, or a template-level constraint.

🧪 Training

Only the Proxy is trained. The Manager, Execution, Critic, retrievers, and reasoning environment remain fixed.

Stage 1: Supervised Fine-Tuning

The proxy is initialized with retrieval decision pseudo-labels. For a given structured state, the system compares alternative retrieval branches and uses the action with the highest downstream composite reward as the supervision target.

Stage 2: GRPO

The proxy is then refined with Group Relative Policy Optimization over decision-contrastive trajectory groups. Alternative retrieval actions are expanded from the same structured state, partially pruned by intermediate answer-grounded reward, and then optimized comparatively.

Reward Signal

Training combines:

  • final answer correctness
  • evidence-supported reasoning behavior

This encourages retrieval decisions that are both effective and grounded.

📚 Benchmarks

SSE-Bio includes evaluation entrypoints for:

  • BioHopR
  • MedHop
  • Humanity's Last Exam: Biomedicine

🚀 Quick Start

Installation

uv sync
source .venv/bin/activate

Configuration

Two configs are included:

  • config.toml.example — default full configuration
  • config.opensource.toml — open-source runnable configuration

Run One Example

python run_sse_bio.py run \
  "Name all diseases related to a phenotype associated with a given drug." \
  --triplets-path path/to/biomedical_triplets.jsonl \
  --config config.opensource.toml

Evaluate on BioHopR

python run_biohopr_eval.py evaluate data/biohopr_bundle \
  --triplets-path path/to/biomedical_triplets.jsonl \
  --config config.opensource.toml \
  --output-path outputs/biohopr_eval.jsonl

Train the Proxy

Build SFT data:

python run_proxy_sft.py build-data data/biohopr_bundle \
  --split train \
  --output-path data/proxy_train.jsonl

Train SFT:

python run_proxy_sft.py train data/proxy_train.jsonl \
  --model Qwen/Qwen2.5-72B-Instruct \
  --output-dir outputs/proxy_sft

Build GRPO data:

python run_proxy_grpo.py build-data data/biohopr_bundle \
  --split train \
  --output-path data/proxy_grpo.jsonl

Train GRPO:

python run_proxy_grpo.py train data/proxy_grpo.jsonl \
  --model outputs/proxy_sft \
  --output-dir outputs/proxy_grpo

🗂 Repository Layout

Path Purpose
sse_bio/ Core package
sse_bio/system.py End-to-end inference loop
sse_bio/agents.py Manager, proxy, execution, and critic wrappers
sse_bio/structure.py Structured controller and local update operators
sse_bio/experience_manager.py Prior template retrieval and persistence
sse_bio/triplet_store.py Biomedical triplet ingestion and retrieval
sse_bio/training/ Proxy SFT, GRPO, rewards, and training-data export
sse_bio/eval/ Benchmark runners and metrics
scripts/data/ Dataset download helpers
scripts/hpc/ Generic cluster launch scripts for proxy training

📎 Citation

If you use SSE-Bio in academic work, please cite the corresponding paper.

About

SSE-Bio: A Structured Self-Evolving Agent with Agentic Retrieval Policy for Multi-Hop Biomedical Reasoning.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors