ResearchClawBench: Evaluating AI Agents for Automated Research from Re-Discovery to New-Discovery
-
Updated
Mar 31, 2026 - Jupyter Notebook
ResearchClawBench: Evaluating AI Agents for Automated Research from Re-Discovery to New-Discovery
Autoresearch with PhD-level workflows and modular agent skills. Built for the autonomous AI Scientist.
AutoR takes a research goal, runs a fixed 8-stage pipeline with Claude Code, and requires explicit human approval after every stage before the workflow can continue.
karpathy auto research ported to mlx so you can use your Mac
Lite research agents on proposal, experiment and review
Enhance academic workflows by auditing papers, verifying citations, and analyzing experiments with a research integrity plugin for Claude Code.
LISFLOOD-FP implementation of model-agnostic self-calibrating loop. Inspired by AutoResearch
General-purpose autonomous research framework for AI agents. Inspired by Andrej Karpathy's autoresearch.
Run your own research lab that never sleeps
Optimize AI agents autonomously by iterating code changes and evals to improve performance using LangSmith observability and automated experiments.
Measure AI agents’ performance with standardized tests across 314 tasks, 33 domains, and 4 difficulty levels for clear, reproducible comparison.
Add a description, image, and links to the auto-research topic page so that developers can more easily learn about it.
To associate your repository with the auto-research topic, visit your repo's landing page and select "manage topics."