The Real-Time Dataset Generator is an advanced agent designed to automate the creation of datasets for evaluating web-augmented AI agents. By generating domain-specific queries, collecting real-time web data, and filtering results, it streamlines the evaluation process for LLM-based agents.
This node dynamically generates targeted web search queries for each subject in the provided list to ensure that subsequent retrieval processes focus on the most relevant information.
Leverages the Tavily Search API, together with optional additional search providers, to retrieve web pages and documents for each subject. Using multiple providers helps reduce bias and improves coverage. This node serves as the foundation of the retrieval process, delivering the context needed to generate high-quality question-answer pairs.
Processes each retrieved web page to generate question-answer pairs. Using a map-reduce paradigm, it extracts key insights from the content and synthesizes comprehensive QA items for each document.
Ensures that the generated question-answer pairs are saved in langsmith or locally. (Based on user input)
num_qa: Specifies the number of question-answer items to generate. For example, setting this to100will produce 100 QA items.qa_subjects: A list of subjects that the dataset should focus on (for example: ["Sports", "Stocks", "News"]). The pipeline will generate search queries and QA items for each subject in this list.save_to_langsmith: Bool Parameter to indicate where to save the dataset
- Tavily API Key: Sign Up for an API Key
- OpenAI API Key: Sign Up for an API Key
- Langsmith API Key: Sign Up for an API Key
git clone https://github.com/Eyalbenba/tavily-web-eval-generator.git
cd tavily-web-eval-generatorTo avoid dependency conflicts, create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # macOS/Linux
venv\Scripts\activate # Windows Configure your Tavily, OpenAI, and Langsmith API keys by exporting them as environment variables or placing them in a .env file:
export TAVILY_API_KEY={Your Tavily API Key here}
export OPENAI_API_KEY={Your OpenAI API Key here}
export LANGSMITH_API_KEY={Your Langsmith API Key here} Install the required dependencies for the project:
pip install -e .import dotenv
dotenv.load_dotenv()
import asyncio
from web_eval_generator.graph import graph
from web_eval_generator.state import GeneratorState
async def main():
# Initialize ResearchState with user inputs
state = GeneratorState(num_qa=100,qa_subjects=["NBA Basketball"])
# Run the graph workflow
print("Starting the QA Generation workflow...")
try:
result = await graph.ainvoke(state) # Use `ainvoke` for async execution
print("\nWorkflow completed successfully.")
print("Final state:", result)
except Exception as e:
print(f"An error occurred during the workflow execution: {e}")
# Run the async main function
if __name__ == "__main__":
asyncio.run(main())Learn more about the Real-Time Dataset Generator in our detailed blog: Effortless Web-Based RAG Evaluation Using Tavily and LangGraph.

