This project implements an end-to-end LLMOps pipeline to manage the lifecycle of a large language model (LLM) deployment. It encompasses data collection, data curation, synthetic data generation, model training, evaluation, and production serving using multiple tools and services. The pipeline is designed to automate data processing, model evaluation, and deployment, ensuring a scalable and maintainable infrastructure for large language model applications.
The pipeline consists of the following components:
- Data Collection
Web Scraping: Web data is collected using BeautifulSoup to crawl news and other relevant data.
Synthetics Generation: Gemini is employed to generate synthetic data for conversational use cases.
See crawl
- Data Curation
Data collected from scraping and synthetic generation is aggregated and stored in Google BigQuery for further processing and analysis.
- Monitoring and Orchestration
Apache Airflow orchestrates the entire data collection, processing, and training workflow.
Prometheus and Grafana are used for monitoring metrics and visualizing data.
- Model Training
LLaMA-Factory is used to fine-tune the LLM using curated data and synthetic conversations. Training data is sourced from BigQuery and fed into the training pipeline. MLflow manages model tracking, logging, and storing training weights and metrics. See training_cluster
- Model Evaluation
An evaluation service assesses the model's performance based on specific metrics defined in MLflow.
The results of the evaluation determine whether the new model will be deployed.
See evaluating_cluster
- Production Serving
The trained model is deployed using vLLM, a serving framework optimized for LLM inference.
The API endpoints are exposed for interaction through a Chat SDK implemented using Vercel.
See production_cluster
- User Interaction
End users interact with the deployed model through the Chat SDK, receiving responses generated by the LLM.

