Skip to content
@NEU-Solution

NEU Solution

LLMOps Pipeline: NEU_SOLUTION

About The Project

This project implements an end-to-end LLMOps pipeline to manage the lifecycle of a large language model (LLM) deployment. It encompasses data collection, data curation, synthetic data generation, model training, evaluation, and production serving using multiple tools and services. The pipeline is designed to automate data processing, model evaluation, and deployment, ensuring a scalable and maintainable infrastructure for large language model applications.

ARCHITECTURE

Architecture Diagram

The pipeline consists of the following components:

Pipeline

  1. Data Collection

Web Scraping: Web data is collected using BeautifulSoup to crawl news and other relevant data.

Synthetics Generation: Gemini is employed to generate synthetic data for conversational use cases.

See crawl

  1. Data Curation

Data collected from scraping and synthetic generation is aggregated and stored in Google BigQuery for further processing and analysis.

  1. Monitoring and Orchestration

Apache Airflow orchestrates the entire data collection, processing, and training workflow.

Prometheus and Grafana are used for monitoring metrics and visualizing data.

  1. Model Training

LLaMA-Factory is used to fine-tune the LLM using curated data and synthetic conversations. Training data is sourced from BigQuery and fed into the training pipeline. MLflow manages model tracking, logging, and storing training weights and metrics. See training_cluster

  1. Model Evaluation

An evaluation service assesses the model's performance based on specific metrics defined in MLflow.

The results of the evaluation determine whether the new model will be deployed.

See evaluating_cluster

  1. Production Serving

The trained model is deployed using vLLM, a serving framework optimized for LLM inference.

The API endpoints are exposed for interaction through a Chat SDK implemented using Vercel.

See production_cluster

  1. User Interaction

End users interact with the deployed model through the Chat SDK, receiving responses generated by the LLM.

Technology

Docker mlflow Apache Airflow Google Cloud Vercel Next JS FastAPI Grafana Prometheus Cloudflare Postgres MongoDB Amazon S3 GitHub Actions

Popular repositories Loading

  1. training_cluster training_cluster Public

    Training service of LLMOps

    Python 2

  2. evaluating_cluster evaluating_cluster Public

    Evaluation service of LLMOps

    Python 1

  3. production_cluster production_cluster Public

    Python 1

  4. chat-ui chat-ui Public

    TypeScript

  5. .github .github Public

  6. monitoring_cluster monitoring_cluster Public

    Python

Repositories

Showing 6 of 6 repositories

Top languages

Loading…

Most used topics

Loading…