llm-evaluation-metrics

Star

Here are 12 public repositories matching this topic...

confident-ai / deepeval

Star

The LLM Evaluation Framework

python evaluation-metrics evaluation-framework llm-evaluation llm-evaluation-framework llm-evaluation-metrics

Updated Dec 12, 2025
Python

locuslab / open-unlearning

Star

[NeurIPS D&B '25] The one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE, WMDP, and many unlearning methods with easy feature extensibility.

open-source benchmarks right-to-be-forgotten privacy-protection unlearning membership-inference-attacks membership-inference llms llm-privacy llm-unlearning llm-evaluation-metrics

Updated Dec 5, 2025
Python

cvs-health / langfair

Star

LangFair is a Python library for conducting use-case level LLM bias and fairness assessments

python ai artificial-intelligence bias fairness ai-safety fairness-testing bias-detection fairness-ai fairness-ml responsible-ai ethical-ai large-language-models llm llm-evaluation llm-evaluation-framework llm-evaluation-metrics

Updated Dec 3, 2025
Python

zhuohaoyu / KIEval

Star

[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

machine-learning explainable-ai llm llm-evaluation llm-evaluation-toolkit llm-evaluation-framework llm-evaluation-metrics acl2024

Updated Jul 19, 2024
Python

attogram / ollama-multirun

Sponsor

Star

Run a prompt against all, or some, of your models running on Ollama. Creates web pages with the output, performance statistics and model info. All in a single Bash shell script.

static-site-generator ai bash-script llm-eval ollama llm-evaluation ollama-interface ollama-app llm-evaluation-metrics ai-evaluation-tools attogram-project

Updated Aug 30, 2025
Shell

ronniross / llm-confidence-scorer

Sponsor

Star

Measure of estimated confidence for non-hallucinative nature of outputs generated by Large Language Models.

dataset datasets llm llms llm-training llm-evaluation llms-reasoning llm-evaluation-toolkit llms-benchmarking llm-evaluation-framework llm-evaluation-metrics llms-efficency llms-evalution

Updated Aug 6, 2025
Python

pyladiesams / eval-llm-based-apps-jan2025

Star

Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.

workshop llm llms llmops llm-eval llm-test llm-evaluation-framework llm-evaluation-metrics llm-monitoring llm-testing llm-evals

Updated May 6, 2025
Jupyter Notebook

Pavansomisetty21 / GEval-Metrics-Analyzing-the-Reliability-of-LLM-Responses

Sponsor

Star

In this we evaluate the LLM responses and find accuracy

llm-evaluation-metrics llm-evals geval

Updated Jul 8, 2025
Python

nhsengland / evalsense

Star

Tools for systematic large language model evaluations

evaluation-metrics evaluation-framework llm llms llm-evaluation llm-evaluation-toolkit llm-evaluation-framework llm-evaluation-metrics llm-benchmarking

Updated Dec 3, 2025
Python

lalitkpal / VerifyAI

Star

VerifyAI is a simple UI application to test GenAI outputs

ai-evaluation llm generative-ai genai llm-test llm-evaluation llm-evaluation-framework llm-evaluation-metrics llm-testing ai-metrics ai-evaluation-framework generative-ai-evaluation

Updated Sep 5, 2025
Python

Nandan91 / spectral-scaling-laws

Star

Offical implementation for Spectral Scaling Laws (EMNLP 2025)

spectral-analysis spectral-methods ffn feedforward-network scaling-laws llm-training llm-evaluation-metrics

Updated Oct 17, 2025

ritwickbhargav80 / quick-llm-model-evaluations

Star

This repo is for an streamlit application that provides a user-friendly interface for evaluating large language models (LLMs) using the beyondllm package.

streamlit llms retrieval-augmented-generation llm-evaluation-metrics beyondllm

Updated Aug 29, 2024
Python

Improve this page

Add a description, image, and links to the llm-evaluation-metrics topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-evaluation-metrics topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-evaluation-metrics

Here are 12 public repositories matching this topic...

confident-ai / deepeval

locuslab / open-unlearning

cvs-health / langfair

zhuohaoyu / KIEval

attogram / ollama-multirun

ronniross / llm-confidence-scorer

pyladiesams / eval-llm-based-apps-jan2025

Pavansomisetty21 / GEval-Metrics-Analyzing-the-Reliability-of-LLM-Responses

nhsengland / evalsense

lalitkpal / VerifyAI

Nandan91 / spectral-scaling-laws

ritwickbhargav80 / quick-llm-model-evaluations

Improve this page

Add this topic to your repo