llm-quantization

Here are 8 public repositories matching this topic...

snu-mllab / GuidedQuant

Official PyTorch implementation of "GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance" (ICML 2025)

quantization efficient-inference large-language-models llm-inference llm-quantization

Updated Jul 6, 2025
Python

GongCheng1919 / bias-compensation

Star

[CAAI AIR'24] Minimize Quantization Output Error with Bias Compensation

post-training-quantization llm-compression output-error-optimization bias-compensation llm-quantization

Updated Mar 12, 2025
Python

nithya333 / Medi-LLM

Star

lora multi-agent-systems qlora peft-fine-tuning-llm llm-quantization

Updated Sep 7, 2025
Jupyter Notebook

t81dev / ternary

Star

Ternary Quantization for LLMs: Implement balanced ternary (T3_K) weights for 2.63-bit quantization—the first working solution for modern large language models.

balanced-ternary llama-cpp gguf llm-quantization ai-efficiency ternary-logic

Updated Nov 29, 2025
C++

MagicTeaMC / AutoGGUF

Star

Let me make GGUF files quickly

llm llamacpp llama-cpp gguf llm-quantization

Updated Jun 4, 2025
Python

nagababumo / Quantization-in-Depth

Star

pytorch quantization dequantization 2-bit hugging-face hugging-face-hub llm-quantization torch-quantization

Updated Jun 26, 2024
Jupyter Notebook

paraglondhe098 / sentiment-classification-llm

Star

Implemented and fine-tuned BERT for a custom sequence classification task, leveraging LoRA adapters for efficient parameter updates and 4-bit quantization to optimize performance and resource utilization.

nlp lora quantization data-augmentation nlp-augmentation llm qlora llm-fine-tuning peft-fine-tuning-llm llm-quantization

Updated Dec 30, 2024
Jupyter Notebook

samarthamp / advanced-nlp-course-projects

Star

Implementation of advanced Natural Language Processing architectures and optimization techniques, built from scratch. The projects focus on understanding the internal mechanics of Transformers, LLM efficiency through quantization, and scaling via Mixture-of-Experts (MoE).

load-balancing mixture-of-experts transformer-architecture positional-encoding llm-fine-tuning llm-quantization

Updated Jan 8, 2026
Python

Improve this page

Add a description, image, and links to the llm-quantization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-quantization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-quantization

Here are 8 public repositories matching this topic...

snu-mllab / GuidedQuant

GongCheng1919 / bias-compensation

nithya333 / Medi-LLM

t81dev / ternary

MagicTeaMC / AutoGGUF

nagababumo / Quantization-in-Depth

paraglondhe098 / sentiment-classification-llm

samarthamp / advanced-nlp-course-projects

Improve this page

Add this topic to your repo