This directory contains various examples demonstrating how to use gollama.cpp for different use cases.
A comprehensive example showing how to generate text using a GGUF model.
Features:
- Text generation and completion with configurable parameters
- Real-time token generation with streaming output
- System information display (GPU support, memory mapping, etc.)
- Detailed progress logging and error handling
- Support for various text types: creative writing, technical explanations, conversations
- Performance optimization with configurable threading
Usage:
cd simple-chat
go run main.go -prompt "The future of AI is"
# Creative writing
go run main.go -prompt "Once upon a time" -n-predict 150
# Technical explanation
go run main.go -prompt "How does machine learning work?" -n-predict 100
# Run the interactive demo
./demo.sh
# Use Makefile shortcuts
make creative # Creative writing demo
make technical # Technical explanation demo
make conversation # Conversation starter demoA comprehensive example that combines dynamic library loading with text generation capabilities.
Features:
- Dynamic Library Loading: Automatically extracts and loads embedded llama.cpp libraries
- Library Lifecycle Management: Proper loading and unloading of shared libraries with cleanup
- Cross-platform Support: Works on macOS, Linux, and Windows with platform detection
- Embedded Library Extraction: Pre-compiled libraries embedded in the binary for portability
- Combined Functionality: Full integration of library management and text generation
- Resource Management: Automatic cleanup of temporary files and library handles
- Error Handling: Comprehensive error handling for both library loading and model operations
Usage:
cd simple-chat-with-loader
go run main.go -prompt "The future of AI is"
# Test library loading specifically
go run main.go -prompt "Testing library loader" -n-predict 20
# Creative writing with loader
go run main.go -prompt "Once upon a time" -n-predict 150
# Run the comprehensive demo
./demo.sh
# Use Makefile shortcuts
make test-loader # Test library loading functionality
make creative # Creative writing demo with loader
make technical # Technical explanation demo with loaderDemonstrates how to generate high-dimensional embedding vectors from text.
Features:
- Generate embeddings for single or multiple texts
- Support for different output formats (default, JSON, array)
- Automatic embedding normalization using L2 norm
- Cosine similarity matrix computation for multiple texts
- Configurable context size and thread count
Usage:
cd embedding
go run main.go -prompt "Hello World!"
# Multiple texts with similarity matrix
go run main.go -prompt "dog|cat|animal|car|vehicle"
# JSON output format
go run main.go -prompt "Hello World!" -output-format json
# Run the interactive demo
./demo.shAdvanced example demonstrating speculative decoding for accelerated text generation.
Features:
- Dual-model speculative decoding with separate target and draft models
- Same-model demonstration mode for understanding the algorithm
- Configurable draft length for performance tuning
- Temperature sampling support with detailed statistics
- Verbose mode for observing the draft/verify process
- Performance analysis showing acceptance rates and speedup
Usage:
cd speculative
go run main.go -prompt "The future of AI is"
# With different models for real speedup
go run main.go -model large.gguf -draft-model small.gguf -prompt "Your prompt"
# Demonstration with verbose output
go run main.go -prompt "Machine learning" -n-draft 8 -verbose
# Run the interactive demo
./demo.sh
# Use Makefile shortcuts
make demo # Full demonstration
make draft-comparison # Compare different draft lengths
make temperature-demo # Temperature sampling demoComprehensive document retrieval system using embedding-based semantic search.
Features:
- Multi-document semantic search using embeddings
- Configurable text chunking with custom separators and sizes
- Cosine similarity ranking for relevance scoring
- Interactive query mode for exploratory research
- Cross-domain search across multiple document types
- Verbose mode showing internal processing steps
- Support for various document formats and chunking strategies
Usage:
cd retrieval
go run main.go -context-files "document.txt" -query "search terms"
# Multiple files with interactive mode
go run main.go -context-files "file1.txt,file2.txt"
# Custom chunking and ranking
go run main.go -context-files "doc.txt" -chunk-size 150 -top-k 5
# Run comprehensive demo
./demo.sh
# Use Makefile shortcuts
make ai-demo # AI concepts retrieval
make programming-demo # Programming languages retrieval
make combined-demo # Cross-domain searchSimplified demonstration of batched text generation concepts for multiple sequence generation.
Features:
- Conceptual demonstration of batched generation principles
- Multiple sequence generation from the same prompt
- Configurable parallel sequence count and generation parameters
- Performance statistics and timing analysis
- Educational example showing what full batched processing would entail
- Clear documentation of implementation limitations and future improvements
Usage:
cd batched
go run main.go -prompt "The future of technology" -n-parallel 4
# Creative writing with higher temperature
go run main.go -prompt "Once upon a time" -n-parallel 3 -temperature 1.0
# Quick batch demo
go run main.go -prompt "Hello world" -n-parallel 2 -n-predict 20
# Run the interactive demo
./demo.sh
# Use Makefile shortcuts
make demo # Comprehensive demo
make creative # Creative writing demo
make tech-demo # Technology concepts demoNote: This is a simplified demonstration that generates sequences sequentially to illustrate batched generation concepts. A full implementation would require true parallel batch processing with advanced batch management.
Conceptual demonstration of diffusion-based text generation principles using iterative token refinement.
Features:
- Iterative token refinement through multiple diffusion steps
- Multiple confidence algorithms (confidence-based, entropy-based, margin-based, random)
- Visual mode showing real-time generation progress
- Configurable transfer scheduling and step parameters
- Educational demonstration of diffusion model concepts
- Performance analysis and step-by-step verbose output
Usage:
cd diffusion
go run main.go -prompt "The future of AI" -diffusion-steps 10
# Visual mode with real-time progress
go run main.go -prompt "Machine learning" -diffusion-visual -diffusion-steps 8
# Compare different algorithms
go run main.go -prompt "Technology advances" -diffusion-algorithm 1 -verbose
# Run the interactive demo
./demo.sh
# Use Makefile shortcuts
make demo # Comprehensive demo
make visual-demo # Interactive visual generation
make confidence-demo # Confidence-based algorithm
make entropy-demo # Entropy-based algorithm
make interactive # Interactive prompt inputNote: This is a conceptual demonstration of diffusion principles. A full implementation would require specialized diffusion model architectures and non-causal attention mechanisms not available in standard chat models.
Demonstrates how to configure and manage the cache directory used for downloading and storing llama.cpp library binaries.
Features:
- Get the default cache directory location
- Configure cache directory via environment variable
- Configure cache directory via Config object
- Configure cache directory via JSON configuration file
- Clean cache to force re-download
- Platform-specific cache location defaults
- Security validation against path traversal attacks
Usage:
cd cache-directory-demo
go run main.go
# With custom environment variable
GOLLAMA_CACHE_DIR=/tmp/my_cache go run main.goConfiguration Methods:
Environment Variable:
export GOLLAMA_CACHE_DIR=/path/to/cacheConfig Object:
config := gollama.DefaultConfig()
config.CacheDir = "/path/to/cache"
gollama.SetGlobalConfig(config)Configuration File (config.json):
{
"cache_dir": "/path/to/cache",
"enable_logging": true,
"num_threads": 8
}Default Cache Locations:
- Linux:
~/.cache/gollama/libs/ - macOS:
~/Library/Caches/gollama/libs/ - Windows:
%LOCALAPPDATA%\gollama\libs\
- Go 1.21 or later
- A GGUF model file (included:
tinyllama-1.1b-chat-v1.0.Q2_K.gguf)
Most examples require the llama.cpp library binaries to be available. There are two approaches:
- Automatic Download (Recommended): Examples will automatically download the required llama.cpp binaries using the
gollama-downloadtool when libraries are not found - Embedded Loader: The
simple-chat-with-loaderexample includes embedded libraries and doesn't require external dependencies
Each example can be built and run independently:
# Navigate to any example directory
cd simple-chat # or embedding
# Build the example
go build
# Run with default parameters
./simple-chat # or ./embedding
# Or run directly with go
go run main.go [options]Most examples support these common command-line options:
-model string: Path to the GGUF model file-prompt string: Input text or prompt-threads int: Number of threads to use (default: 4)-ctx int: Context size (default: 2048)-verbose: Enable verbose output
- Any GGUF model that supports text generation
- Models like LLaMA, Mistral, CodeLlama, etc.
- GGUF models that support embedding generation
- Some models are text-generation only and don't provide embeddings
- Verify your model supports embeddings before using these examples
- Popular embedding-capable models: all-MiniLM, sentence-transformers models
- Note: Even some chat models like TinyLlama support embeddings
- Check that the model path is correct
- Ensure the model file is a valid GGUF file
- Make sure you have enough RAM to load the model
- The example will automatically attempt to download required llama.cpp libraries
- If download fails, check your internet connection
- Manual download:
go run ../../cmd/gollama-download/main.go -download - Clean cache if needed:
go run ../../cmd/gollama-download/main.go -clean-cache
- Make sure the example binary is executable:
chmod +x example-name - Or use
go run main.goinstead
- Reduce context size with
-ctx 1024or smaller - Use a smaller model
- Close other applications to free RAM
When adding new examples:
- Create a new directory under
examples/ - Include a comprehensive
README.mdexplaining the example - Add a
Makefilewith common targets (build, run, clean) - Provide example usage commands
- Update this main examples README
- Main README - Project overview and installation
- Build Documentation - Building from source
- Contributing Guidelines - How to contribute