Name	Name	Last commit message	Last commit date
parent directory ..
batched	batched
cache-directory-demo	cache-directory-demo
diffusion	diffusion
embedding	embedding
eval-callback	eval-callback
gen-docs	gen-docs
ggml-info	ggml-info
gritlm	gritlm
parallel-download-demo	parallel-download-demo
retrieval	retrieval
simple-chat-with-loader	simple-chat-with-loader
simple-chat	simple-chat
speculative	speculative
README.md	README.md
go.mod	go.mod
go.sum	go.sum

Gollama.cpp Examples

This directory contains various examples demonstrating how to use gollama.cpp for different use cases.

Available Examples

1. Simple Chat (`simple-chat/`)

A comprehensive example showing how to generate text using a GGUF model.

Features:

Text generation and completion with configurable parameters
Real-time token generation with streaming output
System information display (GPU support, memory mapping, etc.)
Detailed progress logging and error handling
Support for various text types: creative writing, technical explanations, conversations
Performance optimization with configurable threading

Usage:

cd simple-chat
go run main.go -prompt "The future of AI is"

# Creative writing
go run main.go -prompt "Once upon a time" -n-predict 150

# Technical explanation
go run main.go -prompt "How does machine learning work?" -n-predict 100

# Run the interactive demo
./demo.sh

# Use Makefile shortcuts
make creative    # Creative writing demo
make technical   # Technical explanation demo
make conversation # Conversation starter demo

2. Simple Chat with Library Loader (`simple-chat-with-loader/`)

A comprehensive example that combines dynamic library loading with text generation capabilities.

Features:

Dynamic Library Loading: Automatically extracts and loads embedded llama.cpp libraries
Library Lifecycle Management: Proper loading and unloading of shared libraries with cleanup
Cross-platform Support: Works on macOS, Linux, and Windows with platform detection
Embedded Library Extraction: Pre-compiled libraries embedded in the binary for portability
Combined Functionality: Full integration of library management and text generation
Resource Management: Automatic cleanup of temporary files and library handles
Error Handling: Comprehensive error handling for both library loading and model operations

Usage:

cd simple-chat-with-loader
go run main.go -prompt "The future of AI is"

# Test library loading specifically
go run main.go -prompt "Testing library loader" -n-predict 20

# Creative writing with loader
go run main.go -prompt "Once upon a time" -n-predict 150

# Run the comprehensive demo
./demo.sh

# Use Makefile shortcuts
make test-loader  # Test library loading functionality
make creative     # Creative writing demo with loader
make technical    # Technical explanation demo with loader

3. Embedding Generation (`embedding/`)

Demonstrates how to generate high-dimensional embedding vectors from text.

Features:

Generate embeddings for single or multiple texts
Support for different output formats (default, JSON, array)
Automatic embedding normalization using L2 norm
Cosine similarity matrix computation for multiple texts
Configurable context size and thread count

Usage:

cd embedding
go run main.go -prompt "Hello World!"

# Multiple texts with similarity matrix
go run main.go -prompt "dog|cat|animal|car|vehicle"

# JSON output format
go run main.go -prompt "Hello World!" -output-format json

# Run the interactive demo
./demo.sh

4. Speculative Decoding (`speculative/`)

Advanced example demonstrating speculative decoding for accelerated text generation.

Features:

Dual-model speculative decoding with separate target and draft models
Same-model demonstration mode for understanding the algorithm
Configurable draft length for performance tuning
Temperature sampling support with detailed statistics
Verbose mode for observing the draft/verify process
Performance analysis showing acceptance rates and speedup

Usage:

cd speculative
go run main.go -prompt "The future of AI is"

# With different models for real speedup
go run main.go -model large.gguf -draft-model small.gguf -prompt "Your prompt"

# Demonstration with verbose output
go run main.go -prompt "Machine learning" -n-draft 8 -verbose

# Run the interactive demo
./demo.sh

# Use Makefile shortcuts
make demo              # Full demonstration
make draft-comparison  # Compare different draft lengths
make temperature-demo  # Temperature sampling demo

5. Document Retrieval (`retrieval/`)

Comprehensive document retrieval system using embedding-based semantic search.

Features:

Multi-document semantic search using embeddings
Configurable text chunking with custom separators and sizes
Cosine similarity ranking for relevance scoring
Interactive query mode for exploratory research
Cross-domain search across multiple document types
Verbose mode showing internal processing steps
Support for various document formats and chunking strategies

Usage:

cd retrieval
go run main.go -context-files "document.txt" -query "search terms"

# Multiple files with interactive mode
go run main.go -context-files "file1.txt,file2.txt"

# Custom chunking and ranking
go run main.go -context-files "doc.txt" -chunk-size 150 -top-k 5

# Run comprehensive demo
./demo.sh

# Use Makefile shortcuts
make ai-demo          # AI concepts retrieval
make programming-demo # Programming languages retrieval
make combined-demo    # Cross-domain search

6. Batched Generation (`batched/`)

Simplified demonstration of batched text generation concepts for multiple sequence generation.

Features:

Conceptual demonstration of batched generation principles
Multiple sequence generation from the same prompt
Configurable parallel sequence count and generation parameters
Performance statistics and timing analysis
Educational example showing what full batched processing would entail
Clear documentation of implementation limitations and future improvements

Usage:

cd batched
go run main.go -prompt "The future of technology" -n-parallel 4

# Creative writing with higher temperature
go run main.go -prompt "Once upon a time" -n-parallel 3 -temperature 1.0

# Quick batch demo
go run main.go -prompt "Hello world" -n-parallel 2 -n-predict 20

# Run the interactive demo
./demo.sh

# Use Makefile shortcuts
make demo         # Comprehensive demo
make creative     # Creative writing demo
make tech-demo    # Technology concepts demo

Note: This is a simplified demonstration that generates sequences sequentially to illustrate batched generation concepts. A full implementation would require true parallel batch processing with advanced batch management.

7. Diffusion Generation (`diffusion/`)

Conceptual demonstration of diffusion-based text generation principles using iterative token refinement.

Features:

Iterative token refinement through multiple diffusion steps
Multiple confidence algorithms (confidence-based, entropy-based, margin-based, random)
Visual mode showing real-time generation progress
Configurable transfer scheduling and step parameters
Educational demonstration of diffusion model concepts
Performance analysis and step-by-step verbose output

Usage:

cd diffusion
go run main.go -prompt "The future of AI" -diffusion-steps 10

# Visual mode with real-time progress
go run main.go -prompt "Machine learning" -diffusion-visual -diffusion-steps 8

# Compare different algorithms
go run main.go -prompt "Technology advances" -diffusion-algorithm 1 -verbose

# Run the interactive demo
./demo.sh

# Use Makefile shortcuts
make demo              # Comprehensive demo
make visual-demo       # Interactive visual generation
make confidence-demo   # Confidence-based algorithm
make entropy-demo      # Entropy-based algorithm
make interactive       # Interactive prompt input

Note: This is a conceptual demonstration of diffusion principles. A full implementation would require specialized diffusion model architectures and non-causal attention mechanisms not available in standard chat models.

8. Cache Directory Configuration (`cache-directory-demo/`)

Demonstrates how to configure and manage the cache directory used for downloading and storing llama.cpp library binaries.

Features:

Get the default cache directory location
Configure cache directory via environment variable
Configure cache directory via Config object
Configure cache directory via JSON configuration file
Clean cache to force re-download
Platform-specific cache location defaults
Security validation against path traversal attacks

Usage:

cd cache-directory-demo
go run main.go

# With custom environment variable
GOLLAMA_CACHE_DIR=/tmp/my_cache go run main.go

Configuration Methods:

Environment Variable:

export GOLLAMA_CACHE_DIR=/path/to/cache

Config Object:

config := gollama.DefaultConfig()
config.CacheDir = "/path/to/cache"
gollama.SetGlobalConfig(config)

Configuration File (config.json):

{
  "cache_dir": "/path/to/cache",
  "enable_logging": true,
  "num_threads": 8
}

Default Cache Locations:

Linux: ~/.cache/gollama/libs/
macOS: ~/Library/Caches/gollama/libs/
Windows: %LOCALAPPDATA%\gollama\libs\

Getting Started

Prerequisites

Go 1.21 or later
A GGUF model file (included: tinyllama-1.1b-chat-v1.0.Q2_K.gguf)

Library Dependencies

Most examples require the llama.cpp library binaries to be available. There are two approaches:

Automatic Download (Recommended): Examples will automatically download the required llama.cpp binaries using the gollama-download tool when libraries are not found
Embedded Loader: The simple-chat-with-loader example includes embedded libraries and doesn't require external dependencies

Building and Running Examples

Each example can be built and run independently:

# Navigate to any example directory
cd simple-chat  # or embedding

# Build the example
go build

# Run with default parameters
./simple-chat   # or ./embedding

# Or run directly with go
go run main.go [options]

Common Options

Most examples support these common command-line options:

-model string: Path to the GGUF model file
-prompt string: Input text or prompt
-threads int: Number of threads to use (default: 4)
-ctx int: Context size (default: 2048)
-verbose: Enable verbose output

Model Requirements

Text Generation Examples (simple-chat, speculative)

Any GGUF model that supports text generation
Models like LLaMA, Mistral, CodeLlama, etc.

Embedding Examples (embedding, retrieval)

GGUF models that support embedding generation
Some models are text-generation only and don't provide embeddings
Verify your model supports embeddings before using these examples
Popular embedding-capable models: all-MiniLM, sentence-transformers models
Note: Even some chat models like TinyLlama support embeddings

Troubleshooting

"Failed to load model"

Check that the model path is correct
Ensure the model file is a valid GGUF file
Make sure you have enough RAM to load the model

"Failed to initialize backend" or library loading errors

The example will automatically attempt to download required llama.cpp libraries
If download fails, check your internet connection
Manual download: go run ../../cmd/gollama-download/main.go -download
Clean cache if needed: go run ../../cmd/gollama-download/main.go -clean-cache

"Permission denied" when running examples

Make sure the example binary is executable: chmod +x example-name
Or use go run main.go instead

Out of memory errors

Reduce context size with -ctx 1024 or smaller
Use a smaller model
Close other applications to free RAM

Contributing

When adding new examples:

Create a new directory under examples/
Include a comprehensive README.md explaining the example
Add a Makefile with common targets (build, run, clean)
Provide example usage commands
Update this main examples README

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README.md

Gollama.cpp Examples

Available Examples

1. Simple Chat (`simple-chat/`)

2. Simple Chat with Library Loader (`simple-chat-with-loader/`)

3. Embedding Generation (`embedding/`)

4. Speculative Decoding (`speculative/`)

5. Document Retrieval (`retrieval/`)

6. Batched Generation (`batched/`)

7. Diffusion Generation (`diffusion/`)

8. Cache Directory Configuration (`cache-directory-demo/`)

Getting Started

Prerequisites

Library Dependencies

Building and Running Examples

Common Options

Model Requirements

Text Generation Examples (simple-chat, speculative)

Embedding Examples (embedding, retrieval)

Troubleshooting

"Failed to load model"

"Failed to initialize backend" or library loading errors

"Permission denied" when running examples

Out of memory errors

Contributing

Related Documentation

Uh oh!

FilesExpand file tree

examples

Directory actions

More options

Directory actions

More options

Latest commit

History

examples

Folders and files

parent directory

README.md

Gollama.cpp Examples

Available Examples

1. Simple Chat (simple-chat/)

2. Simple Chat with Library Loader (simple-chat-with-loader/)

3. Embedding Generation (embedding/)

4. Speculative Decoding (speculative/)

5. Document Retrieval (retrieval/)

6. Batched Generation (batched/)

7. Diffusion Generation (diffusion/)

8. Cache Directory Configuration (cache-directory-demo/)

Getting Started

Prerequisites

Library Dependencies

Building and Running Examples

Common Options

Model Requirements

Text Generation Examples (simple-chat, speculative)

Embedding Examples (embedding, retrieval)

Troubleshooting

"Failed to load model"

"Failed to initialize backend" or library loading errors

"Permission denied" when running examples

Out of memory errors

Contributing

Related Documentation

1. Simple Chat (`simple-chat/`)

2. Simple Chat with Library Loader (`simple-chat-with-loader/`)

3. Embedding Generation (`embedding/`)

4. Speculative Decoding (`speculative/`)

5. Document Retrieval (`retrieval/`)

6. Batched Generation (`batched/`)

7. Diffusion Generation (`diffusion/`)

8. Cache Directory Configuration (`cache-directory-demo/`)