This directory contains practical examples demonstrating VectorSmuggle's capabilities for educational and research purposes.
The quickstart_demo.py script provides a comprehensive demonstration of the complete VectorSmuggle workflow.
-
Python Environment:
python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt -
API Configuration:
cp .env.example .env # Edit .env with your OpenAI API key export OPENAI_API_KEY="your-api-key-here"
-
Optional - Ollama Fallback:
# Install Ollama for local embedding fallback curl -fsSL https://ollama.ai/install.sh | sh ollama pull nomic-embed-text:latest ollama serve
Run the complete demonstration:
cd examples
python quickstart_demo.py# Run with deterministic seed for reproducible results
python quickstart_demo.py --seed 42
# Test specific steganographic techniques
python quickstart_demo.py --techniques noise rotation fragmentation
# Run without steganography (baseline comparison)
python quickstart_demo.py --disable-steganography
# Save detailed results to file
python quickstart_demo.py --output results.json- Configuration validation with cross-dependency checks
- API connectivity testing
- Evasion component initialization
- Deterministic seeding for reproducible results
- Processes all sample documents from
sample_docs/ - Supports 15+ document formats (PDF, Office, CSV, JSON, etc.)
- Content chunking and preprocessing
- Format distribution analysis
- Noise Injection: Adds statistical noise to embeddings
- Rotation: Applies geometric transformations
- Scaling: Modifies embedding magnitudes
- Fragmentation: Distributes data across multiple models
- Detection Avoidance: Content transformation and obfuscation
- FAISS index creation with steganographic embeddings
- Metadata preservation for reconstruction
- Temporary storage and cleanup
- Semantic similarity search testing
- Multi-query validation across document types
- Fragment reconstruction verification
- Data integrity validation
- Step completion tracking
- Performance metrics collection
- Integrity verification
- Error analysis and reporting
Starting VectorSmuggle Quickstart Demo
==================================================
=== Step 1: Environment Setup ===
✓ Configuration validation passed
✓ Embedding API connectivity verified
✓ Behavioral camouflage initialized
✓ Detection avoidance initialized
=== Step 2: Document Loading ===
Found 6 supported documents
✓ Loaded 6 document objects
✓ Created 45 text chunks
Document format distribution: {'csv': 1, 'json': 1, 'yaml': 1, 'md': 1, 'eml': 1, 'html': 1}
=== Step 3: Steganographic Processing ===
Applying techniques: ['noise', 'rotation', 'scaling']
Processing chunk 1/45
Processing chunk 11/45
...
✓ Generated 45 embeddings
✓ Applied obfuscation techniques: ['noise', 'rotation', 'scaling']
=== Step 4: Vector Store Creation ===
✓ Created FAISS vector store with steganographic embeddings
✓ Saved vector store to: temp_quickstart_index
✓ Saved steganography metadata
=== Step 5: Query Testing and Reconstruction ===
✓ Query 'financial data': 3 results
✓ Query 'employee information': 3 results
✓ Query 'API documentation': 3 results
✓ Query 'database schema': 3 results
✓ Query 'budget analysis': 3 results
✓ Semantic search returned 5 results
=== Step 6: Integrity Verification ===
✓ VectorSmuggle quickstart demo completed successfully!
✓ Success rate: 100.0%
✓ Total duration: 12.34 seconds
✓ Cleaned up temporary files
==================================================
QUICKSTART DEMO RESULTS
==================================================
🎉 Demo completed successfully!
Steps completed: 6/6
Success rate: 100.0%
Duration: 12.34 seconds
Key Metrics:
Documents loaded: 6
Text chunks: 45
Embeddings processed: 45
Vector store size: 45
-
API Key Issues:
Error: OPENAI_API_KEY is requiredSolution: Set your OpenAI API key in
.envfile or environment variable. -
Missing Dependencies:
ImportError: No module named 'langchain'Solution: Install requirements:
pip install -r requirements.txt -
Sample Documents Not Found:
FileNotFoundError: Sample docs directory not foundSolution: Run from project root or ensure
sample_docs/directory exists. -
Fragmentation Requires Multiple Models:
ValueError: Fragmentation technique requires at least 2 embedding modelsSolution: Configure multiple models in
OPENAI_FALLBACK_MODELSor disable fragmentation. -
Ollama Connection Issues:
Failed to initialize Ollama embeddings: Connection refusedSolution: Start Ollama service:
ollama serve
For detailed debugging, set environment variables:
export LOG_LEVEL=DEBUG
export OPENAI_FALLBACK_ENABLED=true
python quickstart_demo.pyIf the demo fails, check these components:
- Configuration: Ensure all required environment variables are set
- API Access: Test OpenAI API connectivity manually
- File Permissions: Verify read access to
sample_docs/directory - Dependencies: Confirm all Python packages are installed
- System Resources: Ensure sufficient memory for embedding operations
- Success Rate: Percentage of steps completed successfully
- Documents Loaded: Number of sample documents processed
- Text Chunks: Number of text segments created for embedding
- Embeddings Processed: Number of vector embeddings generated
- Vector Store Size: Number of documents stored in the vector database
The demo validates that:
- Embeddings can be successfully obfuscated using multiple techniques
- Data can be fragmented across different embedding models
- Original information remains retrievable through semantic search
- Integrity checks pass for reconstructed data
- Duration: Total execution time (typically 10-30 seconds)
- Query Success: All test queries return relevant results
- Reconstruction: Fragment reconstruction maintains data integrity
- No Critical Errors: All major operations complete successfully
After running the quickstart demo:
- Explore Advanced Features: Try the full embedding and query scripts in
scripts/ - Custom Documents: Test with your own document sets
- Production Deployment: Use Docker and Kubernetes configurations
- Security Analysis: Run the risk assessment and forensic tools
- Research Applications: Adapt techniques for your specific use case
This demo illustrates:
- Attack Vectors: How sensitive data can be exfiltrated through embeddings
- Detection Challenges: Why traditional DLP tools miss semantic data leaks
- Steganographic Techniques: Methods for hiding data in vector spaces
- Defensive Strategies: What security teams should monitor and detect
Remember: This tool is for educational and authorized security testing only. Always obtain proper authorization before testing on any systems you don't own.