AI-powered platform that creates complete, culturally relevant educational books for children across Latin America — with voice narration, AI illustrations, voice cloning, and multi-format export.
Quick Start • Features • Architecture • Demo • Why This Matters • Copilot Story • 📺 YouTube Demo • Full Docs
Millions of children in Latin America deserve educational content that reflects their culture, language, and reality. Yet most AI-generated educational material is US/Euro-centric and English-only.
LATAM Book Generator solves this by orchestrating 16 specialized AI agents into a production pipeline that produces complete, publication-ready educational books — tailored to specific countries, age groups, and pedagogical methods — in Spanish, Portuguese, or English — with AI illustrations, audiobook narration, and even voice cloning so a parent or teacher can narrate in their own voice.
| Dimension | What We Built |
|---|---|
| Multi-Agent Orchestration | 16 specialized agents collaborating sequentially — curriculum design, chapter writing, image generation, TTS, fact-checking — each with a focused role |
| Cultural Intelligence | Country-specific content for Mexico, Colombia, Argentina, Chile, Peru, and Brazil with local references and i18n support for 11 languages |
| Multimodal Output | Text → HTML → PDF → Markdown → Audiobook → Images → QR-embedded videos — all from a single natural-language prompt |
| Voice Cloning | Record your voice, clone it, and generate full audiobooks narrated in your own voice using Qwen3 TTS-VC |
| Conversational UX | Chat naturally with the AI assistant to design your book — no forms required |
| 4 Model Providers | GitHub Models, Qwen/DashScope (3 regions), Anthropic Claude, and Azure OpenAI — switchable at runtime |
| Production-Ready | HTTP server mode, AI Toolkit tracing, retry logic with exponential backoff, typed Pydantic schemas, and session persistence |
From a single topic description, the system orchestrates multiple AI agents to produce:
- Structured Curriculum — Age-appropriate chapter outlines following Scandinavian, Montessori, or Project-Based pedagogy
- Rich Chapter Content — Educational text with activities, reflection questions, experiments, and cultural references
- AI Illustrations — Generated via Qwen-Image-Plus/Max with intelligent prompt engineering (auto-selects art style, camera angle, lighting, and resolution)
- Web Image Search — DuckDuckGo SafeSearch fallback for royalty-free educational images
- YouTube Video Embeds — Relevant educational videos with auto-generated QR codes for print books
- LaTeX Math Support — Rich mathematical notation rendered via KaTeX for STEM content
- Fact-Checking — Web search-powered verification of educational claims with confidence scoring
| Capability | Detail |
|---|---|
| TTS Models | qwen3-tts-flash (fast/cheap), qwen3-tts-instruct-flash (emotion/character control), qwen3-tts-vd (custom voice from text description) |
| Voice Clone | qwen3-tts-vc — replicates any enrolled voice with high fidelity |
| 10 Built-in Voices | Male, female, and child voices in multiple styles (warm, lively, deep, humorous, educational) |
| Audio Script Optimization | AI transforms chapter markdown into narration-ready scripts with pause markers |
| Audio-Only Mode | Skip visual output and produce pure audiobooks with voice-first curriculum design |
| Speech Rate Control | 0.5× to 2.0× speed adjustment optimized for educational content |
| Audio ZIP Download | All chapter narrations bundled as a single downloadable archive |
Curated CSS/JS book templates with automatic AI selection:
| Template | Style | Best For |
|---|---|---|
storybook |
Warm borders, soft colors | Fiction, fairy tales |
stem |
Clean diagrams, scientific layout | Science, technology |
adventure |
Bold colors, explorer themed | Geography, history |
math |
KaTeX-ready, equation-focused | Mathematics, physics |
low_budget |
B&W coloring pages | Printable worksheets |
nature |
Earthy tones, leaf ornaments | Biology, ecology |
culture |
Vibrant folk-art borders | Social studies, culture |
space |
Dark theme, star backgrounds | Astronomy |
ocean |
Wave patterns, blue palette | Marine science |
auto |
AI picks best match | Any topic |
| Format | Description |
|---|---|
| HTML | Interactive, self-contained book with embedded images, videos, and styled templates |
| Print-ready with Unicode support, embedded images, and proper pagination | |
| Markdown | Clean markdown with base64-embedded images for portability |
| JSON | Structured data for programmatic consumption or re-rendering |
| Audio ZIP | All chapter narrations bundled as a downloadable archive |
A chat-based AI assistant that:
- Collects all book parameters through natural dialogue
- Maintains memory across 20+ conversation exchanges
- Emits a structured
book_request_jsonwhen all parameters are confirmed - Supports switching between model providers mid-conversation
- Automatically responds in the user's language
┌──────────────────────────────────────────────────────────────────────┐
│ Streamlit UI (app.py) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────────┐ │
│ │Chat Mode │ │Form Mode │ │Voice Mode│ │ Settings / Sidebar │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └──────────────────────┘ │
│ └─────────────┴──────────────┘ │
│ │ │
├───────────────────────────▼──────────────────────────────────────────┤
│ Agent Orchestration Layer (16 Agents) │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌───────────────┐ │
│ │ Chat Agent │ │ Curriculum │ │ Chapter │ │ Voice Agents │ │
│ │ │ │ Agent │ │ Agent │ │ (5 modules) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └───────────────┘ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌───────────────┐ │
│ │ Qwen Image │ │ DDG Search │ │YouTube + QR │ │ Fact Check │ │
│ │ Generator │ │ │ │ Agent │ │ Agent │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └───────────────┘ │
│ │
├──────────────────────────────────────────────────────────────────────┤
│ Export & Rendering Layer │
│ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌───────────────────────┐ │
│ │ HTML │ │ PDF │ │ MD │ │ JSON │ │ Audio ZIP │ │
│ └──────┘ └──────┘ └──────┘ └──────┘ └───────────────────────┘ │
│ │
├──────────────────────────────────────────────────────────────────────┤
│ Flexible Model Providers (config.py) │
│ ┌──────────────────┐ ┌──────────────────┐ ┌────────────────────┐ │
│ │ GitHub Models │ │ Qwen/DashScope │ │ Claude / Azure OAI │ │
│ │ gpt-4o-mini │ │ Text·Image·TTS │ │ (configurable) │ │
│ │ Free dev tier │ │ 3 global regions │ │ │ │
│ └──────────────────┘ └──────────────────┘ └────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘
| Agent | File | Role |
|---|---|---|
| Chat Agent | agents/chat_agent.py |
Conversational requirement elicitation with memory |
| Curriculum Agent | agents/curriculum_agent.py |
Designs structured educational outlines |
| Chapter Agent | agents/chapter_agent.py |
Writes rich educational chapters in markdown |
| Voice Curriculum | agents/voice_curriculum_agent.py |
Audio-optimized curriculum design |
| Voice Chapter | agents/voice_chapter_agent.py |
Chapters optimized for spoken delivery |
| Audio Script | agents/audio_book_script_agent.py |
Transforms chapters into narration-ready scripts |
| Voice Agent | agents/voice_agent.py |
Standard TTS via Qwen3 (10 built-in voices) |
| Voice Clone | agents/voice_clone_agent.py |
Voice enrollment + cloned-voice synthesis |
| Image Generator | agents/qwen_image_agent.py |
AI illustrations via Qwen-Image models |
| DDG Image Search | agents/ddg_image_search_agent.py |
Web image search with SafeSearch |
| YouTube + QR | agents/youtube_search_agent.py |
Educational video discovery + QR code generation |
| Fact Checker | agents/fact_check_agent.py |
Web search verification of educational claims |
| LaTeX Math | agents/latex_math_agent.py |
Mathematical content with KaTeX rendering |
| HTML Renderer | agents/html_css_agent.py |
Template-based HTML book rendering engine |
| PDF Generator | agents/html_to_pdf_converter.py |
Unicode PDF with embedded images |
| Markdown Export | agents/markdown_agent.py |
Self-contained markdown with base64 images |
All agents communicate through typed Pydantic schemas defined in models/book_spec.py:
BookRequest → Curriculum → [ChapterContent] → BookOutput
├── ImagePlaceholder[]
├── VideoPlaceholder[]
└── AudioNarration
- Python 3.12+
- At least one API key (GitHub Token or DashScope API key)
git clone https://github.com/crissins/Agent-Framework.git
cd Agent-Framework
# Create virtual environment
python -m venv .venv
# Activate (Windows)
.venv\Scripts\activate
# Activate (macOS/Linux)
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txtCreate a .env file in the project root:
# At minimum, provide one of these:
GITHUB_TOKEN=ghp_your_github_token_here
DASHSCOPE_API_KEY=sk-your_dashscope_key_here
# Optional providers
ANTHROPIC_API_KEY=sk-ant-your_key_here
AZURE_OPENAI_API_KEY=your_azure_key_here
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
# Optional: Enable AI Toolkit tracing
AITK_TRACING_ENABLED=0| Key | Source | Used For |
|---|---|---|
GITHUB_TOKEN |
github.com/settings/tokens | Text generation via GitHub Models (free tier) |
DASHSCOPE_API_KEY |
dashscope.aliyun.com | Text, images, TTS, and voice cloning via Qwen |
ANTHROPIC_API_KEY |
console.anthropic.com | Text generation via Claude models |
AZURE_OPENAI_API_KEY |
portal.azure.com | Text generation via Azure OpenAI |
# Streamlit UI (recommended)
python -m streamlit run app.py
# CLI mode
python main.py
# HTTP server for AI Toolkit Agent Inspector
python server.py- Open the Streamlit app at
http://localhost:8501 - In the chat, type: "Quiero un libro sobre los animales del océano para niños de 9 años en México"
- The assistant collects remaining parameters, then generates curriculum, chapters, images, and audio
- Download the result as HTML, PDF, Markdown, or listen to the audiobook directly in the browser
Click the thumbnail to watch the complete walkthrough on YouTube — live book generation, voice narration, AI illustrations, and export in action.
You: "I want a book about space exploration for 10-year-olds in Colombia"
Agent: Clarifying questions about learning method, number of chapters, voice preferences
Agent: Confirms all parameters and emits book_request_json
→ Pipeline generates a complete book: curriculum + chapters + images + audio narration
1. Record a 10-second voice sample using the sidebar microphone
2. Click "Clone & Save" to enroll a persistent voice profile
3. Generate audiobook — narrated entirely in YOUR voice
4. Download as ZIP with all chapter audio files
A generated book includes:
- 📄 HTML — Interactive styled book viewable in any browser
- 📕 PDF — Print-ready with proper pagination and Unicode
- 📝 Markdown — Portable format with embedded images
- 🎧 Audio — Chapter-by-chapter WAV narration files
- 📊 JSON — Structured data for programmatic access
Agent-Framework/
├── app.py # Streamlit UI — main application
├── main.py # CLI workflow runner
├── server.py # HTTP server for AI Toolkit Agent Inspector
├── config.py # Multi-provider model configuration
├── requirements.txt # Python dependencies
├── .env # API keys (not committed)
│
├── agents/ # 16 specialized AI agents
│ ├── chat_agent.py # Conversational book design
│ ├── curriculum_agent.py # Curriculum structure generation
│ ├── chapter_agent.py # Chapter content writing
│ ├── voice_curriculum_agent.py # Audio-first curriculum
│ ├── voice_chapter_agent.py # Voice-optimized chapters
│ ├── audio_book_script_agent.py # Chapter → narration script
│ ├── voice_agent.py # TTS synthesis (10 voices)
│ ├── voice_clone_agent.py # Voice cloning enrollment & synthesis
│ ├── qwen_image_agent.py # AI image generation
│ ├── ddg_image_search_agent.py # Web image search
│ ├── youtube_search_agent.py # YouTube video search + QR
│ ├── fact_check_agent.py # Web-search fact verification
│ ├── latex_math_agent.py # LaTeX/KaTeX math content
│ ├── html_css_agent.py # HTML book renderer
│ ├── html_to_pdf_converter.py # PDF generator (fpdf2)
│ └── markdown_agent.py # Markdown exporter
│
├── models/ # Data schemas & i18n
│ ├── book_spec.py # Pydantic models (BookRequest, Curriculum, etc.)
│ ├── template_registry.py # 10 visual book templates
│ └── i18n.py # 11-language string tables
│
├── templates/ # HTML/CSS book templates
│ ├── master_book.html
│ ├── storybook-template.html
│ ├── math.html
│ └── ...
│
├── utils/ # Shared utilities
│ ├── retry.py # Async/sync retry with exponential backoff
│ └── math_latex.py # LaTeX rendering helpers
│
├── tests/ # Test suite
│ ├── test_chat_agent_contracts.py
│ ├── test_math_book.py
│ └── test_retry.py
│
├── books/ # Generated output (gitignored)
│ ├── json/ # Structured book data
│ ├── html/ # Interactive HTML books
│ ├── md/ # Markdown exports
│ ├── pdf/ # Print-ready PDFs
│ ├── audio/ # Chapter narration WAV files
│ ├── images/ # AI-generated illustrations
│ └── voice_clones/ # Persistent cloned-voice profiles
│
└── docs/ # Extended documentation
├── INDEX.md
├── ARCHITECTURE.md
├── AGENTS_REFERENCE.md
├── SETUP.md
└── USER_GUIDE.md
| Practice | Implementation |
|---|---|
| Retry Logic | Exponential backoff with jitter for all API calls (utils/retry.py) |
| Type Safety | Pydantic v2 models for all data flowing between agents |
| API Key Validation | Preflight checks before any long-running generation |
| Session Persistence | Streamlit session state survives page reruns |
| Error Isolation | Each agent fails gracefully without crashing the pipeline |
| Safe Search | DuckDuckGo SafeSearch for child-appropriate image results |
| Content Verification | Optional fact-checking agent validates educational claims |
| No Hardcoded Secrets | All credentials via .env and environment variables |
| Diagnostic Logging | Structured logging for TTS, image generation, and API calls |
| Tracing | OpenTelemetry + AI Toolkit integration for distributed debugging |
- Agent-as-Server Pattern —
server.pywraps the book planner agent as an HTTP server compatible with Microsoft AI Toolkit Agent Inspector for visual debugging and tracing - 4 Provider Abstraction —
config.pyresolves the correct endpoint, API key, and model ID at runtime for GitHub Models, Qwen/DashScope (Singapore · Beijing · US-Virginia), Anthropic Claude, and Azure OpenAI — eliminating per-agent provider logic - Smart Template Selection — The LLM auto-selects the best visual template based on topic analysis rather than defaulting to a single style
- Intelligent Image Prompting — The text LLM generates structured image prompts (camera angle, lighting, art style, resolution) rather than plain descriptions, improving generation quality
- WAV Audio Concatenation — Properly extracts PCM frames from chunked TTS responses and re-wraps with correct WAV headers — not naive byte concatenation
- Voice Profile Registry — Persistent JSON registry stores cloned voice IDs and enrollment audio for reuse across sessions
- Chat Memory — 20-exchange rolling conversation history enables coherent multi-turn book design
- i18n Engine — String tables covering 11 languages with RTL support detection
# Run the full test suite
python -m pytest tests/ -v
# Test agent data contracts
python -m pytest tests/test_chat_agent_contracts.py -v
# Test retry / backoff logic
python -m pytest tests/test_retry.py -v
# Test math book generation end-to-end
python -m pytest tests/test_math_book.py -v| Issue | Solution |
|---|---|
st.audio_input not found |
Streamlit is too old: pip install --upgrade streamlit>=1.54.0 |
| Chat completes but form values are used | Check that the generation source indicator says "chat" — the chat JSON overrides form fields |
| Voice cloning fails silently | Confirm DASHSCOPE_API_KEY is set and the DashScope account has TTS-VC quota |
| Images not generating | Set DASHSCOPE_API_KEY and select a Qwen image model in the sidebar |
| PDF missing characters | Expected for CJK / Arabic scripts — use HTML export for full Unicode fidelity |
| Tracing errors in the console | Normal when no OTLP collector is running — set AITK_TRACING_ENABLED=0 to suppress |
agent-framework package not found |
Ensure you're using Python 3.12+ and the virtual environment is activated |
| DashScope 429 / rate limit | Switch Qwen region in sidebar (Singapore → Beijing → US-Virginia) or add backoff |
These are active limitations that are known and being tracked:
| Area | Issue | Status |
|---|---|---|
| DDG Image Search | ddg_image_search_agent.py is functional but does not meet production quality standards — DuckDuckGo returns inconsistent results, has no relevance ranking, and the API is undocumented/unofficial. Used as a fallback only. |
|
| DDG Video Search | Same constraints as image search — YouTube results via DuckDuckGo are unreliable and miss the most relevant educational content. | |
| PDF Blocking | PDF generation (html_to_pdf_converter.py) runs synchronously and blocks the Streamlit UI thread during export for large books. |
🔄 In progress |
app.py Size |
The Streamlit UI file is 2,600+ lines — it handles all tabs (chat, form, voice, batch, settings) in a single module, making it harder to maintain as features grow. | 🔄 Planned refactor |
| Test Coverage | Integration tests for the full book generation pipeline and golden-file tests for HTML templates are missing. Unit tests exist only for retry logic and chat agent contracts. | 🔄 Expanding |
| Voice Clone Quota | qwen3-tts-vc voice cloning requires a DashScope account with TTS-VC quota enabled — it silently degrades to standard TTS when quota is absent. |
📋 Known behavior |
| CJK / RTL in PDF | Unicode characters outside Latin script (Arabic, Chinese, Japanese) render as blank boxes in the PDF output. Use HTML export for full Unicode fidelity. | 📋 Known limitation |
Planned enhancements ordered by impact:
The current DuckDuckGo implementation for both image and YouTube search is a temporary stand-in. The planned replacement is Bing Search via Azure AI Agents, which provides:
- Ranked, high-confidence image results with licensing metadata
- Proper YouTube video relevance scoring with educational topic matching
- Azure-grade reliability, SLA, and rate limits
- Seamless integration with the existing Azure AI provider already in
config.py
# Planned: agents/bing_image_search_agent.py
# Planned: agents/bing_video_search_agent.py
# Using: azure-ai-projects + BingGroundingToolThis is the highest-priority search improvement — DDG quality is the weakest link in the current pipeline.
Streamlit is excellent for prototyping but has real limits at production scale: no fine-grained component control, limited real-time streaming, no code splitting, and a Python-locked rendering model. A React frontend would unlock:
| Capability | Streamlit Today | React Target |
|---|---|---|
| Real-time chapter streaming | st.write_stream() (limited) |
Server-Sent Events / WebSocket |
| Book preview | iFrame embed | Full interactive renderer |
| Audio playback UI | Basic st.audio |
Custom waveform + chapter timeline |
| Mobile responsiveness | Limited | Full responsive layouts |
| Routing / multi-page | st.navigation() |
React Router |
| Component reuse | Copy-paste | Composable component library |
The existing server.py HTTP backend is already structured to serve as a headless API — a React frontend would consume it directly.
A docker-compose.yml with a Python service (app + server) and optional OTLP collector for tracing would make the project runnable without Python/venv setup.
- Golden-file tests for HTML template rendering
- End-to-end generation pipeline integration tests with mocked LLM responses
- Snapshot tests for Pydantic schema evolution
Split the 2,600-line app.py into per-tab Streamlit page modules:
pages/
01_chat.py ← Chat-driven book design
02_form.py ← Manual form mode
03_voice.py ← Voice cloning & audiobook
04_batch.py ← Batch generator
05_settings.py ← Provider config & API keys
Extend i18n beyond the current 11 languages — particularly Quechua, Guaraní, and Haitian Creole for broader LATAM coverage.
Track: Creative Apps
Key Technologies:
| Technology | Role |
|---|---|
| Microsoft Agent Framework | Multi-agent orchestration backbone |
| GitHub Models (gpt-4o-mini) | Text generation — free tier development |
| Qwen/DashScope | Text, image generation, TTS Flash/Instruct, voice cloning |
| Anthropic Claude | Optional high-quality text generation provider |
| Azure OpenAI | Optional enterprise text generation provider |
| Streamlit 1.54+ | Interactive UI with real-time audio recording |
| OpenTelemetry + AI Toolkit | Distributed tracing and visual debugging |
| Pydantic v2 | Typed data contracts across all agents |
| fpdf2 | Pure-Python PDF generation with Unicode support |
| KaTeX | Client-side LaTeX math rendering |
Curious how this project was built? Read the Copilot Story — an honest account of how GitHub Copilot acted as a co-author at every stage: architecture design, Pydantic schema generation, WAV audio handling, multi-provider abstraction, and more.
MIT License — see LICENSE for details.
Built with ❤️ for LATAM education
