Skip to content

rvz16/pptx-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

PPTX Agent

DeckSpec JSON is the source of truth. The .pptx file is a rendered artifact that can be regenerated after edits.

Overview

This project implements a Python-based LangGraph agent that:

  • creates a presentation from a natural-language request;
  • stores an editable deck.spec.json;
  • accepts follow-up edit requests;
  • plans visual layouts, callouts, cards, and decorative accents;
  • searches licensed image sources through a provider abstraction;
  • downloads and caches local image assets for image-based slides;
  • re-renders a new .pptx version without overwriting previous ones.

The implementation keeps the same core principle: the LLM plans content and visuals, deterministic Python code assembles the .pptx.

Installation

python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Environment

Copy .env.example to .env and set:

OPENROUTER_API_KEY=
MODEL_NAME=openai/gpt-oss-120b
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
IMAGE_PROVIDER=wikimedia
PEXELS_API_KEY=
UNSPLASH_ACCESS_KEY=
PIXABAY_API_KEY=
RESEARCH_PROVIDER=wikipedia
TAVILY_API_KEY=
PPTX_TEMPLATE_PATH=templates/default_template.pptx

If OPENROUTER_API_KEY is missing, the agent still works with deterministic heuristic fallbacks for create/edit flows.

The scriptwriter sub-agent researches the topic on the web to ground slide content in real facts. RESEARCH_PROVIDER=wikipedia works with no key; set RESEARCH_PROVIDER=tavily and TAVILY_API_KEY=... for full web search (it falls back to Wikipedia if the key is missing).

CLI Usage

Interactive stage-by-stage mode (the agent walks you through script → style → visuals → packaging → refinement, pausing for review at each stage):

python -m app.cli chat
# or seed the topic up front:
python -m app.cli chat "Презентация на 8 слайдов про рынок AI-агентов для инвесторов"

At each stage you can edit the brief fields (by number or name, e.g. аудитория: инвесторы), rename slides and rewrite bullets, pick a theme, toggle images, and after the first render keep requesting free-form edits ("добавь слайд про риски", "сделай 3 слайд короче") — each edit produces a new version without overwriting the previous .pptx.

To improve an existing presentation instead of starting from scratch, import a .pptx:

python -m app.cli chat --pptx path/to/existing.pptx

The deck is parsed into an editable DeckSpec (titles, bullets, tables, embedded images, and the source theme/colors), then you go straight to the style → visuals → refine stages.

Create a deck:

python -m app.cli create \
  "Сделай презентацию на 8 слайдов про рынок AI-агентов для инвесторов" \
  --visual-rich \
  --theme investor_modern \
  --image-provider wikimedia

Edit an existing deck:

python -m app.cli edit \
  --spec output/<deck_id>/v001/deck.spec.json \
  "Сделай презентацию визуально богаче, добавь картинки и карточки"

Useful flags:

  • --visual-rich: prefer richer visual layouts
  • --no-images: disable remote image fetch and rely on decorative fallbacks
  • --image-provider wikimedia|pexels: choose the image source
  • --theme investor_modern|startup_dark|consulting_clean|minimal_blue|corporate_light|tech_gradient
  • --template templates/default_template.pptx: use an optional PowerPoint template

Architecture

  • app/deck/models.py and app/deck/visual_models.py: Pydantic models for DeckSpec, themes, visual layouts, and edits.
  • app/deck/validator.py: schema-adjacent business validation.
  • app/deck/visual_validator.py: visual quality rules and repair triggers.
  • app/deck/renderer.py: layout-based PptxAssembler with cards, decorations, and image slots.
  • app/deck/patcher.py: safe structured edit application.
  • app/assets/*: image provider abstraction, search, download, and cache metadata.
  • app/design/*: layout constants, themes, decorations, and visual rules.
  • app/storage/files.py: versioned output directories and spec persistence.
  • app/agents/*: LangGraph state, prompts, nodes, and graph assembly.
  • app/cli.py: local create/edit commands.
  • app/api/*: optional FastAPI wrapper.

Current Limitations

  • Layouts are deterministic by design; this keeps rendering reliable but not free-form.
  • External .pptx inspection recovers titles (from title placeholders), bullets, tables, embedded images, and the source theme colors/fonts; it does not reconstruct original per-shape positioning or animations.
  • Edit interpretation is strongest for concise Russian requests and can fall back to safe minimal operations.
  • Wikimedia image search works without an API key, but query quality still affects relevance and runtime.
  • Remote image fetching is sequential right now, so image-rich deck generation can be noticeably slower.
  • If no API key is configured, the LLM steps use heuristics instead of model generation.

Future Improvements

  • richer layout engine;
  • PDF and image preview export;
  • CSV/Excel-to-chart pipeline;
  • stronger edit planning for compound requests;
  • approval workflow for destructive edits;
  • corporate templates and brand kits.

About

AI agent that degenerates presentation in .pptx format

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages