This project builds a full AI agent that takes:
- course slides (
.pdf,.pptx,.md,.txt) - class transcript (
.txt/.md)
and generates a comprehensive DOCX lecture note that:
- covers slide content + teacher speech
- keeps a clean, structured format
- highlights special mentions from instructor
- embeds extracted slide images directly in the DOCX file
- applies slide-image cropping metadata (for PPTX images) before embedding
- extracts and preserves formulas
- validates coverage and repairs missing points automatically
- Multi-source ingestion for slides and transcripts
- Model-only OCR for PDFs (automatic whole-file + page fallback)
- Coverage checklist generation (atomic items with IDs)
- Lecture-note drafting with references like
[S3],[T17] - DOCX export with attached images embedded in the final file
- Strict validation pass with JSON audit
- Auto-repair loop to fill dropped/missing content
- Artifacts export (
checklist.md,audit.json,source_bundle.json) - Light Streamlit UI for easy upload/run/download flow
src/lecture_note_agent/io_utils.py— parsing slides/transcript and source payload buildsrc/lecture_note_agent/prompts.py— generation/audit/repair promptssrc/lecture_note_agent/agent.py— orchestration pipeline + iterative validationsrc/lecture_note_agent/cli.py— command line interfacesrc/lecture_note_agent/ui.py— lightweight Streamlit web UIDockerfile+docker-compose.yml— one-command containerized run
-
Install dependencies:
pip install -r requirements.txt -
Configure
.env:OPENAI_API_KEY=your_openai_api_key_hereOPENAI_BASE_URL=https://openrouter.ai/api/v1OPENAI_MODEL=openai/gpt-5.4(fallback)OPENAI_MODEL_OCR=openai/gpt-5.4OPENAI_MODEL_CHECKLIST=openai/gpt-5.4OPENAI_MODEL_DRAFT=openai/gpt-5.4OPENAI_MODEL_AUDIT=openai/gpt-5.4OPENAI_MODEL_REPAIR=openai/gpt-5.4MAX_REPAIR_LOOPS=3MAX_MODEL_CALLS=6MAX_OUTPUT_TOKENS=3500FAST_MODE=false(settruefor faster runs)PDF_OCR_MODE=auto(wholeis usually faster thanauto)
The app now uses an OpenAI-compatible client only. Keep credentials in .env only.
Run from project root:
python -m lecture_note_agent --course-name "Data Structures" --slides ./input/week1.pdf --transcript ./input/week1_transcript.txt --output ./output/week1_lecture_notes.docx --artifacts-dir ./artifacts/week1
Or after editable install (pip install -e .):
slideagent --course-name "Data Structures" --slides ./input/week1.pdf --transcript ./input/week1_transcript.txt --output ./output/week1_lecture_notes.docx --artifacts-dir ./artifacts/week1
whole: uploads the full PDF once and asks the model to return per-page JSON text.page: uploads one-page PDFs and extracts each page separately.auto: trieswholefirst, then falls back topagefor weak/missing pages.
This strategy is always used for PDFs; no OCR toggles are required in UI/CLI/env.
If runs feel slow, use one or more of these:
- Enable
FAST_MODE=true(skips audit/repair loop, disables continuation calls, uses whole-PDF OCR) - Set
PDF_OCR_MODE=wholefor faster OCR on large PDFs - Reduce
MAX_REPAIR_LOOPSandMAX_OUTPUT_TOKENS - Use a faster model for draft/checklist phases
SlideAGENT can route each phase to a different model:
- OCR phase →
OPENAI_MODEL_OCR - Checklist phase →
OPENAI_MODEL_CHECKLIST - Draft phase →
OPENAI_MODEL_DRAFT - Audit phase →
OPENAI_MODEL_AUDIT - Repair phase →
OPENAI_MODEL_REPAIR
If a phase model is not provided, OPENAI_MODEL is used as fallback.
Run locally:
streamlit run src/lecture_note_agent/ui.py
Or with script (after pip install -e .):
slideagent-ui
Build and run UI with Docker Compose:
docker compose up --build
Then open http://localhost:8501.
The generated DOCX is designed to include:
- Full lecture structure (headings/subheadings)
- All concepts from slides and transcript
- Special instructor instructions/reminders
- Embedded slide images with exact refs from source
- Formula sheet with exact formula text
- Inline source references for traceability
Validation ensures high coverage, then repair loop attempts to fix any missing items before final DOCX output is written.
- For best results, provide clean transcript text (timestamps/speaker names are supported).
- PDF image extraction depends on available image metadata in the PDF.
- PPTX image references use shape names from slides.