Upload your own textbooks, set a real deadline, and let Scholar build a personalized study plan — with AI notes, grounded chat, quizzes, and adaptive learning that are answered only from your books.
Scholar turns a pile of PDFs into a structured, deadline-aware course. Every note, every chat answer, and every quiz question is grounded in — and limited to — the specific sources you provide. No hallucinated facts from the open internet.
A student uploads their course material, says "I want to master this by the 30th," and Scholar:
- Indexes the books into a dual retrieval engine (structural + semantic).
- Builds a study plan — a sequence of sessions sized to the deadline.
- Teaches each session with streamed, cited notes and a grounded Q&A chat.
- Tests understanding with quizzes, and adapts when the student struggles.
- Certifies mastery with a cumulative final test that marks the goal complete.
| Capability | What it means for the student |
|---|---|
| Bring your own books | Upload PDFs or paste a URL — extraction and indexing are automatic. |
| Goal-driven plans | Set a topic, level, and deadline; Scholar generates a session-by-session plan. |
| AI study notes | Each session opens with concise, streamed markdown notes — every claim cited to a page. |
| Grounded chat | Ask anything; answers come only from your books, with source citations. |
| Quizzes + adaptive learning | Auto-generated quizzes; a low score inserts a targeted remedial session automatically. |
| Final cumulative test | A cross-session exam that, when passed, marks the whole goal complete. |
| Super Agent | One chat that reasons across all your indexed books at once. |
| Multimodal (optional) | Opt-in understanding of diagrams and figures via a vision model. |
| Notion export | Push your plan and notes to a Notion page in one click. |
| Model-agnostic | Runs on OpenAI, Anthropic, Google, or any OpenAI-compatible model — all via .env. |
flowchart LR
A[📄 Upload books<br/>PDF / URL] --> B[Dual indexing<br/>PageIndex tree + vector embeddings]
B --> C[🎯 Set a goal<br/>topic · level · deadline]
C --> D[🗂️ Study plan<br/>N sessions]
D --> E[📝 Notes + 💬 Chat + ❓ Quiz<br/>grounded & cited]
E -->|low score| F[➕ Adaptive<br/>remedial session]
E --> G[🏁 Final test<br/>→ goal complete]
B --> H[🤖 Super Agent<br/>chat across all books]
At the core is a dual retrieval engine with two complementary strategies, selectable per query or pinned via config:
- PageIndex (structural) — an LLM navigates a hierarchical tree of the document and returns whole sections.
- Vector (semantic) — cosine similarity over embeddings returns the most relevant chunks.
- Hybrid — runs both and merges them.
→ Design details in docs/retrieval.md.
The retrieval strategies are benchmarked head-to-head with RAGAS on a 30-question golden set, using a bundled Environmental Science textbook as the evaluation book.
| Strategy | Faithfulness | Answer Relevancy | Context Precision |
|---|---|---|---|
| PageIndex | 0.94 | 0.81 | 0.77 |
| Vector | 0.85 | 0.91 | 0.92 |
| Hybrid | 0.94 | 0.90 | 0.90 |
A real trade-off — and Hybrid gets the best of both: it keeps PageIndex's grounding (faithfulness 0.94) while nearly matching Vector's precision (0.90). Full methodology, the evaluation book, and the complete comparison: docs/evaluation.md.
Start here → docs/README.md — the documentation hub.
| Doc | Contents |
|---|---|
| Architecture | Components, agent layer, storage model, request map |
| Flows | Diagrams: ingestion, retrieval, sessions, adaptive, final test, super agent |
| Retrieval | The dual engine — PageIndex vs vector, router, hybrid merge |
| Evaluation | RAGAS methodology, the eval book, results, reproduction |
| Configuration | Full environment reference — running model-agnostically + selecting a strategy |
Backend FastAPI · LangChain/LangGraph · PostgreSQL + pgvector · PageIndex (open-source, local) · RAGAS Frontend Next.js 14 · TypeScript · Tailwind · native SSE streaming Models Provider-agnostic via a model factory — OpenAI / Anthropic / Google / any OpenAI-compatible endpoint
A goal-driven study system with adaptive learning, multimodal ingestion, a cumulative final test, a cross-book Super Agent, and Notion export — backed by a benchmarked dual retrieval engine.