GraphLex AI is a local document assistant for PDF ingestion, retrieval, and question answering. It uses Qdrant for vector storage, Ollama for local model inference, and LangGraph for the agent workflow.
- PDF ingestion with page-level chunking and metadata.
- Qdrant-backed similarity search with optional file filtering.
- LangGraph agent flow with retrieval, grading, rewriting, and generation.
- Streamlit interface for uploading documents and asking questions.
- Local model configuration with
qwen2.5:3b.
- Python 3.11
- Poetry
- LangChain and LangGraph
- Qdrant
- Ollama
- Streamlit
GraphLex/
├── docker-compose.yml
├── pyproject.toml
├── README.md
├── src/
│ ├── database.py
│ ├── engine.py
│ └── ui.py
└── tests/
├── test_agent_logic.py
├── test_agent_traces.py
├── test_final_answer.py
└── test_multiple_files.py
Install dependencies with Poetry:
poetry installActivate the environment if needed:
poetry shellStart Qdrant with Docker:
docker compose up -d qdrantMake sure Ollama is running locally and that the qwen2.5:3b model is available.
Run the Streamlit UI:
poetry run streamlit run src/ui.pyThe UI lets you upload a PDF, index it, and ask questions about the document.
src/database.pyloads PDFs, splits them into chunks, and stores them in Qdrant.src/engine.pyruns the LangGraph agent and handles retrieval, grading, query rewriting, and answer generation.src/ui.pyprovides the Streamlit interface for upload and chat.
The repository includes script-style checks under tests/:
tests/test_final_answer.pyruns a single end-to-end question.tests/test_agent_logic.pychecks a positive and negative query.tests/test_agent_traces.pyprints the final graph state.tests/test_multiple_files.pychecks file-specific filtering.
Run them with Poetry:
poetry run python tests/test_final_answer.py
poetry run python tests/test_agent_logic.py
poetry run python tests/test_agent_traces.py
poetry run python tests/test_multiple_files.pyThe current default embedding and chat model is qwen2.5:3b via Ollama. If you change the model, update the comments and README so they match the code.
- Generated PDFs and Qdrant storage are intentionally ignored in Git.
- The repository uses
devfor active implementation work.