ARPe

Structured Question Answering Agent

Modular RAG + LLM framework for document Q&A in Portuguese

Overview

ARPe automates structured question answering - forms, checklists, assessments - by combining Retrieval Augmented Generation (RAG) with local language models. It extracts relevant context from PDF documents, builds vector indexes, and generates accurate, justified answers.

Features


🧾 Document reading	PDF support with page extraction and intelligent chunking
🔍 Context retrieval	Embeddings (`all-MiniLM-L6-v2`) + FAISS index for similarity search
🧠 LLM generation	Local models via Hugging Face (`transformers`) with Mistral prompt template
⚙️ Flexible chunking	By size, by page, or by regex pattern
🌐 Web interface	Static UI with question form, PDF upload, and chunk visualization
🧪 Experiment pipeline	Batch execution with 13 configurations for comparative evaluation
📊 Reports	Accuracy metrics via cosine similarity between expected and generated answers
🇧🇷 Native Portuguese	Prompt, interface, and documentation in Portuguese

Architecture

Web Interface

Quick Start

Prerequisites

Python 3.11+
CUDA (optional, for GPU inference)

1. Virtual environment

python -m venv env
.\env\Scripts\Activate

2. Dependencies

pip install -r backend/requirements.txt

3. Base document

Download the PDF from the RISG repository and save it as bases/RISG.pdf.

4. Servers

# Terminal 1 - Backend (Flask)
.\runners\start-backend.ps1

# Terminal 2 - Frontend (optional)
.\runners\start-frontend.ps1

5. Tests

# Python tests (backend + integration)
.\runners\run-tests.ps1 -v

# E2E tests (Playwright)
.\runners\run-e2e.ps1

Runners

Helper scripts in runners/ to speed up development:

Command	Description
`start-backend.ps1`	Starts the Flask API on `localhost:8000`
`start-frontend.ps1`	Serves the static frontend on `localhost:3000`
`run-tests.ps1`	Runs `pytest` for backend and integration tests
`run-e2e.ps1`	Runs Playwright tests (headed mode by default)

Project structure

ARPe/
├── backend/
│   ├── app.py               # Flask API (endpoints /health, /options, /answer)
│   ├── arpe.py              # RAG core: chunking, embeddings, LLM, pipeline
│   ├── options.json         # API options (models, chunking, etc.)
│   ├── experiments.json     # 13 experiment configurations
│   ├── run_experiments.py   # Experiment runner
│   └── tests/               # Backend unit tests
├── frontend/
│   ├── index.html           # Web UI
│   ├── styles.css           # UI styles
│   └── app.js               # JS logic (form, selects, results)
├── e2e/
│   └── tests/
│       ├── api.spec.js      # API E2E tests
│       └── frontend.spec.js # Frontend E2E tests
├── testes/                  # Integration and coverage tests
├── runners/                 # Helper scripts (start, run)
├── docs/                    # Images and assets
├── bases/                   # Reference PDFs (not versioned)
├── output/                  # Experiment results
├── env/                     # Python virtual environment
│   └── paper.pdf            # Academic paper (TCC)
├── README.md
└── README.pt-BR.md

Paper

This project is based on the academic work "A Utilização de Large Language Models Locais para Consulta a Documentos Militares" by Matheus Vanzan Pimentel de Oliveira (EsAO, 2025). See docs/paper.pdf for the full text.

Data source

The evaluation examples use the Internal Regulation and General Services (RISG) manual, publicly available at the Brazilian Army Digital Library.

The original PDF is not versioned in this repository. Download it from the official source and place it at bases/RISG.pdf.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ARPe

Overview

Features

Architecture

Web Interface

Quick Start

Prerequisites

1. Virtual environment

2. Dependencies

3. Base document

4. Servers

5. Tests

Runners

Project structure

Paper

Data source

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
backend		backend
docs		docs
e2e		e2e
frontend		frontend
runners		runners
testes		testes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
README.pt-BR.md		README.pt-BR.md

Folders and files

Latest commit

History

Repository files navigation

ARPe

Overview

Features

Architecture

Web Interface

Quick Start

Prerequisites

1. Virtual environment

2. Dependencies

3. Base document

4. Servers

5. Tests

Runners

Project structure

Paper

Data source

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages