StayChat AI — RAG-Based Hotel Q&A System

Retrieval-Augmented Generation pipeline for answering natural-language questions about hotels from a curated document corpus (40 synthetic documents).

Submission guide: see SUBMISSION.md — add your name before submitting.

Quick start (clone from GitHub)

git clone https://github.com/YOUR_USERNAME/YOUR_REPO.git
cd YOUR_REPO
python -m venv .venv
.venv\Scripts\activate          # Windows
pip install -r requirements.txt
python main.py --all

The FAISS index is not in the repo (see .gitignore); main.py --all rebuilds it automatically.

Architecture

flowchart LR
  A[hotel_documents.json] --> B[Preprocess and Chunk]
  B --> C[MiniLM Embeddings]
  C --> D[FAISS Index]
  B --> E[BM25 Index]
  F[User Query] --> G[Hybrid Retrieve]
  D --> G
  E --> G
  G --> H[Rerank]
  H --> I[Top-k Chunks]
  I --> J[LLM OpenAI Ollama or Mock]
  I --> K[Hallucination Controls]
  K --> J
  J --> L[Answer with Citations]

Component	Choice	Rationale
Chunking	Sentence-aware, 280 chars, 60 overlap	Splits longer amenity/policy passages; overlap preserves WiFi/breakfast context across boundaries
Embeddings	`all-MiniLM-L6-v2` (384-d)	Strong semantic search, local, no API cost
Vector DB	FAISS `IndexFlatIP` + BM25 hybrid	Dense recall + sparse keyword match (e.g. “complimentary breakfast”)
Top-k	7 (10 for list queries)	Better multi-hotel recall than k=5
LLM	OpenAI / Ollama / generic mock	Mock is query-agnostic (no hardcoded demo branches)
Hallucination	Threshold + term grounding + prompt + verification	See `outputs/hallucination_ablation.md`

Dataset (40 documents)

Category	Count
Hotel descriptions	9
Amenities	7
Guest reviews	10
Policies	7
Location	7

Source: synthetic (MIT). File: data/hotel_documents.json.

Project layout

Task/
├── SUBMISSION.md
├── config.py
├── main.py
├── requirements.txt
├── data/
├── src/
│   ├── preprocessing.py
│   ├── embed_store.py
│   ├── hybrid_retrieval.py
│   ├── retrieval.py
│   ├── generation.py
│   ├── evaluation.py
│   └── hallucination.py
├── tests/test_rag.py
├── index/
└── outputs/

Setup

cd Task
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt

Optional:

set OPENAI_API_KEY=sk-...
set USE_OLLAMA=1

Run

python main.py --all
python -m unittest tests.test_rag -v

Example queries

ID	Query
Q1	Which hotels have free WiFi and complimentary breakfast?
Q2	What is the cancellation policy of Hotel X?
Q3	Suggest a hotel with excellent reviews near the beach.

Results: outputs/sample_outputs.md

Known limitations

Mock LLM uses lexical overlap — enable OpenAI or Ollama for fluent paraphrasing.
Manual relevance labels for three queries; metrics are indicative.
Production would add cross-encoder reranking and NLI-based verification.

Task checklist

Task 1: Cleaning, chunking (280/60) with justification
Task 2: Hybrid embeddings + FAISS/BM25 + top-k
Task 3: Context-only prompt + Q1–Q3
Task 4: Metrics with workings + detailed qualitative analysis
Task 5: Hallucination control + ablation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StayChat AI — RAG-Based Hotel Q&A System

Quick start (clone from GitHub)

Architecture

Dataset (40 documents)

Project layout

Setup

Run

Example queries

Known limitations

Task checklist

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
outputs		outputs
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

StayChat AI — RAG-Based Hotel Q&A System

Quick start (clone from GitHub)

Architecture

Dataset (40 documents)

Project layout

Setup

Run

Example queries

Known limitations

Task checklist

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages