CodeAtlas AI is a full-stack AI repository analysis platform. Paste a public GitHub repository URL, let the backend index it asynchronously, then explore the codebase through file metadata, a dependency graph, an architecture summary, semantic search, and repository-aware chat with cited files.
The goal of this project is to demonstrate production-style full-stack engineering: background jobs, database-backed indexing, vector search, AI integration, and a polished developer-facing dashboard.
- Submit and index public GitHub repositories.
- Run indexing asynchronously with Redis + RQ.
- Store repositories, files, chunks, summaries, dependencies, and chat messages in PostgreSQL.
- Use pgvector for embedding-backed code retrieval.
- Generate architecture summaries from indexed repository context.
- Ask repository-aware questions and receive answers with cited source files.
- Browse an indexed file tree and preview file contents.
- Generate AI explanations for individual files.
- View a basic file-level dependency graph built from local imports.
- Re-index existing repositories from the UI.
| Area | Technology |
|---|---|
| Frontend | Next.js, React, TypeScript, Tailwind CSS |
| Backend | FastAPI, SQLAlchemy, Alembic |
| Database | PostgreSQL, pgvector |
| Background jobs | Redis, RQ |
| AI | OpenAI embeddings and chat models |
| Graph UI | React Flow |
| Local development | Docker Compose |
Browser
|
v
Next.js frontend
|
v
FastAPI backend ---> PostgreSQL + pgvector
|
v
Redis queue
|
v
RQ worker ---> clone repo, scan files, chunk code, embed chunks, build graph, summarize
|
v
OpenAI API
Indexing flow:
- The user submits a public GitHub repository URL.
- FastAPI validates the URL, creates a
repositoriesrow, and enqueues an RQ job. - The worker clones the repository into a temporary directory.
- The worker ignores generated folders, binaries, oversized files, lock files, and common build artifacts.
- Source files are scanned, language-tagged, counted, and stored.
- File contents are chunked and embedded.
- Local import relationships are parsed into a dependency graph.
- An architecture summary is generated and stored.
- The frontend polls status until the repository is ready.
Install:
- Docker Desktop
- Git
- An OpenAI API key
You do not need to install PostgreSQL, Redis, Python packages, or Node packages manually for the default setup. Docker Compose runs those services for you.
git clone https://github.com/YOUR_USERNAME/codeAtlas.git
cd codeAtlascp .env.example .envOpen .env and add your own OpenAI API key:
OPENAI_API_KEY=your_openai_api_key_hereImportant: .env is intentionally ignored by Git. Do not commit it. Commit .env.example, not .env.
docker compose up --buildOpen the app:
- Frontend: http://localhost:3000
- Backend health check: http://localhost:8000/health
FastAPI also exposes developer API docs at http://localhost:8000/docs when the backend is running.
Try a small public repository first:
https://github.com/pallets/markupsafe
Then try a larger full-stack repository:
https://github.com/fastapi/full-stack-fastapi-template
Ask questions like:
- Where is authentication handled?
- Which files should I read first?
- How does the backend connect to the database?
- What are the main entry points?
| Variable | Required | Description |
|---|---|---|
DATABASE_URL |
Yes | SQLAlchemy database URL used by backend and worker |
REDIS_URL |
Yes | Redis connection URL for RQ jobs |
OPENAI_API_KEY |
Yes for AI features | Your OpenAI API key for embeddings, summaries, chat, and file explanations |
GITHUB_TOKEN |
No | Optional token for higher GitHub rate limits |
REPOSITORY_TMP_DIR |
Yes | Temporary clone directory inside the backend/worker container |
FRONTEND_ORIGIN |
Yes | Allowed frontend origin for backend CORS |
NEXT_PUBLIC_API_URL |
Yes | Browser-visible backend API URL |
| Method | Route | Description |
|---|---|---|
POST |
/repositories |
Submit a public GitHub repo for indexing |
GET |
/repositories |
List recent repositories |
GET |
/repositories/{repo_id} |
Get repository status and metadata |
POST |
/repositories/{repo_id}/reindex |
Re-index an existing repository |
GET |
/repositories/{repo_id}/files |
List indexed files |
GET |
/repositories/{repo_id}/files/{file_id} |
Get one file with content |
GET |
/repositories/{repo_id}/summary |
Get generated architecture summary |
GET |
/repositories/{repo_id}/graph |
Get dependency graph nodes and edges |
POST |
/repositories/{repo_id}/search |
Search relevant code chunks |
POST |
/repositories/{repo_id}/chat |
Ask a repository-aware question |
POST |
/repositories/{repo_id}/explain-file |
Generate an explanation for one file |
Backend tests:
docker compose exec backend pytest -qFrontend typecheck:
docker compose exec frontend npm run typecheckBefore pushing:
- Confirm
.envis not staged. - Confirm
frontend/node_modules/is not staged. - Confirm
frontend/.next/is not staged. - Confirm
frontend/tsconfig.tsbuildinfois not staged. - Confirm
.DS_Storeis not staged. - Commit
.env.exampleso other people know what variables they need.
Useful check:
git status --shortIf you ever accidentally commit an API key, revoke that key immediately in the OpenAI dashboard and create a new one.
- Public GitHub repositories only.
- No user accounts or saved private workspaces.
- No private GitHub repository support.
- Import parsing is regex-based and intentionally lightweight.
- Large repositories can take longer and may use more OpenAI API credits.
- Dependency graph quality depends on language and import style.
- The app is designed for local development, not production deployment yet.
- User authentication and saved workspaces.
- Private GitHub repository support.
- Better dependency parsing with tree-sitter or language servers.
- Streaming chat responses.
- More granular indexing progress per file/chunk.
- Hosted deployment with managed Postgres, Redis, and object storage.
- Shareable repository reports.





