Cortex is a hyper-efficient, local, multi-model AI reasoning engine built to run perfectly on consumer hardware (e.g., NVIDIA RTX 3060). It leverages local LLM inference via Ollama to provide advanced cognitive architectures, private data analysis, and seamless real-time interactions.
- Advanced Reasoning: Multi-model self-reflective reasoning with built-in critique and refinement loops.
- Tree of Thought (ToT): Solves complex problems by dynamically exploring multiple reasoning paths.
- RAG (Retrieval-Augmented Generation): Upload files directly to the engine and query against your own documents with instant local context retrieval (powered by PyMuPDF).
- LLM Arena Mode: Pit two different models (e.g., DeepSeek vs. Llama 3) against each other in real-time to compare their outputs.
- Persistent Memory: Tracks session history and enables similarity-based context recall across sessions.
- Live Internet Search: Built-in web search capabilities (
ddgs) allowing the AI to pull live data when required. - Real-time Streaming: Built on WebSockets and Server-Sent Events (SSE) for ultra-low latency token streaming to the beautiful web UI.
- Backend: FastAPI, Python, Uvicorn, WebSockets
- AI Engine: Ollama, DeepSeek-R1, Llama 3
- RAG / Memory: Local embeddings, PyMuPDF for document extraction
- Frontend: HTML5, CSS3, Vanilla JavaScript (Served natively from FastAPI)
Before you start, ensure you have the following:
- Python 3.10+
- Ollama installed and running on your local machine.
- Relevant models downloaded in Ollama. At minimum, we recommend pulling these:
ollama run deepseek-r1:7b ollama run llama3:8b
-
Clone the repository:
git clone https://github.com/subhakantrout/local-ai-engine.git cd local-ai-engine -
Set up a virtual environment (recommended):
python -m venv venv # On Windows: venv\Scripts\activate # On macOS/Linux: source venv/bin/activate
-
Install the dependencies:
pip install -r requirements.txt
- Start the Ollama daemon (if it isn't running in the background already).
- Start the Cortex server:
python run.py
- Open the Web UI: Navigate to http://localhost:8000 in your browser.
Press Ctrl+C in your terminal to safely shut down the server.
├── reasoning_engine/ # Core AI logic (Arena, RAG, Memory, Plugins, ToT)
├── server/ # FastAPI application, routes, and endpoints
├── static/ # Web Interface (HTML, CSS, JS)
├── requirements.txt # Python dependencies
└── run.py # Application entry point
This project is open-source and available under the MIT License.