Skip to content

subhakantrout/local-ai-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cortex Reasoning Engine

Version Python FastAPI Ollama

Cortex is a hyper-efficient, local, multi-model AI reasoning engine built to run perfectly on consumer hardware (e.g., NVIDIA RTX 3060). It leverages local LLM inference via Ollama to provide advanced cognitive architectures, private data analysis, and seamless real-time interactions.

🚀 Features

  • Advanced Reasoning: Multi-model self-reflective reasoning with built-in critique and refinement loops.
  • Tree of Thought (ToT): Solves complex problems by dynamically exploring multiple reasoning paths.
  • RAG (Retrieval-Augmented Generation): Upload files directly to the engine and query against your own documents with instant local context retrieval (powered by PyMuPDF).
  • LLM Arena Mode: Pit two different models (e.g., DeepSeek vs. Llama 3) against each other in real-time to compare their outputs.
  • Persistent Memory: Tracks session history and enables similarity-based context recall across sessions.
  • Live Internet Search: Built-in web search capabilities (ddgs) allowing the AI to pull live data when required.
  • Real-time Streaming: Built on WebSockets and Server-Sent Events (SSE) for ultra-low latency token streaming to the beautiful web UI.

🛠 Tech Stack

  • Backend: FastAPI, Python, Uvicorn, WebSockets
  • AI Engine: Ollama, DeepSeek-R1, Llama 3
  • RAG / Memory: Local embeddings, PyMuPDF for document extraction
  • Frontend: HTML5, CSS3, Vanilla JavaScript (Served natively from FastAPI)

📋 Prerequisites

Before you start, ensure you have the following:

  1. Python 3.10+
  2. Ollama installed and running on your local machine.
  3. Relevant models downloaded in Ollama. At minimum, we recommend pulling these:
    ollama run deepseek-r1:7b
    ollama run llama3:8b

⚙️ Installation

  1. Clone the repository:

    git clone https://github.com/subhakantrout/local-ai-engine.git
    cd local-ai-engine
  2. Set up a virtual environment (recommended):

    python -m venv venv
    # On Windows:
    venv\Scripts\activate
    # On macOS/Linux:
    source venv/bin/activate
  3. Install the dependencies:

    pip install -r requirements.txt

🏃‍♂️ Usage

  1. Start the Ollama daemon (if it isn't running in the background already).
  2. Start the Cortex server:
    python run.py
  3. Open the Web UI: Navigate to http://localhost:8000 in your browser.

Press Ctrl+C in your terminal to safely shut down the server.

📂 Project Structure

├── reasoning_engine/     # Core AI logic (Arena, RAG, Memory, Plugins, ToT)
├── server/               # FastAPI application, routes, and endpoints
├── static/               # Web Interface (HTML, CSS, JS)
├── requirements.txt      # Python dependencies
└── run.py                # Application entry point

📜 License

This project is open-source and available under the MIT License.

About

Cortex is a hyper-efficient, local, multi-model AI reasoning engine with support for RAG, Tree of Thought, Arena mode, and persistent memory.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors