Skip to content

surrealier/LLMImageFinder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”Ž LLMImageFinder

Find an image in your folder β€” just describe it in plain language.

Type a sentence like "a motorcycle parked at night" or "fire and smoke" (Korean or English) and get the matching images back, ranked. A local, offline-first desktop app that fuses multilingual CLIP vector search + BM25 keyword search, an embedded object-graph DB, a multi-agent pipeline, and RAG summaries β€” and runs with zero ML dependencies out of the box.

Python 3.12 PySide6 ChromaDB jina-clip-v2 GraphDB vLLM Tests Version Platform License: MIT

LLMImageFinder β€” natural-language search over an image dataset

😩 The problem

You have thousands of images in nested folders. You remember what's in one β€” "the night shot with the parked motorcycle" β€” but not where it is. File names don't help. Folder browsing is hopeless. And cloud photo search won't touch your local, private, or labeled datasets.

✨ The fix

LLMImageFinder indexes your images locally and lets you search by describing them. It combines semantic search (what the image means) with keyword search (exact object/label terms), can reason over an object co-occurrence graph ("images with both a person and a motorcycle"), and writes a short grounded summary of the results β€” all on your machine. No data leaves your computer. With no ML packages installed it still runs end-to-end in a deterministic mock mode (great for trying it instantly); flip a switch in Settings to use real models.


🧭 Do you need…?

Feature What it does
πŸ—£οΈ Natural-language search? Describe a scene in Korean/English β†’ get ranked matching images (multilingual CLIP, no translation step)
πŸ”€ Hybrid retrieval? Vector (meaning) + BM25 (exact terms) fused by Reciprocal Rank Fusion β€” switch Vector / Keyword / Hybrid in the header
πŸ•ΈοΈ Structured object queries? An embedded GraphDB answers "images containing all of {person, motorcycle}" and "what co-occurs with X"
πŸ€– Agentic search? A plan β†’ hybrid-search β†’ graph-filter β†’ summarize pipeline, with every step traced in the chat
🧠 RAG summaries? A small LLM refines your query and writes a grounded Korean summary over the retrieved set
🏷️ Labeled (YOLO) datasets? Per-image indexing, captions from labels, a class-name editor, and a bounding-box overlay in the viewer
πŸ–ΌοΈ Fast inspection? Thumbnail gallery, zoom/pan viewer, ranked ←/β†’ navigation, "find similar", CSV/clipboard export
πŸ”Œ Zero-setup demo? Mock-first: the whole app (incl. hybrid/graph/agent) runs deterministically with no ML deps
πŸ”’ Privacy? 100% local. Nothing is written into your dataset folder; nothing leaves your machine

πŸ–ΌοΈ Screenshots

Agentic search β€” plan, hybrid search, graph filter, summarize, traced live in chat Object graph β€” class counts and co-occurrence, with an AND filter into the gallery Image viewer with YOLO bounding-box overlay and ranked-result metadata Hybrid search results with provenance badges (V / K / V+K) and score badges

Screenshots use the built-in mock backend + a generated sample dataset, so they reproduce with uv run python scripts/make_screenshots.py.


πŸš€ Quick start (mock mode β€” no models needed)

Requires uv (it auto-installs Python 3.12).

uv sync            # base deps: PySide6, chromadb, pillow, numpy, httpx, rank-bm25
uv run imgsearch   # launch (or: uv run python -m imgsearch)

The app starts in mock mode (a yellow banner says so). Indexing β†’ search β†’ zoom β†’ open-folder all work; results are deterministic. Try the bundled sample data:

uv run imgsearch --make-sample .\sample_dataset

…or click μƒ˜ν”Œ 데이터셋 생성 in the toolbar, then type a query like 뢈과 μ—°κΈ°κ°€ μžˆλŠ” 이미지 ("images with fire and smoke") in the chat.

🧠 Turn on real models

uv sync --extra embed   # local CLIP retrieval: torch (CUDA) + sentence-transformers
uv sync --extra graph   # embedded kΓΉzu GraphDB (optional; memory graph is the default)
  • Embeddings β€” Settings β†’ Embedding β†’ backend jina-clip, model jinaai/jina-clip-v2, device auto.
  • Captions + chat (RAG + planner) β€” run a vLLM OpenAI-compatible server (WSL2 / Docker / remote) hosting e.g. Qwen/Qwen2.5-VL-3B-Instruct, then point Settings β†’ VLM / Chat at its base URL. The app falls back to mock and shows the reason if the endpoint is unreachable. Verify it first:
uv run python scripts/check_vllm.py --base-url http://localhost:8000/v1 --model Qwen/Qwen2.5-VL-3B-Instruct

🧩 How it works

        describe a scene (KO/EN)
                  β”‚
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚ planner (rule-based / vLLM)  β”‚ ──▢ β”‚ semantic text + required/      β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚ excluded objects               β”‚
                  β”‚                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚ HYBRID retrieve              β”‚  vector (jina-clip cosine, ChromaDB)
   β”‚   RRF( vector , BM25 )        β”‚  +  keyword (BM25 over captions/labels)
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚ GRAPH filter (object graph)  β”‚  keep images containing ALL required objects
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚ RAG summarize (vLLM / mock)  β”‚  grounded answer over the retrieved set
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        ranked gallery + chat trace

Everything above runs deterministically on mock backends with zero ML deps; real models (jina-clip, vLLM, kΓΉzu) slot in behind the same interfaces. Full design rationale and a file map: docs/ARCHITECTURE.md.

πŸ› οΈ Tech stack

Layer Choice Notes
GUI PySide6 + qtawesome dark theme, non-blocking workers, background model load
Vector DB ChromaDB (cosine) one record per leaf folder or per image
Embeddings jina-clip-v2 multilingual CLIP, KO text β†’ image directly; mock = deterministic hash vectors
Keyword rank-bm25 BM25 over captions/labels, Korean-aware tokenizer (pure-python, base dep)
Fusion Reciprocal Rank Fusion score stays cosine; fused rank for ordering; provenance badge V / K / V+K
Graph DB kΓΉzu (or in-memory) object co-occurrence; embedded Cypher, serverless; memory backend is the default
LLM / VLM vLLM (OpenAI-compatible) Qwen2.5-VL for captions, query-refine, RAG summary, and the agent's JSON plan

πŸ§ͺ Tests

uv run --extra dev pytest                 # 84 tests, mock backends, no models needed
uv run --extra dev --extra graph pytest   # + kΓΉzu GraphDB parity tests
uv run python scripts/ui_smoke.py         # offscreen GUI smoke

πŸ“š Docs

  • πŸ“ Architecture & design philosophy (RAG Β· VectorDB Β· Hybrid Β· GraphDB Β· multi-agent Β· vLLM): docs/ARCHITECTURE.md
  • πŸ‡°πŸ‡· ν•œκ΅­μ–΄ 상세 μ‚¬μš© μ„€λͺ…μ„œ: README.ko.md
  • πŸ“ Changelog: CHANGELOG.md

The source code is heavily commented (in Korean) throughout.

πŸ“„ License

MIT.

Local-first Β· privacy-respecting Β· mock-first. Built for searching real, labeled image datasets by describing them.

About

Find an Image in Your Folder with Text!

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages