Type a sentence like "a motorcycle parked at night" or "fire and smoke" (Korean or English) and get the matching images back, ranked. A local, offline-first desktop app that fuses multilingual CLIP vector search + BM25 keyword search, an embedded object-graph DB, a multi-agent pipeline, and RAG summaries β and runs with zero ML dependencies out of the box.
You have thousands of images in nested folders. You remember what's in one β "the night shot with the parked motorcycle" β but not where it is. File names don't help. Folder browsing is hopeless. And cloud photo search won't touch your local, private, or labeled datasets.
LLMImageFinder indexes your images locally and lets you search by describing them. It combines semantic search (what the image means) with keyword search (exact object/label terms), can reason over an object co-occurrence graph ("images with both a person and a motorcycle"), and writes a short grounded summary of the results β all on your machine. No data leaves your computer. With no ML packages installed it still runs end-to-end in a deterministic mock mode (great for trying it instantly); flip a switch in Settings to use real models.
| Feature | What it does | |
|---|---|---|
| π£οΈ | Natural-language search? | Describe a scene in Korean/English β get ranked matching images (multilingual CLIP, no translation step) |
| π | Hybrid retrieval? | Vector (meaning) + BM25 (exact terms) fused by Reciprocal Rank Fusion β switch Vector / Keyword / Hybrid in the header |
| πΈοΈ | Structured object queries? | An embedded GraphDB answers "images containing all of {person, motorcycle}" and "what co-occurs with X" |
| π€ | Agentic search? | A plan β hybrid-search β graph-filter β summarize pipeline, with every step traced in the chat |
| π§ | RAG summaries? | A small LLM refines your query and writes a grounded Korean summary over the retrieved set |
| π·οΈ | Labeled (YOLO) datasets? | Per-image indexing, captions from labels, a class-name editor, and a bounding-box overlay in the viewer |
| πΌοΈ | Fast inspection? | Thumbnail gallery, zoom/pan viewer, ranked β/β navigation, "find similar", CSV/clipboard export |
| π | Zero-setup demo? | Mock-first: the whole app (incl. hybrid/graph/agent) runs deterministically with no ML deps |
| π | Privacy? | 100% local. Nothing is written into your dataset folder; nothing leaves your machine |
Screenshots use the built-in mock backend + a generated sample dataset, so they reproduce with
uv run python scripts/make_screenshots.py.
Requires uv (it auto-installs Python 3.12).
uv sync # base deps: PySide6, chromadb, pillow, numpy, httpx, rank-bm25
uv run imgsearch # launch (or: uv run python -m imgsearch)The app starts in mock mode (a yellow banner says so). Indexing β search β zoom β open-folder all work; results are deterministic. Try the bundled sample data:
uv run imgsearch --make-sample .\sample_datasetβ¦or click μν λ°μ΄ν°μ
μμ± in the toolbar, then type a query like λΆκ³Ό μ°κΈ°κ° μλ μ΄λ―Έμ§ ("images with fire and smoke") in the chat.
uv sync --extra embed # local CLIP retrieval: torch (CUDA) + sentence-transformers
uv sync --extra graph # embedded kΓΉzu GraphDB (optional; memory graph is the default)- Embeddings β Settings β Embedding β backend
jina-clip, modeljinaai/jina-clip-v2, deviceauto. - Captions + chat (RAG + planner) β run a vLLM OpenAI-compatible server (WSL2 / Docker / remote) hosting e.g.
Qwen/Qwen2.5-VL-3B-Instruct, then point Settings β VLM / Chat at its base URL. The app falls back to mock and shows the reason if the endpoint is unreachable. Verify it first:
uv run python scripts/check_vllm.py --base-url http://localhost:8000/v1 --model Qwen/Qwen2.5-VL-3B-Instruct describe a scene (KO/EN)
β
ββββββββββββββββΌββββββββββββββββ βββββββββββββββββββββββββββββββββ
β planner (rule-based / vLLM) β βββΆ β semantic text + required/ β
ββββββββββββββββ¬ββββββββββββββββ β excluded objects β
β βββββββββββββββββββββββββββββββββ
ββββββββββββββββΌββββββββββββββββ
β HYBRID retrieve β vector (jina-clip cosine, ChromaDB)
β RRF( vector , BM25 ) β + keyword (BM25 over captions/labels)
ββββββββββββββββ¬ββββββββββββββββ
ββββββββββββββββΌββββββββββββββββ
β GRAPH filter (object graph) β keep images containing ALL required objects
ββββββββββββββββ¬ββββββββββββββββ
ββββββββββββββββΌββββββββββββββββ
β RAG summarize (vLLM / mock) β grounded answer over the retrieved set
ββββββββββββββββ¬ββββββββββββββββ
ranked gallery + chat trace
Everything above runs deterministically on mock backends with zero ML deps; real models (jina-clip, vLLM, kΓΉzu) slot in behind the same interfaces. Full design rationale and a file map: docs/ARCHITECTURE.md.
| Layer | Choice | Notes |
|---|---|---|
| GUI | PySide6 + qtawesome | dark theme, non-blocking workers, background model load |
| Vector DB | ChromaDB (cosine) | one record per leaf folder or per image |
| Embeddings | jina-clip-v2 | multilingual CLIP, KO text β image directly; mock = deterministic hash vectors |
| Keyword | rank-bm25 | BM25 over captions/labels, Korean-aware tokenizer (pure-python, base dep) |
| Fusion | Reciprocal Rank Fusion | score stays cosine; fused rank for ordering; provenance badge V / K / V+K |
| Graph DB | kΓΉzu (or in-memory) | object co-occurrence; embedded Cypher, serverless; memory backend is the default |
| LLM / VLM | vLLM (OpenAI-compatible) | Qwen2.5-VL for captions, query-refine, RAG summary, and the agent's JSON plan |
uv run --extra dev pytest # 84 tests, mock backends, no models needed
uv run --extra dev --extra graph pytest # + kΓΉzu GraphDB parity tests
uv run python scripts/ui_smoke.py # offscreen GUI smoke- π Architecture & design philosophy (RAG Β· VectorDB Β· Hybrid Β· GraphDB Β· multi-agent Β· vLLM):
docs/ARCHITECTURE.md - π°π· νκ΅μ΄ μμΈ μ¬μ© μ€λͺ
μ:
README.ko.md - π Changelog:
CHANGELOG.md
The source code is heavily commented (in Korean) throughout.
MIT.


