Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
254 changes: 254 additions & 0 deletions docs/tagger-cli-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,254 @@
# Tagger CLI and OpenAI-Compatible API Captioning

This document covers the first command-line dataset tagging path for issue #40.
It does not change the WebUI tagger page and does not integrate the separate
dataset tag editor.

## Modes

### Local TAG mode

Local mode uses the existing WD/CL ONNX taggers in `mikazuki/tagger/` and writes
Danbooru-style TAG captions beside each image:

```powershell
python -m mikazuki.tagger.cli local --path .\input --model wd14-convnextv2-v2
```

```bash
python -m mikazuki.tagger.cli local --path ./input --model wd14-convnextv2-v2
```

Useful options:

- `--threshold 0.35`: general tag threshold.
- `--character-threshold 0.6`: character tag threshold.
- `--recursive`: scan child folders.
- `--additional-tags "masterpiece, best quality"`: always append tags.
- `--exclude-tags "lowres, bad anatomy"`: remove exact tags.
- `--use-cn-mirror`: use `https://hf-mirror.com` if the local model is missing.
- `--hf-endpoint https://...`: use a custom Hugging Face-compatible endpoint.
- `--on-conflict ignore|copy|prepend`: skip existing `.txt`, replace it, or
prepend new tags to existing text.
- `--no-replace-underscore`: keep underscores in tags.
- `--no-escape-tag`: do not escape parentheses and backslashes.

Wrapper scripts are available:

```powershell
.\scripts\cli\tagger.ps1 local --path .\input
```

```bash
bash scripts/cli/tagger.sh local --path ./input
```

The wrappers only set `HF_HOME=huggingface`, prefer the local `venv` Python when
present, and forward all arguments to `python -m mikazuki.tagger.cli`.
The Python service also defaults `HF_HOME` to the project `huggingface/` folder
when the environment variable is not already set, so first-run model downloads do
not go to the user's global Hugging Face cache.
Set `USE_CN_MIRROR=1` before running the wrapper to set
`HF_ENDPOINT=https://hf-mirror.com` when `HF_ENDPOINT` is not already set:

```powershell
$env:USE_CN_MIRROR = "1"
.\scripts\cli\tagger.ps1 local --path .\input
```

```bash
USE_CN_MIRROR=1 bash scripts/cli/tagger.sh local --path ./input
```

Mirror note: the code path honors `HF_ENDPOINT`, but the mirror itself must be
compatible with the installed `huggingface_hub` version and the target model's
large-file hosting. If mirror metadata resolution fails, unset `HF_ENDPOINT` or
prefetch the model into `huggingface/` by another network path.

Local model priority:

1. If `MIKAZUKI_TAGGER_DIR` is set, the loader checks that directory first.
2. Then it checks project-local built-in locations:
- `taggers/<model-key>/`
- `models/taggers/<model-key>/`
- `huggingface/taggers/<model-key>/`
3. If required files are present, they are used directly and no network download
is attempted.
4. If no local files are found, Hugging Face download is attempted through
`HF_ENDPOINT`, `--hf-endpoint`, `--use-cn-mirror`, or direct HF in that order.

For WD taggers, place `model.onnx` and `selected_tags.csv` together, for example:

```text
taggers/
wd14-convnextv2-v2/
model.onnx
selected_tags.csv
```

For `cl_tagger_1_01`, place `model.onnx` and `tag_mapping.json` together:

```text
taggers/
cl_tagger_1_01/
model.onnx
tag_mapping.json
```

### Local NL caption mode

Caption mode uses a local Hugging Face BLIP-compatible caption model and writes
natural-language captions beside each image:

```powershell
python -m mikazuki.tagger.cli caption --path .\input
```

```bash
python -m mikazuki.tagger.cli caption --path ./input
```

Defaults:

- Model: `Salesforce/blip-image-captioning-base`
- Output: natural-language `.txt` captions
- Cache: project `huggingface/` folder unless `HF_HOME` is already set

Useful options:

- `--model Salesforce/blip-image-captioning-base`: Hugging Face model id.
- `--prompt "a photo of"`: optional conditional caption prompt.
- `--device auto|cpu|cuda`: torch device selection.
- `--max-new-tokens 64`: generated caption length cap.
- `--use-cn-mirror` / `--hf-endpoint`: download source for missing caption
model files.
- `--recursive`, `--additional-tags`, `--exclude-tags`, and `--on-conflict`
behave like API NL mode.

First-run model downloads print a stage message and then rely on Hugging Face /
Transformers console progress for file downloads. This satisfies the command-line
progress requirement; WebUI SSE/WebSocket progress remains a later UI task.

### API NL mode

API mode calls an OpenAI-compatible Chat Completions vision endpoint and writes a
natural-language caption beside each image:

```powershell
$env:OPENAI_API_KEY = "sk-..."
python -m mikazuki.tagger.cli api --path .\input --model gpt-4o-mini `
--prompt "Describe this image for LoRA training. Return one concise caption."
```

```bash
export OPENAI_API_KEY="sk-..."
python -m mikazuki.tagger.cli api --path ./input --model gpt-4o-mini \
--prompt "Describe this image for LoRA training. Return one concise caption."
```

Useful options:

- `--endpoint https://api.openai.com/v1`: base endpoint. The CLI posts to
`{endpoint}/chat/completions`.
- `--api-key sk-...`: explicit key. This takes precedence over env lookup.
- `--api-key-env OPENAI_API_KEY`: environment variable used when `--api-key` is
not provided.
- `--timeout 60`: request timeout per image.
- `--retries 2`: retry count per image.
- `--recursive`, `--additional-tags`, `--exclude-tags`, and `--on-conflict`
behave like local mode. For API captions, additional/exclude values are applied
as comma/newline text fragments rather than confidence-scored tags.

API mode sends image bytes to the configured endpoint. Users should confirm
privacy, safety, and billing terms for their provider before running it on a
dataset.

## Output Rules

- Supported images are detected through Pillow's registered image extensions.
- A sidecar caption is written as `image_name.txt` in the same directory as the
image.
- `local` writes Danbooru-style TAG captions; `caption` and `api` write
natural-language captions.
- `--on-conflict ignore` skips images that already have a sidecar `.txt`.
- `--on-conflict copy` replaces existing text.
- `--on-conflict prepend` writes the new caption before the existing text.
- Duplicate comma-separated fragments are removed while preserving first
occurrence order.

## OpenAI-Compatible Request Shape

The CLI sends a `POST` request to `{endpoint}/chat/completions`:

```json
{
"model": "gpt-4o-mini",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image for image model training."
},
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,..."
}
}
]
}
]
}
```

The response parser expects:

```json
{
"choices": [
{
"message": {
"content": "A natural-language caption."
}
}
]
}
```

If the response does not contain `choices[0].message.content`, the CLI fails with
a clear error and does not silently write an empty caption.

## Interface Reference

Public command-line interfaces:

- `python -m mikazuki.tagger.cli local`: local TAG tagging with WD/CL ONNX
interrogators.
- `python -m mikazuki.tagger.cli caption`: local NL captioning with a
BLIP-compatible Hugging Face model.
- `python -m mikazuki.tagger.cli api`: OpenAI-compatible NL captioning through
Chat Completions vision.
- `scripts/cli/tagger.ps1` and `scripts/cli/tagger.sh`: thin wrappers that set
project-local cache defaults and forward arguments.

Programmatic interfaces reserved for WebUI or future tooling reuse:

- `run_local_tagger(...)`: local TAG batch runner.
- `run_caption_tagger(...)`: local NL batch runner.
- `run_api_tagger(...)`: API NL batch runner.
- `OpenAICompatibleCaptionClient`: API client using `/chat/completions`.
- `LocalBlipCaptionClient`: local BLIP-compatible caption client.
- Existing WebUI endpoint `POST /interrogate`: still accepts the current
`TaggerInterrogateRequest` fields and now delegates to `run_local_tagger`.

No dataset tag editor API is introduced in this phase.

## Later Model Candidate

PixAI Tagger v0.9 is a strong future local TAG candidate because its model card
describes a newer Danbooru snapshot through 2025-01 and about 13.5k
Danbooru-style tags. It is intentionally not added in this first CLI pass because
it would introduce new dependency and model-size decisions beyond the existing
WD/CL ONNX path.
25 changes: 8 additions & 17 deletions mikazuki/app/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,7 @@
from mikazuki.app.models import (APIResponse, APIResponseFail,
APIResponseSuccess, TaggerInterrogateRequest)
from mikazuki.log import log
from mikazuki.tagger.interrogator import (available_interrogators,
on_interrogate)
from mikazuki.tagger.service import run_local_tagger
from mikazuki.tasks import tm
from mikazuki.train_log_hub import hub as train_log_hub
from mikazuki.utils import train_utils
Expand Down Expand Up @@ -402,29 +401,21 @@ async def run_script(request: Request, background_tasks: BackgroundTasks):

@router.post("/interrogate")
async def run_interrogate(req: TaggerInterrogateRequest, background_tasks: BackgroundTasks):
interrogator = available_interrogators.get(req.interrogator_model, available_interrogators["wd14-convnextv2-v2"])
background_tasks.add_task(
on_interrogate,
image=None,
batch_input_glob=req.path,
batch_input_recursive=req.batch_input_recursive,
batch_output_dir="",
batch_output_filename_format="[name].[output_extension]",
batch_output_action_on_conflict=req.batch_output_action_on_conflict,
batch_remove_duplicated_tag=True,
batch_output_save_json=False,
interrogator=interrogator,
run_local_tagger,
input_path=req.path,
model=req.interrogator_model,
threshold=req.threshold,
character_threshold=req.character_threshold,
add_rating_tag=req.add_rating_tag,
add_model_tag=req.add_model_tag,
recursive=req.batch_input_recursive,
additional_tags=req.additional_tags,
exclude_tags=req.exclude_tags,
sort_by_alphabetical_order=False,
add_confident_as_weight=False,
on_conflict=req.batch_output_action_on_conflict,
replace_underscore=req.replace_underscore,
replace_underscore_excludes=req.replace_underscore_excludes,
escape_tag=req.escape_tag,
add_rating_tag=req.add_rating_tag,
add_model_tag=req.add_model_tag,
unload_model_after_running=True
)
return APIResponseSuccess()
Expand Down
Loading