Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,7 @@
__pycache__
__pycache__
*.pt
.cladue/
.ruff_cache/
.venv/

mobileclip2_*.ts
65 changes: 59 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ This plugin allows you to perform zero-shot prediction on your dataset for the f
Given a list of label classes, which you can input either manually, separated by commas, or by uploading a text file, the plugin will perform zero-shot prediction on your dataset for the specified task and add the results to the dataset under a new field, which you can specify.

### Updates
- 🆕 **2025-03-13**: Added [YOLOE](https://docs.ultralytics.com/models/yoloe/) support — text-prompt zero-shot detection/instance segmentation, plus a visual prompting operator (`visual_prompt_detect`) & updated Readme to point to this fork for installations
- 🆕 **2024-12-03**: Added support for Apple AIMv2 Zero Shot Model (courtesy of [@harpreetsahota204](https://github.com/harpreetsahota204))
- 🆕 **2024-12-16**: Added MPS and GPU support for ALIGN, AltCLIP, Apple AIMv2 (courtesy of [@harpreetsahota204](https://github.com/harpreetsahota204))
- **2024-06-22**: Updated interface for Python operator execution
Expand All @@ -28,7 +29,8 @@ Given a list of label classes, which you can input either manually, separated by

### Requirements

- To use YOLO-World models, you must have `"ultalytics>=8.1.42"`.
- To use YOLO-World models, you must have `ultralytics>=8.1.42`.
- To use YOLOE models, you must have `ultralytics>=8.3.0`.

## Models

Expand All @@ -51,14 +53,17 @@ As a starting point, this plugin comes with at least one zero-shot model per tas
#### Object Detection

- [YOLO-World](https://docs.ultralytics.com/models/yolo-world/)
- [YOLOE](https://docs.ultralytics.com/models/yoloe/)
- [Owl-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)
- [Grounding DINO](https://huggingface.co/docs/transformers/main/en/model_doc/grounding-dino)

#### Instance Segmentation

- [Owl-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit) + [Segment Anything (SAM)](https://github.com/facebookresearch/segment-anything)
- [YOLO-World](https://docs.ultralytics.com/models/yolo-world/) + [Segment Anything (SAM)](https://github.com/facebookresearch/segment-anything)
- [YOLOE](https://docs.ultralytics.com/models/yoloe/) + [Segment Anything (SAM)](https://github.com/facebookresearch/segment-anything)
- [Grounding DINO](https://huggingface.co/docs/transformers/main/en/model_doc/grounding-dino) + [Segment Anything (SAM)](https://github.com/facebookresearch/segment-anything)
- [YOLOE Visual Prompting](https://docs.ultralytics.com/models/yoloe/) — native instance segmentation via the `visual_prompt_detect` operator (no SAM required)

#### Semantic Segmentation

Expand Down Expand Up @@ -129,7 +134,7 @@ CLASSIFICATION_MODELS = {
## Installation

```shell
fiftyone plugins download https://github.com/jacobmarks/zero-shot-prediction-plugin
fiftyone plugins download https://github.com/Burhan-Q/fo-zshot
```

If you want to use AltCLIP, Align, Owl-ViT, CLIPSeg, or GroupViT, you will also need to install the `transformers` library:
Expand All @@ -156,7 +161,7 @@ Or from source:
pip install git+https://github.com/mlfoundations/open_clip.git
```

If you want to use YOLO-World, you will also need to install the `ultralytics` library:
If you want to use YOLO-World or YOLOE, you will also need to install the `ultralytics` library:

```shell
pip install -U ultralytics
Expand Down Expand Up @@ -211,6 +216,12 @@ fiftyone delegated cleanup -s COMPLETED

- Perform zero-shot semantic segmentation on your dataset

### `visual_prompt_detect`

- Perform visual-prompt-based detection using YOLOE. Instead of text class names, provide reference bounding boxes on an exemplar image. The model finds similar objects across your dataset.
- Supports detection-only, instance-segmentation-only, or both output types.
- Exemplar can be specified via selected sample, sample ID, or saved view.

## Python SDK

You can also use the compute operators from the Python SDK!
Expand All @@ -223,7 +234,7 @@ import fiftyone.zoo as foz
dataset = fo.load_dataset("quickstart")

## Access the operator via its URI (plugin name + operator name)
zsc = foo.get_operator("@jacobmarks/zero_shot_prediction/zero_shot_classify")
zsc = foo.get_operator("@Burhan-Q/zero_shot_prediction/zero_shot_classify")

## Run zero-shot classification on all images in the dataset, specifying the labels with the `labels` argument
zsc(dataset, labels=["cat", "dog", "bird"])
Expand All @@ -235,7 +246,7 @@ zsc(dataset, labels_file="/path/to/labels.txt")
zsc(dataset, labels=["cat", "dog", "bird"], model_name="CLIP", label_field="predictions")

## Run zero-shot detection on a view
zsd = foo.get_operator("@jacobmarks/zero_shot_prediction/zero_shot_detect")
zsd = foo.get_operator("@Burhan-Q/zero_shot_prediction/zero_shot_detect")
view = dataset.take(10)
await zsd(
view,
Expand All @@ -249,14 +260,56 @@ All four of the task-specific zero-shot prediction operators also expose a `list

```python
zsss = foo.get_operator(
"@jacobmarks/zero_shot_prediction/zero_shot_semantic_segment"
"@Burhan-Q/zero_shot_prediction/zero_shot_semantic_segment"
)

zsss.list_models()

## ['CLIPSeg', 'GroupViT']
```

### Visual Prompting

The `visual_prompt_detect` operator uses reference bounding boxes instead of text labels. In the SDK, specify the exemplar via `exemplar_sample_id` or `exemplar_saved_view` (exactly one required; "Selected sample" is UI-only).

```python
vpd = foo.get_operator("@Burhan-Q/zero_shot_prediction/visual_prompt_detect")

vpd(dataset, exemplar_sample_id="<sample_id>", prompt_field="ground_truth")
```

<details>
<summary>Example: detection with model and confidence options</summary>

```python
vpd(
dataset,
exemplar_sample_id="<sample_id>",
prompt_field="ground_truth",
output_type="detection",
label_field="vp_predictions",
model_name="yoloe-11l-seg",
confidence=0.3,
)
```
</details>

<details>
<summary>Example: saved view exemplar with both detection and instance segmentation output</summary>

```python
vpd(
dataset,
exemplar_saved_view="my_exemplar",
prompt_field="ground_truth",
inference_saved_view="my_target_view",
output_type="both",
label_field="vp_detections",
seg_label_field="vp_instances",
)
```
</details>

**Note**: The `zero_shot_predict` operator is not yet supported in the Python SDK.

**Note**: With earlier versions of FiftyOne, you may have trouble running these
Expand Down
Loading