jacobmarks · Burhan-Q · Mar 12, 2026 · Mar 13, 2026 · Mar 13, 2026 · Mar 13, 2026
diff --git a/.gitignore b/.gitignore
@@ -1 +1,7 @@
-__pycache__
+__pycache__
+*.pt
+.cladue/
+.ruff_cache/
+.venv/
+
+mobileclip2_*.ts
diff --git a/README.md b/README.md
@@ -12,6 +12,7 @@ This plugin allows you to perform zero-shot prediction on your dataset for the f
 Given a list of label classes, which you can input either manually, separated by commas, or by uploading a text file, the plugin will perform zero-shot prediction on your dataset for the specified task and add the results to the dataset under a new field, which you can specify.
 
 ### Updates
+- 🆕 **2025-03-13**: Added [YOLOE](https://docs.ultralytics.com/models/yoloe/) support — text-prompt zero-shot detection/instance segmentation, plus a visual prompting operator (`visual_prompt_detect`) & updated Readme to point to this fork for installations
 - 🆕 **2024-12-03**: Added support for Apple AIMv2 Zero Shot Model (courtesy of [@harpreetsahota204](https://github.com/harpreetsahota204))
 - 🆕 **2024-12-16**: Added MPS and GPU support for ALIGN, AltCLIP, Apple AIMv2 (courtesy of [@harpreetsahota204](https://github.com/harpreetsahota204))
 - **2024-06-22**: Updated interface for Python operator execution
@@ -28,7 +29,8 @@ Given a list of label classes, which you can input either manually, separated by
 
 ### Requirements
 
-- To use YOLO-World models, you must have `"ultalytics>=8.1.42"`.
+- To use YOLO-World models, you must have `ultralytics>=8.1.42`.
+- To use YOLOE models, you must have `ultralytics>=8.3.0`.
 
 ## Models
 
@@ -51,14 +53,17 @@ As a starting point, this plugin comes with at least one zero-shot model per tas
 #### Object Detection
 
 - [YOLO-World](https://docs.ultralytics.com/models/yolo-world/)
+- [YOLOE](https://docs.ultralytics.com/models/yoloe/)
 - [Owl-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)
 - [Grounding DINO](https://huggingface.co/docs/transformers/main/en/model_doc/grounding-dino)
 
 #### Instance Segmentation
 
 - [Owl-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit) + [Segment Anything (SAM)](https://github.com/facebookresearch/segment-anything)
 - [YOLO-World](https://docs.ultralytics.com/models/yolo-world/) + [Segment Anything (SAM)](https://github.com/facebookresearch/segment-anything)
+- [YOLOE](https://docs.ultralytics.com/models/yoloe/) + [Segment Anything (SAM)](https://github.com/facebookresearch/segment-anything)
 - [Grounding DINO](https://huggingface.co/docs/transformers/main/en/model_doc/grounding-dino) + [Segment Anything (SAM)](https://github.com/facebookresearch/segment-anything)
+- [YOLOE Visual Prompting](https://docs.ultralytics.com/models/yoloe/) — native instance segmentation via the `visual_prompt_detect` operator (no SAM required)
 
 #### Semantic Segmentation
 
@@ -129,7 +134,7 @@ CLASSIFICATION_MODELS = {
 ## Installation
 
 ```shell
-fiftyone plugins download https://github.com/jacobmarks/zero-shot-prediction-plugin
+fiftyone plugins download https://github.com/Burhan-Q/fo-zshot
 ```
 
 If you want to use AltCLIP, Align, Owl-ViT, CLIPSeg, or GroupViT, you will also need to install the `transformers` library:
@@ -156,7 +161,7 @@ Or from source:
 pip install git+https://github.com/mlfoundations/open_clip.git
 ```
 
-If you want to use YOLO-World, you will also need to install the `ultralytics` library:
+If you want to use YOLO-World or YOLOE, you will also need to install the `ultralytics` library:
 
 ```shell
 pip install -U ultralytics
@@ -211,6 +216,12 @@ fiftyone delegated cleanup -s COMPLETED
 
 - Perform zero-shot semantic segmentation on your dataset
 
+### `visual_prompt_detect`
+
+- Perform visual-prompt-based detection using YOLOE. Instead of text class names, provide reference bounding boxes on an exemplar image. The model finds similar objects across your dataset.
+- Supports detection-only, instance-segmentation-only, or both output types.
+- Exemplar can be specified via selected sample, sample ID, or saved view.
+
 ## Python SDK
 
 You can also use the compute operators from the Python SDK!
@@ -223,7 +234,7 @@ import fiftyone.zoo as foz
 dataset = fo.load_dataset("quickstart")
 
 ## Access the operator via its URI (plugin name + operator name)
-zsc = foo.get_operator("@jacobmarks/zero_shot_prediction/zero_shot_classify")
+zsc = foo.get_operator("@Burhan-Q/zero_shot_prediction/zero_shot_classify")
 
 ## Run zero-shot classification on all images in the dataset, specifying the labels with the `labels` argument
 zsc(dataset, labels=["cat", "dog", "bird"])
@@ -235,7 +246,7 @@ zsc(dataset, labels_file="/path/to/labels.txt")
 zsc(dataset, labels=["cat", "dog", "bird"], model_name="CLIP", label_field="predictions")
 
 ## Run zero-shot detection on a view
-zsd = foo.get_operator("@jacobmarks/zero_shot_prediction/zero_shot_detect")
+zsd = foo.get_operator("@Burhan-Q/zero_shot_prediction/zero_shot_detect")
 view = dataset.take(10)
 await zsd(
     view,
@@ -249,14 +260,56 @@ All four of the task-specific zero-shot prediction operators also expose a `list
 
 ```python
 zsss = foo.get_operator(
-    "@jacobmarks/zero_shot_prediction/zero_shot_semantic_segment"
+    "@Burhan-Q/zero_shot_prediction/zero_shot_semantic_segment"
 )
 
 zsss.list_models()
 
 ## ['CLIPSeg', 'GroupViT']
 ```
 
+### Visual Prompting
+
+The `visual_prompt_detect` operator uses reference bounding boxes instead of text labels. In the SDK, specify the exemplar via `exemplar_sample_id` or `exemplar_saved_view` (exactly one required; "Selected sample" is UI-only).
+
+```python
+vpd = foo.get_operator("@Burhan-Q/zero_shot_prediction/visual_prompt_detect")
+
+vpd(dataset, exemplar_sample_id="<sample_id>", prompt_field="ground_truth")
+```
+
+<details>
+<summary>Example: detection with model and confidence options</summary>
+
+```python
+vpd(
+    dataset,
+    exemplar_sample_id="<sample_id>",
+    prompt_field="ground_truth",
+    output_type="detection",
+    label_field="vp_predictions",
+    model_name="yoloe-11l-seg",
+    confidence=0.3,
+)
+```
+</details>
+
+<details>
+<summary>Example: saved view exemplar with both detection and instance segmentation output</summary>
+
+```python
+vpd(
+    dataset,
+    exemplar_saved_view="my_exemplar",
+    prompt_field="ground_truth",
+    inference_saved_view="my_target_view",
+    output_type="both",
+    label_field="vp_detections",
+    seg_label_field="vp_instances",
+)
+```
+</details>
+
 **Note**: The `zero_shot_predict` operator is not yet supported in the Python SDK.
 
 **Note**: With earlier versions of FiftyOne, you may have trouble running these