Reruns of example notebooks by hlapp · Pull Request #194 · Imageomics/pybioclip

hlapp · 2026-04-22T22:27:22Z

There is one code change here, separated out into 015f7de, and it only changes which device (GPU or CPU) is used, so should be wholly inconsequential for the results.

The changes in the results are, however, considerable, and presumably reflect the current model having changed from BioCLIP1 to BioCLIP2.

Specifically, the baseline predictive accuracy of BioCLIP2 is considerably better, leaving much less room for improvement for the few-shot classifiers trained on top of BioCLIP embeddings.

We should consider changing the dataset to a more challenging one. For example, we could use a Peromyscus dataset (as in the Imageomics Conference Tool Workshop tutorial), or some Carabids.

The changes, which are considerable, reflect the current model having changed from BioCLIP1 to BioCLIP2.

hlapp · 2026-04-22T22:29:41Z

@egrace479 and @thompsonmj there isn't so much to review here in terms of code review (because there aren't many code changes to speak of). Instead, it's more about reviewing and considering the changes in the results, and what actions (regarding example data etc) to take or not as a result.

egrace479 · 2026-04-22T22:45:25Z

I'm not sure the original source for the bird images (that HF dataset doesn't have any context). The 2 of the 4 iNat images that are research-grade, shouldn't be in TreeOfLife-200M, since they were ID'd in May 2024 (I believe snapshot was beginning of month).

It might make more sense to use examples from the testing data for which we could hopefully show meaningful improvement.

hlapp · 2026-04-23T01:53:53Z

I should probably tag @vimar-gu here too for good choices of data where BioCLIP2 has some challenges.

vimar-gu · 2026-04-23T01:57:11Z

What is the size of the data that you are looking for? If it's not a large scale, we can see that BioCLIP 2 cannot tell the zebra species very accurately.

hlapp · 2026-04-23T02:01:22Z

What is the size of the data that you are looking for?

Not large. Small enough for embedding the images and training the classifier on the embedding features to each complete in under 10 minutes on a CPU (and thus in ~1-3 minutes on a GPU), and large enough for enough samples per class to meaningfully train and test.

hlapp · 2026-04-23T02:02:20Z

I don't fully remember how large the Peromyscus dataset was that @thompsonmj used for the tutorial, but that seemed to be a good size.

thompsonmj · 2026-04-24T21:10:04Z

I found >10k but filtered down to 1.2k Peromyscus images, ~455M on disk. Can put these somewhere for retrieval into the notebook.

thompsonmj · 2026-04-24T21:17:49Z

If we're looking for challenges, maybe we can do a single HF dataset and retrieve from it in a notebook like:

from datasets import load_dataset

ds = load_dataset("Imagoemics/bioclip-trip-ups", "deer-mice")
ds = load_dataset("Imagoemics/bioclip-trip-ups", "zebras")
# etc.

as configurations.

egrace479 · 2026-04-24T21:25:52Z

If we're looking for challenges, maybe we can do a single HF dataset and retrieve from it in a notebook like:
from datasets import load_dataset

ds = load_dataset("Imagoemics/bioclip-trip-ups", "deer-mice")
ds = load_dataset("Imagoemics/bioclip-trip-ups", "zebras")
# etc.
as configurations.

I like this idea. We should also just remember to specify which data are from the training set, e.g., the mice.

hlapp · 2026-04-24T23:47:37Z

Yes indeed I like this idea too, and makes it super straightforward for a tutorial.

thompsonmj · 2026-04-25T00:24:14Z

Any better ideas on what to call the dataset repo?

egrace479 · 2026-04-25T00:32:28Z

Any better ideas on what to call the dataset repo?

pybioclip-examples
pybioclip-tutorials
bioclip-example-tuning
bioclip-tuning-tutorials

hlapp · 2026-04-25T00:51:02Z

My vote would be on (3), but in a permutation: bioclip-tuning-examples.

thompsonmj · 2026-04-28T01:28:23Z

Such datasets could be useful for tutorials beyond BioCLIP stuff though for other tooling. Maybe something like fine-grained-challenges? Something that generalizes to SAEs or Finer-CAM or other tools as well?

vimar-gu · 2026-04-28T01:35:25Z

I agree. All our tools can be applied haha

hlapp · 2026-05-22T15:59:52Z

@egrace479 @thompsonmj I think we need to decide how to move forward here. Specifically, merge these notebook reruns as they are, for the time being resulting in examples that support much less well what we say in the documentation about what they show, or make replacing the dataset part of this change set now.

hlapp · 2026-05-22T16:01:10Z

I'm in favor of the latter (replace dataset now), but I'll need either of you to assist with putting up the dataset in the appropriate place.

egrace479 · 2026-05-22T16:49:45Z

I'm in favor of the latter (replace dataset now), but I'll need either of you to assist with putting up the dataset in the appropriate place.

I agree that we should make replacing the dataset part of this change.

If we want to pull from the test data, meta-album could be a good choice (it wouldn't require us to set something up now); Med Leaf is a reasonable size.

Regardless, I can make the dataset repo; did we want fine-grained-challenges, since it can be used more broadly? Or did we want to stick to the BioCLIP focus with bioclip-tuning-examples?

We can add the 1.2k Peromyscus images (clarifying they're part of the training data), @vimar-gu did you have a zebra set?

vimar-gu · 2026-05-22T17:23:46Z

Dan only sent me fewer than 100 images. But if you were talking about the subset from training data, yes, I was using them to train DINO linear probing.

thompsonmj · 2026-06-11T13:29:22Z

Quick note—I'm trying out Lance (docs) for the cryptic species groups challenge set. I'll add a column for role too that indicates whether it's part of the model training or if it was held out.

To live in https://huggingface.co/datasets/imageomics/fine-grained-challenges, starting with TOL Peromyscus images.

hlapp added 2 commits April 22, 2026 18:19

Auto-determines GPU or CPU for iNaturalist notebook

015f7de

Reruns of example notebooks

5616db4

The changes, which are considerable, reflect the current model having changed from BioCLIP1 to BioCLIP2.

hlapp requested review from egrace479 and thompsonmj April 22, 2026 22:27

Conversation

hlapp commented Apr 22, 2026

Uh oh!

hlapp commented Apr 22, 2026

Uh oh!

egrace479 commented Apr 22, 2026

Uh oh!

hlapp commented Apr 23, 2026

Uh oh!

vimar-gu commented Apr 23, 2026

Uh oh!

hlapp commented Apr 23, 2026

Uh oh!

hlapp commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thompsonmj commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thompsonmj commented Apr 24, 2026

Uh oh!

egrace479 commented Apr 24, 2026

Uh oh!

hlapp commented Apr 24, 2026

Uh oh!

thompsonmj commented Apr 25, 2026

Uh oh!

egrace479 commented Apr 25, 2026

Uh oh!

hlapp commented Apr 25, 2026

Uh oh!

thompsonmj commented Apr 28, 2026

Uh oh!

vimar-gu commented Apr 28, 2026

Uh oh!

hlapp commented May 22, 2026

Uh oh!

hlapp commented May 22, 2026

Uh oh!

egrace479 commented May 22, 2026

Uh oh!

vimar-gu commented May 22, 2026

Uh oh!

thompsonmj commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hlapp commented Apr 23, 2026 •

edited

Loading

thompsonmj commented Apr 24, 2026 •

edited

Loading