Skip to content

Reruns of example notebooks#194

Open
hlapp wants to merge 2 commits into
mainfrom
nb-rerun
Open

Reruns of example notebooks#194
hlapp wants to merge 2 commits into
mainfrom
nb-rerun

Conversation

@hlapp

@hlapp hlapp commented Apr 22, 2026

Copy link
Copy Markdown
Member

There is one code change here, separated out into 015f7de, and it only changes which device (GPU or CPU) is used, so should be wholly inconsequential for the results.

The changes in the results are, however, considerable, and presumably reflect the current model having changed from BioCLIP1 to BioCLIP2.

Specifically, the baseline predictive accuracy of BioCLIP2 is considerably better, leaving much less room for improvement for the few-shot classifiers trained on top of BioCLIP embeddings.

We should consider changing the dataset to a more challenging one. For example, we could use a Peromyscus dataset (as in the Imageomics Conference Tool Workshop tutorial), or some Carabids.

hlapp added 2 commits April 22, 2026 18:19
The changes, which are considerable, reflect the current model
having changed from BioCLIP1 to BioCLIP2.
@hlapp hlapp requested review from egrace479 and thompsonmj April 22, 2026 22:27
@hlapp

hlapp commented Apr 22, 2026

Copy link
Copy Markdown
Member Author

@egrace479 and @thompsonmj there isn't so much to review here in terms of code review (because there aren't many code changes to speak of). Instead, it's more about reviewing and considering the changes in the results, and what actions (regarding example data etc) to take or not as a result.

@egrace479

Copy link
Copy Markdown
Member

I'm not sure the original source for the bird images (that HF dataset doesn't have any context). The 2 of the 4 iNat images that are research-grade, shouldn't be in TreeOfLife-200M, since they were ID'd in May 2024 (I believe snapshot was beginning of month).

It might make more sense to use examples from the testing data for which we could hopefully show meaningful improvement.

@hlapp

hlapp commented Apr 23, 2026

Copy link
Copy Markdown
Member Author

I should probably tag @vimar-gu here too for good choices of data where BioCLIP2 has some challenges.

@vimar-gu

Copy link
Copy Markdown

What is the size of the data that you are looking for? If it's not a large scale, we can see that BioCLIP 2 cannot tell the zebra species very accurately.

@hlapp

hlapp commented Apr 23, 2026

Copy link
Copy Markdown
Member Author

What is the size of the data that you are looking for?

Not large. Small enough for embedding the images and training the classifier on the embedding features to each complete in under 10 minutes on a CPU (and thus in ~1-3 minutes on a GPU), and large enough for enough samples per class to meaningfully train and test.

@hlapp

hlapp commented Apr 23, 2026

Copy link
Copy Markdown
Member Author

I don't fully remember how large the Peromyscus dataset was that @thompsonmj used for the tutorial, but that seemed to be a good size.

@thompsonmj

thompsonmj commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

I found >10k but filtered down to 1.2k Peromyscus images, ~455M on disk. Can put these somewhere for retrieval into the notebook.

@thompsonmj

Copy link
Copy Markdown
Contributor

If we're looking for challenges, maybe we can do a single HF dataset and retrieve from it in a notebook like:

from datasets import load_dataset

ds = load_dataset("Imagoemics/bioclip-trip-ups", "deer-mice")
ds = load_dataset("Imagoemics/bioclip-trip-ups", "zebras")
# etc.

as configurations.

@egrace479

Copy link
Copy Markdown
Member

If we're looking for challenges, maybe we can do a single HF dataset and retrieve from it in a notebook like:

from datasets import load_dataset

ds = load_dataset("Imagoemics/bioclip-trip-ups", "deer-mice")
ds = load_dataset("Imagoemics/bioclip-trip-ups", "zebras")
# etc.

as configurations.

I like this idea. We should also just remember to specify which data are from the training set, e.g., the mice.

@hlapp

hlapp commented Apr 24, 2026

Copy link
Copy Markdown
Member Author

Yes indeed I like this idea too, and makes it super straightforward for a tutorial.

@thompsonmj

Copy link
Copy Markdown
Contributor

Any better ideas on what to call the dataset repo?

@egrace479

Copy link
Copy Markdown
Member

Any better ideas on what to call the dataset repo?

  1. pybioclip-examples
  2. pybioclip-tutorials
  3. bioclip-example-tuning
  4. bioclip-tuning-tutorials

@hlapp

hlapp commented Apr 25, 2026

Copy link
Copy Markdown
Member Author

My vote would be on (3), but in a permutation: bioclip-tuning-examples.

@thompsonmj

Copy link
Copy Markdown
Contributor

Such datasets could be useful for tutorials beyond BioCLIP stuff though for other tooling. Maybe something like fine-grained-challenges? Something that generalizes to SAEs or Finer-CAM or other tools as well?

@vimar-gu

Copy link
Copy Markdown

I agree. All our tools can be applied haha

@hlapp

hlapp commented May 22, 2026

Copy link
Copy Markdown
Member Author

@egrace479 @thompsonmj I think we need to decide how to move forward here. Specifically, merge these notebook reruns as they are, for the time being resulting in examples that support much less well what we say in the documentation about what they show, or make replacing the dataset part of this change set now.

@hlapp

hlapp commented May 22, 2026

Copy link
Copy Markdown
Member Author

I'm in favor of the latter (replace dataset now), but I'll need either of you to assist with putting up the dataset in the appropriate place.

@egrace479

Copy link
Copy Markdown
Member

I'm in favor of the latter (replace dataset now), but I'll need either of you to assist with putting up the dataset in the appropriate place.

I agree that we should make replacing the dataset part of this change.

If we want to pull from the test data, meta-album could be a good choice (it wouldn't require us to set something up now); Med Leaf is a reasonable size.

Regardless, I can make the dataset repo; did we want fine-grained-challenges, since it can be used more broadly? Or did we want to stick to the BioCLIP focus with bioclip-tuning-examples?

We can add the 1.2k Peromyscus images (clarifying they're part of the training data), @vimar-gu did you have a zebra set?

@vimar-gu

Copy link
Copy Markdown

Dan only sent me fewer than 100 images. But if you were talking about the subset from training data, yes, I was using them to train DINO linear probing.

@thompsonmj

Copy link
Copy Markdown
Contributor

Quick note—I'm trying out Lance (docs) for the cryptic species groups challenge set. I'll add a column for role too that indicates whether it's part of the model training or if it was held out.

To live in https://huggingface.co/datasets/imageomics/fine-grained-challenges, starting with TOL Peromyscus images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants