-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathembed.qmd
More file actions
89 lines (71 loc) · 2.42 KB
/
embed.qmd
File metadata and controls
89 lines (71 loc) · 2.42 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# 2.7 Embedding {.unnumbered}
This is a minimal example script showing how to do an image
embedding using a set of images that are stored on
the same machine where we are running the models. To start,
we will load in a few modules that will be needed for the
task.
```{python}
from os import listdir
from os.path import splitext, join
from transformers import ViTImageProcessor, ViTModel
from PIL import Image
import torch
import polars as pl
import numpy as np
```
Next, we load the model that we are interested in using.
There are a large number of image embedding algorithms
on HuggingFace; most can be used exactly the same way by simply
changing the name of the model in the function calls below.
```{python}
image_processor = ViTImageProcessor.from_pretrained(
'google/vit-base-patch16-224-in21k'
)
model = ViTModel.from_pretrained(
'google/vit-base-patch16-224-in21k'
)
```
With the models loaded, the next step is to build a set of
paths to the images that we are interested in using. Here,
we will create a list of all of the image files in the
directory containing the FSA-OWI images. For this model, it
will be instructive to work with the entire set of images
in the directory.
```{python}
collection = 'fsaowi'
paths = sorted(listdir(join('img', collection)))
paths = [x for x in paths if splitext(x)[1] == '.jpg']
```
Now, we will load each of the images and run the model over
it.
```{python}
output = {}
for path in paths:
image = Image.open(join('img', collection, path))
image = image.convert('RGB')
input = image_processor(image, return_tensors="pt")
outputs = model(**input)
pooler_output = outputs.pooler_output
output[path] = pooler_output[0].detach().numpy()
output[path] = output[path] / np.linalg.norm(output[path], 2)
```
Then, for any reference image, we can compute the similarity scores
relative to all other images. Here we have an example looking at the
score relative to the first image.
```{python}
ref_path = paths[0]
similarity = []
for path in paths:
similarity += [np.dot(output[ref_path], output[path])]
```
Then, we can see which images are a closest match (the first image
will be the one that we picked out, since any image will always be
closest to itself).
```{python}
similarity = np.array(similarity)
similarity_rank = np.argsort(similarity * -1)
for idx in similarity_rank[:5]:
path = paths[idx]
image = Image.open(join('img', collection, path))
display(image)
```