dvscripts/embed.qmd at main · distant-viewing/dvscripts · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# 2.7 Embedding {.unnumbered}

This is a minimal example script showing how to do an image
embedding using a set of images that are stored on
the same machine where we are running the models. To start,
we will load in a few modules that will be needed for the
task.

```{python}
from os import listdir
from os.path import splitext, join
from transformers import ViTImageProcessor, ViTModel
from PIL import Image

import torch
import polars as pl
import numpy as np
```


Next, we load the model that we are interested in using.
There are a large number of image embedding algorithms
on HuggingFace; most can be used exactly the same way by simply
changing the name of the model in the function calls below.

```{python}
image_processor = ViTImageProcessor.from_pretrained(
  'google/vit-base-patch16-224-in21k'
)
model = ViTModel.from_pretrained(
  'google/vit-base-patch16-224-in21k'
)
```

With the models loaded, the next step is to build a set of
paths to the images that we are interested in using. Here,
we will create a list of all of the image files in the
directory containing the FSA-OWI images. For this model, it
will be instructive to work with the entire set of images
in the directory.

```{python}
collection = 'fsaowi'

paths = sorted(listdir(join('img', collection)))
paths = [x for x in paths if splitext(x)[1] == '.jpg']
```


Now, we will load each of the images and run the model over
it.

```{python}
output = {}
for path in paths:
    image = Image.open(join('img', collection, path))
    image = image.convert('RGB')
    input = image_processor(image, return_tensors="pt")
    outputs = model(**input)
    pooler_output = outputs.pooler_output
    output[path] = pooler_output[0].detach().numpy()
    output[path] = output[path] / np.linalg.norm(output[path], 2)
```

Then, for any reference image, we can compute the similarity scores
relative to all other images. Here we have an example looking at the
score relative to the first image.

```{python}
ref_path = paths[0]
similarity = []
for path in paths:
    similarity += [np.dot(output[ref_path], output[path])]
```

Then, we can see which images are a closest match (the first image
will be the one that we picked out, since any image will always be
closest to itself).

```{python}
similarity = np.array(similarity)
similarity_rank = np.argsort(similarity * -1)
for idx in similarity_rank[:5]:
    path = paths[idx]
    image = Image.open(join('img', collection, path))
    display(image)
```