21ICCV # Emerging Properties in Self-Supervised Vision Transformers (DINO)

[Paper](https://arxiv.org/abs/2104.14294)  
[Code](https://github.com/facebookresearch/dino)  

Authors:  
Mathilde Caron, Hugo Touvron, etc.  
FBAI. 

![](https://raw.githubusercontent.com/facebookresearch/dino/main/.github/dino.gif)   

**Highlights:**   
- A new proposed self-supervised learning method with KD: a form of knowledge **di**stillation with **no** labels. Especially, it uses a different way to avoid the collapse solution, that is use the momentum teacher encoder.
- It encouraging "local-to-global" correspondences by feeding different sizes of views to student and teacher encoders.  
- SSL ViT features explicitly contain the scene layout and, in particular, object boundaries, as shown in the next figure.  

![](https://raw.githubusercontent.com/facebookresearch/dino/main/.github/attention_maps.png)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

21ICCV # Emerging Properties in Self-Supervised Vision Transformers (DINO) #44

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

21ICCV # Emerging Properties in Self-Supervised Vision Transformers (DINO) #44

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions