[20230413] Weekly VLM2 - Flamingo

**Paper**
[Flamingo: a Visual Language Model for Few-Shot Learning](https://arxiv.org/abs/2204.14198) (a.k.a. Flamingo)

**Speaker**
@SoongE

**Summary**
![CleanShot 2023-04-13 at 16 31 25](https://user-images.githubusercontent.com/53206234/232372502-52af3275-ba41-43dc-975a-e7c275125dc2.png)

### Key Point

- Powerful connection between pre-trained Vision and Language
- Using visual texture data
- Any input using Preceiver model
- Well implemented on several tasks

### Methods

- Freezing Vision and Language model

  - Vision Encoder:
    - Train on contrastive learning  using BERT
    - Train with ALIGN + LTIP by accumulation methods
  - Fine-tuning or scratch instead of freezing resultes in a very large performance drop. They attribute this to catastrophic forgetting that occurs as the learning objective is refreshed.

- Peceiver Resampler
![CleanShot 2023-04-13 at 17 40 50](https://user-images.githubusercontent.com/53206234/232372562-55f975cd-739a-46fc-bb1a-059d3cb55264.png)
  - Return fixed output shape of vision input
  - Fixed shape of latent query
  - 실험적으로 기존 attention보다 좋다

- Gated Cross-Attention
![CleanShot 2023-04-13 at 17 37 23](https://user-images.githubusercontent.com/53206234/232372543-48f3040c-c59c-407b-81c7-3c1a195b65e2.png)

  - Tanh gate: Long short-term memory(LSTM)
    - normalization 효과

- Train on mixture of datasets

  - Dataset의 양과 quality에 따라 weight를 다르게줬다. (M3W, ALIGN, LTIP and VTP with weights 𝜆𝑚 of 1.0, 0.2, 0.2 and 0.03 respectively.)
  - M3W: interleaved image-text
    - 43M HTML dataset
  - ALIGN and LTIP: image-text pair
    - ALIGN: large and low quality
    - LTIP: small and high quality
  - VTP: video-text pair
    - 27M with short video about 22sec
### strengths and weaknesses

- Strengths
  - 많은 downstream task에서 좋은 성능을 보임
- Weaknesses
  - LM의 side effect를 모두 가져온다.
  - Classification은 CLIP보다 좋지 않다.
  - Few-shot이 아닐 경우에는 각자의 모델이 더 좋은 성능을 낼 수 있다.
  - 학습에 사용한 dataset이 매우 크고, 모델 자체의 사이즈가 매우 커서 공정한 비교가 힘들다.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[20230413] Weekly VLM2 - Flamingo #4

Key Point

Methods

strengths and weaknesses

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[20230413] Weekly VLM2 - Flamingo #4

Description

Key Point

Methods

strengths and weaknesses

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions