[A] Add 24.06 MAR paper by nbswords · Pull Request #19 · DocsaidLab/website

nbswords · 2025-05-17T06:25:39Z

No description provided.

Copilot

Pull Request Overview

Adds a full summary of the new paper “Autoregressive Image Generation without Vector Quantization”, covering background, methodology, implementation details, experiments, and references

Introduces paper metadata, author link, and Chinese translation
Details vector quantization background and the proposed diffusion-based autoregressive method
Provides experiments on loss functions, tokenizers, MLP ablations, and system comparisons

Comments suppressed due to low confidence (1)

papers/image-generation/2406-mar/index.md:35

[nitpick] List indentation is inconsistent here and in subsequent bullet points. Use uniform indent levels for nested lists to improve readability.

+- 以[VQ-VAE, 2017]為例

Copilot · 2025-05-17T06:33:56Z

+- Diffusion Loss：consine形狀的noise schedule，訓練時DDPM有1000 step而推論則僅有100 step
+- Denosing MLP(small MLP)：3層1024個channel的block，每一個block包含LayerNorm, linear layer， SiLU 激活函數並使用residual connection連接，實作上是使用AdaLN將transformer的輸出z加入到LayerNorm層當中
+- Tokenizer：使用LDM提供的公開tokenizer，包括VQ-16和KL-16。其中VQ-16是基於VQ-GAN的量化模型，使用GAN loss和感知loss，KL-16則透過KL散度做regularization且不依賴VQ
+- Transformer：使用 ViT 來接收 tokenizer 處理後的 token sequene，加上位置編碼和類別token [CLS]，然後通過32層1024個channel的transformer block


Typo in 'sequene'; it should be 'sequence'.

Suggested change

- Transformer：使用 ViT 來接收 tokenizer 處理後的 token sequene，加上位置編碼和類別token [CLS]，然後通過32層1024個channel的transformer block

- Transformer：使用 ViT 來接收 tokenizer 處理後的 token sequence，加上位置編碼和類別token [CLS]，然後通過32層1024個channel的transformer block

Copilot · 2025-05-17T06:33:56Z

+- Tokenizer：使用LDM提供的公開tokenizer，包括VQ-16和KL-16。其中VQ-16是基於VQ-GAN的量化模型，使用GAN loss和感知loss，KL-16則透過KL散度做regularization且不依賴VQ
+- Transformer：使用 ViT 來接收 tokenizer 處理後的 token sequene，加上位置編碼和類別token [CLS]，然後通過32層1024個channel的transformer block
+- Masked autoregressive models：在訓練階段使用 [0.7, 1.0] 的masking ratio，0.7代表隨機遮蔽掉70%的token，另外為了避免抽樣出來的序列太短，他們始終pad 64個[cls] token到其中。在推理階段會逐步將1.0的masking ratio降低到0，並使用cosine schedule來調整步數，預設是64步
+- Baseline Autoregressive Model: casual attention的GPT模型，輸入有append一個[cls]，並且有使用kv cache以及溫度參數


Typo in 'casual attention'; it should be 'causal attention'.

Suggested change

- Baseline Autoregressive Model: casual attention的GPT模型，輸入有append一個[cls]，並且有使用kv cache以及溫度參數

- Baseline Autoregressive Model: causal attention的GPT模型，輸入有append一個[cls]，並且有使用kv cache以及溫度參數

Copilot

Pull Request Overview

This PR introduces a new markdown document detailing a paper on autoregressive image generation without using vector quantization.

Added a new markdown file with paper details, experimental setups, and comparison figures.
Provides background, methodology, and implementation details for the proposed approach.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[A] Add 24.06 MAR paper

7d2233c

nbswords self-assigned this May 17, 2025

[R] Remove a deprecated reference

a7aec29

nbswords requested review from Copilot and zephyr-sh May 17, 2025 06:32

Copilot AI reviewed May 17, 2025

View reviewed changes

[F] Fix img2 filename

16a54bd

nbswords requested a review from Copilot May 17, 2025 06:36

Copilot AI reviewed May 17, 2025

View reviewed changes

nbswords and others added 12 commits May 17, 2025 14:41

Update papers/image-generation/2406-mar/index.md

e8f6962

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update papers/image-generation/2406-mar/index.md

b4d75e7

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update papers/image-generation/2406-mar/index.md

7bd3872

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update papers/image-generation/2406-mar/index.md

fe0216a

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update papers/image-generation/2406-mar/index.md

31a1778

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[C] Change some image size

8d80afb

[F] Fix phrase

a1a37d8

[F] Fix img alt and add steps experiments

9bba119

[F] Fix steps experiments img size

3f98728

[R] Remove SOTA phrase

e0727a6

[A] Add article MAR

e4a4943

[C] Update info

c6bfac8

zephyr-sh approved these changes May 17, 2025

View reviewed changes

zephyr-sh merged commit f59f2e0 into main May 17, 2025
1 check passed

zephyr-sh deleted the feat/add_mar_paper branch May 17, 2025 11:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[A] Add 24.06 MAR paper#19

[A] Add 24.06 MAR paper#19
zephyr-sh merged 15 commits into
mainfrom
feat/add_mar_paper

nbswords commented May 17, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI May 17, 2025

Uh oh!

Copilot AI May 17, 2025

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	- Transformer：使用 ViT 來接收 tokenizer 處理後的 token sequene，加上位置編碼和類別token [CLS]，然後通過32層1024個channel的transformer block
	- Transformer：使用 ViT 來接收 tokenizer 處理後的 token sequence，加上位置編碼和類別token [CLS]，然後通過32層1024個channel的transformer block

	- Baseline Autoregressive Model: casual attention的GPT模型，輸入有append一個[cls]，並且有使用kv cache以及溫度參數
	- Baseline Autoregressive Model: causal attention的GPT模型，輸入有append一個[cls]，並且有使用kv cache以及溫度參數

Conversation

nbswords commented May 17, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Copilot AI May 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants