[A] Add 24.06 MAR paper#19
Merged
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull Request Overview
Adds a full summary of the new paper “Autoregressive Image Generation without Vector Quantization”, covering background, methodology, implementation details, experiments, and references
- Introduces paper metadata, author link, and Chinese translation
- Details vector quantization background and the proposed diffusion-based autoregressive method
- Provides experiments on loss functions, tokenizers, MLP ablations, and system comparisons
Comments suppressed due to low confidence (1)
papers/image-generation/2406-mar/index.md:35
- [nitpick] List indentation is inconsistent here and in subsequent bullet points. Use uniform indent levels for nested lists to improve readability.
+- 以[VQ-VAE, 2017]為例
| - Diffusion Loss:consine形狀的noise schedule,訓練時DDPM有1000 step而推論則僅有100 step | ||
| - Denosing MLP(small MLP):3層1024個channel的block,每一個block包含LayerNorm, linear layer, SiLU 激活函數並使用residual connection連接,實作上是使用AdaLN將transformer的輸出z加入到LayerNorm層當中 | ||
| - Tokenizer:使用LDM提供的公開tokenizer,包括VQ-16和KL-16。其中VQ-16是基於VQ-GAN的量化模型,使用GAN loss和感知loss,KL-16則透過KL散度做regularization且不依賴VQ | ||
| - Transformer:使用 ViT 來接收 tokenizer 處理後的 token sequene,加上位置編碼和類別token [CLS],然後通過32層1024個channel的transformer block |
There was a problem hiding this comment.
Typo in 'sequene'; it should be 'sequence'.
Suggested change
| - Transformer:使用 ViT 來接收 tokenizer 處理後的 token sequene,加上位置編碼和類別token [CLS],然後通過32層1024個channel的transformer block | |
| - Transformer:使用 ViT 來接收 tokenizer 處理後的 token sequence,加上位置編碼和類別token [CLS],然後通過32層1024個channel的transformer block |
| - Tokenizer:使用LDM提供的公開tokenizer,包括VQ-16和KL-16。其中VQ-16是基於VQ-GAN的量化模型,使用GAN loss和感知loss,KL-16則透過KL散度做regularization且不依賴VQ | ||
| - Transformer:使用 ViT 來接收 tokenizer 處理後的 token sequene,加上位置編碼和類別token [CLS],然後通過32層1024個channel的transformer block | ||
| - Masked autoregressive models:在訓練階段使用 [0.7, 1.0] 的masking ratio,0.7代表隨機遮蔽掉70%的token,另外為了避免抽樣出來的序列太短,他們始終pad 64個[cls] token到其中。在推理階段會逐步將1.0的masking ratio降低到0,並使用cosine schedule來調整步數,預設是64步 | ||
| - Baseline Autoregressive Model: casual attention的GPT模型,輸入有append一個[cls],並且有使用kv cache以及溫度參數 |
There was a problem hiding this comment.
Typo in 'casual attention'; it should be 'causal attention'.
Suggested change
| - Baseline Autoregressive Model: casual attention的GPT模型,輸入有append一個[cls],並且有使用kv cache以及溫度參數 | |
| - Baseline Autoregressive Model: causal attention的GPT模型,輸入有append一個[cls],並且有使用kv cache以及溫度參數 |
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR introduces a new markdown document detailing a paper on autoregressive image generation without using vector quantization.
- Added a new markdown file with paper details, experimental setups, and comparison figures.
- Provides background, methodology, and implementation details for the proposed approach.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
zephyr-sh
approved these changes
May 17, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.