Generating Fire Emblem GBA-style tactical maps with masked discrete diffusion.
Instead of placing tiles one-by-one, the model generates the entire map at once — starting from a fully masked grid and iteratively unmasking tiles over ~20 denoising steps, the same coarse-to-fine paradigm behind image diffusion models, adapted for small discrete grids.
Masked discrete diffusion (MDLM-style). Terrain tiles are categorical (plains, forest, mountain, water, wall, etc.), so continuous Gaussian noise doesn't apply. Instead:
- Forward process: Randomly replace tiles with
[MASK]tokens according to a cosine noise schedule - Reverse process: A Transformer encoder predicts what each
[MASK]should be; unmask the most confident predictions and repeat
The denoiser is a standard Transformer encoder (4 layers, dim 128, 4 heads) operating over the flattened H×W grid. Each tile gets a summed embedding of its token ID, 2D positional encoding (row + col), and diffusion timestep.
Dataset: 78 maps extracted from Fire Emblem GBA ROMs, each a 2D grid of ~29 terrain types, padded to 42×43. No augmentation yet.
src/model.py Transformer denoiser
src/diffusion.py Cosine noise schedule, forward masking, iterative sampling
src/train.py Training loop with W&B logging and sample visualization
src/generate.py Standalone generation with temperature sampling
src/data.py Dataset loading, padding, batching
viz/render.py Terrain grid → color-coded image
configs/ Hyperparameters (YAML)
Early stage — pipeline is end-to-end functional (train → generate → render). The model trains and produces samples, but with only 78 maps and no augmentation, outputs are not yet diverse. Next steps: data augmentation (rotations, flips, random crops), architecture tuning, and longer training.
PyTorch, Weights & Biases, Matplotlib