- [x] - add bf16 support - [ ] - check if training with bf16 weights works fine - [x] - add resuming from ckpt - [x] - add wandb tracking - [x] - complete adafactor option - [x] - figure out how to best utilize profiler for training loop optimization - [x] - add gradient accumulation - [x] - support iterable datasets and max_steps argument - [x] - prefetch generator for dataloader