Skip to content

Releases: lof310/transformer

transformer-v0.4.0

30 Mar 18:04

Choose a tag to compare

transformer library Release v0.4.0 🔥

Documentation

Documentation available at This Page

transformer-v0.3.0

13 Mar 18:11

Choose a tag to compare

Release v0.3.0

This release introduces Partial Rotary Position Embeddings (PartialRoPE), and ALiBi positional encodings, along with significant improvements to the module APIs and HuggingFace integration. The codebase has been refactored for better maintainability and flexibility.

Key Features

  • Partial Rotary Position EmbeddingsPartialRoPE module that applies rotary embeddings to only a fraction of the head dimension, as used in modern architectures like Qwen.
  • ALiBi support – Initial implementation of Attention with Linear Biases for length extrapolation (ready for integration).
  • HuggingFace compatibility – Full integration with the HuggingFace ecosystem: models inherit from PreTrainedModel and GenerationMixin, with proper configuration class support. Now support for Gradient Checkpointing.
  • Flexible normalization design – Support for pre-norm, post-norm, or both normalization patterns in transformer blocks.
  • Improved type handling – Support for both string identifiers and custom nn.Module classes for attention, feed-forward, and normalization components.
  • Modules now accept kwargs dictionaries for a more customized configuration.

Documentation

Full documentation is available at: transformer 0.3.0 documentation

transformer-v0.3.0-beta

13 Mar 03:14

Choose a tag to compare

Pre-release

Pre-Release Version 0.3.0 -- Beta

transformer-v0.2.0

08 Mar 19:15

Choose a tag to compare

Release v0.2.0

The transformer now supports Flash Attention via PyTorch’s native implementation, dramatically accelerating training and inference. Configuration options have been expanded, including dropout control for attention mechanisms, and the documentation has been thoroughly updated for clarity.

Key Features

  • Flash Attention support – Leverage PyTorch's accelerated Flash Attention implementation for faster attention computation and improved memory efficiency.
  • Attention dropout – Configure dropout rates in attention layers to enhance regularization.
  • Expanded configuration – Additional parameters for finer control over model architecture.
  • Improved documentation – Updated documentation for easier usage and development.

Documentation
Full documentation is available at: transformer 0.2.0 documentation

transformer library

06 Mar 22:06

Choose a tag to compare

Release v0.1.0 - Initial Release

This is the first release of my library transformer -- a polished PyTorch implementation of the current State-Of-The-Art (SOTA) Transformer architecture, designed to serve as a robust baseline for both research and engineering.

Key Features

  • Fully configurable architecture – Easily adjust layers, heads, dimensions, and more via a clean TransformerConfig.
  • HuggingFace-compatible API – Seamlessly integrates with PreTrainedModel and GenerationMixin for text generation and interoperability.
  • Attention mechanisms – Supports Multi-Head Attention (MHA), Grouped-Query Attention (GQA), and others.
  • Modern components – Includes Rotary Position Embeddings (RoPE), SwiGLU feed-forward, QK normalization, and bias control.
  • Weight tying support – Optionally tie embeddings with language modeling head for parameter efficiency.

What's Included?

  • Core Transformer model with configurable layers
  • Attention modules (MHA, GQA, etc.)
  • Positional embeddings (only RoPE for now)
  • Feed-forward modules (SwiGLU and MLP)

Documentation
Full documentation is available at: transformer 0.1.0 documentation