Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .claude/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"extraKnownMarketplaces": {
"amd-claude-marketplace": {
"source": {
"source": "github",
"repo": "ROCm/amd-claude-marketplace"
}
}
}
}
112 changes: 112 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Agent instructions for TransformerEngine (ROCm fork)

## Docker containers
- We work in Docker containers for reproducibility.
- Run build/test commands **only** inside the designated container (not on host).
- If container is unspecified, ask for the exact image/tag and launch command **before** running anything expensive.
- Prefer editable installs (`pip install -e .`).
- Before debugging, record: container image/tag, ROCm version, GPU arch, TE commit, submodule state.
- If results are suspicious, first verify you are in the expected container and that GPU devices/libs are exposed correctly.

## Architecture
- One core C++/HIP library + optional framework bindings:
- core: `transformer_engine/common` → `libtransformer_engine.so`
- PyTorch: `transformer_engine/pytorch` + `transformer_engine/pytorch/csrc`
- JAX: `transformer_engine/jax` + `transformer_engine/jax/csrc/extensions`
- Python import flow:
- framework selection: `transformer_engine/__init__.py` (`NVTE_FRAMEWORK` = `pytorch|jax|all|none`)
- `.so` loading: `transformer_engine/common/__init__.py` (`load_framework_extension`)
- Build orchestration: `setup.py` + `build_tools/*.py` + CMake.
- `build_tools/utils.py::rocm_build()` auto-detects ROCm first, then CUDA, unless `NVTE_USE_ROCM` is set.
- 3rdparty submodules: `aiter`, `aotriton`, `cudnn-frontend`, `cutlass`, `googletest`, `hipify_torch`.

## Hipify convention
The build auto-generates HIP files from CUDA sources via `hipify_torch`. Generated files are marked with `// !!! This is a file automatically generated by hipify!!!` at line 1. **Never edit generated files directly** — edit the CUDA source instead.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If some file does not contain CUDA but only HIP code and it does not include headers containing CUDA code, such file can be excluded from hipification. It can be done in two ways: explicitly add to ignores list in do_hipify() in build_tools/hipify/hipify.py - which is useful for subdirectories containing HIP only code, or rely on HIPIFY to detect that file modification is not needed. In this case the file should have: #include "hip/hip_runtime.h" - real one or commented out, if the header is not really needed.


File extension mapping:

| CUDA source | Generated HIP file |
|---|---|
| `.cu` | `.hip` |
| `.cuh` | `_hip.cuh` |
| `.cpp` | `_hip.cpp` |
| `.h` | `_hip.h` |

The following directories are **excluded** from hipify (native ROCm code — edit directly):
- `transformer_engine/common/ck_fused_attn/` — CK kernel wrappers
- `transformer_engine/common/amd_detail/` — AMD-specific utilities
- `transformer_engine/common/rocshmem_api/` — ROCshmem wrappers

Framework bindings (`pytorch/csrc`, `jax/csrc`) are hipified separately via `build_tools/pytorch.py` and `build_tools/jax.py`.

## Fused attention backends
Backends are gated by env vars (set to `0` to disable, unset or `1` to enable):

| Env var | Controls | Default |
|---|---|---|
| `NVTE_FUSED_ATTN` | Master toggle for all fused attention | `1` |
| `NVTE_FUSED_ATTN_CK` | CK backend | inherits `NVTE_FUSED_ATTN` |
| `NVTE_FUSED_ATTN_AOTRITON` | AOTriton backend | inherits `NVTE_FUSED_ATTN` |
| `NVTE_FLASH_ATTN` | Flash attention | `1` |

CI backend configs (`ci/_utils.sh::configure_fused_attn_env`): `auto`, `ck`, `aotriton`, `flash`, `unfused`.

### ROCm fused-attn file layout
- **Runtime backend selection/dispatch**: `transformer_engine/common/fused_attn_rocm/fused_attn.cpp` (hipified)
- **CK dispatch glue**: `transformer_engine/common/fused_attn_rocm/fused_attn_ck.cpp` (hipified)
- **AOTriton dispatch glue**: `transformer_engine/common/fused_attn_rocm/fused_attn_aotriton.cpp` (hipified)
- **CK kernel wrappers** (native, not hipified):
- `transformer_engine/common/ck_fused_attn/src/ck_fused_attn_{fwd,bwd,utils}.cpp`
- `transformer_engine/common/ck_fused_attn/include/ck_fused_attn/ck_fused_attn.hpp`

### Debug logging env vars
- `NVTE_DEBUG=1` + `NVTE_DEBUG_LEVEL={0,1,2}` — Python-level attention debug output
- `NVTE_LOG_FUSED_ATTN_CONFIG=1` — C++ backend selection logging
- `NVTE_LOG_CK_CONFIG=1` — CK-specific config logging
- `NVTE_LOG_AOTRITON_CONFIG=1` — AOTriton-specific config logging
- `CK_FUSED_ATTN_LOG_CONFIG=1` — CK kernel wrapper logging

## Developer workflows
- Always init submodules first: `git submodule update --init --recursive`.
- Source install: `pip install . --no-build-isolation`.
- C++ tests: `ci/core.sh`.
- Framework CI tests (shell scripts, not bare pytest):
- PyTorch: `ci/pytorch.sh` | JAX: `ci/jax.sh`
- Control via `TEST_LEVEL`, `TEST_SGPU`, `TEST_MGPU`, `TEST_FILTER` (from `ci/_utils.sh`).

## Code conventions
- Edit `transformer_engine/*`, `build_tools/*`, `tests/*`, `ci/*`; avoid `3rdparty/*` unless explicitly required.
- Keep env-var behavior stable; tests toggle flags intentionally.
- Python: Black, line length 100. C/C++: cpplint + `.clang-format`.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and pylintrc

- **Preserve the existing style of each file you edit.** Much of the codebase originates from upstream, and style can vary file-to-file (naming conventions, comment style, control flow patterns, etc.). Before writing new code in a file, read enough of it to understand how similar logic is already written, and follow that style. Consistency within a file matters more than imposing a uniform style across the project.

## Copyright headers
When you modify a file, update its copyright header so the end-year reflects the current year.

This repo carries **two** copyright lines — AMD and NVIDIA. Follow these rules:

1. **Files with an existing AMD copyright line** — update the AMD end-year to the current year (e.g. `2025` → `2026`). Leave the NVIDIA line untouched.
2. **Files with only an NVIDIA copyright line** — add an AMD line **above** the NVIDIA line:
- Python: `# Copyright (c) <YEAR>, Advanced Micro Devices, Inc. All rights reserved.`
- C/C++/HIP: `/* Copyright (c) <YEAR>, Advanced Micro Devices, Inc. All rights reserved. */` (or use the `*`-block style matching the file).
- `<YEAR>` is the current year (single year) for newly-added lines, e.g. `2026`.
3. **New files you create** — include both AMD and NVIDIA headers with the current year, followed by a blank comment line and `See LICENSE for license information.`
4. **Never change the NVIDIA copyright year range** — those dates are updated during IFU (integrate from upstream) merges.

AMD headers are our addition and should stay consistent with the patterns already in the codebase.

## Memory management
When writing or updating memories in the project memory directory, follow these guidelines:
Comment thread
wenchenvincent marked this conversation as resolved.

- **Scope**: only save information that will be useful in future conversations. Do not save ephemeral task details, debugging breadcrumbs, or things derivable from the code/git history.
- **Check before writing**: read `MEMORY.md` and check for an existing memory on the same topic before creating a new file. Update the existing memory instead of duplicating.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the memory.md?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a user-local file that is an index to memory files left by Claude. This isn't something that is stored in the project.

- **File naming**: use short, descriptive, snake_case names (e.g. `aiter_build.md`, `container_setup.md`). Group by topic, not by date.
- **Frontmatter**: every memory file must have the standard `name`, `description`, and `type` frontmatter fields.
- **Index maintenance**: after creating or removing a memory file, update `MEMORY.md` to keep the index in sync. Each entry should be a single line under 150 characters.
- **Staleness**: memories are point-in-time observations. When recalling a memory, verify it against current code/state before acting on it. Update or delete memories that are no longer accurate.

## Troubleshooting pointers
- **Missing `.so` on import**: check path resolution in `transformer_engine/common/__init__.py`.
- **Framework extension won't build on ROCm**: check `build_tools/utils.py::get_frameworks()`.
- **Fused-attn regression**: reproduce under multiple backend configs (`auto`, `ck`, `aotriton`, `unfused`).
- **CK/AITER kernel failures**: use the `ck-debugging` skill for structured triage and isolation.