-
Notifications
You must be signed in to change notification settings - Fork 28
Added initial AI Agent instructions and skills #448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Changes from all commits
8a8ea81
cdab339
7838ef9
241831b
b94f4e5
22d1a55
664de39
94276f8
08d803f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| { | ||
| "extraKnownMarketplaces": { | ||
| "amd-claude-marketplace": { | ||
| "source": { | ||
| "source": "github", | ||
| "repo": "ROCm/amd-claude-marketplace" | ||
| } | ||
| } | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,112 @@ | ||
| # Agent instructions for TransformerEngine (ROCm fork) | ||
|
|
||
| ## Docker containers | ||
| - We work in Docker containers for reproducibility. | ||
| - Run build/test commands **only** inside the designated container (not on host). | ||
| - If container is unspecified, ask for the exact image/tag and launch command **before** running anything expensive. | ||
| - Prefer editable installs (`pip install -e .`). | ||
| - Before debugging, record: container image/tag, ROCm version, GPU arch, TE commit, submodule state. | ||
| - If results are suspicious, first verify you are in the expected container and that GPU devices/libs are exposed correctly. | ||
|
|
||
| ## Architecture | ||
| - One core C++/HIP library + optional framework bindings: | ||
| - core: `transformer_engine/common` → `libtransformer_engine.so` | ||
| - PyTorch: `transformer_engine/pytorch` + `transformer_engine/pytorch/csrc` | ||
| - JAX: `transformer_engine/jax` + `transformer_engine/jax/csrc/extensions` | ||
| - Python import flow: | ||
| - framework selection: `transformer_engine/__init__.py` (`NVTE_FRAMEWORK` = `pytorch|jax|all|none`) | ||
| - `.so` loading: `transformer_engine/common/__init__.py` (`load_framework_extension`) | ||
| - Build orchestration: `setup.py` + `build_tools/*.py` + CMake. | ||
| - `build_tools/utils.py::rocm_build()` auto-detects ROCm first, then CUDA, unless `NVTE_USE_ROCM` is set. | ||
| - 3rdparty submodules: `aiter`, `aotriton`, `cudnn-frontend`, `cutlass`, `googletest`, `hipify_torch`. | ||
|
|
||
| ## Hipify convention | ||
| The build auto-generates HIP files from CUDA sources via `hipify_torch`. Generated files are marked with `// !!! This is a file automatically generated by hipify!!!` at line 1. **Never edit generated files directly** — edit the CUDA source instead. | ||
|
|
||
| File extension mapping: | ||
|
|
||
| | CUDA source | Generated HIP file | | ||
| |---|---| | ||
| | `.cu` | `.hip` | | ||
| | `.cuh` | `_hip.cuh` | | ||
| | `.cpp` | `_hip.cpp` | | ||
| | `.h` | `_hip.h` | | ||
|
|
||
| The following directories are **excluded** from hipify (native ROCm code — edit directly): | ||
| - `transformer_engine/common/ck_fused_attn/` — CK kernel wrappers | ||
| - `transformer_engine/common/amd_detail/` — AMD-specific utilities | ||
| - `transformer_engine/common/rocshmem_api/` — ROCshmem wrappers | ||
|
|
||
| Framework bindings (`pytorch/csrc`, `jax/csrc`) are hipified separately via `build_tools/pytorch.py` and `build_tools/jax.py`. | ||
|
|
||
| ## Fused attention backends | ||
| Backends are gated by env vars (set to `0` to disable, unset or `1` to enable): | ||
|
|
||
| | Env var | Controls | Default | | ||
| |---|---|---| | ||
| | `NVTE_FUSED_ATTN` | Master toggle for all fused attention | `1` | | ||
| | `NVTE_FUSED_ATTN_CK` | CK backend | inherits `NVTE_FUSED_ATTN` | | ||
| | `NVTE_FUSED_ATTN_AOTRITON` | AOTriton backend | inherits `NVTE_FUSED_ATTN` | | ||
| | `NVTE_FLASH_ATTN` | Flash attention | `1` | | ||
|
|
||
| CI backend configs (`ci/_utils.sh::configure_fused_attn_env`): `auto`, `ck`, `aotriton`, `flash`, `unfused`. | ||
|
|
||
| ### ROCm fused-attn file layout | ||
| - **Runtime backend selection/dispatch**: `transformer_engine/common/fused_attn_rocm/fused_attn.cpp` (hipified) | ||
| - **CK dispatch glue**: `transformer_engine/common/fused_attn_rocm/fused_attn_ck.cpp` (hipified) | ||
| - **AOTriton dispatch glue**: `transformer_engine/common/fused_attn_rocm/fused_attn_aotriton.cpp` (hipified) | ||
| - **CK kernel wrappers** (native, not hipified): | ||
| - `transformer_engine/common/ck_fused_attn/src/ck_fused_attn_{fwd,bwd,utils}.cpp` | ||
| - `transformer_engine/common/ck_fused_attn/include/ck_fused_attn/ck_fused_attn.hpp` | ||
|
|
||
| ### Debug logging env vars | ||
| - `NVTE_DEBUG=1` + `NVTE_DEBUG_LEVEL={0,1,2}` — Python-level attention debug output | ||
| - `NVTE_LOG_FUSED_ATTN_CONFIG=1` — C++ backend selection logging | ||
| - `NVTE_LOG_CK_CONFIG=1` — CK-specific config logging | ||
| - `NVTE_LOG_AOTRITON_CONFIG=1` — AOTriton-specific config logging | ||
| - `CK_FUSED_ATTN_LOG_CONFIG=1` — CK kernel wrapper logging | ||
|
|
||
| ## Developer workflows | ||
| - Always init submodules first: `git submodule update --init --recursive`. | ||
| - Source install: `pip install . --no-build-isolation`. | ||
| - C++ tests: `ci/core.sh`. | ||
| - Framework CI tests (shell scripts, not bare pytest): | ||
| - PyTorch: `ci/pytorch.sh` | JAX: `ci/jax.sh` | ||
| - Control via `TEST_LEVEL`, `TEST_SGPU`, `TEST_MGPU`, `TEST_FILTER` (from `ci/_utils.sh`). | ||
|
|
||
| ## Code conventions | ||
| - Edit `transformer_engine/*`, `build_tools/*`, `tests/*`, `ci/*`; avoid `3rdparty/*` unless explicitly required. | ||
| - Keep env-var behavior stable; tests toggle flags intentionally. | ||
| - Python: Black, line length 100. C/C++: cpplint + `.clang-format`. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and pylintrc |
||
| - **Preserve the existing style of each file you edit.** Much of the codebase originates from upstream, and style can vary file-to-file (naming conventions, comment style, control flow patterns, etc.). Before writing new code in a file, read enough of it to understand how similar logic is already written, and follow that style. Consistency within a file matters more than imposing a uniform style across the project. | ||
|
|
||
| ## Copyright headers | ||
| When you modify a file, update its copyright header so the end-year reflects the current year. | ||
|
|
||
| This repo carries **two** copyright lines — AMD and NVIDIA. Follow these rules: | ||
|
|
||
| 1. **Files with an existing AMD copyright line** — update the AMD end-year to the current year (e.g. `2025` → `2026`). Leave the NVIDIA line untouched. | ||
| 2. **Files with only an NVIDIA copyright line** — add an AMD line **above** the NVIDIA line: | ||
| - Python: `# Copyright (c) <YEAR>, Advanced Micro Devices, Inc. All rights reserved.` | ||
| - C/C++/HIP: `/* Copyright (c) <YEAR>, Advanced Micro Devices, Inc. All rights reserved. */` (or use the `*`-block style matching the file). | ||
| - `<YEAR>` is the current year (single year) for newly-added lines, e.g. `2026`. | ||
| 3. **New files you create** — include both AMD and NVIDIA headers with the current year, followed by a blank comment line and `See LICENSE for license information.` | ||
| 4. **Never change the NVIDIA copyright year range** — those dates are updated during IFU (integrate from upstream) merges. | ||
|
|
||
| AMD headers are our addition and should stay consistent with the patterns already in the codebase. | ||
|
|
||
| ## Memory management | ||
| When writing or updating memories in the project memory directory, follow these guidelines: | ||
|
wenchenvincent marked this conversation as resolved.
|
||
|
|
||
| - **Scope**: only save information that will be useful in future conversations. Do not save ephemeral task details, debugging breadcrumbs, or things derivable from the code/git history. | ||
| - **Check before writing**: read `MEMORY.md` and check for an existing memory on the same topic before creating a new file. Update the existing memory instead of duplicating. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where is the memory.md?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a user-local file that is an index to memory files left by Claude. This isn't something that is stored in the project. |
||
| - **File naming**: use short, descriptive, snake_case names (e.g. `aiter_build.md`, `container_setup.md`). Group by topic, not by date. | ||
| - **Frontmatter**: every memory file must have the standard `name`, `description`, and `type` frontmatter fields. | ||
| - **Index maintenance**: after creating or removing a memory file, update `MEMORY.md` to keep the index in sync. Each entry should be a single line under 150 characters. | ||
| - **Staleness**: memories are point-in-time observations. When recalling a memory, verify it against current code/state before acting on it. Update or delete memories that are no longer accurate. | ||
|
|
||
| ## Troubleshooting pointers | ||
| - **Missing `.so` on import**: check path resolution in `transformer_engine/common/__init__.py`. | ||
| - **Framework extension won't build on ROCm**: check `build_tools/utils.py::get_frameworks()`. | ||
| - **Fused-attn regression**: reproduce under multiple backend configs (`auto`, `ck`, `aotriton`, `unfused`). | ||
| - **CK/AITER kernel failures**: use the `ck-debugging` skill for structured triage and isolation. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If some file does not contain CUDA but only HIP code and it does not include headers containing CUDA code, such file can be excluded from hipification. It can be done in two ways: explicitly add to ignores list in do_hipify() in build_tools/hipify/hipify.py - which is useful for subdirectories containing HIP only code, or rely on HIPIFY to detect that file modification is not needed. In this case the file should have: #include "hip/hip_runtime.h" - real one or commented out, if the header is not really needed.