Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions docs/codegen.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
Codegen for TL1 and TL2
------------------------

codegen_tl1.py and codegen_tl2.py are using params to generate kernel codes in different devices to achieve fastest performance for TL1 and TL2.
codegen_tl1.py and codegen_tl2.py use parameters to generate kernel code for different devices to achieve the fastest performance for TL1 and TL2.

We cutting weight into multiple compute blocks to best utilize hardware capabilities.
Weights are split into multiple compute blocks to best utilize hardware capabilities.

### Example
bitnet_b1_58-large:
Expand Down Expand Up @@ -31,19 +31,19 @@ python utils/codegen_tl2.py --model bitnet_b1_58-large --BM 256,128,256 --BK 96,

For TL1, we cut weight into M / BM weights, each weight shape is (BM, K). Then we cut weight into K / BK weights, each weight shape is (BM, BK). As for (BM, BK) weight, we cut it the same way into (bm, compute_num / bm) compute blocks, and finish computing in it.

Thus, we need to make sure
Thus, we need to make sure
- M % BM == 0
- K % BK == 0
- BM % bm == 0
- bm choose in [32, 64]
- bm is chosen from [32, 64]

### TL2:
![TL2](../assets/tl2.png)

For TL2, things got a little more complicated. Due to TL2 needs BK % 6 == 0, we need to split K into threeK and twoK, in which compute in TL2 for (M, threeK), compute in TL1 for (M, two_K).
For TL2, things get a little more complicated. Because TL2 needs BK % 6 == 0, we split K into threeK and twoK, compute (M, threeK) in TL2, and compute (M, two_K) in TL1.

Thus, we needs to make sure
Thus, we need to make sure
- M % BM == 0
- K % BK % 32 == 0
- BM % bm == 0
- bm choose in \[32\]
- bm is chosen from \[32\]