From 93505a7dff3a180b848c188ab02f676f845ceb3f Mon Sep 17 00:00:00 2001 From: baby brr Date: Mon, 11 May 2026 00:59:28 -0400 Subject: [PATCH] Fix codegen docs grammar --- docs/codegen.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/codegen.md b/docs/codegen.md index ff21e3cfb..e632ee3b7 100644 --- a/docs/codegen.md +++ b/docs/codegen.md @@ -1,9 +1,9 @@ Codegen for TL1 and TL2 ------------------------ -codegen_tl1.py and codegen_tl2.py are using params to generate kernel codes in different devices to achieve fastest performance for TL1 and TL2. +codegen_tl1.py and codegen_tl2.py use parameters to generate kernel code for different devices to achieve the fastest performance for TL1 and TL2. -We cutting weight into multiple compute blocks to best utilize hardware capabilities. +Weights are split into multiple compute blocks to best utilize hardware capabilities. ### Example bitnet_b1_58-large: @@ -31,19 +31,19 @@ python utils/codegen_tl2.py --model bitnet_b1_58-large --BM 256,128,256 --BK 96, For TL1, we cut weight into M / BM weights, each weight shape is (BM, K). Then we cut weight into K / BK weights, each weight shape is (BM, BK). As for (BM, BK) weight, we cut it the same way into (bm, compute_num / bm) compute blocks, and finish computing in it. -Thus, we need to make sure +Thus, we need to make sure - M % BM == 0 - K % BK == 0 - BM % bm == 0 -- bm choose in [32, 64] +- bm is chosen from [32, 64] ### TL2: ![TL2](../assets/tl2.png) -For TL2, things got a little more complicated. Due to TL2 needs BK % 6 == 0, we need to split K into threeK and twoK, in which compute in TL2 for (M, threeK), compute in TL1 for (M, two_K). +For TL2, things get a little more complicated. Because TL2 needs BK % 6 == 0, we split K into threeK and twoK, compute (M, threeK) in TL2, and compute (M, two_K) in TL1. -Thus, we needs to make sure +Thus, we need to make sure - M % BM == 0 - K % BK % 32 == 0 - BM % bm == 0 -- bm choose in \[32\] \ No newline at end of file +- bm is chosen from \[32\]