[feat] FU-level fusion #244

HobbitQia · 2026-01-22T13:30:34Z

This pass implements the FU-level fusion after DFG-level fusion (PR 194), which aims to find the minimum FU cost that can cover all patterns extracted in previous passes (wrapped in fused_op).

The algorithm can be depicted as below:

Pattern Extraction: Extracts fused operation patterns from the module and linearizes them via topological sort.
Standalone Operation Extraction: Collects standalone operations not inside fused patterns for hardware coverage.
Template Creation: Greedily merges patterns into shared hardware templates using cost-based accommodation with DFS mapping search.
Connection Generation: Generates optimized slot connections based on pattern dependencies with bypass support.
Execution Plan Generation: Creates parallel execution stages by grouping operations at the same topological level.
JSON Output: Writes hardware configuration including templates, connections, and execution plans to JSON file.

…are_merge

…into hardware_merge

tancheng · 2026-01-23T09:24:42Z

include/NeuraDialect/Transforms/GraphMining/HardwareTemplate.h

+  HardwarePattern(int64_t i, const std::string& n, int64_t f);
+};
+
+struct HardwareSlot {


Define/comment HW slot with example. FU in same slot cannot be executed at the same time. Slot is good for which case.

added. u can check it.

tancheng · 2026-01-23T09:28:11Z

include/NeuraDialect/Transforms/GraphMining/HardwareTemplate.h

+
+// Execution plan for a pattern on a hardware template.
+struct PatternExecutionPlan {
+  int64_t patternId;


Plz refactor all the variables naming. patternId -> pattern_id

tancheng · 2026-01-23T09:30:16Z

test/neura/fusion/test.mlir

+// RUN:           --fold-constant \
+// RUN:           --transform-ctrl-to-data-flow \
+// RUN:           --fold-constant \
+// RUN:           --iter-merge-pattern="min-support=3 max-iter=4" \


Plz remind me, after merging, the II would be improved?

Yes! In test.mlir the rec_mii decreases from 9 to 8 and res_mii decreases from 5 to 3.

If we use the same mapping strategy (customize) the mapping ii will decrease from 12 to 9.

tancheng · 2026-01-23T09:33:06Z

test/neura/fusion/test.mlir

+// CHECK-HARDWARE-MERGE:         "template_id": 0,
+// CHECK-HARDWARE-MERGE:         "instance_count": 2,
+// CHECK-HARDWARE-MERGE:         "supported_single_ops": ["neura.gep", "neura.load", "neura.phi_start", "neura.store"],
+// CHECK-HARDWARE-MERGE:         "supported_composite_ops": [
+// CHECK-HARDWARE-MERGE:           {"pattern_id": 10, "name": "phi_start->fused_op:gep->load"},
+// CHECK-HARDWARE-MERGE:           {"pattern_id": 0, "name": "gep->load"}
+// CHECK-HARDWARE-MERGE:         ],
+// CHECK-HARDWARE-MERGE:         "slots": [
+// CHECK-HARDWARE-MERGE:           {"slot_id": 0, "supported_ops": ["neura.phi_start"]},
+// CHECK-HARDWARE-MERGE:           {"slot_id": 1, "supported_ops": ["neura.gep"]},
+// CHECK-HARDWARE-MERGE:           {"slot_id": 2, "supported_ops": ["neura.load"]}
+// CHECK-HARDWARE-MERGE:         ],
+// CHECK-HARDWARE-MERGE:         "slot_connections": {
+// CHECK-HARDWARE-MERGE:           "connections": [{"from": 0, "to": 1}, {"from": 1, "to": 2}]
+// CHECK-HARDWARE-MERGE:         },
+// CHECK-HARDWARE-MERGE:         "pattern_execution_plans": [
+// CHECK-HARDWARE-MERGE:           {
+// CHECK-HARDWARE-MERGE:             "pattern_id": 10,
+// CHECK-HARDWARE-MERGE:             "pattern_name": "phi_start->fused_op:gep->load",
+// CHECK-HARDWARE-MERGE:             "slot_mapping": [0, 1, 2],
+// CHECK-HARDWARE-MERGE:             "execution_stages": [
+// CHECK-HARDWARE-MERGE:               {
+// CHECK-HARDWARE-MERGE:                 "stage": 0,
+// CHECK-HARDWARE-MERGE:                 "parallel_slots": [0],
+// CHECK-HARDWARE-MERGE:                 "parallel_ops": ["neura.phi_start"]
+// CHECK-HARDWARE-MERGE:               },
+// CHECK-HARDWARE-MERGE:               {
+// CHECK-HARDWARE-MERGE:                 "stage": 1,
+// CHECK-HARDWARE-MERGE:                 "parallel_slots": [1],
+// CHECK-HARDWARE-MERGE:                 "parallel_ops": ["neura.gep"]
+// CHECK-HARDWARE-MERGE:               },
+// CHECK-HARDWARE-MERGE:               {
+// CHECK-HARDWARE-MERGE:                 "stage": 2,
+// CHECK-HARDWARE-MERGE:                 "parallel_slots": [2],
+// CHECK-HARDWARE-MERGE:                 "parallel_ops": ["neura.load"]
+// CHECK-HARDWARE-MERGE:               }
+// CHECK-HARDWARE-MERGE:             ]
+// CHECK-HARDWARE-MERGE:           },
+// CHECK-HARDWARE-MERGE:           {
+// CHECK-HARDWARE-MERGE:             "pattern_id": 0,
+// CHECK-HARDWARE-MERGE:             "pattern_name": "gep->load",
+// CHECK-HARDWARE-MERGE:             "slot_mapping": [1, 2],
+// CHECK-HARDWARE-MERGE:             "execution_stages": [
+// CHECK-HARDWARE-MERGE:               {
+// CHECK-HARDWARE-MERGE:                 "stage": 0,
+// CHECK-HARDWARE-MERGE:                 "parallel_slots": [1],
+// CHECK-HARDWARE-MERGE:                 "parallel_ops": ["neura.gep"]
+// CHECK-HARDWARE-MERGE:               },
+// CHECK-HARDWARE-MERGE:               {
+// CHECK-HARDWARE-MERGE:                 "stage": 1,
+// CHECK-HARDWARE-MERGE:                 "parallel_slots": [2],
+// CHECK-HARDWARE-MERGE:                 "parallel_ops": ["neura.load"]
+// CHECK-HARDWARE-MERGE:               }
+// CHECK-HARDWARE-MERGE:             ]
+// CHECK-HARDWARE-MERGE:           }
+// CHECK-HARDWARE-MERGE:         ]
+// CHECK-HARDWARE-MERGE:       },
+// CHECK-HARDWARE-MERGE:       {
+// CHECK-HARDWARE-MERGE:         "template_id": 1,
+// CHECK-HARDWARE-MERGE:         "instance_count": 3,
+// CHECK-HARDWARE-MERGE:         "supported_single_ops": ["neura.grant_once", "neura.grant_predicate", "neura.icmp"],


Can we use this example to show what each field means (maybe draw it), and put it into PR's description?

Field Explanations

Field Description

template_id Unique identifier for this template

instance_count Number of instances each pattern

supported_single_ops Individual operations this template can execute standalone

supported_composite_ops Fused patterns this template can execute

slots Array of slot definitions with their supported operations

slot_connections Data routing paths between slots

pattern_execution_plans Detailed execution schedules for each pattern

Execution Plan Fields

Field Description

pattern_id Pattern being executed

pattern_name pattern name

slot_mapping Maps operation index to slot ID: [op0→slot0, op1→slot1, ...], e.g. [1, 2] means we will use slot 1 and slot2 to execute op0 and op1 of this pattern.

execution_stages Ordered stages of execution

parallel_slots Slots executing in this stage (can be multiple for parallel ops)

parallel_ops Operations executing in this stage

Note that parallel_ops is corresponding to parallel_slots. e.g.,

Example

Template 1 (instance_count: 3) ══════════════════════════════════════════════════════════════════ Pipeline Structure: ┌────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐ │ Slot 0 │ ──── │ Slot 1 │ ──── │ Slot 2 │ │ icmp │ │ grant_predicate │ │ grant_predicate │ └────────────────┘ └─────────────────────┘ └─────────────────────┘ Supported Patterns: - Pattern 1: icmp->grant_predicate->grant_predicate (full pipeline) - Pattern 3: icmp->grant_predicate (bypass slot 1) - Pattern 2: grant_predicate->grant_predicate (bypass slot 0)

Pattern 1 (icmp->grant_predicate->grant_predicate) shows parallel execution:

Stage 0: Slot 0 executes icmp Stage 1: Slot 1 AND Slot 2 execute grant_predicate IN PARALLEL (both depend only on icmp output)

{ "pattern_id": 1, "pattern_name": "fused_op:icmp->grant_predicate->grant_predicate", "slot_mapping": [0, 1, 2], "execution_stages": [ { "stage": 0, "parallel_slots": [0], "parallel_ops": ["neura.icmp"] }, { "stage": 1, "parallel_slots": [1, 2], "parallel_ops": ["neura.grant_predicate", "neura.grant_predicate"] } ] }

Note: In Stage 1, slots 1 and 2 execute simultaneously because both grant_predicate operations have the same topological level (both depend on icmp).

HobbitQia and others added 10 commits December 16, 2025 12:24

init HardwareMergePass

cd9c65e

update HardwareMergePass

cdaaaa3

Merge branch 'main' of https://github.com/coredac/dataflow into hardw…

938e7d5

…are_merge

update hardware template

446e497

update logic of hardware merging

c6a9afa

add include file

3af882f

add test file

b0073f3

Merge branch 'coredac:main' into hardware_merge

c40d975

Merge branch 'hardware_merge' of https://github.com/HobbitQia/dataflow …

ed94ebb

…into hardware_merge

fix test

aad327a

HobbitQia requested a review from tancheng January 22, 2026 13:34

tancheng reviewed Jan 23, 2026

View reviewed changes

fix format and add comments

8deb304

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] FU-level fusion #244

[feat] FU-level fusion #244

Uh oh!

HobbitQia commented Jan 22, 2026

Uh oh!

tancheng Jan 23, 2026

Uh oh!

HobbitQia Jan 23, 2026

Uh oh!

tancheng Jan 23, 2026

Uh oh!

HobbitQia Jan 23, 2026

Uh oh!

tancheng Jan 23, 2026

Uh oh!

HobbitQia Jan 23, 2026

Uh oh!

tancheng Jan 23, 2026

Uh oh!

HobbitQia Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Field	Description
`template_id`	Unique identifier for this template
`instance_count`	Number of instances each pattern
`supported_single_ops`	Individual operations this template can execute standalone
`supported_composite_ops`	Fused patterns this template can execute
`slots`	Array of slot definitions with their supported operations
`slot_connections`	Data routing paths between slots
`pattern_execution_plans`	Detailed execution schedules for each pattern

Field	Description
`pattern_id`	Pattern being executed
`pattern_name`	pattern name
`slot_mapping`	Maps operation index to slot ID: `[op0→slot0, op1→slot1, ...]`, e.g. `[1, 2]` means we will use slot 1 and slot2 to execute op0 and op1 of this pattern.
`execution_stages`	Ordered stages of execution
`parallel_slots`	Slots executing in this stage (can be multiple for parallel ops)
`parallel_ops`	Operations executing in this stage

[feat] FU-level fusion #244

Are you sure you want to change the base?

[feat] FU-level fusion #244

Uh oh!

Conversation

HobbitQia commented Jan 22, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Field Explanations

Execution Plan Fields

Example

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants