onert would like to run llama model.
Llama has attention block, which is defined as LlamaAttention in modeling_llama.py from HF.
I would like to merge the all opcodes (includign RoPE) from LlamaAttention as 1 opcode ( attention ).
(All I need is attention.)
(I am going to add attention op in circle schema.)
It looks similar to #90.
But it is different:
What would be the best way to do this?
onert would like to run llama model.
Llama has attention block, which is defined as LlamaAttention in modeling_llama.py from HF.
I would like to merge the all opcodes (includign RoPE) from LlamaAttention as 1 opcode ( attention ).
(All I need is attention.)
(I am going to add attention op in circle schema.)
It looks similar to #90.
But it is different:
What would be the best way to do this?