Skip to content

Conversation

Copy link

Copilot AI commented Feb 5, 2026

Description

Updates OpenVINO Execution Provider operator support configuration for GroupQueryAttention:

  • Expands device support from GPU-only to both CPU and GPU (line 80)
  • Adds operator to no_dimension_supported_ list for unrestricted dimension handling (line 385)

Changes align with PR #625 implementation pattern for operator support declarations.

Motivation and Context

GroupQueryAttention was previously restricted to GPU devices in the ovep-develop branch. This change enables CPU execution and removes dimension constraints, matching the support level established in the reference implementation.

Original prompt

USe PR 625 as example to mark Group Query Attention as supported on ovep-develop branch


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: MayureshV1 <47039074+MayureshV1@users.noreply.github.com>
Copilot AI changed the title [WIP] Update group query attention support on ovep branch Enable CPU support for GroupQueryAttention in OpenVINO EP Feb 5, 2026
Copilot AI requested a review from MayureshV1 February 5, 2026 22:49
@MayureshV1
Copy link

We should not merge this PR until we have clarity in direction to enable GQA in OV (on CPU or NPU).

@MayureshV1 MayureshV1 changed the title Enable CPU support for GroupQueryAttention in OpenVINO EP CVS-180796: Enable CPU support for GroupQueryAttention in OpenVINO EP Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants