Skip to content

feat: GEPA integration for sensei skill + quality score CI workflow#1498

Merged
kvenkatrajan merged 7 commits intomicrosoft:mainfrom
spboyer:feat/gepa-sensei-integration
Apr 2, 2026
Merged

feat: GEPA integration for sensei skill + quality score CI workflow#1498
kvenkatrajan merged 7 commits intomicrosoft:mainfrom
spboyer:feat/gepa-sensei-integration

Conversation

@spboyer
Copy link
Copy Markdown
Member

@spboyer spboyer commented Mar 25, 2026

Summary

Adds GEPA (Genetic-Pareto) evolutionary optimization to the sensei skill, plus a CI workflow that scores SKILL.md quality on every PR.

What this PR adds

File Purpose
.github/skills/sensei/SKILL.md Added --gepa flag, GEPA mode docs, Step 5-GEPA in Ralph loop
.github/skills/sensei/scripts/gepa/auto_evaluator.py Auto-discovers test harness, builds GEPA evaluators, scores/optimizes skills
pipelines/gepa-quality-score.yml PR quality gate — scores all skills, uploads results as artifact
pipelines/gepa-quality-score-comment.yml workflow_run-triggered commenter — posts score results as PR comment

How it works

  1. Auto-discovers each skill's test harness (triggers.test.ts, unit.test.ts) at runtime
  2. Builds an evaluator that scores on content quality + trigger accuracy
  3. Proposes improvements via LLM (GitHub Models), keeping only better versions
  4. Surfaces scoring feedback as ASI (Actionable Side Information) so the LLM knows why a candidate scored low — based on heuristic content checks and trigger accuracy (Jest test integration is planned for a future iteration)

Existing tests are NOT replaced or modified. GEPA wraps them as its fitness function.

Current baseline: 0/23 skills pass quality threshold

Skill                           Quality  Triggers  Tests
────────────────────────────────────────────────────────
✗ azure-compliance                 0.14      0.67    TIU
✗ azure-messaging                  0.14      1.00    TIU
✗ azure-ai                         0.16       N/A    TIU
✗ azure-storage                    0.16       N/A    -I-
✗ appinsights-instrumentation      0.24      1.00    TIU
✗ azure-aigateway                  0.25      0.92    TIU
✗ azure-rbac                       0.25      0.91    TIU
✗ azure-quotas                     0.34      0.96    TIU
✗ azure-kusto                      0.37       N/A    -I-
✗ azure-compute                    0.38       N/A    TIU
✗ azure-cost-optimization          0.38      0.89    TIU
✗ azure-resource-lookup            0.38      0.54    TIU
✗ azure-resource-visualizer        0.38      0.92    TIU
✗ entra-app-registration           0.38      1.00    TIU
✗ azure-hosted-copilot-sdk         0.39      0.94    TIU
✗ azure-cloud-migrate              0.45      1.00    TIU
✗ azure-diagnostics                0.50      0.90    TIU
✗ azure-prepare                    0.50      1.00    TIU
✗ microsoft-foundry                0.50      0.91    TIU
✗ azure-deploy                     0.62      1.00    TIU
✗ azure-upgrade                    0.62      1.00    TIU
✗ azure-validate                   0.62      1.00    TIU

  0/23 skills at quality >= 0.80

GEPA optimization results (sample run on 4 skills)

Skill Quality Before Quality After What GEPA added
azure-storage 0.16 1.00 Triggers, Rules, Steps, USE FOR, WHEN, DO NOT USE FOR — all missing
entra-app-registration 0.38 1.00 Triggers, Rules, USE FOR, WHEN, DO NOT USE FOR
microsoft-foundry 0.50 1.00 Triggers, Rules, WHEN, DO NOT USE FOR
azure-deploy 0.62 1.00 USE FOR, WHEN, DO NOT USE FOR

Before / After example: azure-storage

BEFORE (quality: 0.16) — flat reference doc, no agent routing signals
# Azure Storage Services

## Services

| Service | Use When | MCP Tools | CLI |
|---------|----------|-----------|-----|
| Blob Storage | Objects, files, backups, static content | azure__storage | az storage blob |
| File Shares | SMB file shares, lift-and-shift | - | az storage file |

## MCP Server (Preferred)
## CLI Fallback
## Storage Account Tiers

Missing: ## Triggers, ## Rules, ## Steps, USE FOR, WHEN, DO NOT USE FOR

AFTER (quality: 1.00) — structured agent instructions with routing
# Azure Storage Services Skill

Azure Storage Services skill facilitates efficient data management,
storage, and access operations in Azure environments.

## Triggers
- WHEN a user inquires about storing, managing, or retrieving data in Azure Storage.
- WHEN a user mentions blob storage, file shares, queue storage, table storage, data lake.

## Rules
- USE THIS SKILL WHEN:
  - Query involves Azure Storage services (blobs, queues, tables, files, data lakes)
- DO NOT USE FOR:
  - General Azure resource provisioning (use azure-prepare)
  - Databases (SQL, Cosmos DB, MySQL)

## Steps
1. Analyze the user query to identify Azure Storage service and intent
2. Verify the required functionality (create, manage, retrieve)
3. Use Azure MCP commands if enabled
4. Fallback to Azure CLI if MCP unavailable
5. Provide best practices (tiers, redundancy, SDK usage)
6. Link relevant documentation

Usage

# Score all skills (instant, no LLM calls)
python .github/skills/sensei/scripts/gepa/auto_evaluator.py score-all \
  --skills-dir plugin/skills --tests-dir tests

# Optimize a specific skill
python .github/skills/sensei/scripts/gepa/auto_evaluator.py optimize \
  --skill azure-storage --skills-dir plugin/skills --tests-dir tests

# Via sensei
Run sensei on azure-storage --gepa

References

Copilot AI review requested due to automatic review settings March 25, 2026 17:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds GEPA-based evaluation/optimization tooling for skills and introduces a PR-time workflow that scores SKILL.md quality and reports results.

Changes:

  • Added a GitHub Actions workflow to score skills on PRs / manual runs and comment results.
  • Introduced a Python auto-evaluator that discovers existing TS test harnesses and computes quality/trigger scores (and can run GEPA optimization).
  • Updated sensei skill documentation to describe --gepa mode and a GEPA step in the Ralph loop.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
pipelines/gepa-quality-score.yml New CI workflow to compute GEPA quality scores, upload JSON results, and comment on PRs.
.github/skills/sensei/scripts/gepa/auto_evaluator.py New GEPA auto-evaluator CLI: harness discovery, scoring, and optimization entrypoint.
.github/skills/sensei/SKILL.md Docs: adds --gepa usage and describes the GEPA optimization step.

Comment thread pipelines/gepa-quality-score.yml
Comment thread pipelines/gepa-quality-score.yml Outdated
Comment thread pipelines/gepa-quality-score.yml Outdated
Comment thread pipelines/gepa-quality-score.yml Outdated
Comment thread .github/skills/sensei/scripts/gepa/auto_evaluator.py Outdated
Comment thread .github/skills/sensei/scripts/gepa/auto_evaluator.py Outdated
Comment thread .github/skills/sensei/scripts/gepa/auto_evaluator.py Outdated
Comment thread .github/skills/sensei/scripts/gepa/auto_evaluator.py Fixed
Comment thread .github/skills/sensei/scripts/gepa/auto_evaluator.py Fixed
Copilot AI review requested due to automatic review settings March 25, 2026 17:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (5)

.github/skills/sensei/scripts/gepa/auto_evaluator.py:347

  • trigger_prompt_count currently counts only should_trigger prompts and ignores should_not_trigger. That makes the field name misleading and can confuse downstream consumers of the JSON output. Either include both arrays in the count or rename the field to reflect what it measures.
        "has_integration_test": harness["has_integration"],
        "has_unit_test": harness["has_unit"],
        "trigger_prompt_count": len(harness["trigger_prompts"]["should_trigger"]),
    }

.github/skills/sensei/scripts/gepa/auto_evaluator.py:247

  • There are unused parameters that add noise and make the CLI harder to maintain: score_skill(..., as_json=...) never uses as_json, and build_evaluator(..., fast=...) never uses fast. Remove these parameters or wire them up (e.g., use fast to disable slower checks) to avoid misleading callers.
def build_evaluator(skill_name: str, tests_dir: Path, fast: bool = True):
    """Auto-build a GEPA evaluator for a skill from its test harness.

pipelines/gepa-quality-score.yml:7

  • The workflow trigger paths: doesn’t include the GEPA scoring implementation under .github/skills/sensei/scripts/gepa/**. Changes to the evaluator logic won’t re-run this quality-score workflow, which can lead to PRs merging with an unvalidated scoring change. Consider adding that path (and any other inputs like requirements files) to the trigger list.
  pull_request:
    paths:
      - 'plugin/skills/**/SKILL.md'
      - 'tests/**'

pipelines/gepa-quality-score.yml:30

  • Most workflows in this repo use actions/checkout@v6 (e.g., .github/workflows/pr.yml). This new workflow uses actions/checkout@v4, which is inconsistent and may behave differently than expected in this repo’s CI environment. Align the checkout action version with the rest of the workflows.
      - uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v5
        with:

pipelines/gepa-quality-score.yml:35

  • pip install gepa installs the latest GEPA release, which makes CI runs non-reproducible and can cause unexpected breakages when GEPA publishes changes. Pin the dependency to a known-good version (or install from a locked requirements file) so the scoring behavior is stable across PRs.
      - name: Install GEPA evaluator deps
        run: pip install gepa

Comment thread .github/skills/sensei/scripts/gepa/auto_evaluator.py
Comment thread .github/skills/sensei/SKILL.md Outdated
@JasonYeMSFT
Copy link
Copy Markdown
Member

In the before/after example of azure-storage, why do we need to describe when to use the skill after the skill description? They agent only has presents the skill's description to the LLM when deciding whether to load the skill.

@spboyer
Copy link
Copy Markdown
Member Author

spboyer commented Mar 25, 2026

Good point @JasonYeMSFT — the before/after example in the PR description shows what GEPA generates for the SKILL.md body, not the frontmatter description. You're right that routing decisions only use the frontmatter description field. The body ## Triggers / WHEN: sections are read after the skill is loaded to guide the LLM during execution, but they're redundant for routing.

I'll tune the GEPA optimization objective to focus body content on execution instructions (Steps, Rules, MCP Tools) rather than duplicating routing signals that belong in the frontmatter description. The AFTER example will be updated once that's done.

Copilot AI review requested due to automatic review settings March 25, 2026 20:46
@spboyer spboyer force-pushed the feat/gepa-sensei-integration branch from aa042e7 to 0cdeb3a Compare March 25, 2026 20:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (4)

pipelines/gepa-quality-score.yml:103

  • This workflow posts PR comments from the PR’s workflow run (pull-requests: write). Elsewhere in this repo, commenting is intentionally done via a separate workflow triggered by workflow_run so the commenting code always executes from main (see note in .github/workflows/pr.yml). To match that security model, consider emitting an artifact here and moving the PR-commenting step into the existing pr-comment.yml (or a new workflow_run-based commenter) instead of running github-script directly in the PR context.
      - name: Add PR comment with scores
        if: github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const results = JSON.parse(fs.readFileSync('score-results.json', 'utf8'));

pipelines/gepa-quality-score.yml:32

  • This workflow uses floating action tags (e.g., actions/checkout@v4, actions/setup-python@v5). In this repo’s existing GitHub Actions workflows, actions are pinned to full commit SHAs for supply-chain integrity (see .github/workflows/pr.yml). When moving this workflow under .github/workflows/, please pin each uses: to a commit SHA (and keep the version comment).
      - uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'

.github/skills/sensei/scripts/gepa/auto_evaluator.py:49

  • The header says this keyword matching “mirrors trigger-matcher.ts”, but the implementation diverges (stop-word filtering, stemming, per-word matching, extra keywords). Since trigger accuracy in this evaluator is compared against prompts coming from triggers.test.ts (which uses tests/utils/trigger-matcher.ts), this mismatch can produce misleading trigger_accuracy/fitness. Either reimplement the exact TriggerMatcher logic here (substring includes + same keyword set) or rename/reword to clarify it’s only an approximation and not comparable to the Jest trigger tests.
# ── Keyword matching (mirrors trigger-matcher.ts) ──────────────────────────

AZURE_KEYWORDS = [
    "azure", "storage", "cosmos", "sql", "redis", "keyvault", "key vault",
    "function", "app service", "container", "aks", "kubernetes", "bicep",
    "terraform", "deploy", "monitor", "diagnostic", "security", "rbac",
    "identity", "entra", "authentication", "cli", "mcp", "validation",
    "networking", "observability", "foundry", "agent", "model",
]

STOP_WORDS = {
    "the", "and", "for", "with", "this", "that", "from", "have", "has",
    "are", "was", "were", "been", "being", "will", "would", "could",
    "should", "may", "might", "can", "shall", "not", "use", "when",
    "what", "how", "why", "who", "which", "where", "does", "don",
    "your", "its", "our", "their", "these", "those", "some", "any",
    "all", "each", "every", "both", "such", "than", "also", "only",
}

.github/skills/sensei/scripts/gepa/auto_evaluator.py:255

  • build_evaluator(..., fast: bool = True) takes a fast flag but it’s never used. This makes it unclear whether “fast vs full” evaluation is supported. Either remove the parameter or implement the intended behavior (e.g., toggling whether to run Jest tests vs. heuristic-only scoring).
def build_evaluator(skill_name: str, tests_dir: Path, fast: bool = True):
    """Auto-build a GEPA evaluator for a skill from its test harness.

    Returns a callable(candidate, example) -> (score, asi_dict).
    """
    harness = discover_test_harness(tests_dir, skill_name)

Comment thread .github/skills/sensei/scripts/gepa/auto_evaluator.py Outdated
Comment thread .github/skills/sensei/SKILL.md
spboyer added a commit to spboyer/sensei that referenced this pull request Mar 25, 2026
Feedback from microsoft/GitHub-Copilot-for-Azure#1498:

- Focus body on execution (Rules, Steps, MCP Tools) not routing
  signals — routing belongs in frontmatter description (JasonYeMSFT)
- Strip comments before extracting trigger arrays to avoid
  commented-out prompts polluting test data (Copilot reviewer)
- Clarify --gepa replaces Step 5 only, not Steps 5-6 (Copilot reviewer)
- Remove unused imports: dataclass, field (CodeQL)
- Remove unused params: as_json in score_skill, fast in build_evaluator
- Fix trigger_prompt_count to include both should/should_not arrays
- Update optimizer objective to distinguish routing vs execution content

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer force-pushed the feat/gepa-sensei-integration branch from 133105d to 0b3945a Compare March 26, 2026 04:23
Copilot AI review requested due to automatic review settings March 26, 2026 04:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

Comment thread .github/skills/sensei/scripts/gepa/auto_evaluator.py
Comment thread .github/skills/sensei/scripts/gepa/auto_evaluator.py Outdated
Comment thread pipelines/gepa-quality-score.yml Outdated
Comment thread .github/skills/sensei/scripts/gepa/auto_evaluator.py Outdated
Comment thread .github/skills/sensei/scripts/gepa/auto_evaluator.py
Comment thread .github/skills/sensei/scripts/gepa/auto_evaluator.py Outdated
Comment thread .github/skills/sensei/SKILL.md Outdated
Comment thread pipelines/gepa-quality-score.yml Outdated
Comment thread pipelines/gepa-quality-score.yml Outdated
spboyer and others added 6 commits March 26, 2026 10:48
Add GEPA (Genetic-Pareto) evolutionary optimization as an optional
enhancement to sensei's Ralph loop for automated SKILL.md improvement.

Changes:
- .github/skills/sensei/SKILL.md: Added --gepa flag, GEPA mode docs,
  Step 5-GEPA in the Ralph loop
- .github/skills/sensei/scripts/gepa/auto_evaluator.py: Auto-discovers
  test harness at runtime, builds GEPA evaluators, scores/optimizes skills
- pipelines/gepa-quality-score.yml: PR quality gate that scores SKILL.md
  quality and posts results as PR comment

The auto-evaluator requires zero manual configuration. It reads
triggers.test.ts to extract shouldTrigger/shouldNotTrigger arrays
and builds a composite evaluator (content quality + trigger accuracy).

Existing tests are NOT replaced or modified.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Bump sensei SKILL.md version 1.0.0 → 1.0.2 (fixes Skill Structure CI)
- Remove unused imports: sys, dataclass, field (fixes CodeQL warnings)
- Extract strip_frontmatter() helper to replace fragile content.index()
  parsing that could raise ValueError on malformed frontmatter
- Deduplicate frontmatter stripping logic between score_skill/optimize_skill
- Add explicit permissions block (contents: read, pull-requests: write)
- Use sticky comment pattern (<- Consolidate FileSystemWatcher usage: gepa-quality-score --> marker) to avoid
  PR comment spam on re-runs
- Fix display results to match workflow_dispatch single-skill input
- Rename quality gate step to '(advisory)' to clarify non-blocking behavior

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Forked PRs have reduced GITHUB_TOKEN permissions, which would cause
the comment step to fail. Only post comments when the PR originates
from the same repository.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Strip single-line (//) and multi-line (/* */) comments from trigger
  test arrays before extracting strings, preventing commented-out
  example prompts from polluting trigger accuracy scoring
- Fix SKILL.md step 5b to clarify GEPA only replaces step 5 (IMPROVE
  FRONTMATTER), not step 6 (IMPROVE TESTS)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The evaluator parses trigger prompt arrays and uses content heuristics
for scoring — it does not execute Jest tests or incorporate test
pass/fail results. Updated docs to accurately describe this.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove unused params: as_json from score_skill, fast from build_evaluator
- Pin all actions to commit SHAs matching repo convention (checkout v6,
  setup-python v6.2.0, upload-artifact v7.0.0, github-script v8.0.0)
- Pin gepa dependency to v0.7.0 for reproducible CI
- Remove DO NOT USE FOR from scoring criteria (conflicts with repo
  guidance that discourages it due to keyword contamination risk)
- Add quality_score_raw field for full-precision threshold comparisons
- Enhance parse_trigger_arrays to resolve ...varName spread patterns
  by extracting strings from referenced arrays in the same file
- Clarify SKILL.md step 5b: GEPA uses trigger definitions as config,
  does not execute Jest tests
- Add NOTE about future workflow_run commenting pattern migration

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 26, 2026 17:48
@spboyer spboyer force-pushed the feat/gepa-sensei-integration branch from 25b60cd to 8061237 Compare March 26, 2026 17:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (4)

pipelines/gepa-quality-score.yml:79

  • The quality-gate script compares against r.get('quality_score', 0), but auto_evaluator.py rounds quality_score to 2 decimals and also emits an unrounded quality_score_raw. Using the rounded value can incorrectly pass/fail near the threshold (e.g., 0.799 rounds to 0.80). Prefer comparing quality_score_raw (or avoid rounding in the JSON written to score-results.json).
          MIN_SCORE="${{ github.event.inputs.min_score || '0.5' }}"
          python -c "
          import json, sys
          with open('score-results.json') as f:
              results = json.load(f)
          if not isinstance(results, list):
              results = [results]

          failed = []
          for r in results:
              if 'error' in r:
                  continue
              if r.get('quality_score', 0) < float('${MIN_SCORE}'):
                  failed.append(f\"{r['skill']}: {r['quality_score']:.2f} (need >= ${MIN_SCORE})\")

pipelines/gepa-quality-score.yml:89

  • MIN_SCORE is interpolated directly into an inline python -c string. For workflow_dispatch, min_score is user-provided input, so this pattern can lead to accidental quoting issues or code injection if the input contains unexpected characters. Safer approach: pass MIN_SCORE via environment (or argv) and parse it inside Python without string interpolation.
          MIN_SCORE="${{ github.event.inputs.min_score || '0.5' }}"
          python -c "
          import json, sys
          with open('score-results.json') as f:
              results = json.load(f)
          if not isinstance(results, list):
              results = [results]

          failed = []
          for r in results:
              if 'error' in r:
                  continue
              if r.get('quality_score', 0) < float('${MIN_SCORE}'):
                  failed.append(f\"{r['skill']}: {r['quality_score']:.2f} (need >= ${MIN_SCORE})\")

          if failed:
              print('⚠️ Skills below quality threshold (advisory — not blocking):')
              for f in failed:
                  print(f'  {f}')
              print()
              print('💡 Run: python .github/skills/sensei/scripts/gepa/auto_evaluator.py optimize --skill <name> --skills-dir plugin/skills --tests-dir tests')
          else:
              print('✅ All skills meet quality threshold')
          "

pipelines/gepa-quality-score.yml:103

  • This workflow posts PR comments from the pull_request workflow itself. The repo already uses a workflow_run-based commenter (.github/workflows/pr-comment.yml) specifically so commenting code always runs from main and can’t be modified by a PR. To match that security posture, consider uploading score-results.json (and/or a rendered markdown report) as an artifact here and adding a separate workflow_run commenter on main to download and post/update the sticky comment.
      # NOTE: Ideally PR commenting should use a workflow_run-based pattern
      # (score workflow uploads artifact, separate commenter on main downloads
      # and posts) for better security. See .github/workflows/pr-comment.yml.
      - name: Add PR comment with scores
        if: github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository
        uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
        with:

pipelines/gepa-quality-score.yml:34

  • This workflow only runs score/score-all, and those code paths don’t import gepa (the gepa.optimize_anything import is only reached in optimize). Installing gepa==0.7.0 here adds time and an external dependency for a job that, as written, can score without it. Either drop this install for the scoring workflow, or make the scorer depend on GEPA explicitly so the dependency is justified.
      - name: Install GEPA evaluator deps
        run: pip install "gepa==0.7.0"

Comment thread .github/skills/sensei/scripts/gepa/auto_evaluator.py
Comment thread .github/skills/sensei/scripts/gepa/auto_evaluator.py Outdated
- Split gepa-quality-score.yml into read-only scoring workflow +
  workflow_run-triggered commenter (gepa-quality-score-comment.yml),
  matching the repo's existing pr.yml / pr-comment.yml pattern
- Fix API key regex to also match 'api key:' with whitespace separator
- Update PR description to clarify ASI uses heuristic scoring
  (Jest integration is planned for future iteration)
- Remove pull-requests:write from scoring workflow permissions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer
Copy link
Copy Markdown
Member Author

spboyer commented Apr 1, 2026

@-

@kvenkatrajan kvenkatrajan merged commit dc149f6 into microsoft:main Apr 2, 2026
9 checks passed
tmeschter pushed a commit to tmeschter/GitHub-Copilot-for-Azure that referenced this pull request Apr 2, 2026
…icrosoft#1498)

* feat: add GEPA integration to sensei skill + quality score workflow

Add GEPA (Genetic-Pareto) evolutionary optimization as an optional
enhancement to sensei's Ralph loop for automated SKILL.md improvement.

Changes:
- .github/skills/sensei/SKILL.md: Added --gepa flag, GEPA mode docs,
  Step 5-GEPA in the Ralph loop
- .github/skills/sensei/scripts/gepa/auto_evaluator.py: Auto-discovers
  test harness at runtime, builds GEPA evaluators, scores/optimizes skills
- pipelines/gepa-quality-score.yml: PR quality gate that scores SKILL.md
  quality and posts results as PR comment

The auto-evaluator requires zero manual configuration. It reads
triggers.test.ts to extract shouldTrigger/shouldNotTrigger arrays
and builds a composite evaluator (content quality + trigger accuracy).

Existing tests are NOT replaced or modified.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: address PR review feedback for GEPA integration

- Bump sensei SKILL.md version 1.0.0 → 1.0.2 (fixes Skill Structure CI)
- Remove unused imports: sys, dataclass, field (fixes CodeQL warnings)
- Extract strip_frontmatter() helper to replace fragile content.index()
  parsing that could raise ValueError on malformed frontmatter
- Deduplicate frontmatter stripping logic between score_skill/optimize_skill
- Add explicit permissions block (contents: read, pull-requests: write)
- Use sticky comment pattern (<- Consolidate FileSystemWatcher usage: gepa-quality-score --> marker) to avoid
  PR comment spam on re-runs
- Fix display results to match workflow_dispatch single-skill input
- Rename quality gate step to '(advisory)' to clarify non-blocking behavior

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: skip PR comment step for forked PRs

Forked PRs have reduced GITHUB_TOKEN permissions, which would cause
the comment step to fail. Only post comments when the PR originates
from the same repository.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: strip comments in trigger parsing + clarify GEPA step scope

- Strip single-line (//) and multi-line (/* */) comments from trigger
  test arrays before extracting strings, preventing commented-out
  example prompts from polluting trigger accuracy scoring
- Fix SKILL.md step 5b to clarify GEPA only replaces step 5 (IMPROVE
  FRONTMATTER), not step 6 (IMPROVE TESTS)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: correct docstring and SKILL.md to reflect actual evaluator behavior

The evaluator parses trigger prompt arrays and uses content heuristics
for scoring — it does not execute Jest tests or incorporate test
pass/fail results. Updated docs to accurately describe this.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: address round 3 review feedback

- Remove unused params: as_json from score_skill, fast from build_evaluator
- Pin all actions to commit SHAs matching repo convention (checkout v6,
  setup-python v6.2.0, upload-artifact v7.0.0, github-script v8.0.0)
- Pin gepa dependency to v0.7.0 for reproducible CI
- Remove DO NOT USE FOR from scoring criteria (conflicts with repo
  guidance that discourages it due to keyword contamination risk)
- Add quality_score_raw field for full-precision threshold comparisons
- Enhance parse_trigger_arrays to resolve ...varName spread patterns
  by extracting strings from referenced arrays in the same file
- Clarify SKILL.md step 5b: GEPA uses trigger definitions as config,
  does not execute Jest tests
- Add NOTE about future workflow_run commenting pattern migration

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: address PR review feedback — split workflow, fix regex, update docs

- Split gepa-quality-score.yml into read-only scoring workflow +
  workflow_run-triggered commenter (gepa-quality-score-comment.yml),
  matching the repo's existing pr.yml / pr-comment.yml pattern
- Fix API key regex to also match 'api key:' with whitespace separator
- Update PR description to clarify ASI uses heuristic scoring
  (Jest integration is planned for future iteration)
- Remove pull-requests:write from scoring workflow permissions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
tmeschter pushed a commit to tmeschter/GitHub-Copilot-for-Azure that referenced this pull request Apr 2, 2026
…icrosoft#1498)

* feat: add GEPA integration to sensei skill + quality score workflow

Add GEPA (Genetic-Pareto) evolutionary optimization as an optional
enhancement to sensei's Ralph loop for automated SKILL.md improvement.

Changes:
- .github/skills/sensei/SKILL.md: Added --gepa flag, GEPA mode docs,
  Step 5-GEPA in the Ralph loop
- .github/skills/sensei/scripts/gepa/auto_evaluator.py: Auto-discovers
  test harness at runtime, builds GEPA evaluators, scores/optimizes skills
- pipelines/gepa-quality-score.yml: PR quality gate that scores SKILL.md
  quality and posts results as PR comment

The auto-evaluator requires zero manual configuration. It reads
triggers.test.ts to extract shouldTrigger/shouldNotTrigger arrays
and builds a composite evaluator (content quality + trigger accuracy).

Existing tests are NOT replaced or modified.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: address PR review feedback for GEPA integration

- Bump sensei SKILL.md version 1.0.0 → 1.0.2 (fixes Skill Structure CI)
- Remove unused imports: sys, dataclass, field (fixes CodeQL warnings)
- Extract strip_frontmatter() helper to replace fragile content.index()
  parsing that could raise ValueError on malformed frontmatter
- Deduplicate frontmatter stripping logic between score_skill/optimize_skill
- Add explicit permissions block (contents: read, pull-requests: write)
- Use sticky comment pattern (<- Consolidate FileSystemWatcher usage: gepa-quality-score --> marker) to avoid
  PR comment spam on re-runs
- Fix display results to match workflow_dispatch single-skill input
- Rename quality gate step to '(advisory)' to clarify non-blocking behavior

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: skip PR comment step for forked PRs

Forked PRs have reduced GITHUB_TOKEN permissions, which would cause
the comment step to fail. Only post comments when the PR originates
from the same repository.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: strip comments in trigger parsing + clarify GEPA step scope

- Strip single-line (//) and multi-line (/* */) comments from trigger
  test arrays before extracting strings, preventing commented-out
  example prompts from polluting trigger accuracy scoring
- Fix SKILL.md step 5b to clarify GEPA only replaces step 5 (IMPROVE
  FRONTMATTER), not step 6 (IMPROVE TESTS)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: correct docstring and SKILL.md to reflect actual evaluator behavior

The evaluator parses trigger prompt arrays and uses content heuristics
for scoring — it does not execute Jest tests or incorporate test
pass/fail results. Updated docs to accurately describe this.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: address round 3 review feedback

- Remove unused params: as_json from score_skill, fast from build_evaluator
- Pin all actions to commit SHAs matching repo convention (checkout v6,
  setup-python v6.2.0, upload-artifact v7.0.0, github-script v8.0.0)
- Pin gepa dependency to v0.7.0 for reproducible CI
- Remove DO NOT USE FOR from scoring criteria (conflicts with repo
  guidance that discourages it due to keyword contamination risk)
- Add quality_score_raw field for full-precision threshold comparisons
- Enhance parse_trigger_arrays to resolve ...varName spread patterns
  by extracting strings from referenced arrays in the same file
- Clarify SKILL.md step 5b: GEPA uses trigger definitions as config,
  does not execute Jest tests
- Add NOTE about future workflow_run commenting pattern migration

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: address PR review feedback — split workflow, fix regex, update docs

- Split gepa-quality-score.yml into read-only scoring workflow +
  workflow_run-triggered commenter (gepa-quality-score-comment.yml),
  matching the repo's existing pr.yml / pr-comment.yml pattern
- Fix API key regex to also match 'api key:' with whitespace separator
- Update PR description to clarify ASI uses heuristic scoring
  (Jest integration is planned for future iteration)
- Remove pull-requests:write from scoring workflow permissions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
tmeschter added a commit that referenced this pull request Apr 14, 2026
…1585)

* fix: prevent azd+Terraform template variable interpolation failures (#1558)

Address azd template variable interpolation gap that caused deployment
timeouts in terraform-azure-container-apps-deploy integration tests.

azure-prepare (v1.0.13):
- Add warning against using Go-style template variables in .tfvars.json
- Document correct variable passing: azd auto-mapping, TF_VAR_* env vars
- Remove incorrect env() function usage in variable example
- Add troubleshooting entries for template interpolation errors

azure-validate (v1.0.3):
- Add Step 10: Template Variable Resolution Check for azd+Terraform
- Detect unresolved {{ .Env.* }} patterns and .tfvars.json files
- Provide remediation steps to fix before deployment

azure-deploy (v1.0.9):
- Add Unresolved Terraform Template Variables error section with solution
- Add pre-deploy Step 9: Verify Terraform Variable Resolution
- Add Terraform state management error entries
- Document azd state clearing behavior and remote backend recommendation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore: bump azure-prepare skill version to 1.0.14

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update plugin/skills/azure-deploy/references/pre-deploy-checklist.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Restore current plugin version before checking for skill content changes (#1595)

* fix: rename .azure/plan.md to .azure/deployment-plan.md to prevent agent confusion (#1584)

* fix: rename .azure/plan.md to .azure/deployment-plan.md to prevent confusion with session-state plan.md

The agent was confusing the workspace deployment plan (.azure/plan.md) with the
session-state plan.md file, causing the 'creates correct files for AZD with Bicep
recipe' integration test to fail (issue #1562).

Renaming to deployment-plan.md eliminates the name collision and makes the file's
purpose self-documenting. Updated all references across azure-prepare,
azure-validate, and azure-deploy skills, their reference docs, and all tests.

Closes #1562

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: update azure-deploy trigger keyword snapshots for deployment-plan rename

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* build(deps): bump github/codeql-action from 4.34.1 to 4.35.1 (#1570)

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.34.1 to 4.35.1.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@3869755...c10b806)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 4.35.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Remove strong verbiage from Enterprise Infra Planner skill (#1533)

* feat: remove strong verbiage

* feat: update phrasing

* fix: undo tool name change

* chore: bump version to 1.0.1

---------

Co-authored-by: Michael Ren <mren@microsoft.com>

* fix: Add Docker build context validation to azure-validate skill (#1586)

* Add Docker build context validation to azure-validate skill

Pre-validate Docker build context during azure-validate by checking
for package-lock.json when npm ci is specified in a Dockerfile. This
prevents Docker build failures during azd package/up that waste time
and can push deployments past test timeouts.

Changes:
- AZD recipe: Add step 9 (Docker Build Context Validation) between
  Build Verification and Package Validation
- AZCLI recipe: Enhance Docker Build step with build context
  pre-validation before attempting docker build
- AZD/AZCLI errors: Add npm ci / package-lock.json missing error entry
- Bump azure-validate version to 1.0.3

Fixes #1557

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Update plugin/skills/azure-validate/references/recipes/azd/README.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update plugin/skills/azure-validate/references/recipes/azd/README.md

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Copy .claude-plugin/plugin.json to repo root for Claude marketplace support (#1605)

Add .claude-plugin/plugin.json to the sync-to-microsoft-azure-skills job
so the Azure plugin is discoverable in the Claude marketplace. Updates
the copy, URL replacement, version restore, and version bump steps.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Copy hooks to top-level folder in azure-skills in addition to skills (#1606)

* Copy .claude-plugin/plugin.json to repo root for Claude marketplace support

Add .claude-plugin/plugin.json to the sync-to-microsoft-azure-skills job
so the Azure plugin is discoverable in the Claude marketplace. Updates
the copy, URL replacement, version restore, and version bump steps.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Copy hooks to top-level folder in azure-skills in addition to skills

Update the sync-to-microsoft-azure-skills job in the publish pipeline
to also copy hooks/ and copilot-hooks.json to the repo root, matching
how skills/ is already copied. Also add the new paths to the URL
replacement step.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Azure skills update for verification of functionality and role assignments (#1220)

* update azure-prepare skill to check subscription policies

* update azure-prepare skill to check functionality before deploying

* Add role assignment verification step to azure-prepare skill

Add new Phase 2 step 5 (Verify Role Assignments) between security
hardening and functional verification. Includes reference doc with
service-to-role mapping table, MCP tool usage, and common RBAC
mistakes (e.g., generic Contributor lacking data-plane access).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Update plugin/skills/azure-prepare/references/role-verification.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update plugin/skills/azure-prepare/references/functional-verification.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Add live role verification step to azure-validate skill

Add step 4 (Live Role Verification) to query Azure for provisioned
RBAC assignments and cross-check against expected roles. Complements
the static role check in azure-prepare: prepare checks generated
Bicep/Terraform, validate checks live Azure state.

Includes reference doc with MCP tool usage, CLI commands, common
issues table, and decision tree for pass/fail criteria.

Bumps azure-validate version 1.0.0 -> 1.0.1.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Clarify azure-prepare role check as static only

Replace MCP live-query section with static code review guidance.
Live role verification is the responsibility of azure-validate
step 4 (live-role-verification.md). This removes the overlap
between the two skills.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor: move role verification across prepare/validate/deploy skills

- Remove static role check (step 5) from azure-prepare — prepare just generates
- Add static role check as step 4 in azure-validate (pre-deployment)
- Move live role check from azure-validate step 4 to azure-deploy step 8 (post-deployment)
- Move role-verification.md from azure-prepare to azure-validate references
- Move live-role-verification.md from azure-validate to azure-deploy references
- Update all step number cross-references in functional-verification.md
- Bump versions: prepare 1.0.6->1.0.7, validate 1.0.1->1.0.2, deploy 1.0.5->1.0.6

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: add azure__role to azure-deploy MCP Tools table

Step 8 (Live Role Verification) references azure__role for RBAC
assignment listing, but the tool was missing from the MCP Tools
table. Agents could incorrectly assume only the three listed tools
are available. Bump version 1.0.6 -> 1.0.7.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* update test snapshots

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* update the versions and added integration/ unit tests

* Update plugin/skills/azure-validate/references/role-verification.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* update skills and the test runs

* update snapshots

* update to correct versions

* update reference name

* update versioning

* update steps

* fix: update azure-deploy trigger test snapshots

Keywords were removed from the SKILL.md description in a previous PR
but the trigger test snapshots were not regenerated, causing 2 snapshot
failures in the pipeline.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: update azure-deploy trigger test snapshots

Keywords were removed from the SKILL.md description in a previous PR
but the trigger test snapshots were not regenerated, causing 2 snapshot
failures in the pipeline.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* update snapshot

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Fix azure-prepare: add DTS bicep.md to workflow routing (#1540) (#1604)

The workflow routing entries loaded durable.md and the DTS README but not
bicep.md — so the agent had overview docs but not the Bicep patterns
needed to generate .bicep files. Also adds 'order processing' keyword.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* add skill invocation rate dashboard (#1630)

* Add unified azure-cost skills that combines azure-cost-query, azure-cost-forecast and azure cost-optimization skills (#1221)

* initial implementation

* update the guardrails for query and forecast

* reduce token limit of reference files

* update unit tests

* fix breaking PR checks

* fix pr check errors and code review comments

* refactor to azure-cost (#1)

* update tests and references to combined azure cost skill

* Remove unused test fixture files

Delete cost-query-sample.json and cost-forecast-sample.json from
tests/azure-cost/fixtures/ as they are not referenced by any test
files. No other skills in the repo use fixture files either, so
these add maintenance overhead without value.

Addresses PR review comment #14 and #15 (fixtures removed entirely
rather than fixing hard-coded dates, since they were unused).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* updates tests

* Consolidate azure-cost tests to standard 3-file layout, fix CI gates

- Consolidate 12 test files into standard 3-file structure (unit/triggers/integration)
- Rewrite integration tests using canonical withTestResult pattern
- Move all positive trigger prompts into triggers.test.ts
- Move all sub-area unit assertions into unit.test.ts
- Delete 9 redundant sub-area test files
- Regenerate snapshot with Jest 30 header format
- Bump sensei version 1.0.1 -> 1.0.2
- Bump azure-prepare version 1.0.10 -> 1.0.11

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix PR review comments: canonical azqr tool name, table formatting

- Change azure__extension_azqr to mcp_azure_mcp_extension_azqr in SKILL.md
- Fix missing space in 429 table row in cost-forecast/error-handling.md

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review comments: MCP tools table, code block languages, remove phantom skill

- Add azure__extension_azqr and azure__aks to MCP Tools table for consistency
- Add yaml language to azqr code block in SKILL.md
- Add text language to portal link code block in report-template.md
- Remove non-existent azure-create-app row from tests/README.md coverage grid

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor: extract azure-cost workflows into separate reference files

Move each workflow (query, optimization, forecast) into dedicated
reference files under references/ for progressive disclosure. This
reduces SKILL.md from 575 lines (23KB) to 139 lines (7KB), so the
agent only loads the workflow it needs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor: move workflow files into their respective folders

Move cost-query-workflow.md, cost-optimization-workflow.md, and
cost-forecast-workflow.md from references/ into cost-query/,
cost-optimization/, and cost-forecast/ as workflow.md. Update all
links in SKILL.md and cross-references. Bump version to 1.0.2.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fixes skill quality issues

* fix: update azure-cost tests for refactored skill structure

- Update snapshot to match new description with DO NOT USE FOR clause
- Update unit tests to load workflow files directly (content moved from
  SKILL.md to cost-query/, cost-forecast/, cost-optimization/ folders)
- Fix heading level assertions (## not ### in standalone workflow files)
- Remove 3 shouldNotTrigger prompts that contain cost keywords and
  correctly trigger the keyword-based matcher

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: reset azure-cost version to 1.0.0 and fix YAML comment syntax

- Reset version to 1.0.0 for new skill directory (was incorrectly 1.0.3)
- Change // optional to # optional in YAML code block

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Sai Koumudi Kaluvakolanu <saikoumudi@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Document principal type mismatch error in AZD errors reference (#1649)

* Document principal type mismatch error in AZD errors reference

AZD base templates (e.g. functions-quickstart-python-http-azd) create RBAC
role assignments with hardcoded principalType 'User' for the deploying
identity. In CI/CD where a service principal is used, ARM rejects this
with a PrincipalType mismatch error. The agent had no guidance for this
failure and spent multiple retries before finding the fix.

Adding this to the AZD errors reference gives the agent a direct path to
the solution: set allowUserIdentityPrincipal to false in main.bicep. It
also warns against the ineffective workaround of clearing
AZURE_PRINCIPAL_ID.

Fixes #1624

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Update plugin/skills/azure-deploy/references/recipes/azd/errors.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Clarify main.parameters.json format to prevent .bicepparam confusion (#1648)

* Clarify main.parameters.json format to prevent .bicepparam confusion

Add explicit warnings and complete examples to the three skill reference
files used by azure-prepare, azure-validate, and azure-deploy:

- patterns.md: Replace hard-coded values with azd \ substitution
  syntax and add a warning against .bicepparam syntax
- iac-rules.md: Add a new Parameter File Format section with a full
  ARM JSON example and format warning
- troubleshooting.md: Add \/contentVersion to the incomplete JSON
  example and add a format warning callout

Addresses the root cause of issue #1623 where the agent created
main.parameters.json with .bicepparam syntax (readEnvironmentVariable),
causing 6 failed azd provision --preview attempts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Bump azure-prepare to 1.1.2 and azure-deploy to 1.0.11

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Update plugin/skills/azure-prepare/references/recipes/bicep/patterns.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* refactor quota tests (#1489)

* refactor quota tests

* add more specification to test

---------

Co-authored-by: Christopher Earley <cearley@microsoft.com>

* update AKS cost spike prompts with specific time window (#1650)

Co-authored-by: Harsha Nair <hnair@microsoft.com>

* feat: GEPA integration for sensei skill + quality score CI workflow (#1498)

* feat: add GEPA integration to sensei skill + quality score workflow

Add GEPA (Genetic-Pareto) evolutionary optimization as an optional
enhancement to sensei's Ralph loop for automated SKILL.md improvement.

Changes:
- .github/skills/sensei/SKILL.md: Added --gepa flag, GEPA mode docs,
  Step 5-GEPA in the Ralph loop
- .github/skills/sensei/scripts/gepa/auto_evaluator.py: Auto-discovers
  test harness at runtime, builds GEPA evaluators, scores/optimizes skills
- pipelines/gepa-quality-score.yml: PR quality gate that scores SKILL.md
  quality and posts results as PR comment

The auto-evaluator requires zero manual configuration. It reads
triggers.test.ts to extract shouldTrigger/shouldNotTrigger arrays
and builds a composite evaluator (content quality + trigger accuracy).

Existing tests are NOT replaced or modified.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: address PR review feedback for GEPA integration

- Bump sensei SKILL.md version 1.0.0 → 1.0.2 (fixes Skill Structure CI)
- Remove unused imports: sys, dataclass, field (fixes CodeQL warnings)
- Extract strip_frontmatter() helper to replace fragile content.index()
  parsing that could raise ValueError on malformed frontmatter
- Deduplicate frontmatter stripping logic between score_skill/optimize_skill
- Add explicit permissions block (contents: read, pull-requests: write)
- Use sticky comment pattern (<- Consolidate FileSystemWatcher usage: gepa-quality-score --> marker) to avoid
  PR comment spam on re-runs
- Fix display results to match workflow_dispatch single-skill input
- Rename quality gate step to '(advisory)' to clarify non-blocking behavior

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: skip PR comment step for forked PRs

Forked PRs have reduced GITHUB_TOKEN permissions, which would cause
the comment step to fail. Only post comments when the PR originates
from the same repository.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: strip comments in trigger parsing + clarify GEPA step scope

- Strip single-line (//) and multi-line (/* */) comments from trigger
  test arrays before extracting strings, preventing commented-out
  example prompts from polluting trigger accuracy scoring
- Fix SKILL.md step 5b to clarify GEPA only replaces step 5 (IMPROVE
  FRONTMATTER), not step 6 (IMPROVE TESTS)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: correct docstring and SKILL.md to reflect actual evaluator behavior

The evaluator parses trigger prompt arrays and uses content heuristics
for scoring — it does not execute Jest tests or incorporate test
pass/fail results. Updated docs to accurately describe this.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: address round 3 review feedback

- Remove unused params: as_json from score_skill, fast from build_evaluator
- Pin all actions to commit SHAs matching repo convention (checkout v6,
  setup-python v6.2.0, upload-artifact v7.0.0, github-script v8.0.0)
- Pin gepa dependency to v0.7.0 for reproducible CI
- Remove DO NOT USE FOR from scoring criteria (conflicts with repo
  guidance that discourages it due to keyword contamination risk)
- Add quality_score_raw field for full-precision threshold comparisons
- Enhance parse_trigger_arrays to resolve ...varName spread patterns
  by extracting strings from referenced arrays in the same file
- Clarify SKILL.md step 5b: GEPA uses trigger definitions as config,
  does not execute Jest tests
- Add NOTE about future workflow_run commenting pattern migration

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: address PR review feedback — split workflow, fix regex, update docs

- Split gepa-quality-score.yml into read-only scoring workflow +
  workflow_run-triggered commenter (gepa-quality-score-comment.yml),
  matching the repo's existing pr.yml / pr-comment.yml pattern
- Fix API key regex to also match 'api key:' with whitespace separator
- Update PR description to clarify ASI uses heuristic scoring
  (Jest integration is planned for future iteration)
- Remove pull-requests:write from scoring workflow permissions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* build(deps-dev): bump the minor-and-patch group (#1572)

Bumps the minor-and-patch group in /scripts with 4 updates: [@vitest/coverage-v8](https://github.com/vitest-dev/vitest/tree/HEAD/packages/coverage-v8), [fast-xml-parser](https://github.com/NaturalIntelligence/fast-xml-parser), [typescript-eslint](https://github.com/typescript-eslint/typescript-eslint/tree/HEAD/packages/typescript-eslint) and [vitest](https://github.com/vitest-dev/vitest/tree/HEAD/packages/vitest).


Updates `@vitest/coverage-v8` from 4.1.0 to 4.1.2
- [Release notes](https://github.com/vitest-dev/vitest/releases)
- [Commits](https://github.com/vitest-dev/vitest/commits/v4.1.2/packages/coverage-v8)

Updates `fast-xml-parser` from 5.5.8 to 5.5.9
- [Release notes](https://github.com/NaturalIntelligence/fast-xml-parser/releases)
- [Changelog](https://github.com/NaturalIntelligence/fast-xml-parser/blob/master/CHANGELOG.md)
- [Commits](NaturalIntelligence/fast-xml-parser@v5.5.8...v5.5.9)

Updates `typescript-eslint` from 8.57.1 to 8.57.2
- [Release notes](https://github.com/typescript-eslint/typescript-eslint/releases)
- [Changelog](https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/typescript-eslint/CHANGELOG.md)
- [Commits](https://github.com/typescript-eslint/typescript-eslint/commits/v8.57.2/packages/typescript-eslint)

Updates `vitest` from 4.1.0 to 4.1.2
- [Release notes](https://github.com/vitest-dev/vitest/releases)
- [Commits](https://github.com/vitest-dev/vitest/commits/v4.1.2/packages/vitest)

---
updated-dependencies:
- dependency-name: "@vitest/coverage-v8"
  dependency-version: 4.1.2
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
- dependency-name: fast-xml-parser
  dependency-version: 5.5.9
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
- dependency-name: typescript-eslint
  dependency-version: 8.57.2
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
- dependency-name: vitest
  dependency-version: 4.1.2
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add kvenkatrajan as codeowner for entra-app-registration (#1667)

* Initial plan

* Add kvenkatrajan as codeowner for entra-app-registration

Agent-Logs-Url: https://github.com/microsoft/GitHub-Copilot-for-Azure/sessions/0062a31a-0103-4dbf-a191-8264b9deea81

Co-authored-by: kvenkatrajan <102772054+kvenkatrajan@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: kvenkatrajan <102772054+kvenkatrajan@users.noreply.github.com>

* adding the azure skills gif (#1651)

* Address wbreza PR review: fix main.tfvars.json guidance and auto-mapping claims

Key changes based on wbreza's review of PR #1585:

- Replace 'Do NOT generate main.tfvars.json' with correct guidance:
  use \\\ syntax (azd envsubst), not Go-style {{ .Env.* }}
- Remove incorrect 'azd auto-mapping' claims — variables flow via
  main.tfvars.json substitution or explicit TF_VAR_* env vars
- Fix pre-deploy check: validate syntax in main.tfvars.json instead
  of rejecting the file's existence
- Scope grep patterns with --include='*.tf' --include='*.tfvars.json'
  to avoid false positives from .terraform/ and READMEs
- Align grep patterns consistently across all files
- Update remediation steps to fix syntax rather than delete files
- Add main.tfvars.json example with correct \ syntax

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Bump skill versions: azure-prepare 1.0.15, azure-deploy 1.0.12, azure-validate 1.0.4

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Bump version

* Remove extraneous file

* Fix comments in code blocks

Move comments above code blocks. This is especially important for JSON,
as it does not actually support code comments and we wouldn't want the
LM to copy the code block as-is.

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Fan Yang <52458914+fanyang-mono@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Michael <37400755+micha31r@users.noreply.github.com>
Co-authored-by: Michael Ren <mren@microsoft.com>
Co-authored-by: KarishmaGhiya <kghiya8@gmail.com>
Co-authored-by: greenie-msft <56556602+greenie-msft@users.noreply.github.com>
Co-authored-by: msalaman <Marcossalamanca97@hotmail.com>
Co-authored-by: taylorak <taykenned@gmail.com>
Co-authored-by: Sai Koumudi Kaluvakolanu <saikoumudi@gmail.com>
Co-authored-by: Christopher T Earley <jrekct@gmail.com>
Co-authored-by: Christopher Earley <cearley@microsoft.com>
Co-authored-by: Harsha Nair <hjjn26@gmail.com>
Co-authored-by: Harsha Nair <hnair@microsoft.com>
Co-authored-by: Shayne Boyer <spboyer@live.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: kvenkatrajan <102772054+kvenkatrajan@users.noreply.github.com>
Co-authored-by: Yun Jung Choi <49920477+yunjchoi@users.noreply.github.com>
Ba4bes pushed a commit to Ba4bes/GitHub-Copilot-for-Azure that referenced this pull request Apr 24, 2026
…icrosoft#1498)

* feat: add GEPA integration to sensei skill + quality score workflow

Add GEPA (Genetic-Pareto) evolutionary optimization as an optional
enhancement to sensei's Ralph loop for automated SKILL.md improvement.

Changes:
- .github/skills/sensei/SKILL.md: Added --gepa flag, GEPA mode docs,
  Step 5-GEPA in the Ralph loop
- .github/skills/sensei/scripts/gepa/auto_evaluator.py: Auto-discovers
  test harness at runtime, builds GEPA evaluators, scores/optimizes skills
- pipelines/gepa-quality-score.yml: PR quality gate that scores SKILL.md
  quality and posts results as PR comment

The auto-evaluator requires zero manual configuration. It reads
triggers.test.ts to extract shouldTrigger/shouldNotTrigger arrays
and builds a composite evaluator (content quality + trigger accuracy).

Existing tests are NOT replaced or modified.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: address PR review feedback for GEPA integration

- Bump sensei SKILL.md version 1.0.0 → 1.0.2 (fixes Skill Structure CI)
- Remove unused imports: sys, dataclass, field (fixes CodeQL warnings)
- Extract strip_frontmatter() helper to replace fragile content.index()
  parsing that could raise ValueError on malformed frontmatter
- Deduplicate frontmatter stripping logic between score_skill/optimize_skill
- Add explicit permissions block (contents: read, pull-requests: write)
- Use sticky comment pattern (<- Consolidate FileSystemWatcher usage: gepa-quality-score --> marker) to avoid
  PR comment spam on re-runs
- Fix display results to match workflow_dispatch single-skill input
- Rename quality gate step to '(advisory)' to clarify non-blocking behavior

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: skip PR comment step for forked PRs

Forked PRs have reduced GITHUB_TOKEN permissions, which would cause
the comment step to fail. Only post comments when the PR originates
from the same repository.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: strip comments in trigger parsing + clarify GEPA step scope

- Strip single-line (//) and multi-line (/* */) comments from trigger
  test arrays before extracting strings, preventing commented-out
  example prompts from polluting trigger accuracy scoring
- Fix SKILL.md step 5b to clarify GEPA only replaces step 5 (IMPROVE
  FRONTMATTER), not step 6 (IMPROVE TESTS)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: correct docstring and SKILL.md to reflect actual evaluator behavior

The evaluator parses trigger prompt arrays and uses content heuristics
for scoring — it does not execute Jest tests or incorporate test
pass/fail results. Updated docs to accurately describe this.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: address round 3 review feedback

- Remove unused params: as_json from score_skill, fast from build_evaluator
- Pin all actions to commit SHAs matching repo convention (checkout v6,
  setup-python v6.2.0, upload-artifact v7.0.0, github-script v8.0.0)
- Pin gepa dependency to v0.7.0 for reproducible CI
- Remove DO NOT USE FOR from scoring criteria (conflicts with repo
  guidance that discourages it due to keyword contamination risk)
- Add quality_score_raw field for full-precision threshold comparisons
- Enhance parse_trigger_arrays to resolve ...varName spread patterns
  by extracting strings from referenced arrays in the same file
- Clarify SKILL.md step 5b: GEPA uses trigger definitions as config,
  does not execute Jest tests
- Add NOTE about future workflow_run commenting pattern migration

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: address PR review feedback — split workflow, fix regex, update docs

- Split gepa-quality-score.yml into read-only scoring workflow +
  workflow_run-triggered commenter (gepa-quality-score-comment.yml),
  matching the repo's existing pr.yml / pr-comment.yml pattern
- Fix API key regex to also match 'api key:' with whitespace separator
- Update PR description to clarify ASI uses heuristic scoring
  (Jest integration is planned for future iteration)
- Remove pull-requests:write from scoring workflow permissions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Ba4bes pushed a commit to Ba4bes/GitHub-Copilot-for-Azure that referenced this pull request Apr 24, 2026
…icrosoft#1585)

* fix: prevent azd+Terraform template variable interpolation failures (microsoft#1558)

Address azd template variable interpolation gap that caused deployment
timeouts in terraform-azure-container-apps-deploy integration tests.

azure-prepare (v1.0.13):
- Add warning against using Go-style template variables in .tfvars.json
- Document correct variable passing: azd auto-mapping, TF_VAR_* env vars
- Remove incorrect env() function usage in variable example
- Add troubleshooting entries for template interpolation errors

azure-validate (v1.0.3):
- Add Step 10: Template Variable Resolution Check for azd+Terraform
- Detect unresolved {{ .Env.* }} patterns and .tfvars.json files
- Provide remediation steps to fix before deployment

azure-deploy (v1.0.9):
- Add Unresolved Terraform Template Variables error section with solution
- Add pre-deploy Step 9: Verify Terraform Variable Resolution
- Add Terraform state management error entries
- Document azd state clearing behavior and remote backend recommendation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore: bump azure-prepare skill version to 1.0.14

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update plugin/skills/azure-deploy/references/pre-deploy-checklist.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Restore current plugin version before checking for skill content changes (microsoft#1595)

* fix: rename .azure/plan.md to .azure/deployment-plan.md to prevent agent confusion (microsoft#1584)

* fix: rename .azure/plan.md to .azure/deployment-plan.md to prevent confusion with session-state plan.md

The agent was confusing the workspace deployment plan (.azure/plan.md) with the
session-state plan.md file, causing the 'creates correct files for AZD with Bicep
recipe' integration test to fail (issue microsoft#1562).

Renaming to deployment-plan.md eliminates the name collision and makes the file's
purpose self-documenting. Updated all references across azure-prepare,
azure-validate, and azure-deploy skills, their reference docs, and all tests.

Closes microsoft#1562

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: update azure-deploy trigger keyword snapshots for deployment-plan rename

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* build(deps): bump github/codeql-action from 4.34.1 to 4.35.1 (microsoft#1570)

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.34.1 to 4.35.1.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@3869755...c10b806)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 4.35.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Remove strong verbiage from Enterprise Infra Planner skill (microsoft#1533)

* feat: remove strong verbiage

* feat: update phrasing

* fix: undo tool name change

* chore: bump version to 1.0.1

---------

Co-authored-by: Michael Ren <mren@microsoft.com>

* fix: Add Docker build context validation to azure-validate skill (microsoft#1586)

* Add Docker build context validation to azure-validate skill

Pre-validate Docker build context during azure-validate by checking
for package-lock.json when npm ci is specified in a Dockerfile. This
prevents Docker build failures during azd package/up that waste time
and can push deployments past test timeouts.

Changes:
- AZD recipe: Add step 9 (Docker Build Context Validation) between
  Build Verification and Package Validation
- AZCLI recipe: Enhance Docker Build step with build context
  pre-validation before attempting docker build
- AZD/AZCLI errors: Add npm ci / package-lock.json missing error entry
- Bump azure-validate version to 1.0.3

Fixes microsoft#1557

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Update plugin/skills/azure-validate/references/recipes/azd/README.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update plugin/skills/azure-validate/references/recipes/azd/README.md

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Copy .claude-plugin/plugin.json to repo root for Claude marketplace support (microsoft#1605)

Add .claude-plugin/plugin.json to the sync-to-microsoft-azure-skills job
so the Azure plugin is discoverable in the Claude marketplace. Updates
the copy, URL replacement, version restore, and version bump steps.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Copy hooks to top-level folder in azure-skills in addition to skills (microsoft#1606)

* Copy .claude-plugin/plugin.json to repo root for Claude marketplace support

Add .claude-plugin/plugin.json to the sync-to-microsoft-azure-skills job
so the Azure plugin is discoverable in the Claude marketplace. Updates
the copy, URL replacement, version restore, and version bump steps.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Copy hooks to top-level folder in azure-skills in addition to skills

Update the sync-to-microsoft-azure-skills job in the publish pipeline
to also copy hooks/ and copilot-hooks.json to the repo root, matching
how skills/ is already copied. Also add the new paths to the URL
replacement step.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Azure skills update for verification of functionality and role assignments (microsoft#1220)

* update azure-prepare skill to check subscription policies

* update azure-prepare skill to check functionality before deploying

* Add role assignment verification step to azure-prepare skill

Add new Phase 2 step 5 (Verify Role Assignments) between security
hardening and functional verification. Includes reference doc with
service-to-role mapping table, MCP tool usage, and common RBAC
mistakes (e.g., generic Contributor lacking data-plane access).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Update plugin/skills/azure-prepare/references/role-verification.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update plugin/skills/azure-prepare/references/functional-verification.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Add live role verification step to azure-validate skill

Add step 4 (Live Role Verification) to query Azure for provisioned
RBAC assignments and cross-check against expected roles. Complements
the static role check in azure-prepare: prepare checks generated
Bicep/Terraform, validate checks live Azure state.

Includes reference doc with MCP tool usage, CLI commands, common
issues table, and decision tree for pass/fail criteria.

Bumps azure-validate version 1.0.0 -> 1.0.1.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Clarify azure-prepare role check as static only

Replace MCP live-query section with static code review guidance.
Live role verification is the responsibility of azure-validate
step 4 (live-role-verification.md). This removes the overlap
between the two skills.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor: move role verification across prepare/validate/deploy skills

- Remove static role check (step 5) from azure-prepare — prepare just generates
- Add static role check as step 4 in azure-validate (pre-deployment)
- Move live role check from azure-validate step 4 to azure-deploy step 8 (post-deployment)
- Move role-verification.md from azure-prepare to azure-validate references
- Move live-role-verification.md from azure-validate to azure-deploy references
- Update all step number cross-references in functional-verification.md
- Bump versions: prepare 1.0.6->1.0.7, validate 1.0.1->1.0.2, deploy 1.0.5->1.0.6

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: add azure__role to azure-deploy MCP Tools table

Step 8 (Live Role Verification) references azure__role for RBAC
assignment listing, but the tool was missing from the MCP Tools
table. Agents could incorrectly assume only the three listed tools
are available. Bump version 1.0.6 -> 1.0.7.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* update test snapshots

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* update the versions and added integration/ unit tests

* Update plugin/skills/azure-validate/references/role-verification.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* update skills and the test runs

* update snapshots

* update to correct versions

* update reference name

* update versioning

* update steps

* fix: update azure-deploy trigger test snapshots

Keywords were removed from the SKILL.md description in a previous PR
but the trigger test snapshots were not regenerated, causing 2 snapshot
failures in the pipeline.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: update azure-deploy trigger test snapshots

Keywords were removed from the SKILL.md description in a previous PR
but the trigger test snapshots were not regenerated, causing 2 snapshot
failures in the pipeline.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* update snapshot

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Fix azure-prepare: add DTS bicep.md to workflow routing (microsoft#1540) (microsoft#1604)

The workflow routing entries loaded durable.md and the DTS README but not
bicep.md — so the agent had overview docs but not the Bicep patterns
needed to generate .bicep files. Also adds 'order processing' keyword.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* add skill invocation rate dashboard (microsoft#1630)

* Add unified azure-cost skills that combines azure-cost-query, azure-cost-forecast and azure cost-optimization skills (microsoft#1221)

* initial implementation

* update the guardrails for query and forecast

* reduce token limit of reference files

* update unit tests

* fix breaking PR checks

* fix pr check errors and code review comments

* refactor to azure-cost (#1)

* update tests and references to combined azure cost skill

* Remove unused test fixture files

Delete cost-query-sample.json and cost-forecast-sample.json from
tests/azure-cost/fixtures/ as they are not referenced by any test
files. No other skills in the repo use fixture files either, so
these add maintenance overhead without value.

Addresses PR review comment microsoft#14 and microsoft#15 (fixtures removed entirely
rather than fixing hard-coded dates, since they were unused).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* updates tests

* Consolidate azure-cost tests to standard 3-file layout, fix CI gates

- Consolidate 12 test files into standard 3-file structure (unit/triggers/integration)
- Rewrite integration tests using canonical withTestResult pattern
- Move all positive trigger prompts into triggers.test.ts
- Move all sub-area unit assertions into unit.test.ts
- Delete 9 redundant sub-area test files
- Regenerate snapshot with Jest 30 header format
- Bump sensei version 1.0.1 -> 1.0.2
- Bump azure-prepare version 1.0.10 -> 1.0.11

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix PR review comments: canonical azqr tool name, table formatting

- Change azure__extension_azqr to mcp_azure_mcp_extension_azqr in SKILL.md
- Fix missing space in 429 table row in cost-forecast/error-handling.md

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review comments: MCP tools table, code block languages, remove phantom skill

- Add azure__extension_azqr and azure__aks to MCP Tools table for consistency
- Add yaml language to azqr code block in SKILL.md
- Add text language to portal link code block in report-template.md
- Remove non-existent azure-create-app row from tests/README.md coverage grid

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor: extract azure-cost workflows into separate reference files

Move each workflow (query, optimization, forecast) into dedicated
reference files under references/ for progressive disclosure. This
reduces SKILL.md from 575 lines (23KB) to 139 lines (7KB), so the
agent only loads the workflow it needs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor: move workflow files into their respective folders

Move cost-query-workflow.md, cost-optimization-workflow.md, and
cost-forecast-workflow.md from references/ into cost-query/,
cost-optimization/, and cost-forecast/ as workflow.md. Update all
links in SKILL.md and cross-references. Bump version to 1.0.2.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fixes skill quality issues

* fix: update azure-cost tests for refactored skill structure

- Update snapshot to match new description with DO NOT USE FOR clause
- Update unit tests to load workflow files directly (content moved from
  SKILL.md to cost-query/, cost-forecast/, cost-optimization/ folders)
- Fix heading level assertions (## not ### in standalone workflow files)
- Remove 3 shouldNotTrigger prompts that contain cost keywords and
  correctly trigger the keyword-based matcher

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: reset azure-cost version to 1.0.0 and fix YAML comment syntax

- Reset version to 1.0.0 for new skill directory (was incorrectly 1.0.3)
- Change // optional to # optional in YAML code block

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Sai Koumudi Kaluvakolanu <saikoumudi@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Document principal type mismatch error in AZD errors reference (microsoft#1649)

* Document principal type mismatch error in AZD errors reference

AZD base templates (e.g. functions-quickstart-python-http-azd) create RBAC
role assignments with hardcoded principalType 'User' for the deploying
identity. In CI/CD where a service principal is used, ARM rejects this
with a PrincipalType mismatch error. The agent had no guidance for this
failure and spent multiple retries before finding the fix.

Adding this to the AZD errors reference gives the agent a direct path to
the solution: set allowUserIdentityPrincipal to false in main.bicep. It
also warns against the ineffective workaround of clearing
AZURE_PRINCIPAL_ID.

Fixes microsoft#1624

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Update plugin/skills/azure-deploy/references/recipes/azd/errors.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Clarify main.parameters.json format to prevent .bicepparam confusion (microsoft#1648)

* Clarify main.parameters.json format to prevent .bicepparam confusion

Add explicit warnings and complete examples to the three skill reference
files used by azure-prepare, azure-validate, and azure-deploy:

- patterns.md: Replace hard-coded values with azd \ substitution
  syntax and add a warning against .bicepparam syntax
- iac-rules.md: Add a new Parameter File Format section with a full
  ARM JSON example and format warning
- troubleshooting.md: Add \/contentVersion to the incomplete JSON
  example and add a format warning callout

Addresses the root cause of issue microsoft#1623 where the agent created
main.parameters.json with .bicepparam syntax (readEnvironmentVariable),
causing 6 failed azd provision --preview attempts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Bump azure-prepare to 1.1.2 and azure-deploy to 1.0.11

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Update plugin/skills/azure-prepare/references/recipes/bicep/patterns.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* refactor quota tests (microsoft#1489)

* refactor quota tests

* add more specification to test

---------

Co-authored-by: Christopher Earley <cearley@microsoft.com>

* update AKS cost spike prompts with specific time window (microsoft#1650)

Co-authored-by: Harsha Nair <hnair@microsoft.com>

* feat: GEPA integration for sensei skill + quality score CI workflow (microsoft#1498)

* feat: add GEPA integration to sensei skill + quality score workflow

Add GEPA (Genetic-Pareto) evolutionary optimization as an optional
enhancement to sensei's Ralph loop for automated SKILL.md improvement.

Changes:
- .github/skills/sensei/SKILL.md: Added --gepa flag, GEPA mode docs,
  Step 5-GEPA in the Ralph loop
- .github/skills/sensei/scripts/gepa/auto_evaluator.py: Auto-discovers
  test harness at runtime, builds GEPA evaluators, scores/optimizes skills
- pipelines/gepa-quality-score.yml: PR quality gate that scores SKILL.md
  quality and posts results as PR comment

The auto-evaluator requires zero manual configuration. It reads
triggers.test.ts to extract shouldTrigger/shouldNotTrigger arrays
and builds a composite evaluator (content quality + trigger accuracy).

Existing tests are NOT replaced or modified.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: address PR review feedback for GEPA integration

- Bump sensei SKILL.md version 1.0.0 → 1.0.2 (fixes Skill Structure CI)
- Remove unused imports: sys, dataclass, field (fixes CodeQL warnings)
- Extract strip_frontmatter() helper to replace fragile content.index()
  parsing that could raise ValueError on malformed frontmatter
- Deduplicate frontmatter stripping logic between score_skill/optimize_skill
- Add explicit permissions block (contents: read, pull-requests: write)
- Use sticky comment pattern (<- Consolidate FileSystemWatcher usage: gepa-quality-score --> marker) to avoid
  PR comment spam on re-runs
- Fix display results to match workflow_dispatch single-skill input
- Rename quality gate step to '(advisory)' to clarify non-blocking behavior

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: skip PR comment step for forked PRs

Forked PRs have reduced GITHUB_TOKEN permissions, which would cause
the comment step to fail. Only post comments when the PR originates
from the same repository.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: strip comments in trigger parsing + clarify GEPA step scope

- Strip single-line (//) and multi-line (/* */) comments from trigger
  test arrays before extracting strings, preventing commented-out
  example prompts from polluting trigger accuracy scoring
- Fix SKILL.md step 5b to clarify GEPA only replaces step 5 (IMPROVE
  FRONTMATTER), not step 6 (IMPROVE TESTS)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: correct docstring and SKILL.md to reflect actual evaluator behavior

The evaluator parses trigger prompt arrays and uses content heuristics
for scoring — it does not execute Jest tests or incorporate test
pass/fail results. Updated docs to accurately describe this.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: address round 3 review feedback

- Remove unused params: as_json from score_skill, fast from build_evaluator
- Pin all actions to commit SHAs matching repo convention (checkout v6,
  setup-python v6.2.0, upload-artifact v7.0.0, github-script v8.0.0)
- Pin gepa dependency to v0.7.0 for reproducible CI
- Remove DO NOT USE FOR from scoring criteria (conflicts with repo
  guidance that discourages it due to keyword contamination risk)
- Add quality_score_raw field for full-precision threshold comparisons
- Enhance parse_trigger_arrays to resolve ...varName spread patterns
  by extracting strings from referenced arrays in the same file
- Clarify SKILL.md step 5b: GEPA uses trigger definitions as config,
  does not execute Jest tests
- Add NOTE about future workflow_run commenting pattern migration

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: address PR review feedback — split workflow, fix regex, update docs

- Split gepa-quality-score.yml into read-only scoring workflow +
  workflow_run-triggered commenter (gepa-quality-score-comment.yml),
  matching the repo's existing pr.yml / pr-comment.yml pattern
- Fix API key regex to also match 'api key:' with whitespace separator
- Update PR description to clarify ASI uses heuristic scoring
  (Jest integration is planned for future iteration)
- Remove pull-requests:write from scoring workflow permissions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* build(deps-dev): bump the minor-and-patch group (microsoft#1572)

Bumps the minor-and-patch group in /scripts with 4 updates: [@vitest/coverage-v8](https://github.com/vitest-dev/vitest/tree/HEAD/packages/coverage-v8), [fast-xml-parser](https://github.com/NaturalIntelligence/fast-xml-parser), [typescript-eslint](https://github.com/typescript-eslint/typescript-eslint/tree/HEAD/packages/typescript-eslint) and [vitest](https://github.com/vitest-dev/vitest/tree/HEAD/packages/vitest).


Updates `@vitest/coverage-v8` from 4.1.0 to 4.1.2
- [Release notes](https://github.com/vitest-dev/vitest/releases)
- [Commits](https://github.com/vitest-dev/vitest/commits/v4.1.2/packages/coverage-v8)

Updates `fast-xml-parser` from 5.5.8 to 5.5.9
- [Release notes](https://github.com/NaturalIntelligence/fast-xml-parser/releases)
- [Changelog](https://github.com/NaturalIntelligence/fast-xml-parser/blob/master/CHANGELOG.md)
- [Commits](NaturalIntelligence/fast-xml-parser@v5.5.8...v5.5.9)

Updates `typescript-eslint` from 8.57.1 to 8.57.2
- [Release notes](https://github.com/typescript-eslint/typescript-eslint/releases)
- [Changelog](https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/typescript-eslint/CHANGELOG.md)
- [Commits](https://github.com/typescript-eslint/typescript-eslint/commits/v8.57.2/packages/typescript-eslint)

Updates `vitest` from 4.1.0 to 4.1.2
- [Release notes](https://github.com/vitest-dev/vitest/releases)
- [Commits](https://github.com/vitest-dev/vitest/commits/v4.1.2/packages/vitest)

---
updated-dependencies:
- dependency-name: "@vitest/coverage-v8"
  dependency-version: 4.1.2
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
- dependency-name: fast-xml-parser
  dependency-version: 5.5.9
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
- dependency-name: typescript-eslint
  dependency-version: 8.57.2
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
- dependency-name: vitest
  dependency-version: 4.1.2
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: minor-and-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add kvenkatrajan as codeowner for entra-app-registration (microsoft#1667)

* Initial plan

* Add kvenkatrajan as codeowner for entra-app-registration

Agent-Logs-Url: https://github.com/microsoft/GitHub-Copilot-for-Azure/sessions/0062a31a-0103-4dbf-a191-8264b9deea81

Co-authored-by: kvenkatrajan <102772054+kvenkatrajan@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: kvenkatrajan <102772054+kvenkatrajan@users.noreply.github.com>

* adding the azure skills gif (microsoft#1651)

* Address wbreza PR review: fix main.tfvars.json guidance and auto-mapping claims

Key changes based on wbreza's review of PR microsoft#1585:

- Replace 'Do NOT generate main.tfvars.json' with correct guidance:
  use \\\ syntax (azd envsubst), not Go-style {{ .Env.* }}
- Remove incorrect 'azd auto-mapping' claims — variables flow via
  main.tfvars.json substitution or explicit TF_VAR_* env vars
- Fix pre-deploy check: validate syntax in main.tfvars.json instead
  of rejecting the file's existence
- Scope grep patterns with --include='*.tf' --include='*.tfvars.json'
  to avoid false positives from .terraform/ and READMEs
- Align grep patterns consistently across all files
- Update remediation steps to fix syntax rather than delete files
- Add main.tfvars.json example with correct \ syntax

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Bump skill versions: azure-prepare 1.0.15, azure-deploy 1.0.12, azure-validate 1.0.4

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Bump version

* Remove extraneous file

* Fix comments in code blocks

Move comments above code blocks. This is especially important for JSON,
as it does not actually support code comments and we wouldn't want the
LM to copy the code block as-is.

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Fan Yang <52458914+fanyang-mono@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Michael <37400755+micha31r@users.noreply.github.com>
Co-authored-by: Michael Ren <mren@microsoft.com>
Co-authored-by: KarishmaGhiya <kghiya8@gmail.com>
Co-authored-by: greenie-msft <56556602+greenie-msft@users.noreply.github.com>
Co-authored-by: msalaman <Marcossalamanca97@hotmail.com>
Co-authored-by: taylorak <taykenned@gmail.com>
Co-authored-by: Sai Koumudi Kaluvakolanu <saikoumudi@gmail.com>
Co-authored-by: Christopher T Earley <jrekct@gmail.com>
Co-authored-by: Christopher Earley <cearley@microsoft.com>
Co-authored-by: Harsha Nair <hjjn26@gmail.com>
Co-authored-by: Harsha Nair <hnair@microsoft.com>
Co-authored-by: Shayne Boyer <spboyer@live.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: kvenkatrajan <102772054+kvenkatrajan@users.noreply.github.com>
Co-authored-by: Yun Jung Choi <49920477+yunjchoi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants