Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,18 @@ This project follows [Semantic Versioning 2.0.0](https://semver.org/spec/v2.0.0.

### Added

- **`ext.dusk.find` now surfaces a `matchCount` field and an ambiguity `diagnostic` in its success response when `--semanticsLabel` or `--text` matches more than one Semantics node.** Previously the handler silently returned the first match, so `--semanticsLabel "Password"` over-matched the email field on forms where both `<TextField semanticsLabel="Password"/>` nodes shared the same label. The response now includes `matchCount: N` on every match; when `N > 1` a `diagnostic` key carries a human-readable hint (`label 'X' matched N nodes; refine with --key, --text, or --contains`). Single-match and no-match behaviour is unchanged (backward-compatible). Touches `lib/src/extensions/ext_find.dart`; covered by `test/src/extensions/ext_find_test.dart`.
Comment thread
anilcancakir marked this conversation as resolved.

- **`ext.dusk.snap` now surfaces captured non-fatal render/build FlutterErrors in a `renderErrors` block, and `dusk:snap` prints a `⚠ N render error(s)` banner to stderr while stdout stays the pure snapshot.** A widget that throws at build time (a `ParentDataWidget` misuse such as `flex-1`/`Expanded` placed under a `Semantics`/`WAnchor` instead of directly inside a Flex, or an overflow) can render partially and stay invisible in the semantics snapshot, so an action against it silently no-ops with no signal to the agent. The snapshot payload now carries `renderErrors: {count, recent: [{type, message}], hint}` (populated from the existing `FlutterError.onError` capture buffer, omitted entirely when clean), so a broken screen is impossible to miss without separately calling `ext.dusk.exceptions`. Touches `lib/src/extensions/ext_snapshot.dart`, `lib/src/commands/dusk_snap_command.dart`; covered by `test/src/extensions/ext_snapshot_render_errors_test.dart`.

### Changed

- **`ext.dusk.navigate` now tries the consumer navigate adapter (`DuskPlugin.navigateAdapter`, e.g. `MagicRoute.to`) BEFORE `Navigator.pushNamed`.** On a Router-only stack (go_router / auto_route) `Navigator.onGenerateRoute` is null, so `Navigator.pushNamed` raised an asynchronous "no corresponding route" `FlutterError` on every navigate. Because the failure was async, the handler's try/catch could not suppress it, and it landed in the FlutterError buffer, now doubly visible via the new `renderErrors` snapshot block as a false positive. Adapter-first dispatch routes through the app's own router public API (the correct path for these apps) and skips the throwing `Navigator.pushNamed` entirely; it remains the fallback for apps with no registered adapter. Touches `lib/src/extensions/ext_navigation.dart`.

### Fixed

- **`dusk:doctor` check 3 (snapshot enrichers) now emits INFO when no enrichers are registered, instead of WARN.** Enrichers are opt-in; zero is a valid state, not a problem. The WARN reading alongside "integration wired" (check 5) created false contradiction. Touches `lib/src/commands/dusk_doctor_command.dart`; test case updated in `test/src/commands/dusk_doctor_command_test.dart`.

---

## [0.0.8] - 2026-06-17
Expand Down
88 changes: 82 additions & 6 deletions doc/commands/dusk-find.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,21 +62,36 @@ The CLI guards an empty params map (`Provide at least one of --text / --contains

**Success envelope (illustrative):**

Single match:

```json
{
"ref": "q1",
"matchCount": 1,
"rect": [120, 400, 240, 48],
"role": "button",
"label": "Sign in"
"matched": true,
"matchCount": 1
}
```

`matchCount > 1` indicates the predicate is ambiguous: the handle still resolves to the first match, but the agent should narrow with an extra predicate (typically `--key`) before acting.
Multi-match (ambiguous predicate):

```json
{
"ref": "q1",
"matched": true,
"matchCount": 2,
"diagnostic": "label 'Password' matched 2 nodes; refine with --key, --text, or --contains"
}
```

`matchCount > 1` means the predicate is ambiguous: the handle still resolves to the FIRST match (backward-compatible), but the agent should narrow with an additional predicate before acting. Common disambiguation strategies:

- Add `--key=<widget-key>` when the widget carries a `ValueKey`.
- Add `--text=<visible-label>` when the accessibility label and the visible text differ.
- Use `--contains=<unique-substring>` when only part of the label is unique.
Comment thread
anilcancakir marked this conversation as resolved.

**Error envelope:**

The VM Service handler propagates errors as `ServiceExtensionResponse.error(extensionError, message)`. The CLI surfaces them via `ArtisanContext.callExtension` and exits non-zero. Common messages include `No widget matched predicates: {...}`.
The VM Service handler propagates errors as `ServiceExtensionResponse.error(extensionError, message)`. The CLI surfaces them via `ArtisanContext.callExtension` and exits non-zero. Common messages include `No widget matched predicates: {}`.
Comment thread
anilcancakir marked this conversation as resolved.

---

Expand Down Expand Up @@ -154,6 +169,67 @@ The two predicates AND together; useful when the screen has multiple "Save" butt

---

<a name="ref-staleness"></a>
## e-ref staleness and when to prefer q-handles

`e<N>` tokens minted by `dusk:snap` are frozen to the Semantics node that was
live at snap time. They become defunct the moment the node leaves the tree, which
happens on any route push, list rebuild, or conditional widget swap. The
`RefRegistry` that backs `e<N>` tokens does NOT re-resolve; calling an action
with a stale `e<N>` returns a `defunct (element no longer mounted)` failure.

`q<N>` handles minted by `dusk:find` store the predicate set instead of the
node, and re-walk the live tree on every action call. They survive navigations,
hot-reloads, and full widget rebuilds as long as the predicate still matches
something in the tree.

**When to reach for `dusk:find` / `q<N>` instead of using the `e<N>` from a
snap:**

- The page might rebuild between snap and action (e.g. Settings pages with
dynamic sections, lists driven by async data).
- The agent will retry an action (gate failure, transient loading state).
- The flow spans more than one navigation hop; an `e<N>` from the previous
screen is always stale after the route change.
- The agent holds a ref across a hot-reload.

The `RefRegistry` is intentionally frozen for `e<N>` (it is a FIFO token store,
not a live observer). There is no mechanism to refresh a stale `e<N>` in place;
the design intent is that `dusk:snap` re-mints the ref after every page change.
For rebuild-prone pages, prefer `dusk:find` / `dusk:observe` from the start.

---

<a name="semantics-label-over-match"></a>
## Avoiding `--semanticsLabel` over-match

`--semanticsLabel` performs an exact case-sensitive match against
`SemanticsNode.label` and returns the FIRST node in tree order. When two or
more nodes carry the same label (e.g. two `TextField` widgets both labelled
`Password` on a sign-up form, or a list of repeated row controls), the handle
resolves to the first node in tree order, which may not be the intended target.

The `matchCount` field in the response tells the agent how many nodes matched.
A `diagnostic` key appears when `matchCount > 1`, e.g.:

```
label 'Password' matched 2 nodes; refine with --key, --text, or --contains
```

**Disambiguation strategies (most to least precise):**

1. Add `--key=<widget-key>` when the widget carries a `ValueKey`. This is the
most precise predicate and survives label changes.
2. Combine `--semanticsLabel=Password --text=Confirm` when the second node has
distinct visible text (some widgets expose both a label and a text value).
3. Use `--contains=<unique-substring>` when only part of the label is unique
across the matching nodes.
4. Use `dusk:observe` with a narrow `intent` and inspect the returned candidate
list; each candidate includes role, bounds, and enricher fields that let the
agent identify the correct target before minting the handle.

---

<a name="see-also"></a>
## See also

Expand Down
5 changes: 1 addition & 4 deletions lib/src/commands/dusk_doctor_command.dart
Original file line number Diff line number Diff line change
Expand Up @@ -234,10 +234,7 @@ class DuskDoctorCommand extends ArtisanCommand {
const String label = 'snapshot enrichers';
final int count = enrichersProbe();
if (count == 0) {
ctx.output.warning(
'$label: no enrichers registered; install Magic + Wind integrations '
'for richer snapshots',
);
ctx.output.info('$label: enrichers are opt-in; none registered');
return;
}
ctx.output.success('$label: enrichers registered: $count');
Expand Down
119 changes: 79 additions & 40 deletions lib/src/extensions/ext_find.dart
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,11 @@ void registerFindExtension() {
/// live Semantics + Element tree once to verify the predicates resolve to
/// a node, then mints a `q<N>` handle backed by the stored predicate set.
///
/// On first match returns `{"ref": "q<N>", "matched": true}`. When no node
/// matches, returns `{"ref": null, "matched": false}` — no handle is minted.
/// On first match returns `{"ref": "q<N>", "matched": true, "matchCount": N}`.
/// When `matchCount > 1`, an additional `diagnostic` key carries a
/// human-readable hint so agents know to disambiguate with `--text`,
/// `--contains`, or a widget `--key`. When no node matches, returns
/// `{"ref": null, "matched": false}` — no handle is minted.
///
/// The handle is opaque from the agent's perspective: passing it back to
/// `ext.dusk.tap` etc. triggers a fresh tree walk at that moment, so a
Expand Down Expand Up @@ -78,7 +81,8 @@ Future<developer.ServiceExtensionResponse> extDuskFindHandler(
// NOT store the resolved RefEntry — the handle re-executes the
// walk on every action call so the agent gets the latest rect /
// element after intermediate rebuilds.
final RefEntry? entry = resolveQuery(query);
final (RefEntry? entry, int matchCount, String? diagnostic) =
resolveQueryWithCount(query);
if (entry == null) {
return developer.ServiceExtensionResponse.result(
jsonEncode(<String, dynamic>{
Expand All @@ -92,12 +96,17 @@ Future<developer.ServiceExtensionResponse> extDuskFindHandler(
// (no groupId scope; action handlers rebuild RefEntry on call).
final String token = RefRegistry.registerQuery(query);

return developer.ServiceExtensionResponse.result(
jsonEncode(<String, dynamic>{
'ref': token,
'matched': true,
}),
);
final Map<String, dynamic> payload = <String, dynamic>{
'ref': token,
'matched': true,
'matchCount': matchCount,
};
// 4. Surface ambiguity diagnostic when more than one node matched.
if (diagnostic != null) {
payload['diagnostic'] = diagnostic;
}

return developer.ServiceExtensionResponse.result(jsonEncode(payload));
} catch (e, stackTrace) {
developer.log(
'[fluttersdk_dusk] ext.dusk.find error: $e\n$stackTrace',
Expand Down Expand Up @@ -134,35 +143,65 @@ Future<developer.ServiceExtensionResponse> extDuskFindHandler(
/// When multiple predicates are set they all must match the same node /
/// element (intersection).
RefEntry? resolveQuery(DuskQuery query) {
return resolveQueryWithCount(query).$1;
}

/// Variant of [resolveQuery] that also returns the total number of Semantics
/// nodes that matched the label predicate and an optional ambiguity
/// diagnostic.
///
/// Returns a record `(entry, matchCount, diagnostic)`:
/// - `entry` — first match, or `null` when nothing matched.
/// - `matchCount` — number of nodes that matched (1 on a single match, 0
/// when nothing matches; only meaningful for the `semanticsLabel` / `text`
/// Semantics-walk paths; key-based and text-only Element paths return 1 on
/// a match, 0 on no match).
/// - `diagnostic` — non-null only when `matchCount > 1`; a message suitable
/// for surfacing to an agent, e.g. `label 'Password' matched 2 nodes;
/// refine with --key, --text, or --contains`.
///
/// Single-match and no-match behaviour is identical to [resolveQuery];
/// callers that do not need ambiguity detection may use [resolveQuery]
/// directly.
(RefEntry?, int, String?) resolveQueryWithCount(DuskQuery query) {
// 1. Key-based match: Element tree walk. Cheapest, most specific.
if (query.keyValue != null) {
final Element? element = _findElementByKey(query.keyValue!);
if (element == null) return null;
if (!_elementMatchesOtherPredicates(element, query)) return null;
return _entryFromElement(element);
if (element == null) return (null, 0, null);
if (!_elementMatchesOtherPredicates(element, query)) return (null, 0, null);
return (_entryFromElement(element), 1, null);
}

// 2. Semantics-label match: walk the Semantics tree first because it
// surfaces merged accessibility labels (Button "Submit" with no
// Text descendant still resolves).
if (query.semanticsLabel != null) {
final SemanticsNode? node =
_findSemanticsNodeByLabel(query.semanticsLabel!);
if (node == null) return null;
return _entryFromSemanticsNode(node);
final (SemanticsNode? node, int count) =
_findSemanticsNodeByLabelWithCount(query.semanticsLabel!);
if (node == null) return (null, 0, null);
final String? diagnostic = count > 1
? "label '${query.semanticsLabel}' matched $count nodes; "
'refine with --key, --text, or --contains'
: null;
return (_entryFromSemanticsNode(node), count, diagnostic);
Comment thread
anilcancakir marked this conversation as resolved.
}

// 3. text-only match: Semantics-label first (covers labelled widgets
// where the visible text is the accessibility label), then Element-
// tree Text widget fallback.
if (query.text != null) {
final SemanticsNode? node = _findSemanticsNodeByLabel(query.text!);
final (SemanticsNode? node, int count) =
_findSemanticsNodeByLabelWithCount(query.text!);
if (node != null) {
return _entryFromSemanticsNode(node);
final String? diagnostic = count > 1
? "label '${query.text}' matched $count nodes; "
'refine with --key, --text, or --contains'
: null;
return (_entryFromSemanticsNode(node), count, diagnostic);
Comment thread
anilcancakir marked this conversation as resolved.
}
final Element? element = _findElementByTextData(query.text!);
if (element == null) return null;
return _entryFromElement(element);
if (element == null) return (null, 0, null);
return (_entryFromElement(element), 1, null);
}

// 4. containsText match: substring search across Semantics labels then
Expand All @@ -172,14 +211,14 @@ RefEntry? resolveQuery(DuskQuery query) {
final SemanticsNode? node =
_findSemanticsNodeByLabelContains(query.containsText!);
if (node != null) {
return _entryFromSemanticsNode(node);
return (_entryFromSemanticsNode(node), 1, null);
}
final Element? element = _findElementByTextContains(query.containsText!);
if (element == null) return null;
return _entryFromElement(element);
if (element == null) return (null, 0, null);
return (_entryFromElement(element), 1, null);
}

return null;
return (null, 0, null);
}

// ---------------------------------------------------------------------------
Expand Down Expand Up @@ -289,41 +328,41 @@ SemanticsNode? _findSemanticsNodeByLabelContains(String needle) {
return found;
}

/// Walks the live Semantics tree and returns the first node whose [label]
/// equals [needle].
/// Walks the live Semantics tree and counts ALL nodes whose [label] equals
/// [needle], returning the first match alongside the total count.
///
/// Production-bound widget trees expose their semantics owner via
/// `RendererBinding.instance.rootPipelineOwner.semanticsOwner`. The Flutter
/// test harness, however, mounts the widget tree under a CHILD pipeline
/// owner attached to the test view (see `ext_snapshot_dispatcher_test.dart`
/// docs for the rationale). We walk the root owner first, then every child
/// owner registered under it, so this helper works in BOTH environments.
SemanticsNode? _findSemanticsNodeByLabel(String needle) {
/// test harness mounts the widget tree under a CHILD pipeline owner attached
/// to the test view, so the walk covers the root owner and all child owners.
///
/// The walk never stops early after finding the first node, so the returned
/// count reflects ALL matches in the tree. When `count > 1` the caller
/// should surface an ambiguity diagnostic to the agent.
(SemanticsNode?, int) _findSemanticsNodeByLabelWithCount(String needle) {
SemanticsNode? found;
int count = 0;

void visit(SemanticsNode node) {
if (found != null) return;
if (node.label == needle) {
found = node;
return;
count += 1;
found ??= node;
}
node.visitChildren((SemanticsNode child) {
visit(child);
return found == null;
// Always continue walking to collect the full count.
return true;
});
}

void visitOwner(PipelineOwner owner) {
if (found != null) return;
final SemanticsNode? root = owner.semanticsOwner?.rootSemanticsNode;
if (root != null) visit(root);
owner.visitChildren((PipelineOwner child) {
if (found == null) visitOwner(child);
});
owner.visitChildren(visitOwner);
}

visitOwner(RendererBinding.instance.rootPipelineOwner);
return found;
return (found, count);
}

/// Cross-checks an Element-tree match against the supplied query's
Expand Down
20 changes: 18 additions & 2 deletions skills/fluttersdk-dusk/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
---
name: fluttersdk-dusk
description: "fluttersdk_dusk: E2E driver for Flutter apps that lets an LLM agent see (snap, observe, screenshot) and act (tap, type, drag, scroll, navigate) on a running Flutter app via 33 MCP tools (`dusk_*`) and 34 matching CLI commands (`./bin/fsa dusk:*`). Snapshots emit a YAML Semantics tree with stable `[ref=eN]` tokens; `dusk_find` and `dusk_observe` mint re-resolvable `q<N>` query handles. Every gesture passes a 6-step actionability gate with substring-parseable failure reasons (`not enabled`, `zero rect`, `off-viewport`, `not stable`, `obscured by`, `defunct`). TRIGGER when: any `dusk_*` MCP tool call, any `dusk:*` CLI command, `./bin/fsa` invocation, the user asks the agent to drive / inspect / test / debug a running Flutter app, the user mentions snap / observe / actionability / ref / eN / qN, or the conversation touches end-to-end testing of a Flutter UI. DO NOT TRIGGER when: only authoring `flutter_test` widget tests, only reading telescope ring buffers without driving the UI (use fluttersdk-telescope), or only modifying Dart source without running it."
version: 0.0.8
version: 0.0.9
when_to_use: "Any task where the agent drives or inspects a running Flutter app via dusk: calling `dusk_*` MCP tools in a loop (snap, tap, type, screenshot, hot_reload_and_snap), invoking `./bin/fsa dusk:<verb>` from a shell, recovering from an actionability failure, choosing between `e<N>` and `q<N>` ref tokens, waiting for text or network idle, navigating routes, or filling a form."
---

<!-- fluttersdk_dusk v0.0.8 | Skill updated: 2026-06-17 -->
<!-- fluttersdk_dusk v0.0.9 | Skill updated: 2026-06-25 -->

# fluttersdk_dusk

Expand Down Expand Up @@ -189,6 +189,22 @@ Default: snap returns `e<N>`; use them inline. Switch to `dusk_find` /
`dusk_observe` and `q<N>` the moment the agent enters a retry or
multi-step flow against the same target.

**e-ref staleness on rebuild-prone pages.** `e<N>` tokens are frozen to
the Semantics node at snap time. The `RefRegistry` backing them does NOT
re-resolve; on pages that rebuild (Settings, lists driven by async data,
any page with conditional sections), use `dusk_find` from the start
instead of snapping and then regretting the stale `defunct` failure.

**`--semanticsLabel` over-match.** `dusk_find { semanticsLabel: "X" }`
exact-matches against the accessibility label and resolves to the FIRST node;
ambiguity is now surfaced via `matchCount` and `diagnostic` in the response.
On forms with repeated labels ("Password" and "Confirm Password" both labelled
"Password"), the handle points at the first match. Check the `matchCount`
field in the response; when `> 1`, read the `diagnostic` key and add a second
predicate (`--key`, `--text`, or `--contains`) before acting. Full
disambiguation table: `references/actionability-and-refs.md` section
Comment thread
anilcancakir marked this conversation as resolved.
"semanticsLabel exact-match and over-match".

## 5. Quick install + doctor (when dusk is missing)

If `./bin/fsa dusk:snap` returns "VM Service URI absent" (or any
Expand Down
Loading