Skip to content

Commit 2ead356

Browse files
author
SentienceDEV
committed
porting agent to TS
1 parent a236fe7 commit 2ead356

26 files changed

Lines changed: 7060 additions & 25 deletions

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,4 +46,5 @@ playground
4646

4747
# Temporary directories from sync workflows
4848
extension-temp/
49-
49+
playground
50+
.env

package-lock.json

Lines changed: 16 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

package.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@
4848
"@types/uuid": "^9.0.0",
4949
"@typescript-eslint/eslint-plugin": "^8.51.0",
5050
"@typescript-eslint/parser": "^8.51.0",
51+
"dotenv": "^17.4.2",
5152
"eslint": "^9.39.2",
5253
"eslint-config-prettier": "^10.1.8",
5354
"eslint-plugin-prettier": "^5.5.4",
Lines changed: 333 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,333 @@
1+
# PlannerExecutorAgent: Deferred Features
2+
3+
**Date:** 2026-04-13
4+
**Status:** Documentation for post-MVP implementation
5+
6+
## Overview
7+
8+
This document outlines features from the Python `PlannerExecutorAgent` that were deferred from the TypeScript MVP port. These features add reliability and flexibility but are not required for basic browser automation tasks.
9+
10+
## MVP Implementation Summary
11+
12+
The TypeScript MVP includes:
13+
14+
1. **Core Agent (~600 lines)**
15+
- Stepwise (ReAct-style) planning loop
16+
- Action parsing (CLICK, TYPE, SCROLL, PRESS, DONE)
17+
- Compact context formatting for small models
18+
- Token usage tracking by role and model
19+
20+
2. **Reliability Features (~200 lines)**
21+
- Snapshot escalation (progressive limit increase)
22+
- Pre-action authorization hook (for sidecar policy)
23+
- Basic error handling and retry
24+
25+
3. **Configuration (~250 lines)**
26+
- `PlannerExecutorConfig` with presets
27+
- `SnapshotEscalationConfig`, `RetryConfig`, `StepwisePlanningConfig`
28+
- Factory helpers for provider creation
29+
30+
## Deferred Features
31+
32+
### 1. Modal/Overlay Dismissal
33+
34+
**Python Reference:** `ModalDismissalConfig`, `_attempt_modal_dismissal()`
35+
36+
**Description:** Automatically dismiss blocking overlays after DOM changes:
37+
38+
- Product protection/warranty upsells
39+
- Cookie consent banners
40+
- Newsletter signup popups
41+
- Promotional overlays
42+
- Cart upsell drawers
43+
44+
**Implementation Effort:** ~150 lines
45+
46+
**Config Interface:**
47+
48+
```typescript
49+
interface ModalDismissalConfig {
50+
enabled: boolean;
51+
dismissPatterns: string[]; // e.g., ['close', 'no thanks', 'skip']
52+
dismissIcons: string[]; // e.g., ['×', '✕', 'x']
53+
roleFilter: string[]; // e.g., ['button', 'link']
54+
maxAttempts: number;
55+
minNewElements: number; // Minimum DOM changes to trigger
56+
}
57+
```
58+
59+
**Key Logic:**
60+
61+
- Detect DOM changes after CLICK actions
62+
- Find buttons matching dismissal patterns (word-boundary matching)
63+
- Click dismissal button and verify modal closed
64+
- Skip if checkout-related buttons are present
65+
66+
---
67+
68+
### 2. Captcha Handling
69+
70+
**Python Reference:** `CaptchaConfig`, `_detect_captcha()`, `_handle_captcha()`
71+
72+
**Description:** Detect and handle CAPTCHAs during automation:
73+
74+
- Policy options: `abort`, `callback`, `pause`
75+
- Support for external solving services
76+
- Detection via element text and patterns
77+
78+
**Implementation Effort:** ~100 lines
79+
80+
**Config Interface:**
81+
82+
```typescript
83+
interface CaptchaConfig {
84+
enabled: boolean;
85+
policy: 'abort' | 'callback' | 'pause';
86+
detectionPatterns: string[];
87+
solverCallback?: (imageBase64: string) => Promise<string>;
88+
maxWaitMs: number;
89+
}
90+
```
91+
92+
**Key Logic:**
93+
94+
- Check snapshot elements for CAPTCHA indicators
95+
- Based on policy: abort task, call external solver, or pause for human
96+
- Resume automation after CAPTCHA solved
97+
98+
---
99+
100+
### 3. Vision Fallback
101+
102+
**Python Reference:** `VisionFallbackConfig`, vision_executor, vision_verifier
103+
104+
**Description:** Use vision-capable models when DOM-based automation fails:
105+
106+
- Canvas pages with no accessible elements
107+
- Low element confidence scores
108+
- Complex visual layouts
109+
110+
**Implementation Effort:** ~200 lines
111+
112+
**Config Interface:**
113+
114+
```typescript
115+
interface VisionFallbackConfig {
116+
enabled: boolean;
117+
maxVisionCalls: number;
118+
triggerRequiresVision: boolean;
119+
triggerCanvasOrLowActionables: boolean;
120+
canvasDetectionThreshold: number;
121+
lowActionablesThreshold: number;
122+
}
123+
```
124+
125+
**Key Logic:**
126+
127+
- Detect snapshot failures (low elements, canvas pages)
128+
- Switch to vision executor with screenshot input
129+
- Use vision verifier for state verification
130+
- Fall back gracefully to DOM mode when possible
131+
132+
---
133+
134+
### 4. Intent Heuristics
135+
136+
**Python Reference:** `IntentHeuristics` protocol, `_try_intent_heuristics()`
137+
138+
**Description:** Pluggable domain-specific element selection without LLM:
139+
140+
- E-commerce: "Add to Cart", "Checkout" buttons
141+
- Authentication: login forms, password fields
142+
- Search: search boxes, result links
143+
144+
**Implementation Effort:** ~100 lines
145+
146+
**Interface:**
147+
148+
```typescript
149+
interface IntentHeuristics {
150+
findElementForIntent(
151+
intent: string,
152+
elements: SnapshotElement[],
153+
url: string,
154+
goal: string
155+
): number | null;
156+
157+
priorityOrder(): string[];
158+
}
159+
160+
// Example implementation
161+
class EcommerceHeuristics implements IntentHeuristics {
162+
findElementForIntent(intent, elements, url, goal) {
163+
if (intent.toLowerCase().includes('add to cart')) {
164+
const btn = elements.find(el => el.text?.toLowerCase().includes('add to cart'));
165+
return btn?.id ?? null;
166+
}
167+
return null; // Fall back to LLM
168+
}
169+
170+
priorityOrder() {
171+
return ['add_to_cart', 'checkout', 'search'];
172+
}
173+
}
174+
```
175+
176+
**Key Logic:**
177+
178+
- Check heuristics before calling executor LLM
179+
- Reduces token usage for common patterns
180+
- Improves reliability for known sites
181+
182+
---
183+
184+
### 5. Recovery Navigation
185+
186+
**Python Reference:** `RecoveryNavigationConfig`, `_last_known_good_url`
187+
188+
**Description:** Track and recover from off-track navigation:
189+
190+
- Remember last URL where verification passed
191+
- Navigate back when subsequent steps fail
192+
- Detect when agent is lost
193+
194+
**Implementation Effort:** ~80 lines
195+
196+
**Config Interface:**
197+
198+
```typescript
199+
interface RecoveryNavigationConfig {
200+
enabled: boolean;
201+
maxRecoveryAttempts: number;
202+
trackSuccessfulUrls: boolean;
203+
}
204+
```
205+
206+
**Key Logic:**
207+
208+
- Store URL after successful verification
209+
- On repeated failures, navigate back to last good URL
210+
- Replan from recovered state
211+
212+
---
213+
214+
### 6. Checkout/Auth Boundary Detection
215+
216+
**Python Reference:** `CheckoutDetectionConfig`, `AuthBoundaryConfig`
217+
218+
**Description:** Detect when agent reaches boundaries that require human intervention:
219+
220+
- Checkout pages requiring payment info
221+
- Login/signup pages requiring credentials
222+
- Age verification gates
223+
224+
**Implementation Effort:** ~60 lines
225+
226+
**Config Interface:**
227+
228+
```typescript
229+
interface CheckoutDetectionConfig {
230+
enabled: boolean;
231+
urlPatterns: string[]; // e.g., ['/checkout', '/payment']
232+
elementPatterns: string[]; // e.g., ['credit card', 'payment']
233+
stopOnDetection: boolean;
234+
}
235+
236+
interface AuthBoundaryConfig {
237+
enabled: boolean;
238+
urlPatterns: string[]; // e.g., ['/login', '/signin']
239+
elementPatterns: string[]; // e.g., ['sign in', 'log in']
240+
stopOnDetection: boolean;
241+
}
242+
```
243+
244+
---
245+
246+
### 7. Executor Override
247+
248+
**Python Reference:** `ExecutorOverride` protocol
249+
250+
**Description:** Validate or override executor's element choices before action:
251+
252+
- Safety checks (block delete buttons)
253+
- Domain-specific corrections
254+
- Audit logging
255+
256+
**Implementation Effort:** ~50 lines
257+
258+
**Interface:**
259+
260+
```typescript
261+
interface ExecutorOverride {
262+
validateChoice(
263+
elementId: number,
264+
action: string,
265+
elements: SnapshotElement[],
266+
goal: string
267+
): {
268+
valid: boolean;
269+
overrideElementId?: number;
270+
rejectionReason?: string;
271+
};
272+
}
273+
```
274+
275+
---
276+
277+
### 8. Upfront Planning Mode
278+
279+
**Python Reference:** `plan()`, `replan()` methods
280+
281+
**Description:** Generate full execution plan upfront (alternative to stepwise):
282+
283+
- Better for known workflows
284+
- Supports plan patching on failure
285+
- More efficient for simple tasks
286+
287+
**Implementation Effort:** ~200 lines
288+
289+
**Key Functions:**
290+
291+
- `plan(task, startUrl)` - Generate full plan
292+
- `replan(task, failedStep, reason)` - Patch plan after failure
293+
- `run(runtime, task)` - Execute with upfront planning
294+
295+
---
296+
297+
### 9. Task Category Pruning
298+
299+
**Python Reference:** `PruningTaskCategory`, `prune_with_recovery()`
300+
301+
**Description:** Category-specific element filtering to reduce context size:
302+
303+
- Shopping: prioritize product/cart elements
304+
- Search: prioritize search box/results
305+
- Auth: prioritize form fields
306+
307+
**Implementation Effort:** ~150 lines
308+
309+
**Categories:**
310+
311+
- `shopping`, `checkout`, `search`, `auth`, `form_filling`, `extraction`, `navigation`
312+
313+
---
314+
315+
## Implementation Priority
316+
317+
Recommended order based on impact and complexity:
318+
319+
1. **Intent Heuristics** - High impact, low complexity, reduces token usage
320+
2. **Modal Dismissal** - Common pain point, medium complexity
321+
3. **Vision Fallback** - Required for canvas/complex pages
322+
4. **Captcha Handling** - Needed for production use
323+
5. **Recovery Navigation** - Improves reliability
324+
6. **Upfront Planning** - Alternative mode for simple tasks
325+
7. **Boundary Detection** - Nice to have for graceful stops
326+
8. **Executor Override** - Nice to have for safety
327+
9. **Task Category Pruning** - Optimization for large pages
328+
329+
## References
330+
331+
- Python implementation: `sdk-python/predicate/agents/planner_executor_agent.py`
332+
- Design doc: `docs/sdk-ts-doc/2026-03-28_planner_executor_agent_port.md`
333+
- Chrome extension feasibility: `docs/sdk-python-doc/2026-04-13_predicate_chrome_extension_agent_feasibility.md`

0 commit comments

Comments
 (0)