predicate-sdk-playground/planner_executor_local at main · PredicateSystems/predicate-sdk-playground

Name	Name	Last commit message	Last commit date
parent directory ..
artifacts/0fa04671-c8bb-4354-aabb-e72d4f6f90fd-1768869969639	artifacts/0fa04671-c8bb-4354-aabb-e72d4f6f90fd-1768869969639
screenshots	screenshots
README.md	README.md
download_deepseek_r1_distill_qwen_14b.py	download_deepseek_r1_distill_qwen_14b.py
download_qwen25_3b.py	download_qwen25_3b.py
download_qwen25_7b.py	download_qwen25_7b.py
download_qwen3_vl.py	download_qwen3_vl.py
log.txt	log.txt
logs.md	logs.md
logs2.md	logs2.md
logs3.md	logs3.md
logs4.md	logs4.md
logs5.md	logs5.md
main.py	main.py

Planner + Executor (Local)

This folder is a starter for a planner + executor Amazon flow, modeled after amazon_shopping_with_assertions (snapshots + AgentRuntime assertions per step).

Planner: Qwen 3.5 9B (MLX 4-bit quantized) — produces a step plan
Executor: Qwen 3.5 4B (MLX 4-bit quantized) — executes each step deterministically

Upgrade note: We now use Qwen 3.5 models (9B planner, 4B executor) which provide better JSON output and reasoning compared to the older Qwen 2.5 models (7B/3B). The MLX 4-bit quantization runs efficiently on Apple Silicon.

Model Recommendation (Apple Silicon)

Default: MLX with 4-bit quantized Qwen 3.5 models.

Apple Silicon (recommended): Uses MLX backend with mlx-community/Qwen3.5-9B-MLX-4bit (planner) and mlx-community/Qwen3.5-4B-MLX-4bit (executor). Fast and memory-efficient.
CUDA GPU available: You can use HuggingFace Transformers with 4-bit quantization (load_in_4bit=True).
CPU-only: Not recommended for 9B models; expect very slow runtimes.

Prerequisites (MLX - Apple Silicon)

Install the MLX language model library:

pip install mlx-lm>=0.31.1

Important: Version 0.31.1+ is required for Qwen 3.5 model support.

Running on Other Platforms (Linux/Windows)

MLX is Apple Silicon only. For Linux or Windows, use one of the following approaches:

Option 1: NVIDIA CUDA GPU (Recommended for non-Mac)

Use HuggingFace Transformers with 4-bit quantization via bitsandbytes:

Prerequisites:

# Install PyTorch with CUDA support (adjust cuda version as needed)
pip install torch --index-url https://download.pytorch.org/whl/cu121

# Install transformers and quantization dependencies
pip install transformers>=4.40.0 accelerate bitsandbytes>=0.43.0

Run with HuggingFace provider:

export PLANNER_PROVIDER=hf \
PLANNER_MODEL=Qwen/Qwen2.5-7B-Instruct \
EXECUTOR_PROVIDER=hf \
EXECUTOR_MODEL=Qwen/Qwen2.5-3B-Instruct
python main.py

Note: The HuggingFace provider automatically enables 4-bit quantization (load_in_4bit=True) when bitsandbytes is installed. For Qwen 3.5 models on HuggingFace, check model availability as MLX-specific quantizations won't work.

VRAM requirements (approximate):

Qwen 2.5 7B (4-bit): ~6GB VRAM
Qwen 2.5 3B (4-bit): ~3GB VRAM
Combined: ~10GB VRAM minimum

Option 2: Windows with WSL2

If running on Windows, we recommend using WSL2 (Windows Subsystem for Linux) with Ubuntu:

Install WSL2: wsl --install
Install CUDA drivers for WSL2 from NVIDIA
Follow the Linux CUDA instructions above inside WSL2

Option 3: CPU-Only (Not Recommended)

Running 7B+ models on CPU is extremely slow (expect 10-30+ seconds per inference).

export PLANNER_PROVIDER=hf \
PLANNER_MODEL=Qwen/Qwen2.5-3B-Instruct \
EXECUTOR_PROVIDER=hf \
EXECUTOR_MODEL=Qwen/Qwen2.5-1.5B-Instruct
python main.py

Warning: CPU inference for the planner will be very slow. Consider using smaller models (1.5B-3B) or a cloud API for the planner while running the executor locally.

Option 4: Cloud API for Planner, Local Executor

For best results on limited hardware, use a cloud API for the planner (reasoning-heavy) and run only the executor locally:

export PLANNER_PROVIDER=openai \
PLANNER_MODEL=gpt-4o-mini \
EXECUTOR_PROVIDER=hf \
EXECUTOR_MODEL=Qwen/Qwen2.5-3B-Instruct
python main.py

This hybrid approach gives you cloud-quality planning with local execution (no screenshots sent to cloud).

Next Step

Run the scaffold:

python main.py

This scaffold includes:

Planner feedback loop: executor failures are summarized back to the planner for a revised plan.
JSON schema validation: plan output is validated against the advanced plan format.
Planner feedback JSONL: per-run file in planner_feedback/<run_id>.jsonl.
Summary JSON: compact summary at planner_feedback/<run_id>.summary.json.
Vision fallback (optional): set ENABLE_VISION_FALLBACK=1.

Default Models (Qwen 3.5)

The script defaults to Qwen 3.5 MLX models on Apple Silicon:

# Default (no env vars needed on Apple Silicon)
python main.py

This uses:

Planner: mlx-community/Qwen3.5-9B-MLX-4bit
Executor: mlx-community/Qwen3.5-4B-MLX-4bit

Custom Model Configuration

To override models or use different providers:

export PLANNER_PROVIDER=mlx \
PLANNER_MODEL=mlx-community/Qwen3.5-9B-MLX-4bit \
EXECUTOR_PROVIDER=mlx \
EXECUTOR_MODEL=mlx-community/Qwen3.5-4B-MLX-4bit
python main.py

For HuggingFace Transformers (CUDA):

export PLANNER_PROVIDER=hf \
PLANNER_MODEL=Qwen/Qwen2.5-7B-Instruct \
EXECUTOR_PROVIDER=hf \
EXECUTOR_MODEL=Qwen/Qwen2.5-3B-Instruct
python main.py

Vision Fallback (optional)

By default, the executor is text-only. To enable vision fallback:

ENABLE_VISION_FALLBACK=1 \
VISION_PROVIDER=local \
VISION_MODEL=Qwen/Qwen3-VL-8B-Instruct \
python main.py

On Apple Silicon, you can use MLX-VLM:

ENABLE_VISION_FALLBACK=1 \
VISION_PROVIDER=mlx \
VISION_MODEL=mlx-community/Qwen3-VL-8B-Instruct-3bit \
python main.py

Vision fallback behavior:

If executor cannot produce a CLICK(<id>), vision selects an element ID from the snapshot list.
If required verification fails after a click, vision can re-select a better element ID and retry.
Vision responses are logged as vision_select events in the JSONL feedback.

If you want, I can add:

A planner/executor scaffold script
JSON step schema + validator
Executor loop that maps plan steps to AgentRuntime assertions
Planner feedback channel (executor writes assertion outcomes back to planner)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Planner + Executor (Local)

Model Recommendation (Apple Silicon)

Prerequisites (MLX - Apple Silicon)

Running on Other Platforms (Linux/Windows)

Option 1: NVIDIA CUDA GPU (Recommended for non-Mac)

Option 2: Windows with WSL2

Option 3: CPU-Only (Not Recommended)

Option 4: Cloud API for Planner, Local Executor

Next Step

Default Models (Qwen 3.5)

Custom Model Configuration

Vision Fallback (optional)

FilesExpand file tree

planner_executor_local

Directory actions

More options

Directory actions

More options

Latest commit

History

planner_executor_local

Folders and files

parent directory

README.md

Planner + Executor (Local)

Model Recommendation (Apple Silicon)

Prerequisites (MLX - Apple Silicon)

Running on Other Platforms (Linux/Windows)

Option 1: NVIDIA CUDA GPU (Recommended for non-Mac)

Option 2: Windows with WSL2

Option 3: CPU-Only (Not Recommended)

Option 4: Cloud API for Planner, Local Executor

Next Step

Default Models (Qwen 3.5)

Custom Model Configuration

Vision Fallback (optional)