Skip to content

Experiment: BetterEdit with [upto] — disappointing results #254

@lpdink

Description

@lpdink

I followed up on @antirez's [upto] edit design. Here's what happened when I tried to bring it to my agent (powered cloud-based LLM).

Setup

  • I run agents via cloud API (DashScope's Qwen 3.7 Max), not local inference. So I can't interrupt generation on mismatch or force the sampler like antirez does. All I could do was implement the tool with [upto] support and describe it in the schema.
  • The task was a real refactoring project: the agent rewriting its own session, agent template, and context management code. Ended up with ~1300 lines of changes, ~800 added, ~500 deleted. Plenty of places where [upto] would have saved the model from retyping huge old blocks.
  • Tool set: Bash, Read, Write, BetterEdit, Glob, Grep, AskUserQuestion, TodoWrite. I deliberately aligned tool naming with Claude Code's conventions, hoping models SFT'd on Claude Code patterns would generalize better.
  • BetterEdit was described with the standard OpenAI function-calling schema: "use [upto] for large replacements: write the first lines, then [upto], then the final lines."
  • Audit: subscribed to the tool call event bus, logged BetterEdit calls, checked whether old_block contained [upto], and whether the call succeeded.

Results

{
  "tool_calls": {
    "ds-v4-flash:BetterEdit": {
      "times": 10,
      "with_upto_times": 2,
      "upto_error_times": 0,
      "error_times": 1
    },
    "qwen3.7-max:BetterEdit": {
      "times": 88,
      "with_upto_times": 0,
      "upto_error_times": 0,
      "error_times": 0
    }
  }
}
  • The 2 ds-v4-flash uses only happened after I explicitly said in the user message "we just implemented a better edit tool, please try it." Without that nudge — zero.
  • Qwen 3.7 Max had a real refactoring task with an OpenSpec checklist, exactly the kind of workflow where [upto] should shine. Never used it once. 88 calls, all standard old/new block matches.

Conclusion

SFT is hard to overcome. If the model wasn't trained to use anchored edit, prompting alone won't make it do so — at least not with the models I have access to. The 2/10 on ds-v4-flash required explicitly asking it to try the tool.
Maybe this changes with models that got [upto]-like patterns in their training data, or when local inference allows the kind of forced sampling antirez demonstrated. For cloud-based agents today, the standard old/new block edit remains the practical default.

Appendix

All Tool Call Metrics

id call_times error_times detail
2026-05-24:ds-v4-flash:Read 72 0 null
2026-05-24:ds-v4-flash:Bash 27 0 null
2026-05-24:ds-v4-flash:Grep 38 0 null
2026-05-24:ds-v4-flash:Edit 2 0 null
2026-05-24:ds-v4-flash:BetterEdit 10 1 Error executing tool 'BetterEdit': old_block not found in 16 lines
2026-05-24:qwen3.7-max:BetterEdit 13 0 null
2026-05-24:ds-v4-flash:TodoWrite 4 0 null
2026-05-24:ds-v4-flash:Write 2 0 null
2026-05-25:qwen3.7-max:Read 80 0 null
2026-05-25:qwen3.7-max:Grep 7 0 null
2026-05-25:qwen3.7-max:Glob 7 0 null
2026-05-25:qwen3.7-max:TodoWrite 5 0 null
2026-05-25:qwen3.7-max:Bash 13 0 null
2026-05-25:qwen3.7-max:Write 6 0 null
2026-05-25:qwen3.7-max:BetterEdit 17 0 null
2026-05-26:qwen3.7-max:Write 5 0 null
2026-05-26:qwen3.7-max:BetterEdit 58 0 null
2026-05-26:qwen3.7-max:Read 51 0 null
2026-05-26:qwen3.7-max:Grep 12 0 null
2026-05-26:qwen3.7-max:TodoWrite 10 0 null
2026-05-26:qwen3.7-max:Bash 33 0 null
2026-05-26:qwen3.7-max:Glob 2 0 null
2026-05-26:ds-v4-flash:Read 2 0 null

Tool Schema

{
    "model": "qwen3.7-max",
    "reasoning_effort": "xhigh",
    "stream": true,
    "stream_options": {
        "include_usage": true
    },
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "AskUserQuestion",
                "description": "Ask user a question and wait for their response.\n\nUse this tool when you need user input during execution:\n- Uncertain technical decisions\n- Conflicts with previous instructions\n- Need for requirement clarification\n- Presenting options for user to choose\n\nArgs:\n    question: The question to ask the user.\n    agent: The agent instance (injected automatically).\n    choices: Optional list of suggested choices. User can still provide\n        their own answer freely.\n\nReturns:\n    User's response as a string.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "question": {
                            "type": "string"
                        },
                        "choices": {
                            "type": "array",
                            "items": {
                                "type": "string"
                            }
                        }
                    },
                    "required": [
                        "question",
                        "choices"
                    ]
                }
            }
        },
        {
            "type": "function",
            "function": {
                "name": "Bash",
                "description": "Execute a shell command in a specific directory.\n\nArgs:\n    command: The command to execute.\n    timeout: Maximum wait time in seconds.\n\nReturns:\n    Command output with exit code.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "command": {
                            "type": "string"
                        },
                        "timeout": {
                            "type": "integer",
                            "default": 30
                        },
                        "purpose": {
                            "type": "string",
                            "description": "简要说明本次工具调用的目的,控制在20个字以内",
                            "default": ""
                        }
                    },
                    "required": [
                        "command",
                        "purpose"
                    ]
                }
            }
        },
        {
            "type": "function",
            "function": {
                "name": "BetterEdit",
                "description": "Edit a file using path, old_block, and new_block. The old text must match exactly once in the file; otherwise the edit fails for safety.\n\nFor large replacements, prefer anchored old_block: write the first lines, then [upto], then the final lines.\nThe tool replaces everything from the head through the tail. If the head or tail is ambiguous, the edit fails.\n\nAfter [upto], always write unique final lines before closing old_block; never close old_block immediately after [upto].\nDo not use a generic tail anchor like:\n\n    some_function() {\n        ...\n[upto]\n    }\n\nbecause the closing brace may match many functions. Instead include final lines that are unique near that function,\nfor example its last calculation and return line before the brace.\n\nExample anchored edit:\n\n    old_block: \"static int parse(void) {\n        int ok = 0;\n[upto]\n        return ok;\n    }\"\n    new_block: \"static int parse(void) {\n        return parse_impl();\n    }\"\n\nTo insert text, use old_block set to an exact unique anchor and new_block set to that anchor plus the added text.\n\nWithout [upto], old_block must match exactly once.\n\nArgs:\n    path: Target file path.\n    old_block: Exact text to find. Use [upto] marker for anchored edit.\n    new_block: Replacement text.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "path": {
                            "type": "string"
                        },
                        "old_block": {
                            "type": "string"
                        },
                        "new_block": {
                            "type": "string"
                        }
                    },
                    "required": [
                        "path",
                        "old_block",
                        "new_block"
                    ]
                }
            }
        },
        {
            "type": "function",
            "function": {
                "name": "Glob",
                "description": "Find files matching glob pattern.\n\nSupports glob patterns like \"**/*.py\" or \"src/**/*.ts\".\nReturns matching file paths sorted by modification time (newest first).\n\nArgs:\n    pattern: The glob pattern to match files against (e.g., \"**/*.py\").\n    path: The directory to search in. Defaults to current directory.\n    respect_gitignore: Whether to respect ignore rules (.gitignore, .ignore, etc).\n        Only effective in git repositories. Defaults to True.\n\nReturns:\n    List of matching file paths, one per line.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "pattern": {
                            "type": "string"
                        },
                        "path": {
                            "type": "string",
                            "default": "."
                        },
                        "respect_gitignore": {
                            "type": "boolean",
                            "default": true
                        }
                    },
                    "required": [
                        "pattern"
                    ]
                }
            }
        },
        {
            "type": "function",
            "function": {
                "name": "Grep",
                "description": "Search file contents with regex pattern.\n\nArgs:\n    pattern: The regular expression pattern to search for.\n    path: Directory or file to search in. Defaults to current directory.\n    glob: Glob pattern to filter files (e.g., \"*.py\", \"*.ts\"). Ignored if path is a file.\n    output_mode: \"files_with_matches\" (default), \"content\", or \"count\".\n    i: Case insensitive search.\n    head_limit: Max results to return (default 100).\n    respect_gitignore: Whether to respect ignore rules (.gitignore, .ignore, etc).\n        Only effective in git repositories. Defaults to True.\n    context: Number of lines to show before and after each match.\n        Only works with output_mode=\"content\". Default is 0.\n\nReturns:\n    - files_with_matches: file paths containing the pattern\n    - content: file:line content for each match (with context if specified)\n    - count: file path and match count",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "pattern": {
                            "type": "string"
                        },
                        "path": {
                            "type": "string",
                            "default": "."
                        },
                        "glob": {
                            "type": "string",
                            "default": "*"
                        },
                        "output_mode": {
                            "type": "string",
                            "default": "files_with_matches"
                        },
                        "i": {
                            "type": "boolean",
                            "default": false
                        },
                        "head_limit": {
                            "type": "integer",
                            "default": 100
                        },
                        "respect_gitignore": {
                            "type": "boolean",
                            "default": true
                        },
                        "context": {
                            "type": "integer",
                            "default": 0
                        }
                    },
                    "required": [
                        "pattern",
                        "i",
                        "context"
                    ]
                }
            }
        },
        {
            "type": "function",
            "function": {
                "name": "Read",
                "description": "Read file segment. offset=0 is first line, negative counts from end.\n\nArgs:\n    path: File to read.\n    offset: Starting line (0-based). -1 = last line.\n    limit: Max lines to return (capped at 1000).\n\nReturns:\n    [file: PATH | lines START-END/TOTAL | ENCODING]\n    Content...\n    [... N more lines]  # if truncated\n\nErrors: \"read_file: PATH: No such file|Is a directory|Permission denied|Binary file\"",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "path": {
                            "type": "string"
                        },
                        "offset": {
                            "type": "integer",
                            "default": 0
                        },
                        "limit": {
                            "type": "integer",
                            "default": 200
                        }
                    },
                    "required": [
                        "path",
                        "offset"
                    ]
                }
            }
        },
        {
            "type": "function",
            "function": {
                "name": "TodoWrite",
                "description": "Update the todo list for the current session. Use proactively for complex multi-step tasks (3+ steps). Rules: keep exactly ONE task as in_progress at a time; mark completed IMMEDIATELY after finishing; remove irrelevant tasks; provide content (imperative) and optionally activeForm (present continuous).",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "todos": {
                            "type": "array",
                            "description": "The updated todo list (replaces the entire previous list)",
                            "items": {
                                "type": "object",
                                "properties": {
                                    "content": {
                                        "type": "string",
                                        "description": "Task description in imperative form (e.g. 'Run tests')"
                                    },
                                    "status": {
                                        "type": "string",
                                        "enum": [
                                            "pending",
                                            "in_progress",
                                            "completed"
                                        ],
                                        "description": "Task state: pending = not started, in_progress = currently working, completed = done"
                                    },
                                    "activeForm": {
                                        "type": "string",
                                        "description": "Present continuous form (e.g. 'Running tests'). Optional — defaults to content if omitted."
                                    }
                                },
                                "required": [
                                    "content",
                                    "status"
                                ]
                            }
                        }
                    },
                    "required": [
                        "todos"
                    ]
                }
            }
        },
        {
            "type": "function",
            "function": {
                "name": "Write",
                "description": "Write content to file (overwrite).\n\nArgs:\n    path: Target file path.\n    content: Content to write.\n\nReturns:\n    Success with stats, or error message.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "path": {
                            "type": "string"
                        },
                        "content": {
                            "type": "string"
                        }
                    },
                    "required": [
                        "path",
                        "content"
                    ]
                }
            }
        }
    ],
    "enable_thinking": true
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions