Conversation
Evals expanded from 2 to 25 covering all skill capabilities: reactive install, proactive audit, preferred tool recommendations, binary name mapping, PATH troubleshooting, project type detection, batch updates, permission workarounds, and tool integration pipelines. SKILL.md improvements: added Advisory trigger category, inline preferred-tools table, troubleshooting quick reference, tighter workflow steps with hash -r guidance. 413 words (under 500 limit).
There was a problem hiding this comment.
Code Review
This pull request significantly expands the evaluation test suite in evals/evals.json by adding numerous test cases for CLI tool detection, installation, and modern tool recommendations. Additionally, the SKILL.md documentation has been refactored to improve clarity regarding workflows, preferred tools, and troubleshooting. The review feedback suggests using more specific assertion values in the evaluation file, such as 'ripgrep' and 'fd-find' instead of 'rg' and 'fd', to prevent false positives during testing, and refining the troubleshooting instructions for Debian-specific tool aliases.
| }, | ||
| { | ||
| "type": "content_contains", | ||
| "value": "fd", |
There was a problem hiding this comment.
The assertion value "fd" is very short and likely to cause false positives in evaluation results, as it can appear as a substring in many common words or paths. Consider using a more specific string like "fd-find" or "fdfind", which are the package and binary names mentioned in the documentation.
| "value": "fd", | |
| "value": "fd-find", |
| "assertions": [ | ||
| { | ||
| "type": "content_contains", | ||
| "value": "rg", |
There was a problem hiding this comment.
The assertion value "rg" is prone to false positives as it frequently appears in common words (e.g., "large", "target", "merge"). Since the description explicitly mentions "ripgrep", using "ripgrep" as the assertion value would be much more reliable and consistent with other tests in this file.
| "value": "rg", | |
| "value": "ripgrep", |
| "assertions": [ | ||
| { | ||
| "type": "content_contains", | ||
| "value": "fd", |
There was a problem hiding this comment.
| |---------|-----| | ||
| | Installed but not found | `hash -r` or add dir to PATH | | ||
| | No sudo | `cargo install`, `pip install --user`, manual binary | | ||
| | Debian `bat`=`batcat`, `fd`=`fdfind` | Symlink to `~/.local/bin/` | |
There was a problem hiding this comment.
The troubleshooting fix for Debian aliases is a bit vague. It would be more helpful to explicitly state that the symlink should be created using the standard tool name (e.g., bat or fd) to point to the Debian-specific binary.
| | Debian `bat`=`batcat`, `fd`=`fdfind` | Symlink to `~/.local/bin/` | | |
| | Debian bat=batcat, fd=fdfind | Symlink to standard name in ~/.local/bin/ | |
| "assertions": [ | ||
| { | ||
| "type": "content_contains", | ||
| "value": "fd", |
| }, | ||
| { | ||
| "type": "content_contains", | ||
| "value": "rg", |
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Snapshot WarningsEnsure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice. Scanned FilesNone |
Summary
Eval categories added
SKILL.md improvements
Test plan