Expand evals to 25 and improve SKILL.md by CybotTM · Pull Request #22 · netresearch/cli-tools-skill

CybotTM · 2026-04-01T08:28:34Z

Summary

Expanded eval suite from 2 to 25 tests covering all skill capabilities
Improved SKILL.md with better trigger categorization, inline preferred-tools table, troubleshooting quick reference, and tighter workflow steps (413 words, under 500 limit)

Eval categories added

Reactive install (4): command-not-found for rg, batcat, jq, ripgrep+fd
Preferred tool recommendations (7): grep->rg, find->fd, JSON->jq, YAML->yq, diff->difft, benchmark->hyperfine, CSV->qsv
Project type detection (3): Python, Node.js, Docker
Troubleshooting (3): PATH issues, hash cache, permission-blocked installs
Audit/Update (3): audit dependencies, batch update, environment PATH check
Catalog/Mapping (3): catalog lookup, binary name mapping, install via script
Integration (2): security tool suggestion, fd+rg pipeline

SKILL.md improvements

Added "Advisory" trigger category for modern tool recommendations
Compact 2-column preferred tools table (saves ~40% space vs old format)
Inline troubleshooting table with Debian alias fixes
Explicit hash -r guidance in resolution workflow
All content under 500-word limit

Test plan

Verify evals/evals.json is valid JSON with 25 entries
Verify SKILL.md renders correctly and stays under 500 words
Spot-check that eval prompts match actual skill capabilities
Confirm all referenced scripts and reference files exist

Evals expanded from 2 to 25 covering all skill capabilities: reactive install, proactive audit, preferred tool recommendations, binary name mapping, PATH troubleshooting, project type detection, batch updates, permission workarounds, and tool integration pipelines. SKILL.md improvements: added Advisory trigger category, inline preferred-tools table, troubleshooting quick reference, tighter workflow steps with hash -r guidance. 413 words (under 500 limit).

gemini-code-assist

Code Review

This pull request significantly expands the evaluation test suite in evals/evals.json by adding numerous test cases for CLI tool detection, installation, and modern tool recommendations. Additionally, the SKILL.md documentation has been refactored to improve clarity regarding workflows, preferred tools, and troubleshooting. The review feedback suggests using more specific assertion values in the evaluation file, such as 'ripgrep' and 'fd-find' instead of 'rg' and 'fd', to prevent false positives during testing, and refining the troubleshooting instructions for Debian-specific tool aliases.

gemini-code-assist · 2026-04-01T08:35:24Z

evals/evals.json

+      },
+      {
+        "type": "content_contains",
+        "value": "fd",


The assertion value "fd" is very short and likely to cause false positives in evaluation results, as it can appear as a substring in many common words or paths. Consider using a more specific string like "fd-find" or "fdfind", which are the package and binary names mentioned in the documentation.

Suggested change

"value": "fd",

"value": "fd-find",

gemini-code-assist · 2026-04-01T08:35:24Z

evals/evals.json

+    "assertions": [
+      {
+        "type": "content_contains",
+        "value": "rg",


The assertion value "rg" is prone to false positives as it frequently appears in common words (e.g., "large", "target", "merge"). Since the description explicitly mentions "ripgrep", using "ripgrep" as the assertion value would be much more reliable and consistent with other tests in this file.

Suggested change

"value": "rg",

"value": "ripgrep",

gemini-code-assist · 2026-04-01T08:35:24Z

evals/evals.json

+    "assertions": [
+      {
+        "type": "content_contains",
+        "value": "fd",


Using "fd" as a content assertion is risky due to potential false positives. Recommending the full name "fd-find" or checking for the specific binary name "fdfind" would improve the robustness of this evaluation case.

Suggested change

"value": "fd",

"value": "fd-find",

gemini-code-assist · 2026-04-01T08:35:24Z

skills/cli-tools/SKILL.md

+|---------|-----|
+| Installed but not found | `hash -r` or add dir to PATH |
+| No sudo | `cargo install`, `pip install --user`, manual binary |
+| Debian `bat`=`batcat`, `fd`=`fdfind` | Symlink to `~/.local/bin/` |


The troubleshooting fix for Debian aliases is a bit vague. It would be more helpful to explicitly state that the symlink should be created using the standard tool name (e.g., bat or fd) to point to the Debian-specific binary.

Suggested change

| Debian `bat`=`batcat`, `fd`=`fdfind` | Symlink to `~/.local/bin/` |

| Debian bat=batcat, fd=fdfind | Symlink to standard name in ~/.local/bin/ |

gemini-code-assist · 2026-04-01T08:35:24Z

evals/evals.json

+    "assertions": [
+      {
+        "type": "content_contains",
+        "value": "fd",


As noted in previous cases, "fd" is too generic for a reliable content assertion. Using "fd-find" or "fdfind" is preferred to ensure the evaluation correctly identifies the tool recommendation.

Suggested change

"value": "fd",

"value": "fd-find",

gemini-code-assist · 2026-04-01T08:35:24Z

evals/evals.json

+      },
+      {
+        "type": "content_contains",
+        "value": "rg",


Using "rg" as an assertion value here is likely to trigger false positives. Switching to "ripgrep" provides a safer and more accurate verification of the tool's presence in the output.

Suggested change

"value": "rg",

"value": "ripgrep",

github-actions · 2026-04-01T08:43:43Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA d433f22.

Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

Scanned Files

None

gemini-code-assist bot reviewed Apr 1, 2026

View reviewed changes

CybotTM merged commit 9e9d48f into main Apr 1, 2026
7 of 8 checks passed

CybotTM deleted the feature/evals-and-improvements branch April 1, 2026 09:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand evals to 25 and improve SKILL.md#22

Expand evals to 25 and improve SKILL.md#22
CybotTM merged 1 commit intomainfrom
feature/evals-and-improvements

CybotTM commented Apr 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	\| Debian `bat`=`batcat`, `fd`=`fdfind` \| Symlink to `~/.local/bin/` \|
	\| Debian bat=batcat, fd=fdfind \| Symlink to standard name in ~/.local/bin/ \|

Conversation

CybotTM commented Apr 1, 2026

Summary

Eval categories added

SKILL.md improvements

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 1, 2026

Dependency Review

Snapshot Warnings

Scanned Files

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant