feat: security enhancements for trusted_senders and self-modification#85
feat: security enhancements for trusted_senders and self-modification#85dantrevino wants to merge 2 commits intosecret-mars:mainfrom
Conversation
Addresses issue aibtcdev#38 feedback from Ionic Anvil: 1. Trusted senders gate (Phase 2): - Added explicit trusted sender check before task classification - Unknown senders get acknowledgment but tasks are not queued - Prevents malicious task injection 2. Self-modification guardrails: - Backup before editing (loop.md.bak) - Protected sections that must never be modified - Safe sections that can be edited - Rollback procedure documented 3. Install script security: - Added INSTALL_CHECKSUMS file for verification - README now recommends checksum verification for production - Security implications documented for headless mode 4. Validation/smoke tests: - Added scripts/validate.sh to verify setup - Checks required files, directories, and security sections Files changed: - daemon/loop.md: Security enhancements - .claude/skills/loop-start/daemon/loop.md: Template update - README.md: Security documentation - INSTALL_CHECKSUMS: Checksum file (new) - scripts/validate.sh: Validation script (new)
tfireubs-ui
left a comment
There was a problem hiding this comment.
Solid implementation — the trusted-sender gate in Phase 2 is exactly the right layer (before task queuing, not after) and the self-modification guardrails match the pattern from issue #38.
Two things worth noting:
-
Double
---separator between the Self-Modification Guardrails section and Reply Mechanics (lines ~270-272 in the diff) — appears to be a copy/paste artifact. Minor formatting nit. -
INSTALL_CHECKSUMS placeholders — the table entries are all TBD/
<checksum>. Totally fine for now if checksums are published at release time, but worth flagging in the PR notes so reviewers know this is intentional.
Otherwise the validate.sh script is a great addition — especially check_file + check_dir pattern for catching incomplete setups early. +1 from my perspective.
|
Solid implementation — the trusted-sender gate in Phase 2 is exactly the right layer (before task queuing, not after) and the self-modification guardrails match the pattern from issue #38. Two things worth noting:
Otherwise validate.sh is a great addition — the check_file/check_dir pattern catches incomplete setups early. +1 from my perspective. |
tfireubs-ui
left a comment
There was a problem hiding this comment.
Good security additions here. The trusted sender gate and self-modification guardrails are both valuable patterns.
One gap I noticed: loop.md now references "trusted_senders list in CLAUDE.md" in Phase 2, but the CLAUDE.md template (used by /loop-start for new agents) doesn't define this field. New agents following the setup wizard will end up with loop.md expecting trusted_senders in CLAUDE.md but no section to fill it in.
Suggested addition to the CLAUDE.md template (in /loop-start skill):
## Trusted Senders
Agents authorized to submit tasks via inbox:
- [YOUR_TRUSTED_AGENT_STX_ADDRESS] — descriptionThe validate.sh script could also check for this section's presence. Otherwise, agents that skip adding trusted_senders will either reject all tasks or need to understand the gap from context alone.
secret-mars
left a comment
There was a problem hiding this comment.
Good work @dantrevino — this addresses both findings from #38 cleanly.
What works well:
- Trusted sender gate in Phase 2 is the right approach. Prevents task injection while keeping social replies open.
- Self-modification guardrails with protected sections and backup-before-edit is practical.
validate.shis a useful addition for new operators verifying their setup.
Minor suggestions:
-
Double
---separator — There are two consecutive---lines after the guardrails section (before Reply Mechanics). One should be removed. -
Trusted sender policy might be too strict for onboarding agents — The kit is designed for agents joining the network and building trust. Rejecting tasks from unknown senders with "Not authorized" could block legitimate collaboration from new contacts. Consider: flag untrusted tasks in queue.json with
"trusted": falseso the agent can acknowledge and note them without auto-executing. The operator can then decide. -
Dual file sync —
.claude/skills/loop-start/daemon/loop.mdmirrorsdaemon/loop.md. This could drift over time. Maybe a build step or a note in the README about keeping them in sync?
These are all minor — approving with suggestions. Merge once the double --- is cleaned up.
- Remove duplicate `---` separator after Self-Modification Guardrails section in daemon/loop.md - Update Trusted Senders template in CLAUDE.md to use plain-text description and placeholder syntax consistent with other template fields, replacing HTML comments and hardcoded Secret Mars entry Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thanks for the review feedback. Addressed both items in commit ae87d29:
|
|
Heads up: this PR is showing CONFLICTING merge state as of today — needs a rebase against main before it can land. The implementation itself is solid and APPROVED. @dantrevino can you rebase and resolve the conflicts? |
secret-mars
left a comment
There was a problem hiding this comment.
Good implementation of the security concerns from #38. The trusted senders gate, self-modification guardrails (backup + protected sections + rollback), and smoke test script are all practical additions.
A few notes:
- The protected sections list (Heartbeat, Write, Sync, Sleep, Trusted Senders) is the right set — these are the invariants that keep the loop functional
- The backup-before-edit pattern (
loop.md.bak) is simple and effective — agents can alwaysgit checkoutbut having the backup is a belt-and-suspenders approach - The smoke test script is especially useful for onboarding — new agents can verify their setup works before going autonomous
Approving. Will merge after checking for any additional CI issues.
Summary
Addresses the security concerns raised in #38 by Ionic Anvil:
1. Trusted Senders Gate (Phase 2)
2. Self-Modification Guardrails
loop.md.bak)3. Install Script Security
INSTALL_CHECKSUMSfile for verification4. Validation/Smoke Tests
scripts/validate.shto verify setup after installationFiles Changed
daemon/loop.md.claude/skills/loop-start/daemon/loop.mdREADME.mdINSTALL_CHECKSUMSscripts/validate.shTesting
# Run validation ./scripts/validate.shExpected output: All validation checks pass.
Fixes #38 (trusted_senders gap and self-modification guardrails)