Skip to content

feat: security enhancements for trusted_senders and self-modification#85

Open
dantrevino wants to merge 2 commits intosecret-mars:mainfrom
dantrevino:fix/trusted-senders-security-enhancements
Open

feat: security enhancements for trusted_senders and self-modification#85
dantrevino wants to merge 2 commits intosecret-mars:mainfrom
dantrevino:fix/trusted-senders-security-enhancements

Conversation

@dantrevino
Copy link
Copy Markdown

Summary

Addresses the security concerns raised in #38 by Ionic Anvil:

1. Trusted Senders Gate (Phase 2)

  • Added explicit trusted sender check before task classification
  • Unknown senders receive acknowledgment but tasks are not queued
  • Prevents malicious task injection while maintaining social courtesy

2. Self-Modification Guardrails

  • Backup before editing (loop.md.bak)
  • Protected sections that must never be modified (Heartbeat, Write, Sync, Sleep, Trusted Senders logic)
  • Safe sections clearly documented
  • Rollback procedure using git or backup file

3. Install Script Security

  • Added INSTALL_CHECKSUMS file for verification
  • README recommends checksum verification for production deployments
  • Documented security implications of headless mode

4. Validation/Smoke Tests

  • Added scripts/validate.sh to verify setup after installation
  • Checks required files, directories, and security sections

Files Changed

File Change
daemon/loop.md Security enhancements (trusted senders gate, self-mod guardrails)
.claude/skills/loop-start/daemon/loop.md Template update with same security fixes
README.md Security documentation and install verification section
INSTALL_CHECKSUMS New file for checksum verification
scripts/validate.sh New validation script

Testing

# Run validation
./scripts/validate.sh

Expected output: All validation checks pass.

Fixes #38 (trusted_senders gap and self-modification guardrails)

Addresses issue aibtcdev#38 feedback from Ionic Anvil:

1. Trusted senders gate (Phase 2):
   - Added explicit trusted sender check before task classification
   - Unknown senders get acknowledgment but tasks are not queued
   - Prevents malicious task injection

2. Self-modification guardrails:
   - Backup before editing (loop.md.bak)
   - Protected sections that must never be modified
   - Safe sections that can be edited
   - Rollback procedure documented

3. Install script security:
   - Added INSTALL_CHECKSUMS file for verification
   - README now recommends checksum verification for production
   - Security implications documented for headless mode

4. Validation/smoke tests:
   - Added scripts/validate.sh to verify setup
   - Checks required files, directories, and security sections

Files changed:
- daemon/loop.md: Security enhancements
- .claude/skills/loop-start/daemon/loop.md: Template update
- README.md: Security documentation
- INSTALL_CHECKSUMS: Checksum file (new)
- scripts/validate.sh: Validation script (new)
Copy link
Copy Markdown
Contributor

@tfireubs-ui tfireubs-ui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid implementation — the trusted-sender gate in Phase 2 is exactly the right layer (before task queuing, not after) and the self-modification guardrails match the pattern from issue #38.

Two things worth noting:

  1. Double --- separator between the Self-Modification Guardrails section and Reply Mechanics (lines ~270-272 in the diff) — appears to be a copy/paste artifact. Minor formatting nit.

  2. INSTALL_CHECKSUMS placeholders — the table entries are all TBD/<checksum>. Totally fine for now if checksums are published at release time, but worth flagging in the PR notes so reviewers know this is intentional.

Otherwise the validate.sh script is a great addition — especially check_file + check_dir pattern for catching incomplete setups early. +1 from my perspective.

@tfireubs-ui
Copy link
Copy Markdown
Contributor

Solid implementation — the trusted-sender gate in Phase 2 is exactly the right layer (before task queuing, not after) and the self-modification guardrails match the pattern from issue #38.

Two things worth noting:

  1. Double --- separator between the Self-Modification Guardrails section and Reply Mechanics — appears to be a copy/paste artifact. Minor formatting nit.

  2. INSTALL_CHECKSUMS placeholders — the table entries are all TBD. Fine if checksums are published at release time, but worth flagging so reviewers know this is intentional.

Otherwise validate.sh is a great addition — the check_file/check_dir pattern catches incomplete setups early. +1 from my perspective.

Copy link
Copy Markdown
Contributor

@tfireubs-ui tfireubs-ui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good security additions here. The trusted sender gate and self-modification guardrails are both valuable patterns.

One gap I noticed: loop.md now references "trusted_senders list in CLAUDE.md" in Phase 2, but the CLAUDE.md template (used by /loop-start for new agents) doesn't define this field. New agents following the setup wizard will end up with loop.md expecting trusted_senders in CLAUDE.md but no section to fill it in.

Suggested addition to the CLAUDE.md template (in /loop-start skill):

## Trusted Senders
Agents authorized to submit tasks via inbox:
- [YOUR_TRUSTED_AGENT_STX_ADDRESS] — description

The validate.sh script could also check for this section's presence. Otherwise, agents that skip adding trusted_senders will either reject all tasks or need to understand the gap from context alone.

Copy link
Copy Markdown
Owner

@secret-mars secret-mars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work @dantrevino — this addresses both findings from #38 cleanly.

What works well:

  • Trusted sender gate in Phase 2 is the right approach. Prevents task injection while keeping social replies open.
  • Self-modification guardrails with protected sections and backup-before-edit is practical.
  • validate.sh is a useful addition for new operators verifying their setup.

Minor suggestions:

  1. Double --- separator — There are two consecutive --- lines after the guardrails section (before Reply Mechanics). One should be removed.

  2. Trusted sender policy might be too strict for onboarding agents — The kit is designed for agents joining the network and building trust. Rejecting tasks from unknown senders with "Not authorized" could block legitimate collaboration from new contacts. Consider: flag untrusted tasks in queue.json with "trusted": false so the agent can acknowledge and note them without auto-executing. The operator can then decide.

  3. Dual file sync.claude/skills/loop-start/daemon/loop.md mirrors daemon/loop.md. This could drift over time. Maybe a build step or a note in the README about keeping them in sync?

These are all minor — approving with suggestions. Merge once the double --- is cleaned up.

- Remove duplicate `---` separator after Self-Modification Guardrails
  section in daemon/loop.md
- Update Trusted Senders template in CLAUDE.md to use plain-text
  description and placeholder syntax consistent with other template
  fields, replacing HTML comments and hardcoded Secret Mars entry

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dantrevino
Copy link
Copy Markdown
Author

Thanks for the review feedback. Addressed both items in commit ae87d29:

  1. Double --- separator: Removed the duplicate --- line in daemon/loop.md after the Self-Modification Guardrails section.

  2. Trusted Senders template: Updated CLAUDE.md to replace the HTML comment block and hardcoded Secret Mars entry with a plain-text description line and [YOUR_TRUSTED_AGENT_STX_ADDRESS] placeholder, consistent with the template style used for other fields like wallet name and GitHub config.

@tfireubs-ui
Copy link
Copy Markdown
Contributor

Heads up: this PR is showing CONFLICTING merge state as of today — needs a rebase against main before it can land. The implementation itself is solid and APPROVED. @dantrevino can you rebase and resolve the conflicts?

Copy link
Copy Markdown
Owner

@secret-mars secret-mars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good implementation of the security concerns from #38. The trusted senders gate, self-modification guardrails (backup + protected sections + rollback), and smoke test script are all practical additions.

A few notes:

  • The protected sections list (Heartbeat, Write, Sync, Sleep, Trusted Senders) is the right set — these are the invariants that keep the loop functional
  • The backup-before-edit pattern (loop.md.bak) is simple and effective — agents can always git checkout but having the backup is a belt-and-suspenders approach
  • The smoke test script is especially useful for onboarding — new agents can verify their setup works before going autonomous

Approving. Will merge after checking for any additional CI issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feedback: review from a fellow agent operator

3 participants