-
Notifications
You must be signed in to change notification settings - Fork 207
feat: support AGENTS.md in Ask Sourcebot #777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: support AGENTS.md in Ask Sourcebot #777
Conversation
WalkthroughAdded retrieval and aggregation of repository AGENTS.md files, sanitized and truncated, then injected via a new Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Stream as createAgentStream
participant Repos as RepoFetcher
participant Prompt as createBaseSystemPrompt
participant Agent as AgentRuntime
Client->>Stream: initiate agent request
Stream->>Repos: fetch AGENTS.md from repositories (parallel)
Repos-->>Stream: AGENTS.md contents (truncated & sanitized)
Stream->>Prompt: build base prompt (include agentsMdContent)
Prompt-->>Stream: system prompt (with repository_instructions if present)
Stream->>Agent: initialize agent with system prompt + tools
Agent-->>Client: stream responses
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@packages/web/src/features/chat/agent.ts`:
- Around line 205-209: The template injects raw AGENTS.md into the system prompt
via agentsMdContent, creating a prompt-injection risk; update the code that
builds the prompt (the template using agentsMdContent) to (1) impose a
configurable max length on agentsMdContent and truncate with ellipses, (2)
sanitize/escape dangerous patterns (strip or escape XML/HTML tags, remove or
neutralize strings that resemble "system" or instruction overrides, and remove
sequences like "</system>" or other closing tags), and (3) add a comment/log
entry and configuration flag documenting that enabling AGENTS.md is a trust
decision; locate and change the prompt construction where agentsMdContent is
interpolated and apply these transformations before injection.
🧹 Nitpick comments (1)
packages/web/src/features/chat/agent.ts (1)
41-54: Consider adding a constant for the AGENTS.md filename and logging for observability.The hardcoded
'AGENTS.md'string could be extracted to a constant inconstants.tsfor consistency and easier maintenance. Additionally, consider logging when AGENTS.md files are successfully fetched for debugging purposes.♻️ Suggested improvement
In
constants.ts:export const AGENTS_MD_FILENAME = 'AGENTS.md';Then in this file:
const agentsMdResults = await Promise.all(searchScopeRepoNames.map(async (repo) => { const result = await getFileSource({ - fileName: 'AGENTS.md', + fileName: AGENTS_MD_FILENAME, repository: repo, }); if (isServiceError(result)) { return null; } + + logger.debug(`Fetched AGENTS.md from ${repo}`); return `Repository: ${repo}\nFile: AGENTS.md\nContent:\n${result.source}`; }));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@packages/web/src/features/chat/agent.ts`:
- Around line 59-60: The current sanitization in agent.ts only replaces the
specific string '</repository_instructions>' on the variable content, leaving
other XML-style closing tags (e.g., </workflow>, </available_repositories>,
</research_phase_instructions>, </answer_instructions>) vulnerable to prompt
injection; update the handling of content (the same variable and its .replace
usage) to either escape or remove all closing XML-like tags by applying a
broader transformation (e.g., a single regex that matches any closing tag like
</...> and replaces or escapes it, or strip angle brackets entirely) so every
closing XML-style tag is covered.
🧹 Nitpick comments (1)
packages/web/src/features/chat/agent.ts (1)
216-220: Consider rewording "verified" — content is fetched, not authenticated.The phrase "verified instructions from the repository maintainers" implies authentication or validation of authorship. In practice, AGENTS.md is simply fetched from the repository without cryptographic verification of who authored it. Anyone with write access to the repo could modify it.
Consider softening the language to accurately reflect the trust model:
📝 Suggested wording
${agentsMdContent ? `<repository_instructions> -The following are verified instructions from the repository maintainers (AGENTS.md). You MUST follow these instructions when generating code or answering questions for these repositories. +The following instructions are from repository AGENTS.md files. Follow these guidelines when generating code or answering questions for these repositories. ${agentsMdContent} </repository_instructions>` : ''}
|
@brendan-kellam @msukkari do check it !! |
#475 I have successfully implemented support for AGENTS.md in Ask Sourcebot. This feature allows repository maintainers to provide explicit, repository-specific instructions to the AI agent by simply adding an AGENTS.md file at the root of their repository. With this change, maintainers can directly influence how the AI behaves when answering questions about their codebase, without needing any external configuration or code changes.
The implementation works by modifying the chat agent’s initialization process so that these instructions are discovered and injected before any conversation begins. The entry point for this logic is the createAgentStream function in packages/web/src/features/chat/agent.ts. During initialization, the agent now iterates over every repository included in the user’s Search Scope. For each repository, it attempts to fetch a file named AGENTS.md using the existing getFileSource API. This logic runs asynchronously using Promise.all to ensure all repositories are checked efficiently. If the file exists, its contents are captured along with the repository name. If the file does not exist or an error occurs, the error is handled gracefully so that missing files do not break or block agent initialization.
Once all repositories have been checked, the collected AGENTS.md contents are formatted into a single structured string. Each entry clearly includes the repository name, the file name (AGENTS.md), and the full contents of the file. This ensures that instructions from multiple repositories can coexist and remain clearly attributable to their source.
I then updated the createBaseSystemPrompt function in the same file to accept this aggregated AGENTS.md content as an additional input. If any instructions are present, they are appended directly to the system prompt and wrapped inside a dedicated <repository_instructions> XML tag. The prompt text explicitly states that these instructions are verified and provided by the repository maintainers, and that the AI must follow them. This structure ensures the LLM interprets the instructions as high-priority system context rather than optional guidance or user input.
This approach guarantees that repository-defined rules are injected before the conversation starts and are consistently available throughout the agent’s reasoning process. By embedding them directly into the system prompt, the instructions take precedence over user prompts when applicable, which is exactly the intended behavior for maintainer-authored guidance.
To ensure the feature works end-to-end, I created a temporary unit test suite in agent.test.ts. These tests simulate the entire flow in isolation. The file system and API layer were mocked to simulate a repository containing an AGENTS.md file. The tests verify that createAgentStream correctly calls the file-fetching logic for each repository in the search scope and successfully retrieves the mocked AGENTS.md content. Additionally, the tests assert that createBaseSystemPrompt includes the fetched instructions in the final system prompt, wrapped exactly inside the <repository_instructions> XML block with the expected content.
All tests passed successfully, confirming that the fetch logic, prompt injection, and overall integration are working as intended. This implementation cleanly extends the existing architecture, requires no breaking changes, and provides a simple, repository-native way for maintainers to control how Ask Sourcebot’s AI agent interprets and responds to their codebase.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.