feat: support AGENTS.md in Ask Sourcebot #777

sudhanshu112233shukla · 2026-01-22T17:45:19Z

#475 I have successfully implemented support for AGENTS.md in Ask Sourcebot. This feature allows repository maintainers to provide explicit, repository-specific instructions to the AI agent by simply adding an AGENTS.md file at the root of their repository. With this change, maintainers can directly influence how the AI behaves when answering questions about their codebase, without needing any external configuration or code changes.

The implementation works by modifying the chat agent’s initialization process so that these instructions are discovered and injected before any conversation begins. The entry point for this logic is the createAgentStream function in packages/web/src/features/chat/agent.ts. During initialization, the agent now iterates over every repository included in the user’s Search Scope. For each repository, it attempts to fetch a file named AGENTS.md using the existing getFileSource API. This logic runs asynchronously using Promise.all to ensure all repositories are checked efficiently. If the file exists, its contents are captured along with the repository name. If the file does not exist or an error occurs, the error is handled gracefully so that missing files do not break or block agent initialization.

Once all repositories have been checked, the collected AGENTS.md contents are formatted into a single structured string. Each entry clearly includes the repository name, the file name (AGENTS.md), and the full contents of the file. This ensures that instructions from multiple repositories can coexist and remain clearly attributable to their source.

I then updated the createBaseSystemPrompt function in the same file to accept this aggregated AGENTS.md content as an additional input. If any instructions are present, they are appended directly to the system prompt and wrapped inside a dedicated <repository_instructions> XML tag. The prompt text explicitly states that these instructions are verified and provided by the repository maintainers, and that the AI must follow them. This structure ensures the LLM interprets the instructions as high-priority system context rather than optional guidance or user input.

This approach guarantees that repository-defined rules are injected before the conversation starts and are consistently available throughout the agent’s reasoning process. By embedding them directly into the system prompt, the instructions take precedence over user prompts when applicable, which is exactly the intended behavior for maintainer-authored guidance.

To ensure the feature works end-to-end, I created a temporary unit test suite in agent.test.ts. These tests simulate the entire flow in isolation. The file system and API layer were mocked to simulate a repository containing an AGENTS.md file. The tests verify that createAgentStream correctly calls the file-fetching logic for each repository in the search scope and successfully retrieves the mocked AGENTS.md content. Additionally, the tests assert that createBaseSystemPrompt includes the fetched instructions in the final system prompt, wrapped exactly inside the <repository_instructions> XML block with the expected content.

All tests passed successfully, confirming that the fetch logic, prompt injection, and overall integration are working as intended. This implementation cleanly extends the existing architecture, requires no breaking changes, and provides a simple, repository-native way for maintainers to control how Ask Sourcebot’s AI agent interprets and responds to their codebase.

Summary by CodeRabbit

New Features
- Chat agent now collects, sanitizes, and truncates repository AGENTS.md content and embeds it into system prompts when available, improving contextual awareness so responses better reflect repo-specific instructions and help the agent select appropriate tools and guidance during conversations.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-22T17:45:43Z

Walkthrough

Added retrieval and aggregation of repository AGENTS.md files, sanitized and truncated, then injected via a new agentsMdContent option into the base system prompt used by agent creation and streaming.

Changes

Cohort / File(s)	Summary
Agent system prompt & stream `packages/web/src/features/chat/agent.ts`	Added `agentsMdContent?: string` to `BaseSystemPromptOptions`; `createBaseSystemPrompt` now accepts and conditionally embeds `agentsMdContent` as a `repository_instructions` block. `createAgentStream` now fetches AGENTS.md from repositories (parallel), truncates/sanitizes/aggregates results into `agentsMdContent`, and passes it into the base prompt. Minor formatting alignment change.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant Stream as createAgentStream
  participant Repos as RepoFetcher
  participant Prompt as createBaseSystemPrompt
  participant Agent as AgentRuntime

  Client->>Stream: initiate agent request
  Stream->>Repos: fetch AGENTS.md from repositories (parallel)
  Repos-->>Stream: AGENTS.md contents (truncated & sanitized)
  Stream->>Prompt: build base prompt (include agentsMdContent)
  Prompt-->>Stream: system prompt (with repository_instructions if present)
  Stream->>Agent: initialize agent with system prompt + tools
  Agent-->>Client: stream responses

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: support AGENTS.md in Ask Sourcebot' clearly and concisely summarizes the main change: adding support for AGENTS.md files to the AI agent.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@packages/web/src/features/chat/agent.ts`:
- Around line 205-209: The template injects raw AGENTS.md into the system prompt
via agentsMdContent, creating a prompt-injection risk; update the code that
builds the prompt (the template using agentsMdContent) to (1) impose a
configurable max length on agentsMdContent and truncate with ellipses, (2)
sanitize/escape dangerous patterns (strip or escape XML/HTML tags, remove or
neutralize strings that resemble "system" or instruction overrides, and remove
sequences like "</system>" or other closing tags), and (3) add a comment/log
entry and configuration flag documenting that enabling AGENTS.md is a trust
decision; locate and change the prompt construction where agentsMdContent is
interpolated and apply these transformations before injection.

🧹 Nitpick comments (1)

packages/web/src/features/chat/agent.ts (1)
41-54: Consider adding a constant for the AGENTS.md filename and logging for observability.

The hardcoded 'AGENTS.md' string could be extracted to a constant in constants.ts for consistency and easier maintenance. Additionally, consider logging when AGENTS.md files are successfully fetched for debugging purposes.
♻️ Suggested improvement

In constants.ts:
export const AGENTS_MD_FILENAME = 'AGENTS.md';
Then in this file:
     const agentsMdResults = await Promise.all(searchScopeRepoNames.map(async (repo) => {
         const result = await getFileSource({
-            fileName: 'AGENTS.md',
+            fileName: AGENTS_MD_FILENAME,
             repository: repo,
         });

         if (isServiceError(result)) {
             return null;
         }
+
+        logger.debug(`Fetched AGENTS.md from ${repo}`);
         return `Repository: ${repo}\nFile: AGENTS.md\nContent:\n${result.source}`;
     }));

packages/web/src/features/chat/agent.ts

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@packages/web/src/features/chat/agent.ts`:
- Around line 59-60: The current sanitization in agent.ts only replaces the
specific string '</repository_instructions>' on the variable content, leaving
other XML-style closing tags (e.g., </workflow>, </available_repositories>,
</research_phase_instructions>, </answer_instructions>) vulnerable to prompt
injection; update the handling of content (the same variable and its .replace
usage) to either escape or remove all closing XML-like tags by applying a
broader transformation (e.g., a single regex that matches any closing tag like
</...> and replaces or escapes it, or strip angle brackets entirely) so every
closing XML-style tag is covered.

🧹 Nitpick comments (1)

packages/web/src/features/chat/agent.ts (1)
216-220: Consider rewording "verified" — content is fetched, not authenticated.

The phrase "verified instructions from the repository maintainers" implies authentication or validation of authorship. In practice, AGENTS.md is simply fetched from the repository without cryptographic verification of who authored it. Anyone with write access to the repo could modify it.

Consider softening the language to accurately reflect the trust model:
📝 Suggested wording
 ${agentsMdContent ? `<repository_instructions>
-The following are verified instructions from the repository maintainers (AGENTS.md). You MUST follow these instructions when generating code or answering questions for these repositories.
+The following instructions are from repository AGENTS.md files. Follow these guidelines when generating code or answering questions for these repositories.

 ${agentsMdContent}
 </repository_instructions>` : ''}

packages/web/src/features/chat/agent.ts

sudhanshu112233shukla · 2026-01-22T18:24:00Z

@brendan-kellam @msukkari do check it !!

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

packages/web/src/features/chat/agent.ts Show resolved Hide resolved

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

packages/web/src/features/chat/agent.ts Outdated Show resolved Hide resolved

sudhanshu112233shukla added 3 commits January 22, 2026 23:03

feat: support AGENTS.md in Ask Sourcebot (sourcebot-dev#475)

217ad44

fix: sanitize AGENTS.md content to prevent prompt injection

25017b3

fix: broaden AGENTS.md sanitization to escape all closing tags

0c2195e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support AGENTS.md in Ask Sourcebot #777

feat: support AGENTS.md in Ask Sourcebot #777

Uh oh!

sudhanshu112233shukla commented Jan 22, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 22, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

sudhanshu112233shukla commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: support AGENTS.md in Ask Sourcebot #777

Are you sure you want to change the base?

feat: support AGENTS.md in Ask Sourcebot #777

Uh oh!

Conversation

sudhanshu112233shukla commented Jan 22, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sudhanshu112233shukla commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sudhanshu112233shukla commented Jan 22, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 22, 2026 •

edited

Loading