Skip to content

Doc/update-2#56

Merged
stewartshea merged 12 commits intomainfrom
doc/update-2
Feb 25, 2026
Merged

Doc/update-2#56
stewartshea merged 12 commits intomainfrom
doc/update-2

Conversation

@stewartshea
Copy link
Copy Markdown
Contributor

@stewartshea stewartshea commented Feb 23, 2026

  • Modified docker-compose.yml to include new volume mappings for sources.yaml and ensure proper configuration for the backend and worker services.
  • Enhanced README.md to reflect the updated backend capabilities, including embedding generation and vector search functionalities.
  • Updated schedules.yaml to clarify the new indexing tasks and their purposes, emphasizing the integration of vector embeddings.
  • Refined MCP_INDEXING_SCHEDULE.md and MCP_WORKFLOW.md to detail the new data ingestion pipeline, including the embedding generation process and its impact on search capabilities.
  • Deprecated old MCP indexing tasks in favor of the new indexing structure, ensuring a smoother transition to the updated workflow.

Note

High Risk
Adds new embedding generation, web crawling, and pgvector upsert/search flows plus new scheduled jobs; failures/misconfig could impact worker load, external calls, and vector table integrity (though keyword search remains separate).

Overview
Adds a pgvector-backed semantic search and indexing pipeline to the backend: new SQLAlchemy vector-table models plus /api/v1/vector/* endpoints for unified and per-table similarity search, vector table stats, and reindex triggers.

Moves embedding generation and indexing into the backend worker (new indexing_tasks.py) with an Azure OpenAI embedding client, documentation crawling from sources.yaml, and safer upsert logic that avoids truncating tables when embeddings are mostly/entirely empty; the main workflow is extended to run sync → parse → enhance → embed.

Updates infra/docs to support this: enables vector extension on startup/migrations, mounts sources.yaml into backend/worker/scheduler (Docker + K8s ConfigMap), switches schedules from deprecated mcp_tasks to indexing_tasks (and keeps mcp_tasks as redirecting stubs), and refreshes documentation/UI styling to reflect vector search + embedding capabilities.

Written by Cursor Bugbot for commit b56074f. This will update automatically on new commits. Configure here.

- Updated the ARCHITECTURE.md to clarify production data flows, including the sync-parse-enhance pipeline and search mechanisms.
- Expanded MCP_INDEXING_SCHEDULE.md to detail production schedules for data ingestion and statistics updates, along with development indexing tasks.
- Revised MCP_WORKFLOW.md to differentiate between production and development workflows, emphasizing the role of the backend in data ingestion and the offline indexing pipeline for future vector search capabilities.
…tegration

- Modified `docker-compose.yml` to include new volume mappings for `sources.yaml` and ensure proper configuration for the backend and worker services.
- Enhanced `README.md` to reflect the updated backend capabilities, including embedding generation and vector search functionalities.
- Updated `schedules.yaml` to clarify the new indexing tasks and their purposes, emphasizing the integration of vector embeddings.
- Refined `MCP_INDEXING_SCHEDULE.md` and `MCP_WORKFLOW.md` to detail the new data ingestion pipeline, including the embedding generation process and its impact on search capabilities.
- Deprecated old MCP indexing tasks in favor of the new indexing structure, ensuring a smoother transition to the updated workflow.
@github-actions
Copy link
Copy Markdown

MCP Server Image Built

Tag: doc-update-256-merge-2690f785
Build: ✅ Image pushed
Test Deploy: ✅ Triggered

Image Tag
runwhen-mcp-server doc-update-256-merge-2690f785

@github-actions
Copy link
Copy Markdown

Container Images Built

Tag: doc-update-256-merge-2690f785
Build: ✅ All images pushed
Test Deploy: ✅ Triggered

Image Tag
cc-registry-v2-backend doc-update-256-merge-2690f785
cc-registry-v2-frontend doc-update-256-merge-2690f785
cc-registry-v2-worker doc-update-256-merge-2690f785

Copy link
Copy Markdown
Contributor Author

@stewartshea stewartshea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bugbot run

@stewartshea
Copy link
Copy Markdown
Contributor Author

bugbot run

Comment thread cc-registry-v2/backend/app/routers/vector_search.py
Comment thread cc-registry-v2/backend/app/services/vector_service.py Outdated
Comment thread cc-registry-v2/backend/app/services/vector_service.py
Comment thread cc-registry-v2/backend/app/models/vector_models.py
Comment thread cc-registry-v2/backend/app/tasks/indexing_tasks.py
Comment thread cc-registry-v2/backend/app/tasks/indexing_tasks.py
- Updated `vector_models.py` to use dynamic embedding dimensions from settings.
- Enhanced `vector_search.py` to support additional metadata filters for search queries.
- Modified `embedding_service.py` to include dimensions in embedding requests.
- Improved `vector_service.py` with validation for metadata filter keys and refined upsert logic to prevent data loss on empty embeddings.
- Added utility functions in `indexing_tasks.py` to count valid embeddings and handle cases where all embeddings are empty, ensuring robust error handling during indexing tasks.
@github-actions
Copy link
Copy Markdown

MCP Server Image Built

Tag: doc-update-256-merge-fc09d271
Build: ✅ Image pushed
Test Deploy: ✅ Triggered

Image Tag
runwhen-mcp-server doc-update-256-merge-fc09d271

@github-actions
Copy link
Copy Markdown

Container Images Built

Tag: doc-update-256-merge-fc09d271
Build: ✅ All images pushed
Test Deploy: ✅ Triggered

Image Tag
cc-registry-v2-backend doc-update-256-merge-fc09d271
cc-registry-v2-frontend doc-update-256-merge-fc09d271
cc-registry-v2-worker doc-update-256-merge-fc09d271

Comment thread cc-registry-v2/backend/app/routers/vector_search.py
- Changed the `semantic_search`, `search_codebundles`, `search_documentation`, `search_libraries`, and `vector_stats` functions from asynchronous to synchronous definitions.
- This refactor aims to simplify the function signatures and improve readability while maintaining existing functionality.
@github-actions
Copy link
Copy Markdown

MCP Server Image Built

Tag: doc-update-256-merge-deed6598
Build: ✅ Image pushed
Test Deploy: ✅ Triggered

Image Tag
runwhen-mcp-server doc-update-256-merge-deed6598

Comment thread cc-registry-v2/backend/app/core/config.py
Comment thread cc-registry-v2/backend/app/tasks/mcp_tasks.py Outdated
@github-actions
Copy link
Copy Markdown

Container Images Built

Tag: doc-update-256-merge-deed6598
Build: ✅ All images pushed
Test Deploy: ✅ Triggered

Image Tag
cc-registry-v2-backend doc-update-256-merge-deed6598
cc-registry-v2-frontend doc-update-256-merge-deed6598
cc-registry-v2-worker doc-update-256-merge-deed6598

- Introduced a new ConfigMap for documentation sources, enabling the backend to utilize `sources.yaml` for vector embedding generation.
- Updated deployment configurations across backend, worker, and scheduler to mount the new documentation sources.
- Enhanced the scheduler tasks to reflect the new indexing structure for vector embeddings, ensuring proper scheduling and execution.
- Added a new PostgreSQL extension for vector support in the database deployment.
- Updated kustomization to include the new documentation sources configuration, improving the overall deployment setup.
@github-actions
Copy link
Copy Markdown

MCP Server Image Built

Tag: doc-update-256-merge-34b1e256
Build: ✅ Image pushed
Test Deploy: ✅ Triggered

Image Tag
runwhen-mcp-server doc-update-256-merge-34b1e256

@github-actions
Copy link
Copy Markdown

Container Images Built

Tag: doc-update-256-merge-34b1e256
Build: ✅ All images pushed
Test Deploy: ✅ Triggered

Image Tag
cc-registry-v2-backend doc-update-256-merge-34b1e256
cc-registry-v2-frontend doc-update-256-merge-34b1e256
cc-registry-v2-worker doc-update-256-merge-34b1e256

Comment thread cc-registry-v2/backend/app/models/vector_models.py Outdated
- Removed the EMBEDDING_DIMENSIONS setting from the configuration file and set it directly in vector_models.py to ensure consistency with the Azure OpenAI model output.
- Updated metadata column defaults in vector models to use a predefined empty JSONB variable for clarity.
- Adjusted the EmbeddingService to reference the new static EMBEDDING_DIMENSIONS directly, simplifying initialization.
- Enhanced documentation to clarify that vector dimensions are fixed at 1536, aligning with the database schema and Azure model.
@github-actions
Copy link
Copy Markdown

MCP Server Image Built

Tag: doc-update-256-merge-209297dc
Build: ✅ Image pushed
Test Deploy: ✅ Triggered

Image Tag
runwhen-mcp-server doc-update-256-merge-209297dc

@github-actions
Copy link
Copy Markdown

Container Images Built

Tag: doc-update-256-merge-209297dc
Build: ✅ All images pushed
Test Deploy: ✅ Triggered

Image Tag
cc-registry-v2-backend doc-update-256-merge-209297dc
cc-registry-v2-frontend doc-update-256-merge-209297dc
cc-registry-v2-worker doc-update-256-merge-209297dc

Comment thread cc-registry-v2/backend/app/tasks/indexing_tasks.py
- Updated the `_rows_to_dicts` function to skip orphaned codebundles without active collection slugs, logging the count of skipped entries.
- Added duplicate vector ID detection in the `index_codebundles_task`, logging an error if duplicates are found and returning a failure status.
- Improved documentation within the `_rows_to_dicts` function to clarify the handling of orphaned rows.
@github-actions
Copy link
Copy Markdown

MCP Server Image Built

Tag: doc-update-256-merge-fd953871
Build: ✅ Image pushed
Test Deploy: ✅ Triggered

Image Tag
runwhen-mcp-server doc-update-256-merge-fd953871

@github-actions
Copy link
Copy Markdown

Container Images Built

Tag: doc-update-256-merge-fd953871
Build: ✅ All images pushed
Test Deploy: ✅ Triggered

Image Tag
cc-registry-v2-backend doc-update-256-merge-fd953871
cc-registry-v2-frontend doc-update-256-merge-fd953871
cc-registry-v2-worker doc-update-256-merge-fd953871

Comment thread cc-registry-v2/backend/run_migrations.py
Comment thread cc-registry-v2/backend/app/services/embedding_service.py
Comment thread cc-registry-v2/backend/app/services/vector_service.py
- Introduced threading locks in `EmbeddingService` and `VectorService` to ensure thread-safe singleton instantiation.
- Added validation for input lengths in the `VectorService` to prevent mismatches between IDs, embeddings, documents, and metadata, improving error handling during vector operations.
- Updated imports in `run_migrations.py` to include new vector models, ensuring they are registered with the database.
@github-actions
Copy link
Copy Markdown

MCP Server Image Built

Tag: doc-update-256-merge-d2efa3f6
Build: ✅ Image pushed
Test Deploy: ✅ Triggered

Image Tag
runwhen-mcp-server doc-update-256-merge-d2efa3f6

@github-actions
Copy link
Copy Markdown

Container Images Built

Tag: doc-update-256-merge-d2efa3f6
Build: ✅ All images pushed
Test Deploy: ✅ Triggered

Image Tag
cc-registry-v2-backend doc-update-256-merge-d2efa3f6
cc-registry-v2-frontend doc-update-256-merge-d2efa3f6
cc-registry-v2-worker doc-update-256-merge-d2efa3f6

Comment thread cc-registry-v2/k8s/database-deployment.yaml
- Changed volume mappings in `docker-compose.yml` to reference `sources.yaml` from the local `cc-registry-v2` directory instead of the deprecated `mcp-server` path.
- Updated comments and documentation in `CONFIGURATION.md`, `MCP_WORKFLOW.md`, and `kustomization.yaml` to reflect the new location of `sources.yaml`.
- Removed the obsolete `mcp-server/sources.yaml` file, streamlining the project structure and enhancing clarity on documentation source management.
@github-actions
Copy link
Copy Markdown

MCP Server Image Built

Tag: doc-update-256-merge-65b0e713
Build: ✅ Image pushed
Test Deploy: ✅ Triggered

Image Tag
runwhen-mcp-server doc-update-256-merge-65b0e713

@github-actions
Copy link
Copy Markdown

Container Images Built

Tag: doc-update-256-merge-65b0e713
Build: ✅ All images pushed
Test Deploy: ✅ Triggered

Image Tag
cc-registry-v2-backend doc-update-256-merge-65b0e713
cc-registry-v2-frontend doc-update-256-merge-65b0e713
cc-registry-v2-worker doc-update-256-merge-65b0e713

- Updated the SearchIcon color to use 'text.secondary' for better visibility.
- Improved input field styles in AllTasks, CodeBundles, and Home components, including background color, text color, and border styling for hover and focus states, ensuring a consistent and modern look throughout the application.
@github-actions
Copy link
Copy Markdown

MCP Server Image Built

Tag: doc-update-256-merge-d945519f
Build: ✅ Image pushed
Test Deploy: ✅ Triggered

Image Tag
runwhen-mcp-server doc-update-256-merge-d945519f

@github-actions
Copy link
Copy Markdown

Container Images Built

Tag: doc-update-256-merge-d945519f
Build: ✅ All images pushed
Test Deploy: ✅ Triggered

Image Tag
cc-registry-v2-backend doc-update-256-merge-d945519f
cc-registry-v2-frontend doc-update-256-merge-d945519f
cc-registry-v2-worker doc-update-256-merge-d945519f

Comment thread cc-registry-v2/backend/app/services/embedding_service.py
Comment thread cc-registry-v2/backend/app/services/vector_service.py Outdated
- Updated the `EmbeddingService` to sort response data by index before appending embeddings, ensuring consistent ordering.
- Enhanced validation in the `VectorService` to check the ratio of valid embeddings against the total count, raising errors for low validity scenarios to improve error handling during truncation operations.
@github-actions
Copy link
Copy Markdown

MCP Server Image Built

Tag: doc-update-256-merge-1fdedbb7
Build: ✅ Image pushed
Test Deploy: ✅ Triggered

Image Tag
runwhen-mcp-server doc-update-256-merge-1fdedbb7

@github-actions
Copy link
Copy Markdown

Container Images Built

Tag: doc-update-256-merge-1fdedbb7
Build: ✅ All images pushed
Test Deploy: ✅ Triggered

Image Tag
cc-registry-v2-backend doc-update-256-merge-1fdedbb7
cc-registry-v2-frontend doc-update-256-merge-1fdedbb7
cc-registry-v2-worker doc-update-256-merge-1fdedbb7

Comment thread cc-registry-v2/backend/app/services/web_crawler.py Outdated
- Added exception handling in the `_fetch` method of the `WebCrawler` class to log parsing errors when extracting data from fetched HTML, improving robustness and error reporting during web crawling operations.
@github-actions
Copy link
Copy Markdown

MCP Server Image Built

Tag: doc-update-256-merge-13431f51
Build: ✅ Image pushed
Test Deploy: ✅ Triggered

Image Tag
runwhen-mcp-server doc-update-256-merge-13431f51

@github-actions
Copy link
Copy Markdown

Container Images Built

Tag: doc-update-256-merge-13431f51
Build: ✅ All images pushed
Test Deploy: ✅ Triggered

Image Tag
cc-registry-v2-backend doc-update-256-merge-13431f51
cc-registry-v2-frontend doc-update-256-merge-13431f51
cc-registry-v2-worker doc-update-256-merge-13431f51

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the [Cursor dashboard](https://www.cursor.com/dashboard?tab=bugbot).

raise
finally:
if own_session:
db.close()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unconditional commit on caller-provided database session

Low Severity

The upsert_vectors method unconditionally calls db.commit() on line 136, even when db was provided by the caller (own_session is False). However, the except block only calls db.rollback() when own_session is True. This creates an asymmetric contract: a caller-provided session gets committed (potentially committing unrelated pending changes) on success, but is left in a dirty/error state on failure since rollback is skipped. Current callers all omit db (so own_session is always True), but the method's signature explicitly accepts a db parameter, so a future caller using it would encounter surprising behavior.

Fix in Cursor Fix in Web

@stewartshea stewartshea merged commit 922be43 into main Feb 25, 2026
11 checks passed
@stewartshea stewartshea deleted the doc/update-2 branch February 25, 2026 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant