Conversation
- Updated the ARCHITECTURE.md to clarify production data flows, including the sync-parse-enhance pipeline and search mechanisms. - Expanded MCP_INDEXING_SCHEDULE.md to detail production schedules for data ingestion and statistics updates, along with development indexing tasks. - Revised MCP_WORKFLOW.md to differentiate between production and development workflows, emphasizing the role of the backend in data ingestion and the offline indexing pipeline for future vector search capabilities.
…tegration - Modified `docker-compose.yml` to include new volume mappings for `sources.yaml` and ensure proper configuration for the backend and worker services. - Enhanced `README.md` to reflect the updated backend capabilities, including embedding generation and vector search functionalities. - Updated `schedules.yaml` to clarify the new indexing tasks and their purposes, emphasizing the integration of vector embeddings. - Refined `MCP_INDEXING_SCHEDULE.md` and `MCP_WORKFLOW.md` to detail the new data ingestion pipeline, including the embedding generation process and its impact on search capabilities. - Deprecated old MCP indexing tasks in favor of the new indexing structure, ensuring a smoother transition to the updated workflow.
MCP Server Image BuiltTag:
|
Container Images BuiltTag:
|
|
bugbot run |
- Updated `vector_models.py` to use dynamic embedding dimensions from settings. - Enhanced `vector_search.py` to support additional metadata filters for search queries. - Modified `embedding_service.py` to include dimensions in embedding requests. - Improved `vector_service.py` with validation for metadata filter keys and refined upsert logic to prevent data loss on empty embeddings. - Added utility functions in `indexing_tasks.py` to count valid embeddings and handle cases where all embeddings are empty, ensuring robust error handling during indexing tasks.
MCP Server Image BuiltTag:
|
Container Images BuiltTag:
|
- Changed the `semantic_search`, `search_codebundles`, `search_documentation`, `search_libraries`, and `vector_stats` functions from asynchronous to synchronous definitions. - This refactor aims to simplify the function signatures and improve readability while maintaining existing functionality.
MCP Server Image BuiltTag:
|
Container Images BuiltTag:
|
- Introduced a new ConfigMap for documentation sources, enabling the backend to utilize `sources.yaml` for vector embedding generation. - Updated deployment configurations across backend, worker, and scheduler to mount the new documentation sources. - Enhanced the scheduler tasks to reflect the new indexing structure for vector embeddings, ensuring proper scheduling and execution. - Added a new PostgreSQL extension for vector support in the database deployment. - Updated kustomization to include the new documentation sources configuration, improving the overall deployment setup.
MCP Server Image BuiltTag:
|
Container Images BuiltTag:
|
- Removed the EMBEDDING_DIMENSIONS setting from the configuration file and set it directly in vector_models.py to ensure consistency with the Azure OpenAI model output. - Updated metadata column defaults in vector models to use a predefined empty JSONB variable for clarity. - Adjusted the EmbeddingService to reference the new static EMBEDDING_DIMENSIONS directly, simplifying initialization. - Enhanced documentation to clarify that vector dimensions are fixed at 1536, aligning with the database schema and Azure model.
MCP Server Image BuiltTag:
|
Container Images BuiltTag:
|
- Updated the `_rows_to_dicts` function to skip orphaned codebundles without active collection slugs, logging the count of skipped entries. - Added duplicate vector ID detection in the `index_codebundles_task`, logging an error if duplicates are found and returning a failure status. - Improved documentation within the `_rows_to_dicts` function to clarify the handling of orphaned rows.
MCP Server Image BuiltTag:
|
Container Images BuiltTag:
|
- Introduced threading locks in `EmbeddingService` and `VectorService` to ensure thread-safe singleton instantiation. - Added validation for input lengths in the `VectorService` to prevent mismatches between IDs, embeddings, documents, and metadata, improving error handling during vector operations. - Updated imports in `run_migrations.py` to include new vector models, ensuring they are registered with the database.
MCP Server Image BuiltTag:
|
Container Images BuiltTag:
|
- Changed volume mappings in `docker-compose.yml` to reference `sources.yaml` from the local `cc-registry-v2` directory instead of the deprecated `mcp-server` path. - Updated comments and documentation in `CONFIGURATION.md`, `MCP_WORKFLOW.md`, and `kustomization.yaml` to reflect the new location of `sources.yaml`. - Removed the obsolete `mcp-server/sources.yaml` file, streamlining the project structure and enhancing clarity on documentation source management.
MCP Server Image BuiltTag:
|
Container Images BuiltTag:
|
- Updated the SearchIcon color to use 'text.secondary' for better visibility. - Improved input field styles in AllTasks, CodeBundles, and Home components, including background color, text color, and border styling for hover and focus states, ensuring a consistent and modern look throughout the application.
MCP Server Image BuiltTag:
|
Container Images BuiltTag:
|
- Updated the `EmbeddingService` to sort response data by index before appending embeddings, ensuring consistent ordering. - Enhanced validation in the `VectorService` to check the ratio of valid embeddings against the total count, raising errors for low validity scenarios to improve error handling during truncation operations.
MCP Server Image BuiltTag:
|
Container Images BuiltTag:
|
- Added exception handling in the `_fetch` method of the `WebCrawler` class to log parsing errors when extracting data from fetched HTML, improving robustness and error reporting during web crawling operations.
MCP Server Image BuiltTag:
|
Container Images BuiltTag:
|
| raise | ||
| finally: | ||
| if own_session: | ||
| db.close() |
There was a problem hiding this comment.
Unconditional commit on caller-provided database session
Low Severity
The upsert_vectors method unconditionally calls db.commit() on line 136, even when db was provided by the caller (own_session is False). However, the except block only calls db.rollback() when own_session is True. This creates an asymmetric contract: a caller-provided session gets committed (potentially committing unrelated pending changes) on success, but is left in a dirty/error state on failure since rollback is skipped. Current callers all omit db (so own_session is always True), but the method's signature explicitly accepts a db parameter, so a future caller using it would encounter surprising behavior.


docker-compose.ymlto include new volume mappings forsources.yamland ensure proper configuration for the backend and worker services.README.mdto reflect the updated backend capabilities, including embedding generation and vector search functionalities.schedules.yamlto clarify the new indexing tasks and their purposes, emphasizing the integration of vector embeddings.MCP_INDEXING_SCHEDULE.mdandMCP_WORKFLOW.mdto detail the new data ingestion pipeline, including the embedding generation process and its impact on search capabilities.Note
High Risk
Adds new embedding generation, web crawling, and pgvector upsert/search flows plus new scheduled jobs; failures/misconfig could impact worker load, external calls, and vector table integrity (though keyword search remains separate).
Overview
Adds a pgvector-backed semantic search and indexing pipeline to the backend: new SQLAlchemy vector-table models plus
/api/v1/vector/*endpoints for unified and per-table similarity search, vector table stats, and reindex triggers.Moves embedding generation and indexing into the backend worker (new
indexing_tasks.py) with an Azure OpenAI embedding client, documentation crawling fromsources.yaml, and safer upsert logic that avoids truncating tables when embeddings are mostly/entirely empty; the main workflow is extended to run sync → parse → enhance → embed.Updates infra/docs to support this: enables
vectorextension on startup/migrations, mountssources.yamlinto backend/worker/scheduler (Docker + K8s ConfigMap), switches schedules from deprecatedmcp_taskstoindexing_tasks(and keepsmcp_tasksas redirecting stubs), and refreshes documentation/UI styling to reflect vector search + embedding capabilities.Written by Cursor Bugbot for commit b56074f. This will update automatically on new commits. Configure here.