Skip to content

fix: gerrit repositories filtering (CM-1079)#3977

Open
joanagmaia wants to merge 5 commits intomainfrom
fix/gerrit-repositories-filtering
Open

fix: gerrit repositories filtering (CM-1079)#3977
joanagmaia wants to merge 5 commits intomainfrom
fix/gerrit-repositories-filtering

Conversation

@joanagmaia
Copy link
Copy Markdown
Contributor

@joanagmaia joanagmaia commented Mar 30, 2026

Problem

Since the bucketing architecture was introduced, the Active Contributors widget (and any widget backed by activities_filtered) showed no data for projects with Gerrit integrations, even when activity data existed.

Root cause: The enrichment/cleaning copy pipes (activityRelations_bucket_clean_enrich_copy_pipe_X) validate git-platform activities by checking:
(channel, segmentId) IN (SELECT r.url, r.segmentId FROM repositories ...)

Gerrit activities store their channel in /q/project: format (e.g. https://gerrit.example.com/r/q/project:myproject), but repositories.url stores the base URL (https://gerrit.example.com/r/myproject). The match always fails, so all Gerrit activities were silently dropped during the cleaning step and never made it into the cleaned bucket datasources that widgets query.

The repos_to_channels pipe already handled this URL expansion correctly at query time — but the cleaning step wasn't using it.

Fix

Extended repos_to_channels.pipe to also output segmentId alongside channel. This is backward-compatible: all 21 existing consumers only SELECT channel and are unaffected.

The 10 cleaning pipes now delegate to repos_to_channels instead of maintaining an inline subquery. This means the Gerrit URL expansion logic lives in exactly one place — any future change to channel formats only needs updating in repos_to_channels.pipe.


Note

Medium Risk
Touches the activity cleaning/enrichment COPY pipelines that populate downstream analytics datasets, so mistakes could silently drop or include activity data. Logic change is localized but affects all bucket shards and repo/channel matching semantics.

Overview
Fixes Gerrit repo activities being filtered out during the activityRelations_bucket_clean_enrich_copy_pipe_* cleaning step by replacing the inline repositories/insightsProjects subquery with (channel, segmentId) IN (SELECT channel, segmentId FROM repos_to_channels).

Extends repos_to_channels.pipe to also output segmentId alongside each expanded channel (including Gerrit /q/project: variants) and tightens selection to enabled, non-deleted repos/projects, centralizing repo-to-activity-channel matching in one place.

Written by Cursor Bugbot for commit 0c39d89. This will update automatically on new commits. Configure here.

Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Copilot AI review requested due to automatic review settings March 30, 2026 13:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Gerrit-backed projects showing no data in widgets that rely on activities_filtered by ensuring the bucket cleaning/enrichment COPY pipes validate git-platform activities against the expanded set of possible repository channel formats (including Gerrit /q/project: variants).

Changes:

  • Extend repos_to_channels.pipe to also return segmentId alongside channel and to filter out repos belonging to deleted insightsProjects.
  • Update the 10 activityRelations_bucket_clean_enrich_copy_pipe_{0..9}.pipe cleaning pipes to validate (channel, segmentId) via repos_to_channels instead of an inline repositories subquery.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
services/libs/tinybird/pipes/repos_to_channels.pipe Adds segmentId output and keeps Gerrit channel expansion centralized for reuse by cleaning/validation logic.
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_0.pipe Switches repo validation subquery to repos_to_channels to include Gerrit channel variants.
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_1.pipe Same change as bucket 0.
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_2.pipe Same change as bucket 0.
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_3.pipe Same change as bucket 0.
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_4.pipe Same change as bucket 0.
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_5.pipe Same change as bucket 0.
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_6.pipe Same change as bucket 0.
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_7.pipe Same change as bucket 0.
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_8.pipe Same change as bucket 0.
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_9.pipe Same change as bucket 0.
Comments suppressed due to low confidence (1)

services/libs/tinybird/pipes/repos_to_channels.pipe:44

  • When repos is provided, repos_to_expand hard-codes segmentId to '' (lines 19-24), but gerrit_repos later rehydrates segmentId from repositories (lines 40-44). This makes the output inconsistent: non-Gerrit rows will have empty segmentId, while Gerrit variants can have a real segmentId, which also contradicts the response description. Consider making segmentId consistently empty for all outputs when repos is provided (e.g., carry segmentId from repos_to_expand into Gerrit variants), or update the contract/documentation and ensure all branches follow it.
    {% if defined(repos) %}
        SELECT
            arrayJoin(
                {{ Array(repos, 'String', description="Repository URLs to expand", required=False) }}
            ) AS url,
            '' AS segmentId
    {% else %}
        SELECT r.url, r.segmentId
        FROM repositories r FINAL
        INNER JOIN insightsProjects i FINAL ON r.insightsProjectId = i.id
        WHERE
            isNull (r.deletedAt) AND r.enabled = true AND isNull (i.deletedAt)
            {% if defined(excluded) and excluded %} AND r.excluded = true
            {% end %}
    {% end %}

NODE gerrit_repos
DESCRIPTION >
    Identify Gerrit repositories by joining with integrations table

SQL >
    SELECT r.url, r.segmentId
    FROM repositories r FINAL
    JOIN integrations i FINAL ON r.sourceIntegrationId = i.id
    WHERE i.platform = 'gerrit' AND isNull (r.deletedAt) AND r.url IN (SELECT url FROM repos_to_expand)


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@joanagmaia joanagmaia requested a review from epipav March 30, 2026 13:21
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

@joanagmaia joanagmaia requested a review from mbani01 March 30, 2026 13:21
mbani01
mbani01 previously approved these changes Mar 30, 2026
Copy link
Copy Markdown
Contributor

@mbani01 mbani01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants