fix: gerrit repositories filtering (CM-1079)#3977
Conversation
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
There was a problem hiding this comment.
Pull request overview
Fixes Gerrit-backed projects showing no data in widgets that rely on activities_filtered by ensuring the bucket cleaning/enrichment COPY pipes validate git-platform activities against the expanded set of possible repository channel formats (including Gerrit /q/project: variants).
Changes:
- Extend
repos_to_channels.pipeto also returnsegmentIdalongsidechanneland to filter out repos belonging to deletedinsightsProjects. - Update the 10
activityRelations_bucket_clean_enrich_copy_pipe_{0..9}.pipecleaning pipes to validate(channel, segmentId)viarepos_to_channelsinstead of an inline repositories subquery.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| services/libs/tinybird/pipes/repos_to_channels.pipe | Adds segmentId output and keeps Gerrit channel expansion centralized for reuse by cleaning/validation logic. |
| services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_0.pipe | Switches repo validation subquery to repos_to_channels to include Gerrit channel variants. |
| services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_1.pipe | Same change as bucket 0. |
| services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_2.pipe | Same change as bucket 0. |
| services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_3.pipe | Same change as bucket 0. |
| services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_4.pipe | Same change as bucket 0. |
| services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_5.pipe | Same change as bucket 0. |
| services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_6.pipe | Same change as bucket 0. |
| services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_7.pipe | Same change as bucket 0. |
| services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_8.pipe | Same change as bucket 0. |
| services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_9.pipe | Same change as bucket 0. |
Comments suppressed due to low confidence (1)
services/libs/tinybird/pipes/repos_to_channels.pipe:44
- When
reposis provided,repos_to_expandhard-codessegmentIdto '' (lines 19-24), butgerrit_reposlater rehydratessegmentIdfromrepositories(lines 40-44). This makes the output inconsistent: non-Gerrit rows will have emptysegmentId, while Gerrit variants can have a realsegmentId, which also contradicts the response description. Consider makingsegmentIdconsistently empty for all outputs whenreposis provided (e.g., carrysegmentIdfromrepos_to_expandinto Gerrit variants), or update the contract/documentation and ensure all branches follow it.
{% if defined(repos) %}
SELECT
arrayJoin(
{{ Array(repos, 'String', description="Repository URLs to expand", required=False) }}
) AS url,
'' AS segmentId
{% else %}
SELECT r.url, r.segmentId
FROM repositories r FINAL
INNER JOIN insightsProjects i FINAL ON r.insightsProjectId = i.id
WHERE
isNull (r.deletedAt) AND r.enabled = true AND isNull (i.deletedAt)
{% if defined(excluded) and excluded %} AND r.excluded = true
{% end %}
{% end %}
NODE gerrit_repos
DESCRIPTION >
Identify Gerrit repositories by joining with integrations table
SQL >
SELECT r.url, r.segmentId
FROM repositories r FINAL
JOIN integrations i FINAL ON r.sourceIntegrationId = i.id
WHERE i.platform = 'gerrit' AND isNull (r.deletedAt) AND r.url IN (SELECT url FROM repos_to_expand)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>

Problem
Since the bucketing architecture was introduced, the Active Contributors widget (and any widget backed by
activities_filtered) showed no data for projects with Gerrit integrations, even when activity data existed.Root cause: The enrichment/cleaning copy pipes (
activityRelations_bucket_clean_enrich_copy_pipe_X) validate git-platform activities by checking:(channel, segmentId) IN (SELECT r.url, r.segmentId FROM repositories ...)Gerrit activities store their
channelin/q/project: format (e.g.https://gerrit.example.com/r/q/project:myproject), butrepositories.urlstores the base URL (https://gerrit.example.com/r/myproject). The match always fails, so all Gerrit activities were silently dropped during the cleaning step and never made it into the cleaned bucket datasources that widgets query.The
repos_to_channelspipe already handled this URL expansion correctly at query time — but the cleaning step wasn't using it.Fix
Extended
repos_to_channels.pipe to also outputsegmentIdalongsidechannel. This is backward-compatible: all 21 existing consumers onlySELECT channeland are unaffected.The 10 cleaning pipes now delegate to
repos_to_channelsinstead of maintaining an inline subquery. This means the Gerrit URL expansion logic lives in exactly one place — any future change to channel formats only needs updating inrepos_to_channels.pipe.Note
Medium Risk
Touches the activity cleaning/enrichment COPY pipelines that populate downstream analytics datasets, so mistakes could silently drop or include activity data. Logic change is localized but affects all bucket shards and repo/channel matching semantics.
Overview
Fixes Gerrit repo activities being filtered out during the
activityRelations_bucket_clean_enrich_copy_pipe_*cleaning step by replacing the inlinerepositories/insightsProjectssubquery with(channel, segmentId) IN (SELECT channel, segmentId FROM repos_to_channels).Extends
repos_to_channels.pipeto also outputsegmentIdalongside each expandedchannel(including Gerrit/q/project:variants) and tightens selection to enabled, non-deleted repos/projects, centralizing repo-to-activity-channel matching in one place.Written by Cursor Bugbot for commit 0c39d89. This will update automatically on new commits. Configure here.