Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@ DESCRIPTION >
- `project_insights_copy_ds` contains materialized project insights data.
- Populated by `project_insights_copy.pipe` copy pipe.
- Includes project metadata, health score, first commit, and activity metrics for last 365 days and previous 365 days.
- `id` column is the primary key identifier for the project.
- `id` column is the primary key identifier for the project or repository.
- `type` column indicates the record type: 'project' for project insights or 'repo' for repository insights.
- `repoUrl` column is the full repository URL for repo type records (empty string for project type).
- `name` column is the human-readable project name.
- `slug` column is the URL-friendly identifier used in routing and filtering.
- `logoUrl` column is the URL to the project's logo image.
Expand Down Expand Up @@ -35,6 +37,8 @@ TAGS "Project insights", "Metrics"

SCHEMA >
`id` String,
`type` String,
`repoUrl` String,
`name` String,
`slug` String,
`logoUrl` String,
Expand Down Expand Up @@ -64,4 +68,4 @@ SCHEMA >
`activeOrganizationsPrevious365Days` UInt64

ENGINE MergeTree
ENGINE_SORTING_KEY id
ENGINE_SORTING_KEY type, id
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
DESCRIPTION >
- `repo_health_score_copy_ds` contains comprehensive health score metrics and benchmarks per repository.
- Created via copy pipe with computed health metrics for repository-level analytics.
- Aggregates multiple health dimensions including contributors, popularity, development activity, and security.
- `channel` is the repository URL used as the primary key.
- `activeContributors` is the unique contributor count for the previous quarter.
- `activeContributorsBenchmark` is the benchmark score (0-5) for active contributors.
- `contributorDependencyCount` measures contributor concentration risk (bus factor).
- `contributorDependencyPercentage` is the combined contribution percentage of dependent contributors.
- `contributorDependencyBenchmark` is the benchmark score (0-5) for contributor dependency.
- `organizationDependencyCount` measures organizational concentration risk.
- `organizationDependencyPercentage` is the combined contribution percentage of dependent organizations.
- `organizationDependencyBenchmark` is the benchmark score (0-5) for organization dependency.
- `retentionRate` is the quarter-over-quarter contributor retention percentage.
- `retentionBenchmark` is the benchmark score (0-5) for retention.
- `stars` is the total star count for the repository.
- `starsBenchmark` is the benchmark score (0-5) for stars.
- `forks` is the total fork count for the repository.
- `forksBenchmark` is the benchmark score (0-5) for forks.
- `issueResolution` is the average days to close issues (nullable for repos without issues).
- `issueResolutionBenchmark` is the benchmark score (0-5) for issue resolution.
- `pullRequests` is the PR count in the last 365 days.
- `pullRequestsBenchmark` is the benchmark score (0-5) for pull requests.
- `mergeLeadTime` is the average days to merge PRs (nullable for repos without PRs).
- `mergeLeadTimeBenchmark` is the benchmark score (0-5) for merge lead time.
- `activeDaysCount` is the count of distinct active days in the last 365 days.
- `activeDaysBenchmark` is the benchmark score (0-5) for active days.
- `contributionsOutsideWorkHours` is the percentage of contributions outside work hours.
- `contributionsOutsideWorkHoursBenchmark` is the benchmark score (0-5) for outside work hours.
- `securityPercentage` is the health score percentage for the security category (0-100).
- `contributorPercentage` is the health score percentage for the contributors category (0-100).
- `popularityPercentage` is the health score percentage for the popularity category (0-100).
- `developmentPercentage` is the health score percentage for the development category (0-100).
- `overallScore` is the computed overall health score combining all dimensions.

TAGS "Repository health", "Metrics"

SCHEMA >
`channel` String,
`activeContributors` UInt64,
`activeContributorsBenchmark` UInt64,
`contributorDependencyCount` UInt64,
`contributorDependencyPercentage` Float64,
`contributorDependencyBenchmark` UInt64,
`organizationDependencyCount` UInt64,
`organizationDependencyPercentage` Float64,
`organizationDependencyBenchmark` UInt64,
`retentionRate` Float64,
`retentionBenchmark` UInt64,
`stars` UInt64,
`starsBenchmark` UInt64,
`forks` UInt64,
`forksBenchmark` UInt64,
`issueResolution` Nullable(Float64),
`issueResolutionBenchmark` UInt64,
`pullRequests` UInt64,
`pullRequestsBenchmark` UInt64,
`mergeLeadTime` Nullable(Float64),
`mergeLeadTimeBenchmark` UInt64,
`activeDaysCount` UInt64,
`activeDaysBenchmark` UInt64,
`contributionsOutsideWorkHours` Float64,
`contributionsOutsideWorkHoursBenchmark` UInt64,
`securityPercentage` Float64,
`contributorPercentage` Float64,
`popularityPercentage` Float64,
`developmentPercentage` Float64,
`overallScore` Float64

ENGINE MergeTree
ENGINE_SORTING_KEY channel
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
DESCRIPTION >
- `repositories_populated_ds` contains enriched repository data with computed metrics.
- Populated by `repositories_populated_copy.pipe` copy pipe.
- Extends base repository data with contributor counts, software valuation, and first commit timestamp.
- `id` is the primary key identifier for the repository record.
- `url` is the full repository URL.
- `segmentId` links to the segment this repository belongs to.
- `insightsProjectId` links to the insights project this repository is associated with.
- `contributorCount` is the total number of unique contributors for the repository.
- `organizationCount` is the total number of unique organizations for the repository.
- `softwareValue` is the estimated economic value of the repository software.
- `firstCommit` is the timestamp of the first commit in the repository (nullable).

TAGS "Repository metadata", "Analytics enrichment"

SCHEMA >
`id` String,
`url` String,
`segmentId` String,
`insightsProjectId` String,
`contributorCount` UInt64,
`organizationCount` UInt64,
`softwareValue` UInt64,
`firstCommit` Nullable(DateTime64(3))

ENGINE MergeTree
ENGINE_SORTING_KEY id, url
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
NODE health_score_active_contributors_benchmark
SQL >
%
SELECT
$GROUP_COL,
activeContributors,
CASE
WHEN activeContributors BETWEEN 0 AND 1 THEN 0
WHEN activeContributors BETWEEN 2 AND 3 THEN 1
WHEN activeContributors BETWEEN 4 AND 6 THEN 2
WHEN activeContributors BETWEEN 7 AND 10 THEN 3
WHEN activeContributors BETWEEN 11 AND 20 THEN 4
WHEN activeContributors > 20 THEN 5
ELSE 0
END AS activeContributorsBenchmark
FROM $SOURCE_NODE
16 changes: 16 additions & 0 deletions services/libs/tinybird/includes/health_score_active_days.incl
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
NODE health_score_active_days_benchmark
SQL >
%
SELECT
$GROUP_COL,
activeDaysCount,
CASE
WHEN activeDaysCount BETWEEN 0 AND 5 THEN 0
WHEN activeDaysCount BETWEEN 6 AND 10 THEN 1
WHEN activeDaysCount BETWEEN 11 AND 15 THEN 2
WHEN activeDaysCount BETWEEN 16 AND 20 THEN 3
WHEN activeDaysCount BETWEEN 21 AND 26 THEN 4
WHEN activeDaysCount > 26 THEN 5
ELSE 0
END AS activeDaysBenchmark
FROM $SOURCE_NODE
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
NODE health_score_contributions_outside_work_hours_benchmark
SQL >
%
SELECT
$GROUP_COL,
contributionsOutsideWorkHours,
CASE
WHEN contributionsOutsideWorkHours >= 75 THEN 0
WHEN contributionsOutsideWorkHours BETWEEN 50 AND 74 THEN 1
WHEN contributionsOutsideWorkHours BETWEEN 40 AND 49 THEN 2
WHEN contributionsOutsideWorkHours BETWEEN 30 AND 39 THEN 3
WHEN contributionsOutsideWorkHours BETWEEN 20 AND 29 THEN 4
WHEN contributionsOutsideWorkHours BETWEEN 0 AND 19 THEN 5
ELSE 0
END AS contributionsOutsideWorkHoursBenchmark
FROM $SOURCE_NODE
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
NODE health_score_contributor_dependency_pct
SQL >
%
SELECT
$GROUP_COL,
memberId,
contributionCount,
ROUND(contributionCount * 100.0 / SUM(contributionCount) OVER (PARTITION BY $GROUP_COL), 2) AS contributionPercentage
FROM $SOURCE_NODE
ORDER BY contributionPercentage DESC

NODE health_score_contributor_dependency_running
SQL >
%
SELECT
$GROUP_COL,
memberId,
contributionPercentage,
SUM(contributionPercentage) OVER (
PARTITION BY $GROUP_COL ORDER BY contributionPercentage DESC, memberId
) AS contributionPercentageRunningTotal
FROM health_score_contributor_dependency_pct

NODE health_score_contributor_dependency_score
SQL >
%
SELECT
$GROUP_COL,
count() AS contributorDependencyCount,
round(sum(contributionPercentage)) AS contributorDependencyPercentage
FROM health_score_contributor_dependency_running
WHERE
contributionPercentageRunningTotal < 51
OR (contributionPercentageRunningTotal - contributionPercentage < 51)
GROUP BY $GROUP_COL

NODE health_score_contributor_dependency_benchmark
SQL >
%
SELECT
$GROUP_COL,
contributorDependencyCount,
contributorDependencyPercentage,
CASE
WHEN contributorDependencyCount BETWEEN 0 AND 1 THEN 0
WHEN contributorDependencyCount = 2 THEN 1
WHEN contributorDependencyCount BETWEEN 3 AND 4 THEN 2
WHEN contributorDependencyCount BETWEEN 5 AND 6 THEN 3
WHEN contributorDependencyCount BETWEEN 7 AND 9 THEN 4
WHEN contributorDependencyCount > 9 THEN 5
ELSE 0
END AS contributorDependencyBenchmark
FROM health_score_contributor_dependency_score
16 changes: 16 additions & 0 deletions services/libs/tinybird/includes/health_score_forks.incl
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
NODE health_score_forks_benchmark
SQL >
%
SELECT
$GROUP_COL,
forks,
CASE
WHEN forks BETWEEN 0 AND 4 THEN 0
WHEN forks BETWEEN 5 AND 9 THEN 1
WHEN forks BETWEEN 10 AND 19 THEN 2
WHEN forks BETWEEN 20 AND 39 THEN 3
WHEN forks BETWEEN 40 AND 79 THEN 4
WHEN forks >= 80 THEN 5
ELSE 0
END AS forksBenchmark
FROM $SOURCE_NODE
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
NODE health_score_issues_resolution_benchmark
SQL >
%
SELECT
$GROUP_COL,
issueResolution,
CASE
WHEN issueResolution >= 61 THEN 0
WHEN issueResolution BETWEEN 51 AND 60 THEN 1
WHEN issueResolution BETWEEN 36 AND 50 THEN 2
WHEN issueResolution BETWEEN 22 AND 35 THEN 3
WHEN issueResolution BETWEEN 8 AND 21 THEN 4
WHEN issueResolution BETWEEN 0 AND 7 THEN 5
ELSE 0
END AS issueResolutionBenchmark
FROM $SOURCE_NODE
16 changes: 16 additions & 0 deletions services/libs/tinybird/includes/health_score_merge_lead_time.incl
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
NODE health_score_merge_lead_time_benchmark
SQL >
%
SELECT
$GROUP_COL,
mergeLeadTime,
CASE
WHEN mergeLeadTime >= 30 THEN 0
WHEN mergeLeadTime BETWEEN 21 AND 30 THEN 1
WHEN mergeLeadTime BETWEEN 15 AND 20 THEN 2
WHEN mergeLeadTime BETWEEN 7 AND 14 THEN 3
WHEN mergeLeadTime BETWEEN 3 AND 6 THEN 4
WHEN mergeLeadTime BETWEEN 0 AND 2 THEN 5
ELSE 0
END AS mergeLeadTimeBenchmark
FROM $SOURCE_NODE
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
NODE health_score_organization_dependency_pct
SQL >
%
SELECT
$GROUP_COL,
organizationId,
contributionCount,
(contributionCount * 100.0 / SUM(contributionCount) OVER (PARTITION BY $GROUP_COL)) AS contributionPercentage
FROM $SOURCE_NODE
ORDER BY contributionPercentage DESC

NODE health_score_organization_dependency_running
SQL >
%
SELECT
$GROUP_COL,
organizationId,
contributionPercentage,
SUM(contributionPercentage) OVER (
PARTITION BY $GROUP_COL ORDER BY contributionPercentage DESC, organizationId
) AS contributionPercentageRunningTotal
FROM health_score_organization_dependency_pct

NODE health_score_organization_dependency_score
SQL >
%
SELECT
$GROUP_COL,
count() AS organizationDependencyCount,
round(sum(contributionPercentage)) AS organizationDependencyPercentage
FROM health_score_organization_dependency_running
WHERE
contributionPercentageRunningTotal < 51
OR (contributionPercentageRunningTotal - contributionPercentage < 51)
GROUP BY $GROUP_COL

NODE health_score_organization_dependency_benchmark
SQL >
%
SELECT
$GROUP_COL,
organizationDependencyCount,
organizationDependencyPercentage,
CASE
WHEN organizationDependencyCount BETWEEN 0 AND 1 THEN 0
WHEN organizationDependencyCount = 2 THEN 1
WHEN organizationDependencyCount = 3 THEN 2
WHEN organizationDependencyCount BETWEEN 4 AND 5 THEN 3
WHEN organizationDependencyCount BETWEEN 6 AND 7 THEN 4
WHEN organizationDependencyCount >= 8 THEN 5
ELSE 0
END AS organizationDependencyBenchmark
FROM health_score_organization_dependency_score
16 changes: 16 additions & 0 deletions services/libs/tinybird/includes/health_score_pull_requests.incl
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
NODE health_score_pull_requests_benchmark
SQL >
%
SELECT
$GROUP_COL,
pullRequests,
CASE
WHEN pullRequests BETWEEN 0 AND 1 THEN 0
WHEN pullRequests BETWEEN 2 AND 3 THEN 1
WHEN pullRequests BETWEEN 4 AND 7 THEN 2
WHEN pullRequests BETWEEN 8 AND 15 THEN 3
WHEN pullRequests BETWEEN 16 AND 30 THEN 4
WHEN pullRequests >= 31 THEN 5
ELSE 0
END AS pullRequestsBenchmark
FROM $SOURCE_NODE
34 changes: 34 additions & 0 deletions services/libs/tinybird/includes/health_score_retention.incl
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
NODE health_score_retention_counts
SQL >
%
SELECT
cur.$GROUP_COL AS $GROUP_COL,
if(
length(coalesce(prev.previousQuarterMembers, [])) > 0,
round(
100 * length(arrayIntersect(
coalesce(cur.currentQuarterMembers, []),
coalesce(prev.previousQuarterMembers, [])
)) / length(coalesce(prev.previousQuarterMembers, []))
),
0
) AS retentionRate
FROM $SOURCE_CURRENT AS cur
LEFT JOIN $SOURCE_PREVIOUS AS prev USING ($GROUP_COL)

NODE health_score_retention_benchmark
SQL >
%
SELECT
$GROUP_COL,
retentionRate,
CASE
WHEN retentionRate BETWEEN 0 AND 2 THEN 0
WHEN retentionRate BETWEEN 3 AND 5 THEN 1
WHEN retentionRate BETWEEN 6 AND 9 THEN 2
WHEN retentionRate BETWEEN 10 AND 14 THEN 3
WHEN retentionRate BETWEEN 15 AND 19 THEN 4
WHEN retentionRate >= 20 THEN 5
ELSE 0
END AS retentionBenchmark
FROM health_score_retention_counts
Loading
Loading