Skip to content

feat: enrich MP3 metadata via MusicBrainz lookup#44

Open
jzstern wants to merge 20 commits into
mainfrom
feat/musicbrainz-metadata
Open

feat: enrich MP3 metadata via MusicBrainz lookup#44
jzstern wants to merge 20 commits into
mainfrom
feat/musicbrainz-metadata

Conversation

@jzstern
Copy link
Copy Markdown
Owner

@jzstern jzstern commented Jan 24, 2026

Description

Enriches downloaded MP3s with album, year, genre, track number, record label, and cover art by looking up tracks on MusicBrainz. The lookup runs in parallel with the audio download, so there's no added latency. Falls back to YouTube thumbnail for artwork when no MusicBrainz match is found.

New ID3 tags added: album, year, genre, trackNumber, publisher (label), APIC (cover art)

Architecture: As soon as artist+title are parsed from the YouTube metadata, a parallel enrichMetadata() promise fires off. It queries MusicBrainz's recording search API, picks the best album release, fetches cover art from the Cover Art Archive (or YouTube thumbnail as fallback), and applies everything as ID3 tags before the file is sent to the client.

Cover art lookup iterates through all candidate release IDs across matching recordings (albums prioritized first) until artwork is found on the Cover Art Archive, maximizing the chance of getting official artwork.

Future work

  • Rate limiting for MusicBrainz API (1 req/sec per IP) if concurrent downloads become common
  • Show album artwork in the UI (currently only metadata text is displayed)
  • Allow users to skip enrichment or correct mismatched metadata

How to test

  1. Navigate to the app and paste a YouTube URL for a well-known song (e.g., Queen - Bohemian Rhapsody)
  2. Click Download
  • During download, album name, year, genre, and label appear below the video title
  • Downloaded MP3 has enriched ID3 tags (check with a tag editor or music player)
  • Cover art is embedded in the MP3 (official album art, not YouTube thumbnail)
  1. Try a track where the single lacks cover art but the album has it (e.g., Aries & brakence - SLEEPWALKER)
  • Official album artwork is embedded (not YouTube thumbnail)
  1. Try an obscure/non-music video (e.g., a podcast or lecture)
  • Download still succeeds without delay
  • YouTube thumbnail is embedded as artwork (no MusicBrainz match expected)
  1. Run the test suite: bun run test
  • All 209 tests pass

Author checklist

This PR:

  • Satisfies a goal that is specific & clearly motivated
  • Adds value in isolation (whether user-facing or sustainability-related)
  • Contains a concise & easy-to-understand title + description
  • Adheres to SRP by default
  • Presents the best possible implementation to meet its goal, given constraints at hand

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Automatic music metadata enrichment during downloads: album, year, genre, track number, label, and artwork may be fetched, emitted as runtime metadata, embedded in ID3 tags, and shown in the UI near title/status; artwork falls back to thumbnails and is cleared on resets/errors.
  • Tests

    • Added comprehensive tests covering metadata lookup, cover-art/thumbnail fallbacks, timeouts, error scenarios, release selection, primary-artist normalization, and integration with the download flow.
  • Chores

    • Broadened test discovery, increased Node memory limit, extended token generation timeout, and added non-blocking background token refresh at startup.

✏️ Tip: You can customize this high-level summary in your review settings.


Note

Adds metadata enrichment during downloads with parallel MusicBrainz lookup and artwork embedding.

  • New src/lib/musicbrainz.ts: search recordings, prefer Album releases, collect candidate release IDs, fetch tags/labels, get cover art; fallback to YouTube thumbnail
  • download-stream/+server.ts: run enrichment in parallel, emit metadata SSE event, write ID3 tags (album, year, genre, trackNumber, publisher, APIC) and embed artwork
  • UI (+page.svelte): display enriched metadata during download; clear on reset/error
  • Tests: comprehensive unit/integration coverage for lookup, cover art fallback, timeouts, errors; update vitest.config.ts to include src/**/*.{test,spec}.ts and coverage thresholds
  • Ops/infra: increase yt-token Node memory, extend token generation timeout, prefetch PO token at startup in non-dev, and set PR-only COBALT_API_URL in Railway workflow

Written by Cursor Bugbot for commit 499ad14. This will update automatically on new commits. Configure here.

@charliecreates charliecreates Bot requested a review from CharlieHelps January 24, 2026 02:32
@railway-app railway-app Bot temporarily deployed to dub-rip / dub-rip-pr-44 January 24, 2026 02:32 Destroyed
@railway-app
Copy link
Copy Markdown

railway-app Bot commented Jan 24, 2026

🚅 Deployed to the dub-rip-pr-44 environment in dub-rip

Service Status Web Updated (UTC)
dub-rip ✅ Success (View Logs) Web Jan 30, 2026 at 5:05 am

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 24, 2026

📝 Walkthrough

Walkthrough

Adds MusicBrainz-backed metadata and artwork enrichment (with YouTube-thumbnail fallback), tests for enrichment and fetch logic, integrates enrichment into download-stream (SSE metadata events and ID3 tagging), expands types and frontend UI, adjusts yt-token startup/timeout, and updates CI and test discovery.

Changes

Cohort / File(s) Summary
MusicBrainz library & types
src/lib/musicbrainz.ts, src/lib/types.ts
New MusicBrainz client: lookupTrack, fetchCoverArt, fetchThumbnailArt, TrackMetadata, CoverArtResult; new EnrichedMetadata and StreamEvent additions. Network calls use timeouts/AbortSignal and return null on failure.
Download-stream integration
src/routes/api/download-stream/+server.ts
Starts enrichment promise, emits "metadata" SSE events, awaits enrichment before writing ID3 tags, embeds artwork when present, and replaces direct controller.close() calls with closeStream() in error paths.
Frontend UI
src/routes/+page.svelte
Adds enrichedMetadata state, listens for "metadata" events, conditionally renders album/year/genre/label, and clears metadata on start/errors.
Tests
src/lib/musicbrainz.test.ts, src/lib/musicbrainz-integration.test.ts
New unit and integration tests covering lookup, release selection, tags/labels, cover-art & thumbnail fetching, timeouts/abort handling, network errors, and fallback flows (mocked fetch).
YT token & startup
src/lib/yt-token.ts, src/hooks.server.ts, Dockerfile.yt-token
Increased token generation timeout (30s→60s), call to fetchPoToken() at startup (non-blocking), and raised Node memory limit in Dockerfile.
Test discovery & CI
vitest.config.ts, .github/workflows/railway-pr.yml
Expanded Vitest globs to include src/**/*.{test,spec}.ts; added step to override internal env var (COBALT_API_URL) in PR workflow.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Server as Server (download-stream)
    participant MB as MusicBrainz API
    participant CAA as Cover Art Archive
    participant YT as YouTube
    participant ID3 as ID3 Tagger

    Client->>Server: POST /download (artist, title, videoId)
    Server->>Server: start enrichmentPromise
    par Enrichment Branch
        Server->>MB: lookupTrack(artist, title)
        alt MusicBrainz match
            MB-->>Server: TrackMetadata (incl. releaseId)
            Server->>CAA: fetchCoverArt(releaseId)
            CAA-->>Server: imageBuffer + mime
        else No MB match
            MB-->>Server: null
            Server->>YT: fetchThumbnailArt(videoId)
            YT-->>Server: imageBuffer + mime
        end
    and Download Branch
        Server->>Server: start audio stream / download
    end
    Server->>Server: resolve enrichmentPromise
    Server->>Client: emit metadata event (album, year, genre, trackNumber, label, hasArtwork)
    Server->>ID3: embed metadata + artwork (if present)
    Server->>Client: stream audio response with ID3 tags
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly Related PRs

Poem

🐰 I hopped through MusicBrainz at break of day,

Fetched covers and tags and tucked them away,
When MB is silent, YouTube lends a view,
I stitch art and metadata snug into you,
Now every rip wears a ribbon anew.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: enrich MP3 metadata via MusicBrainz lookup' directly and accurately describes the main change in the PR, which adds MusicBrainz-based metadata enrichment for MP3 files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@charliecreates charliecreates Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enrichment pipeline works end-to-end, but there are behavioral mismatches between intent and implementation: lookupTrack() naively trusts recordings[0] (likely metadata mismatches in the wild), and the server always fetches a YouTube thumbnail when artist/title parsing fails, while also not emitting a metadata event in that case. The UI additionally defines an ad-hoc metadata shape instead of reusing EnrichedMetadata, increasing contract drift risk. Addressing these will improve correctness, performance, and long-term maintainability.

Additional notes (1)
  • Readability | src/lib/musicbrainz.ts:44-49
    pickBestRelease() uses releases[0] as a fallback. If MusicBrainz returns releases in a non-deterministic order (or an order that biases toward Singles/compilations), this can degrade quality significantly.

Since you already have a preference for Albums, it’s worth adding additional tie-breakers (e.g., earliest date or presence of track number/label).

Summary of changes

Overview

This PR adds parallel MusicBrainz-based metadata enrichment for downloaded MP3s and streams the enriched fields to the UI.

Key changes

  • New MusicBrainz client in src/lib/musicbrainz.ts
    • lookupTrack() queries MusicBrainz recording search and extracts album, year, genre, trackNumber, label, releaseId.
    • fetchCoverArt() pulls release artwork from Cover Art Archive.
    • fetchThumbnailArt() fetches YouTube thumbnail as artwork fallback.
  • Streaming contract updates in src/lib/types.ts
    • Adds EnrichedMetadata and a new SSE event type: type: "metadata" with metadata payload.
  • Download pipeline integration in src/routes/api/download-stream/+server.ts
    • Starts enrichMetadata() in parallel once artist/title are known.
    • Sends a metadata SSE event and writes additional ID3 tags (including artwork/APIC) before returning the MP3.
  • UI updates in src/routes/+page.svelte
    • Stores and displays enriched metadata during download.
  • Tests + Vitest config
    • Adds unit/integration tests under src/lib/*.test.ts.
    • Expands Vitest include globs to run tests in src/**.

Comment on lines +219 to +227
let enrichmentPromise: Promise<EnrichmentResult> =
Promise.resolve({ metadata: null, artwork: null });

if (artist && trackTitle) {
enrichmentPromise = enrichMetadata(artist, trackTitle, videoId);
} else if (videoId) {
enrichmentPromise = enrichMetadata("", "", videoId);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enrichMetadata("", "", videoId) is effectively a no-op for MusicBrainz (since lookupTrack immediately returns null when artist/title are empty) but still always triggers the YouTube thumbnail fetch. That means for any video where parsing fails (or for non-music videos), you’ll still do an extra network call early, in parallel with the download. If the intent is “fallback artwork only when we’re confident we’re producing an MP3 and/or when enrichment is desired”, this is the wrong place to do it.

Also note: you later only send a metadata SSE event when enrichment.metadata exists, so in this branch the UI never learns hasArtwork: true even though artwork may be embedded.

Suggestion

Consider splitting the concerns:

  • Only perform MusicBrainz lookup when artist && trackTitle.
  • Perform thumbnail fallback later (or conditionally) if you actually need artwork and have a valid videoId.
  • If you do embed thumbnail art without MB metadata, consider still emitting a metadata event with { hasArtwork: true } so UI state matches the output file.

Example:

let enrichmentPromise = artist && trackTitle
  ? enrichMetadata(artist, trackTitle, videoId)
  : Promise.resolve({ metadata: null, artwork: null });

// ...later, before writing tags
let enrichment = await enrichmentPromise;
if (!enrichment.artwork) {
  enrichment = { ...enrichment, artwork: await fetchThumbnailArt(videoId) };
}

send({ type: "metadata", metadata: {
  album: enrichment.metadata?.album,
  year: enrichment.metadata?.year,
  genre: enrichment.metadata?.genre,
  trackNumber: enrichment.metadata?.trackNumber,
  label: enrichment.metadata?.label,
  hasArtwork: enrichment.artwork !== null,
}});

Reply with "@CharlieHelps yes please" if you'd like me to add a commit with this suggestion.

Comment thread src/lib/musicbrainz.ts
Comment on lines +84 to +92
if (!recordings || recordings.length === 0) return null;

const recording = recordings[0];
const releases = recording.releases;

if (!releases || releases.length === 0) return null;

const release = pickBestRelease(releases);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lookupTrack() always picks recordings[0] and then picks the “best release” from that recording’s releases. This can be wrong when MusicBrainz returns a first recording that is a close-but-incorrect match (covers, live versions, remasters, karaoke, etc.). In practice, this will cause metadata mismatches that are hard for users to understand.

Given the PR description says “picks the best album release”, the implementation currently only prefers album within a single recording, not best recording overall.

Suggestion

Improve the selection logic by scoring multiple recordings (you’re already limiting to 5) and then selecting the best release from the best recording. Even a simple heuristic helps a lot:

  • Prefer recordings that have at least one Album release.
  • Prefer releases with a date.
  • Prefer releases that include media.track.number.

Sketch:

function scoreRecording(r: MusicBrainzRecording): number {
  const releases = r.releases ?? [];
  const hasAlbum = releases.some(x => x["release-group"]?.["primary-type"] === "Album");
  const hasDate = releases.some(x => x.date);
  const hasTrackNo = releases.some(x => x.media?.[0]?.track?.[0]?.number);
  return (hasAlbum ? 100 : 0) + (hasDate ? 10 : 0) + (hasTrackNo ? 5 : 0);
}

const bestRecording = recordings
  .slice()
  .sort((a,b) => scoreRecording(b) - scoreRecording(a))[0];

Then pick the best release from bestRecording.releases.

Reply with "@CharlieHelps yes please" if you'd like me to add a commit with this suggestion.

Comment thread src/lib/musicbrainz.ts
Comment on lines +60 to +165
export async function lookupTrack(
artist: string,
title: string,
timeout: number = DEFAULT_TIMEOUT,
): Promise<TrackMetadata | null> {
if (!artist || !title) return null;

const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), timeout);

try {
const query = `recording:"${escapeLucene(title)}" AND artist:"${escapeLucene(artist)}"`;
const url = `${API_BASE}?query=${encodeURIComponent(query)}&fmt=json&limit=5`;

const response = await fetch(url, {
signal: controller.signal,
headers: { "User-Agent": USER_AGENT },
});

if (!response.ok) return null;

const data = (await response.json()) as MusicBrainzResponse;
const recordings = data.recordings;

if (!recordings || recordings.length === 0) return null;

const recording = recordings[0];
const releases = recording.releases;

if (!releases || releases.length === 0) return null;

const release = pickBestRelease(releases);

return {
album: release.title ?? "",
year: release.date?.slice(0, 4) ?? "",
genre: extractGenre(recording.tags),
trackNumber: release.media?.[0]?.track?.[0]?.number ?? "",
label: release["label-info"]?.[0]?.label?.name ?? "",
releaseId: release.id,
};
} catch {
return null;
} finally {
clearTimeout(timeoutId);
}
}

export async function fetchCoverArt(
releaseId: string,
timeout: number = COVER_ART_TIMEOUT,
): Promise<CoverArtResult | null> {
if (!releaseId) return null;

const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), timeout);

try {
const url = `${COVER_ART_BASE}/${encodeURIComponent(releaseId)}/front-250`;

const response = await fetch(url, {
signal: controller.signal,
headers: { "User-Agent": USER_AGENT },
});

if (!response.ok) return null;

const arrayBuffer = await response.arrayBuffer();
const mime = response.headers.get("Content-Type") ?? "image/jpeg";

return { imageBuffer: Buffer.from(arrayBuffer), mime };
} catch {
return null;
} finally {
clearTimeout(timeoutId);
}
}

export async function fetchThumbnailArt(
videoId: string,
timeout: number = COVER_ART_TIMEOUT,
): Promise<CoverArtResult | null> {
if (!videoId) return null;

const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), timeout);

try {
const url = `https://i.ytimg.com/vi/${encodeURIComponent(videoId)}/hqdefault.jpg`;

const response = await fetch(url, {
signal: controller.signal,
});

if (!response.ok) return null;

const arrayBuffer = await response.arrayBuffer();
const mime = response.headers.get("Content-Type") ?? "image/jpeg";

return { imageBuffer: Buffer.from(arrayBuffer), mime };
} catch {
return null;
} finally {
clearTimeout(timeoutId);
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All fetch failures are swallowed via catch { return null; } in lookupTrack(), fetchCoverArt(), and fetchThumbnailArt(). That makes operational debugging difficult—especially since MusicBrainz rate-limits and may intermittently 503/429.

At minimum, distinguishing abort/timeouts from other failures (and optionally logging at debug level) would help you understand when enrichment is failing due to timeouts vs network vs status codes.

Suggestion

Capture and branch on error type:

  • If err is an AbortError, return null silently.
  • Otherwise, log a concise message (or Sentry.captureException) including which endpoint failed.
  • For non-OK responses, consider special-casing 429 to avoid repeated retries and to make it observable.

Reply with "@CharlieHelps yes please" if you’d like me to add a commit that adds minimal error classification + optional Sentry reporting without spamming logs.

Comment thread src/routes/+page.svelte Outdated
Comment on lines +35 to +40
let enrichedMetadata = $state<{
album?: string;
year?: string;
genre?: string;
label?: string;
} | null>(null);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The page stores enrichedMetadata as an ad-hoc local object type that doesn’t include trackNumber or hasArtwork, even though the SSE contract (EnrichedMetadata) includes them. This increases drift risk between client/server and encourages partial UI updates to silently break when the server evolves.

Suggestion

Import and reuse the shared type from $lib/types to keep the contract aligned, and store the full EnrichedMetadata | null.

<script lang="ts">
  import type { EnrichedMetadata } from "$lib/types";
  let enrichedMetadata = $state<EnrichedMetadata | null>(null);
</script>

Then you can optionally render trackNumber / hasArtwork later without changing state shape.

Reply with "@CharlieHelps yes please" if you'd like me to add a commit with this suggestion.

@charliecreates charliecreates Bot removed the request for review from CharlieHelps January 24, 2026 02:34
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View issue and 4 additional flags in Devin Review.

Open in Devin Review

Comment thread src/lib/musicbrainz.ts
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/routes/+page.svelte (1)

210-218: Clear metadata on SSE connection errors.

On connection loss, enrichedMetadata can remain stale (Line 210 onward). Consider resetting it in the error handler.

🐛 Suggested fix
eventSource.onerror = () => {
	if (!error) {
		error = "Connection lost";
	}
	errorUrl = url;
	eventSource.close();
	loading = false;
	status = "";
+	enrichedMetadata = null;
};
🤖 Fix all issues with AI agents
In `@src/lib/musicbrainz.ts`:
- Line 3: The USER_AGENT constant currently uses the wrong repo URL; update the
USER_AGENT string (constant USER_AGENT in src/lib/musicbrainz.ts) to reference
the correct GitHub repository/maintainer (e.g., "jzstern/dub-rip") or another
accurate contact per MusicBrainz API guidelines so the User-Agent contains
correct contact info.
- Around line 56-58: The escapeLucene function currently only escapes double
quotes; update it to escape all Lucene reserved characters so queries don't
break for names like "AC/DC" or "P!nk". In the escapeLucene function, replace
the simple double-quote replacement with logic that prepends a backslash to
every Lucene special character (including + - && || ! ( ) { } [ ] ^ ~ * ? : \ /
and "), ensuring the backslash itself is escaped in the pattern; use a single
regex or iteration over a character set and return the fully escaped string from
escapeLucene so callers (e.g., any code building Lucene queries) get a safe,
escaped value.
- Line 97: The current assignment to trackNumber uses
release.media?.[0]?.track?.[0]?.number which always picks the first track;
update the logic in the function that builds the release track info (referencing
the release object and the trackNumber field) to find the track that matches the
searched recording: either request recordings by adding the inc=recordings
parameter to the MusicBrainz query and match by recording id, or iterate over
release.media[*].track[*] to find the track whose recording id equals the
searched recording id and then set trackNumber to that track.number (falling
back to empty string if none found).

In `@src/routes/api/download-stream/`+server.ts:
- Around line 219-221: The CI failed due to formatting mismatch around the
initialization of enrichmentPromise (variable enrichmentPromise:
Promise<EnrichmentResult> = Promise.resolve({ metadata: null, artwork: null }));
run the project's formatter (e.g., Prettier/format script) on
src/routes/api/download-stream/+server.ts or invoke the repository's
pre-commit/format command to reformat this block so it matches the repository
style; ensure the single-line/spacing/indentation around enrichmentPromise and
its object literal follow the project's configured formatter rules and re-run
CI.
- Around line 219-226: enrichmentPromise is eagerly resolved to empty metadata
which prevents a later enrichMetadata(artist, trackTitle, videoId) run when
artist/trackTitle are discovered via yt-dlp; change to a lazy starter: remove
the immediate Promise.resolve and introduce a startEnrichment() helper that sets
enrichmentPromise = enrichMetadata(...) the first time artist/trackTitle (or
videoId-only fallback) are available, call startEnrichment() from the paths that
first determine artist/trackTitle (oEmbed success and the ytDlpEvent handler),
and ensure you keep a final fallback that only uses thumbnail if
startEnrichment() was never invoked; references: enrichmentPromise,
enrichMetadata, and the ytDlpEvent path (also adjust the analogous logic
referenced around lines 419-420).
🧹 Nitpick comments (3)
src/routes/+page.svelte (1)

35-40: Reuse EnrichedMetadata to avoid drift.

This local inline shape can silently diverge from the shared type (e.g., hasArtwork, trackNumber). Consider importing and using the shared type (or a Pick/Omit).

♻️ Suggested refactor
-import type { VideoPreview as VideoPreviewType } from "$lib/types";
+import type { EnrichedMetadata, VideoPreview as VideoPreviewType } from "$lib/types";
@@
-let enrichedMetadata = $state<{
-	album?: string;
-	year?: string;
-	genre?: string;
-	label?: string;
-} | null>(null);
+let enrichedMetadata = $state<EnrichedMetadata | null>(null);
src/lib/musicbrainz.ts (2)

150-152: Consider adding User-Agent header for consistency.

Unlike the other fetch calls in this module, fetchThumbnailArt doesn't include a User-Agent header. While YouTube thumbnails may not require it, adding it maintains consistency and is generally good practice for HTTP requests.

Proposed fix
 		const response = await fetch(url, {
 			signal: controller.signal,
+			headers: { "User-Agent": USER_AGENT },
 		});

60-106: Consider rate limiting for MusicBrainz API compliance.

The PR notes MusicBrainz rate limiting (1 req/sec per IP) as future work. For production use with concurrent users, consider implementing a queue or throttle mechanism to avoid API blocks. Libraries like p-limit or bottleneck can help manage this.

Comment thread src/lib/musicbrainz.ts Outdated
Comment thread src/lib/musicbrainz.ts
Comment thread src/lib/musicbrainz.ts
album: release.title ?? "",
year: release.date?.slice(0, 4) ?? "",
genre: extractGenre(recording.tags),
trackNumber: release.media?.[0]?.track?.[0]?.number ?? "",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n src/lib/musicbrainz.ts | head -120

Repository: jzstern/dub-rip

Length of output: 4005


🌐 Web query:

MusicBrainz API recording search track position offset documentation

💡 Result:

To filter recording search results by a track’s position/number on a medium, use the tnum field in the Lucene query:

  • tnum = “the position of the track on any medium including this recording (starts at 1, pre-gaps at 0)” [1]

Example (track 6, title “Blow Your Mind (Mwah)”):

GET /ws/2/recording?query=recording:"Blow Your Mind (Mwah)" AND tnum:6&fmt=json

To page through search results, use the offset parameter (this is not track position; it’s result paging):

  • limit = 1–100 results per request
  • offset = “Return search results starting at a given offset.” [2]

Example (second page of 25 results):

GET /ws/2/recording?query=recording:"Blow Your Mind (Mwah)"&limit=25&offset=25&fmt=json

Note: you may also see offset inside <track-list ... offset="..."> in release/recording lookup output examples; that offset is about paging within the returned track list, not search filtering by track position [3].

Sources: MusicBrainz RecordingSearch fields (tnum) [1], MusicBrainz API Search parameters (limit, offset) [2], Recording XML output examples showing track-list offset [3].


Track number extraction returns incorrect value.

Line 97 extracts the number of the first track on the first media disc, not the position of the searched recording within the release. For songs that aren't the first track, this will return "1" or another incorrect number.

The MusicBrainz API response structure doesn't include recording information in track data by default. To correctly identify which track corresponds to the searched recording, either:

  • Include inc=recordings parameter to embed recording IDs in track responses, then match against the searched recording
  • Or iterate through tracks to find the one matching the recording and extract its number
🤖 Prompt for AI Agents
In `@src/lib/musicbrainz.ts` at line 97, The current assignment to trackNumber
uses release.media?.[0]?.track?.[0]?.number which always picks the first track;
update the logic in the function that builds the release track info (referencing
the release object and the trackNumber field) to find the track that matches the
searched recording: either request recordings by adding the inc=recordings
parameter to the MusicBrainz query and match by recording id, or iterate over
release.media[*].track[*] to find the track whose recording id equals the
searched recording id and then set trackNumber to that track.number (falling
back to empty string if none found).

Comment thread src/routes/api/download-stream/+server.ts Outdated
Comment on lines +219 to +226
let enrichmentPromise: Promise<EnrichmentResult> =
Promise.resolve({ metadata: null, artwork: null });

if (artist && trackTitle) {
enrichmentPromise = enrichMetadata(artist, trackTitle, videoId);
} else if (videoId) {
enrichmentPromise = enrichMetadata("", "", videoId);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid locking in empty enrichment before artist/title are known.

If oEmbed fails and artist/trackTitle are derived later via yt‑dlp, the current enrichmentPromise already ran with empty values and won’t re-run, so MusicBrainz metadata is skipped. Consider deferring the promise until artist/trackTitle are available, and only falling back to thumbnail if they never materialize.

🐛 Suggested fix (lazy start + final fallback)
-let enrichmentPromise: Promise<EnrichmentResult> =
-	Promise.resolve({ metadata: null, artwork: null });
-
-if (artist && trackTitle) {
-	enrichmentPromise = enrichMetadata(artist, trackTitle, videoId);
-} else if (videoId) {
-	enrichmentPromise = enrichMetadata("", "", videoId);
-}
+let enrichmentPromise: Promise<EnrichmentResult> | null = null;
+
+const startEnrichment = () => {
+	if (!enrichmentPromise && artist && trackTitle) {
+		enrichmentPromise = enrichMetadata(artist, trackTitle, videoId);
+	}
+};
+
+startEnrichment();
@@
-const enrichment = await enrichmentPromise;
+const enrichment =
+	await (enrichmentPromise ?? enrichMetadata("", "", videoId));

Consider calling startEnrichment() when artist/trackTitle are first derived in the ytDlpEvent path so the lookup still runs in parallel once data becomes available.

Also applies to: 419-420

🧰 Tools
🪛 GitHub Actions: CI

[error] 219-221: Formatter mismatch: code would be reformatted. Run the formatter (e.g., Prettier) to fix formatting.

🤖 Prompt for AI Agents
In `@src/routes/api/download-stream/`+server.ts around lines 219 - 226,
enrichmentPromise is eagerly resolved to empty metadata which prevents a later
enrichMetadata(artist, trackTitle, videoId) run when artist/trackTitle are
discovered via yt-dlp; change to a lazy starter: remove the immediate
Promise.resolve and introduce a startEnrichment() helper that sets
enrichmentPromise = enrichMetadata(...) the first time artist/trackTitle (or
videoId-only fallback) are available, call startEnrichment() from the paths that
first determine artist/trackTitle (oEmbed success and the ytDlpEvent handler),
and ensure you keep a final fallback that only uses thumbnail if
startEnrichment() was never invoked; references: enrichmentPromise,
enrichMetadata, and the ytDlpEvent path (also adjust the analogous logic
referenced around lines 419-420).

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment thread src/lib/musicbrainz.ts
Comment thread src/routes/api/download-stream/+server.ts
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 24, 2026

🚂 Railway Preview Deployment

Environment URL
pr-44 https://dub-rip-pr-44.up.railway.app

This preview deployment will be automatically cleaned up when the PR is closed.


Deployed by RailwayView logs

@railway-app railway-app Bot temporarily deployed to dub-rip / dub-rip-pr-44 January 24, 2026 18:05 Destroyed
@railway-app railway-app Bot temporarily deployed to dub-rip / dub-rip-pr-44 January 24, 2026 18:16 Destroyed
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View issue and 11 additional flags in Devin Review.

Open in Devin Review

Comment thread src/routes/api/download-stream/+server.ts
@railway-app railway-app Bot temporarily deployed to dub-rip / dub-rip-pr-44 January 24, 2026 22:10 Destroyed
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View issue and 12 additional flags in Devin Review.

Open in Devin Review

Comment thread src/lib/musicbrainz.ts Outdated
@railway-app railway-app Bot temporarily deployed to dub-rip / dub-rip-pr-44 January 24, 2026 22:36 Destroyed
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/lib/musicbrainz.ts`:
- Around line 112-144: The current code reuses the same AbortController
(controller) and timeoutId for the initial search and the subsequent detail
lookups (fetchRecordingTags, fetchReleaseLabels), so if the search consumed most
of the timeout the follow-up requests get aborted immediately; fix this by
clearing the original timeout (clearTimeout(timeoutId)) once the search
completes and creating a new AbortController (e.g., detailController = new
AbortController()) with a fresh setTimeout for the remaining/next timeout
window, then pass detailController.signal to fetchRecordingTags and
fetchReleaseLabels (retain the conditional on recordingId) and ensure any
previous controller is cleaned up/aborted as appropriate.
♻️ Duplicate comments (1)
src/lib/musicbrainz.ts (1)

146-150: Track number extraction still uses the first track, not the matched recording.
The code pulls release.media?.[0]?.track?.[0]?.number, which often resolves to track 1 (and the search response may not even include track lists). Typically you need a release lookup with inc=recordings+media and then match the track whose recording id equals recordingId.

MusicBrainz recording search response fields: do recording search results include release media track lists? How to retrieve track positions for a recording (inc=recordings, inc=media) in release lookup?

Comment thread src/lib/musicbrainz.ts
Comment on lines +112 to +144
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), timeout);

try {
const query = `recording:"${escapeLucene(title)}" AND artist:"${escapeLucene(artist)}"`;
const searchUrl = `${API_BASE}/recording?query=${encodeURIComponent(query)}&fmt=json&limit=5`;

const response = await fetch(searchUrl, {
signal: controller.signal,
headers: { "User-Agent": USER_AGENT },
});

if (!response.ok) return null;

const data = (await response.json()) as MusicBrainzSearchResponse;
const recordings = data.recordings;

if (!recordings || recordings.length === 0) return null;

const recording = recordings[0];
const releases = recording.releases;

if (!releases || releases.length === 0) return null;

const release = pickBestRelease(releases);
const recordingId = recording.id;

const [tags, label] = await Promise.all([
recordingId
? fetchRecordingTags(recordingId, controller.signal)
: Promise.resolve([]),
fetchReleaseLabels(release.id, controller.signal),
]);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid sharing one timeout across search + detail lookups.

If the search call consumes most of the timeout, the tag/label lookups can be aborted immediately, yielding empty metadata even though the search succeeded. Consider resetting the timer (or using a new controller) before the follow-up calls.

💡 Possible adjustment
 const controller = new AbortController();
 const timeoutId = setTimeout(() => controller.abort(), timeout);

 try {
   // search...
   const response = await fetch(searchUrl, { signal: controller.signal, headers: { "User-Agent": USER_AGENT } });
   if (!response.ok) return null;
   const data = (await response.json()) as MusicBrainzSearchResponse;

+  // reset timer for follow-up lookups
+  clearTimeout(timeoutId);
+  const detailController = new AbortController();
+  const detailTimeoutId = setTimeout(() => detailController.abort(), timeout);

   const [tags, label] = await Promise.all([
-    recordingId ? fetchRecordingTags(recordingId, controller.signal) : Promise.resolve([]),
-    fetchReleaseLabels(release.id, controller.signal),
+    recordingId ? fetchRecordingTags(recordingId, detailController.signal) : Promise.resolve([]),
+    fetchReleaseLabels(release.id, detailController.signal),
   ]);

   return { /* ... */ };
 } finally {
-  clearTimeout(timeoutId);
+  clearTimeout(timeoutId);
+  clearTimeout(detailTimeoutId);
 }
🤖 Prompt for AI Agents
In `@src/lib/musicbrainz.ts` around lines 112 - 144, The current code reuses the
same AbortController (controller) and timeoutId for the initial search and the
subsequent detail lookups (fetchRecordingTags, fetchReleaseLabels), so if the
search consumed most of the timeout the follow-up requests get aborted
immediately; fix this by clearing the original timeout (clearTimeout(timeoutId))
once the search completes and creating a new AbortController (e.g.,
detailController = new AbortController()) with a fresh setTimeout for the
remaining/next timeout window, then pass detailController.signal to
fetchRecordingTags and fetchReleaseLabels (retain the conditional on
recordingId) and ensure any previous controller is cleaned up/aborted as
appropriate.

jzstern and others added 10 commits January 25, 2026 12:42
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Start parallel MusicBrainz lookup when artist+title are known, await
after download completes, send metadata SSE event, and apply enriched
ID3 tags (album, year, genre, trackNumber, label, artwork) to MP3.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix Biome formatting of enrichmentPromise initialization
- Escape all Lucene special characters (not just quotes) for queries
  like "AC/DC" or "P!nk"
- Use imported EnrichedMetadata type instead of inline shape
- Reset enrichedMetadata on SSE connection error
- Add User-Agent header to fetchThumbnailArt for consistency
- Fix User-Agent URL to correct repository

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When oEmbed fails to parse artist/title, the enrichment promise was
started with empty strings (returning only a YouTube thumbnail). Now
when yt-dlp later successfully parses artist/title from the filename,
the enrichment promise is restarted with the actual values so
MusicBrainz metadata is still fetched in the fallback path.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The MusicBrainz search endpoint does not return tags or label-info
fields. Add separate lookup calls to /recording/{id}?inc=tags and
/release/{id}?inc=labels after the initial search to retrieve genre
and label data correctly.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@jzstern jzstern force-pushed the feat/musicbrainz-metadata branch from 0cd610b to 44dc3f5 Compare January 25, 2026 19:44
@railway-app railway-app Bot temporarily deployed to dub-rip / dub-rip-pr-44 January 25, 2026 19:44 Destroyed
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View issue and 12 additional flags in Devin Review.

Open in Devin Review

Comment thread src/routes/api/download-stream/+server.ts
Replace direct controller.close() call with the closeStream() helper
at the file-not-found error path for consistency. The helper properly
sets the isClosed flag and wraps the close in try-catch for error
handling.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to dub-rip / dub-rip-pr-44 January 25, 2026 19:54 Destroyed
…search

YouTube video titles often use "Artist & Other" or "Artist feat. Other"
format for collaborations. MusicBrainz stores these differently (e.g.,
"Aries feat. brakence" in artist credits). Searching for the full
collaborative name returns no results.

Add extractPrimaryArtist() to split on common collaboration patterns
(&, feat., ft., x, and, with, comma) and use only the primary artist
name for the MusicBrainz search query.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to dub-rip / dub-rip-pr-44 January 25, 2026 20:11 Destroyed
PR environments copy production variables, but the internal
COBALT_API_URL (cobalt.railway.internal) is unreachable from
isolated PR environments. Clearing it lets the app fall back
to the public Cobalt API.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to dub-rip / dub-rip-pr-44 January 26, 2026 04:15 Destroyed
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @.github/workflows/railway-pr.yml:
- Around line 45-47: The workflow uses an invalid Railway CLI subcommand;
replace the incorrect call with the proper CLI flag for setting/clearing
variables by using the Railway variables command with --set for the
COBALT_API_URL variable (i.e., change the invocation that references
COBALT_API_URL to use the --set flag such as railway variables --set
COBALT_API_URL="" --service dub-rip), and confirm with the deployment
environment whether clearing to an empty string is acceptable or the variable
must be removed via the Railway dashboard.

Comment thread .github/workflows/railway-pr.yml Outdated
Comment on lines +45 to +47
- name: Clear internal-only variables
run: |
railway variables set COBALT_API_URL="" --service dub-rip
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Railway CLI variables delete unset command documentation

💡 Result:

Railway’s official CLI reference currently documents railway variables as a single command to view variables and set variables via --set, and it does not document any delete, unset, or remove subcommand/flag. The documented interface is: railway variables [OPTIONS] with options like --service, --environment, --kv, and --set key=value. [1]

If you’re seeing references to railway variables delete/unset, those appear to come from third‑party CLI completion/docs (e.g., Fig) and don’t match Railway’s current published CLI API reference. [2]

What to do instead (per current docs):

  • Set/update: railway variables --set "KEY=value" [1]
  • For deletion: use the Railway dashboard (the CLI reference does not provide a delete/unset operation). [1]

Sources
[1] Railway Docs — CLI Reference (railway variables options) https://docs.railway.com/reference/cli-api
[2] Fig manual page showing railway variables delete/get/set (likely outdated vs official docs) https://www.fig.io/manual/railway/variables


🏁 Script executed:

# Search for railway-related files in the repo that might show correct usage
rg -i "railway variables" --type yaml --type yml -A 2 -B 2

Repository: jzstern/dub-rip

Length of output: 86


Use correct Railway CLI syntax; delete subcommand does not exist.
railway variables delete is not a valid Railway CLI command. Per official Railway documentation, variable deletion must be performed via the Railway dashboard. For CLI-based clearing, use railway variables --set COBALT_API_URL="" --service dub-rip (note the --set flag). Verify with the deployment environment whether clearing via empty string is sufficient or if dashboard deletion is required.

🤖 Prompt for AI Agents
In @.github/workflows/railway-pr.yml around lines 45 - 47, The workflow uses an
invalid Railway CLI subcommand; replace the incorrect call with the proper CLI
flag for setting/clearing variables by using the Railway variables command with
--set for the COBALT_API_URL variable (i.e., change the invocation that
references COBALT_API_URL to use the --set flag such as railway variables --set
COBALT_API_URL="" --service dub-rip), and confirm with the deployment
environment whether clearing to an empty string is acceptable or the variable
must be removed via the Railway dashboard.

Railway CLI rejects empty variable values. Use `railway variables
delete` to remove the internal-only URL so the app falls back to
the public Cobalt API default.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to dub-rip / dub-rip-pr-44 January 26, 2026 04:35 Destroyed
`railway variables delete` silently failed. Use `railway variables set`
with the public Cobalt API URL instead, which is the same approach used
to manually fix the pr-44 environment.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to dub-rip / dub-rip-pr-44 January 26, 2026 06:22 Destroyed
Point PR environments to the production Cobalt instance via its
public domain instead of the unreachable internal URL. Same API
key, same instance, just accessible over public network.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to dub-rip / dub-rip-pr-44 January 26, 2026 06:50 Destroyed
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View issue and 13 additional flags in Devin Review.

Open in Devin Review

Comment thread src/routes/+page.svelte
Add enrichedMetadata = null to the $effect reset block so stale
metadata from a previous download doesn't persist in component
state when the user changes the URL.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to dub-rip / dub-rip-pr-44 January 26, 2026 06:59 Destroyed
- Increase yt-token-service heap limit from 512MB to 1024MB to prevent
  OOM crashes during jsdom-based BotGuard challenge solving
- Increase PO token generation timeout from 30s to 60s for cold starts
- Pre-warm PO token cache on server startup so the token is ready
  before the first user download request

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to dub-rip / dub-rip-pr-44 January 26, 2026 07:51 Destroyed
When MusicBrainz returns multiple recordings, the cover art was only
checked for the first recording's release. Now collects release IDs from
all matching recordings (albums first) and iterates through them until
cover art is found on the Cover Art Archive.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to dub-rip / dub-rip-pr-44 January 26, 2026 08:02 Destroyed
The fetchPoToken() pre-warm imports youtube-po-token-generator (jsdom,
~500MB) at startup. On CI runners also running Playwright's Chromium,
this OOMs the Vite process silently, causing a 120s webServer timeout.
Pre-warming is only needed in production.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant