Skip to content

Use MB lookup to resolve ambiguous artist names#3862

Open
OzGav wants to merge 1 commit into
devfrom
use-mb-for-multiartist-lookup
Open

Use MB lookup to resolve ambiguous artist names#3862
OzGav wants to merge 1 commit into
devfrom
use-mb-for-multiartist-lookup

Conversation

@OzGav
Copy link
Copy Markdown
Contributor

@OzGav OzGav commented May 10, 2026

Resolving multiple artist names has been a perennial problem. In my most recent adjustment to the logic I moved to using the MBID count to try and identify when the heuristic split did not match the expected number of artists. That didnt solve the problem but at least made it visible. This PR takes the next step: when that count mismatch is detected, use the MBIDs to look up canonical names from MusicBrainz instead of just logging a warning and going with the wrong split.

So this PR adds:

  • When MusicBrainz Artist IDs / Album Artist IDs are present in tags and the parsed artist count does not match the MBID count, resolve canonical names from the MusicBrainz API and use those instead of the heuristic split. Applies to both track artists and album artists.

  • When counts already match the parsed names are trusted as before, so cleanly-tagged libraries do no network calls.
    MusicbrainzProvider.get_artist_details is cached for 30 days so repeat MBIDs across tracks are effectively free.

  • Failed individual MBID lookups are dropped rather than being substituted from the tag-parsed names — matching by position is unsafe when the counts already disagree. If every lookup in a track fails, fall back to the tag-parsed names so the track still gets stored with something.

" presents " is added to FEATURING_SPLITTERS to handle "Above & Beyond presents OceanLab" and similar. The current heuristic produces the correct count (2) on that string but on the wrong boundary, so the current count-mismatch check would not catch it and the MB lookup would never fire. Thus the splitter addition is needed independently.

@marcelveldt
Copy link
Copy Markdown
Member

We should be careful with this;

  1. Its potentially going to do a call to MB for each track in a user's library. That is a lot of calls!
  2. We have always said that users tags are always leading - this change adjusts that a bit

What is the exact issue you are trying to solve here ?

@OzGav
Copy link
Copy Markdown
Contributor Author

OzGav commented May 11, 2026

Just had another new edge case where a user has the artist "Above & Beyond presents OceanLab" where we dont have the "presents" in FEATURING_SPLITTERS so the parsing failed.

Whilst I have improved things by comparing the number of MB IDs to the number of parsed artists it is still fragile. If the number of MB IDs doesnt equal the number of parsed artists then we still currently pull the incorrect artists into the database and log a warning which isn't ideal (better but not ideal).

I just feel that if we have the MBIDs we could guarantee to get the artist names right and also solve any naming/ spelling/ language/ diacritics ambiguities.

I agree that this will increase the number of calls but only for new additions to peoples libraries and only once when the track is first added to the database. I considered the further mitigation that the MBID lookup is cached for 30 days so a user with 50 Beatles tracks does 1 lookup not 50.

@OzGav
Copy link
Copy Markdown
Contributor Author

OzGav commented May 11, 2026

Here is a good example from a classical album I have:

Artist: Pyotr Ilyich Tchaikovsky
Album Artist: Tchaikovsky
Album artist sort order: Tchaikovsky, Pyotr Ilyich
Artist sort order: Tchaikovsky, Pyotr Ilyich
MB ARTIST ID and MB RELEASE ARTIST ID are the same though so using these IDs will result in a consistent artist on this track

@marcelveldt
Copy link
Copy Markdown
Member

We should prevent doing a lookup if the musicbrainz tags are already present.

I agree that this will increase the number of calls but only for new additions to peoples libraries and only once when the track is first added to the database.

And this is exactly what worries me. Local libraries are potentially very large so this may result in 10000s of calls for scanning an initial library. That is a lot of stress for a free service.

What we can potentially do is if the artists tag is already present and matches number of MB id's, we do not have to do any lookup.

@OzGav OzGav force-pushed the use-mb-for-multiartist-lookup branch 5 times, most recently from abf7d6c to 140ea9b Compare May 17, 2026 03:16
@OzGav
Copy link
Copy Markdown
Contributor Author

OzGav commented May 17, 2026

Fair. I have switched it to as you suggested and just do the lookup on a mismatch between number of artist MBIDs and parsed number of artist names.

There is still the problem of poor artist name tagging where the first potentially incorrect name is persisted when additional tracks are added. I thought we could maybe have it so that if you do an UPDATE METADATA or REFRESH ITEM on an artist then do the name lookup in that circumstance. That gives the user an internal path to fix this and the existing 30-day MB cache means repeat clicks won't make repeated API calls. Thoughts on this idea?

Two parser improvements for multi-artist resolution:

1. Add " presents " to FEATURING_SPLITTERS so single ARTIST tag strings
   like "Above & Beyond presents OceanLab" split correctly instead of
   silently being mis-split on the inner ampersand.

2. When the parsed artist count doesn't match the MusicBrainz Artist ID
   count, the filesystem_local resolver looks up canonical names via
   the new MusicbrainzProvider.resolve_artists_from_mbids method. Failed
   individual lookups are dropped rather than mapped back to a tag name
   by position (unsafe when counts already disagree); if every lookup
   fails, the resolver falls back to the tag-parsed names so the track
   still gets stored. When counts already match, no lookup runs.

   The mismatch warnings move out of tags.py into the resolver, where
   they can report what actually happened.

Out of scope for this PR:
- First-write-wins persistence of misspellings ("Tchaikovsky" vs
  "Pyotr Ilyich Tchaikovsky"). The count-match short-circuit means the
  mismatch trigger doesn't help here; this needs a separate user-
  triggered "refresh canonical names" action so the MB load is opt-in.
@OzGav OzGav force-pushed the use-mb-for-multiartist-lookup branch from 140ea9b to 09007cd Compare May 17, 2026 04:09
@OzGav OzGav changed the title Switch to using MB IDs as the truth source for artist names Use MB lookup to resolve ambiguous artist names May 17, 2026
@OzGav OzGav added this to the 2.9.0 milestone May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants