radieo/ingest/radieo/models.py
Pierre-Olivier Mercier 7e0f08b863 Milestone 5: MusicBrainz MBID canonicalizer
Give tracks a source-agnostic identity so the same song from different
sources no longer replays in a loop.

- Canonicalizer resolves (artist, title) to a MusicBrainz recording MBID
  (no API key; ~1 req/s, descriptive User-Agent, best-effort). Hits and
  confirmed misses are cached in SQLite; transient errors are not.
- Track.key becomes mbid:<id> when resolved, else a normalized
  name:<artist>|<title> fallback — still source-agnostic.
- Scheduler now owns the authoritative anti-repeat on the canonical key,
  canonicalizing the drawn track with a bounded retry; providers keep a
  cheap recent-locator filter to limit retries.
- db: canonical_cache table, history.locator column with migration for
  existing databases, recent_locators().
- Canonicalization can be turned off via RADIEO_CANONICAL_ENABLED=0.

Verified: MBID hit/cache/miss, cross-source key collapse, scheduler
dodging a recent play, schema migration, and full stack (Navidrome +
yt-dlp) with zero Python tracebacks and a valid 192 kbps MP3 stream.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-02 18:46:30 +08:00

47 lines
1.7 KiB
Python

"""Shared data model.
A ``Track`` is the uniform object every provider emits: a *resolved* reference
(which backend can download it, and where) plus display metadata. Fetchers turn
it into a local file; the queue and the state database use ``key`` for
de-duplication and anti-repeat.
"""
import re
from dataclasses import dataclass
_WS = re.compile(r"\s+")
def norm_name(value: str) -> str:
"""Normalize an artist/title for stable keying and cache lookups:
case-fold and collapse whitespace. Kept deliberately light (no accent
stripping) so distinct titles never collapse together."""
return _WS.sub(" ", value.strip()).casefold()
@dataclass(frozen=True)
class Track:
backend: str # which fetcher handles it: "subsonic" | "ytdlp"
locator: str # backend-specific: Subsonic song id, or a media URL
artist: str
title: str
origin: str # provider that produced it, e.g. "navidrome"
mbid: str | None = None # filled by the Canonicalizer (milestone 5)
source_ext: str | None = None # filename hint, e.g. "mp3", "flac"
source_url: str | None = None # container URL a track was picked from
@property
def key(self) -> str:
"""Stable, source-agnostic identity for de-duplication and anti-repeat.
The Canonicalizer fills ``mbid`` when it can, giving a truly
cross-source identity. When it can't, we fall back to the normalized
``(artist, title)`` — still source-agnostic, so the same track fetched
from two backends collapses to one key.
"""
if self.mbid:
return f"mbid:{self.mbid}"
return f"name:{norm_name(self.artist)}|{norm_name(self.title)}"
def __str__(self) -> str:
return f"{self.artist}{self.title} [{self.origin}]"