Milestone 5: MusicBrainz MBID canonicalizer
Give tracks a source-agnostic identity so the same song from different sources no longer replays in a loop. - Canonicalizer resolves (artist, title) to a MusicBrainz recording MBID (no API key; ~1 req/s, descriptive User-Agent, best-effort). Hits and confirmed misses are cached in SQLite; transient errors are not. - Track.key becomes mbid:<id> when resolved, else a normalized name:<artist>|<title> fallback — still source-agnostic. - Scheduler now owns the authoritative anti-repeat on the canonical key, canonicalizing the drawn track with a bounded retry; providers keep a cheap recent-locator filter to limit retries. - db: canonical_cache table, history.locator column with migration for existing databases, recent_locators(). - Canonicalization can be turned off via RADIEO_CANONICAL_ENABLED=0. Verified: MBID hit/cache/miss, cross-source key collapse, scheduler dodging a recent play, schema migration, and full stack (Navidrome + yt-dlp) with zero Python tracebacks and a valid 192 kbps MP3 stream. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
8774f5c2a1
commit
7e0f08b863
11 changed files with 292 additions and 33 deletions
16
README.md
16
README.md
|
|
@ -67,12 +67,18 @@ source); the file being absent also disables yt-dlp.
|
|||
|
||||
## Current status
|
||||
|
||||
**Milestone 4 — yt-dlp provider: done.**
|
||||
**Milestone 5 — MBID canonicalizer: done.**
|
||||
|
||||
- Two playback sources feed a weighted scheduler: a Navidrome/OpenSubsonic
|
||||
playlist and a hand-maintained list of yt-dlp URLs (`config/urls.txt`).
|
||||
Container URLs (playlist/album/label/artist) are expanded and one track is
|
||||
drawn at random, honouring the anti-repeat window.
|
||||
drawn at random.
|
||||
- Each track is canonicalized to a MusicBrainz recording MBID (no API key
|
||||
needed; ~1 req/s, best-effort, results cached in SQLite). This gives a
|
||||
source-agnostic identity, so the same song from two sources collapses to one;
|
||||
when no confident match is found it falls back to a normalized
|
||||
`(artist, title)` key. The scheduler uses this canonical key for anti-repeat,
|
||||
with the providers applying a cheap locator filter first.
|
||||
- Each source has its own fetcher (Subsonic stream / yt-dlp download); files are
|
||||
cached ahead of playback (prefetch buffer) and decoded by Liquidsoap.
|
||||
- Play history and LRU retention are tracked in a SQLite database under
|
||||
|
|
@ -86,7 +92,9 @@ source); the file being absent also disables yt-dlp.
|
|||
- HTTP stream served at `http://localhost:8000/radio.mp3` (MP3, 192 kbps),
|
||||
multiple simultaneous listeners supported.
|
||||
|
||||
The ListenBrainz suggestion feed comes next.
|
||||
The ListenBrainz suggestion feed comes next. (Known cosmetic quirk: at startup
|
||||
the fallback logs a few harmless ffmpeg "Invalid data" warnings while probing
|
||||
non-audio files such as `.gitkeep`; to be quieted in the polish milestone.)
|
||||
|
||||
## Roadmap
|
||||
|
||||
|
|
@ -97,7 +105,7 @@ The ListenBrainz suggestion feed comes next.
|
|||
LRU retention and play history.
|
||||
4. ✅ **yt-dlp provider** — fetch tracks from a maintained URL/artist list;
|
||||
weighted mixing between sources.
|
||||
5. **Canonicalizer** — ListenBrainz MBID lookup for source-agnostic
|
||||
5. ✅ **Canonicalizer** — MusicBrainz MBID lookup for source-agnostic
|
||||
de-duplication.
|
||||
6. **ListenBrainz provider** — parse the RSS suggestions feed and resolve each
|
||||
one to Navidrome or yt-dlp.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue