Milestone 5: MusicBrainz MBID canonicalizer
Give tracks a source-agnostic identity so the same song from different sources no longer replays in a loop. - Canonicalizer resolves (artist, title) to a MusicBrainz recording MBID (no API key; ~1 req/s, descriptive User-Agent, best-effort). Hits and confirmed misses are cached in SQLite; transient errors are not. - Track.key becomes mbid:<id> when resolved, else a normalized name:<artist>|<title> fallback — still source-agnostic. - Scheduler now owns the authoritative anti-repeat on the canonical key, canonicalizing the drawn track with a bounded retry; providers keep a cheap recent-locator filter to limit retries. - db: canonical_cache table, history.locator column with migration for existing databases, recent_locators(). - Canonicalization can be turned off via RADIEO_CANONICAL_ENABLED=0. Verified: MBID hit/cache/miss, cross-source key collapse, scheduler dodging a recent play, schema migration, and full stack (Navidrome + yt-dlp) with zero Python tracebacks and a valid 192 kbps MP3 stream. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
8774f5c2a1
commit
7e0f08b863
11 changed files with 292 additions and 33 deletions
|
|
@ -19,6 +19,13 @@ RADIEO_NAVIDROME_PLAYLIST=Radio
|
||||||
RADIEO_WEIGHT_NAVIDROME=3
|
RADIEO_WEIGHT_NAVIDROME=3
|
||||||
RADIEO_WEIGHT_YTDLP=1
|
RADIEO_WEIGHT_YTDLP=1
|
||||||
|
|
||||||
|
# --- Canonicalizer MBID (optionnel) ---
|
||||||
|
# Résout (artiste, titre) -> MBID MusicBrainz pour dédupliquer entre sources.
|
||||||
|
# Aucune clé requise. Mettre 0 pour désactiver (clé = (artiste, titre)).
|
||||||
|
RADIEO_CANONICAL_ENABLED=1
|
||||||
|
# User-Agent envoyé à MusicBrainz (qui en exige un, descriptif).
|
||||||
|
RADIEO_USER_AGENT=radieo/0.1 (personal music radio)
|
||||||
|
|
||||||
# --- Rétention du cache (optionnel) ---
|
# --- Rétention du cache (optionnel) ---
|
||||||
# Nombre de morceaux joués conservés sur disque avant éviction (LRU).
|
# Nombre de morceaux joués conservés sur disque avant éviction (LRU).
|
||||||
RADIEO_RETENTION_KEEP=20
|
RADIEO_RETENTION_KEEP=20
|
||||||
|
|
|
||||||
16
README.md
16
README.md
|
|
@ -67,12 +67,18 @@ source); the file being absent also disables yt-dlp.
|
||||||
|
|
||||||
## Current status
|
## Current status
|
||||||
|
|
||||||
**Milestone 4 — yt-dlp provider: done.**
|
**Milestone 5 — MBID canonicalizer: done.**
|
||||||
|
|
||||||
- Two playback sources feed a weighted scheduler: a Navidrome/OpenSubsonic
|
- Two playback sources feed a weighted scheduler: a Navidrome/OpenSubsonic
|
||||||
playlist and a hand-maintained list of yt-dlp URLs (`config/urls.txt`).
|
playlist and a hand-maintained list of yt-dlp URLs (`config/urls.txt`).
|
||||||
Container URLs (playlist/album/label/artist) are expanded and one track is
|
Container URLs (playlist/album/label/artist) are expanded and one track is
|
||||||
drawn at random, honouring the anti-repeat window.
|
drawn at random.
|
||||||
|
- Each track is canonicalized to a MusicBrainz recording MBID (no API key
|
||||||
|
needed; ~1 req/s, best-effort, results cached in SQLite). This gives a
|
||||||
|
source-agnostic identity, so the same song from two sources collapses to one;
|
||||||
|
when no confident match is found it falls back to a normalized
|
||||||
|
`(artist, title)` key. The scheduler uses this canonical key for anti-repeat,
|
||||||
|
with the providers applying a cheap locator filter first.
|
||||||
- Each source has its own fetcher (Subsonic stream / yt-dlp download); files are
|
- Each source has its own fetcher (Subsonic stream / yt-dlp download); files are
|
||||||
cached ahead of playback (prefetch buffer) and decoded by Liquidsoap.
|
cached ahead of playback (prefetch buffer) and decoded by Liquidsoap.
|
||||||
- Play history and LRU retention are tracked in a SQLite database under
|
- Play history and LRU retention are tracked in a SQLite database under
|
||||||
|
|
@ -86,7 +92,9 @@ source); the file being absent also disables yt-dlp.
|
||||||
- HTTP stream served at `http://localhost:8000/radio.mp3` (MP3, 192 kbps),
|
- HTTP stream served at `http://localhost:8000/radio.mp3` (MP3, 192 kbps),
|
||||||
multiple simultaneous listeners supported.
|
multiple simultaneous listeners supported.
|
||||||
|
|
||||||
The ListenBrainz suggestion feed comes next.
|
The ListenBrainz suggestion feed comes next. (Known cosmetic quirk: at startup
|
||||||
|
the fallback logs a few harmless ffmpeg "Invalid data" warnings while probing
|
||||||
|
non-audio files such as `.gitkeep`; to be quieted in the polish milestone.)
|
||||||
|
|
||||||
## Roadmap
|
## Roadmap
|
||||||
|
|
||||||
|
|
@ -97,7 +105,7 @@ The ListenBrainz suggestion feed comes next.
|
||||||
LRU retention and play history.
|
LRU retention and play history.
|
||||||
4. ✅ **yt-dlp provider** — fetch tracks from a maintained URL/artist list;
|
4. ✅ **yt-dlp provider** — fetch tracks from a maintained URL/artist list;
|
||||||
weighted mixing between sources.
|
weighted mixing between sources.
|
||||||
5. **Canonicalizer** — ListenBrainz MBID lookup for source-agnostic
|
5. ✅ **Canonicalizer** — MusicBrainz MBID lookup for source-agnostic
|
||||||
de-duplication.
|
de-duplication.
|
||||||
6. **ListenBrainz provider** — parse the RSS suggestions feed and resolve each
|
6. **ListenBrainz provider** — parse the RSS suggestions feed and resolve each
|
||||||
one to Navidrome or yt-dlp.
|
one to Navidrome or yt-dlp.
|
||||||
|
|
|
||||||
|
|
@ -22,6 +22,9 @@ services:
|
||||||
# Dosage du mix entre les sources (0 désactive).
|
# Dosage du mix entre les sources (0 désactive).
|
||||||
- RADIEO_WEIGHT_NAVIDROME=${RADIEO_WEIGHT_NAVIDROME:-3}
|
- RADIEO_WEIGHT_NAVIDROME=${RADIEO_WEIGHT_NAVIDROME:-3}
|
||||||
- RADIEO_WEIGHT_YTDLP=${RADIEO_WEIGHT_YTDLP:-1}
|
- RADIEO_WEIGHT_YTDLP=${RADIEO_WEIGHT_YTDLP:-1}
|
||||||
|
# Canonicalizer MusicBrainz (identité MBID inter-sources ; sans clé).
|
||||||
|
- RADIEO_CANONICAL_ENABLED=${RADIEO_CANONICAL_ENABLED:-1}
|
||||||
|
- RADIEO_USER_AGENT=${RADIEO_USER_AGENT:-radieo/0.1 (personal music radio)}
|
||||||
restart: unless-stopped
|
restart: unless-stopped
|
||||||
|
|
||||||
stream:
|
stream:
|
||||||
|
|
|
||||||
|
|
@ -11,8 +11,18 @@ from .scheduler import Scheduler
|
||||||
log = logging.getLogger("radieo")
|
log = logging.getLogger("radieo")
|
||||||
|
|
||||||
|
|
||||||
|
class _NullCanonicalizer:
|
||||||
|
"""Used when canonicalization is disabled: leaves the Track untouched."""
|
||||||
|
|
||||||
|
def canonicalize(self, track):
|
||||||
|
return track
|
||||||
|
|
||||||
|
def close(self):
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
def _build_pipeline(db: Database):
|
def _build_pipeline(db: Database):
|
||||||
"""Return (scheduler, fetchers). Assembles whichever sources are enabled;
|
"""Return (providers, fetchers). Assembles whichever sources are enabled;
|
||||||
when none is, the scheduler yields nothing and the stream plays its local
|
when none is, the scheduler yields nothing and the stream plays its local
|
||||||
cache fallback."""
|
cache fallback."""
|
||||||
providers = [] # list[(provider, weight)]
|
providers = [] # list[(provider, weight)]
|
||||||
|
|
@ -58,7 +68,7 @@ def _build_pipeline(db: Database):
|
||||||
|
|
||||||
if not providers:
|
if not providers:
|
||||||
log.warning("no source active: the stream plays its local cache only.")
|
log.warning("no source active: the stream plays its local cache only.")
|
||||||
return Scheduler(providers), fetchers
|
return providers, fetchers
|
||||||
|
|
||||||
|
|
||||||
def _sweep_temp_files() -> None:
|
def _sweep_temp_files() -> None:
|
||||||
|
|
@ -94,7 +104,19 @@ def main() -> None:
|
||||||
_sweep_temp_files()
|
_sweep_temp_files()
|
||||||
|
|
||||||
db = Database(config.STATE_DB)
|
db = Database(config.STATE_DB)
|
||||||
scheduler, fetchers = _build_pipeline(db)
|
|
||||||
|
if config.CANONICAL_ENABLED:
|
||||||
|
from .canonicalizer import Canonicalizer
|
||||||
|
|
||||||
|
canonicalizer = Canonicalizer(db)
|
||||||
|
log.info("Canonicalizer enabled (MusicBrainz, min_score=%d)",
|
||||||
|
config.CANONICAL_MIN_SCORE)
|
||||||
|
else:
|
||||||
|
canonicalizer = _NullCanonicalizer()
|
||||||
|
log.info("Canonicalizer disabled: tracks keyed by (artist, title).")
|
||||||
|
|
||||||
|
providers, fetchers = _build_pipeline(db)
|
||||||
|
scheduler = Scheduler(providers, canonicalizer, db)
|
||||||
queue = TrackQueue(scheduler, fetchers, db)
|
queue = TrackQueue(scheduler, fetchers, db)
|
||||||
queue.start()
|
queue.start()
|
||||||
|
|
||||||
|
|
@ -113,6 +135,7 @@ def main() -> None:
|
||||||
finally:
|
finally:
|
||||||
queue.stop()
|
queue.stop()
|
||||||
server.server_close()
|
server.server_close()
|
||||||
|
canonicalizer.close()
|
||||||
db.close()
|
db.close()
|
||||||
|
|
||||||
|
|
||||||
|
|
|
||||||
94
ingest/radieo/canonicalizer.py
Normal file
94
ingest/radieo/canonicalizer.py
Normal file
|
|
@ -0,0 +1,94 @@
|
||||||
|
"""Canonicalizer: resolve a Track to a MusicBrainz recording MBID.
|
||||||
|
|
||||||
|
Given ``(artist, title)`` it queries the MusicBrainz recording search and keeps
|
||||||
|
the best match above a score threshold. Results — hits *and* confirmed misses —
|
||||||
|
are cached in SQLite so the network is hit at most once per distinct track.
|
||||||
|
Genuine no-matches are cached; transient network errors are not, so they get
|
||||||
|
retried later.
|
||||||
|
|
||||||
|
MusicBrainz asks anonymous clients to stay under ~1 request/second and to send a
|
||||||
|
descriptive User-Agent; both are honoured here. The whole thing is best-effort:
|
||||||
|
any failure just leaves ``mbid`` unset and the Track falls back to its
|
||||||
|
name-based key.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import threading
|
||||||
|
import time
|
||||||
|
from dataclasses import replace
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
from . import config
|
||||||
|
from .models import Track, norm_name
|
||||||
|
|
||||||
|
log = logging.getLogger("radieo.canonicalizer")
|
||||||
|
|
||||||
|
|
||||||
|
class Canonicalizer:
|
||||||
|
def __init__(self, db):
|
||||||
|
self._db = db
|
||||||
|
self._http = httpx.Client(
|
||||||
|
timeout=httpx.Timeout(connect=10.0, read=30.0, write=10.0, pool=10.0),
|
||||||
|
headers={"User-Agent": config.USER_AGENT},
|
||||||
|
follow_redirects=True,
|
||||||
|
)
|
||||||
|
self._rate_lock = threading.Lock()
|
||||||
|
self._last_call = 0.0
|
||||||
|
|
||||||
|
def canonicalize(self, track: Track) -> Track:
|
||||||
|
"""Return the Track with ``mbid`` filled when resolvable, else unchanged."""
|
||||||
|
artist_norm = norm_name(track.artist)
|
||||||
|
title_norm = norm_name(track.title)
|
||||||
|
if not artist_norm or not title_norm:
|
||||||
|
return track
|
||||||
|
|
||||||
|
cached, mbid = self._db.get_canonical(artist_norm, title_norm)
|
||||||
|
if not cached:
|
||||||
|
ok, mbid = self._lookup(track.artist, track.title)
|
||||||
|
if ok: # cache hits and genuine misses; skip transient errors
|
||||||
|
self._db.put_canonical(artist_norm, title_norm, mbid)
|
||||||
|
return replace(track, mbid=mbid) if mbid else track
|
||||||
|
|
||||||
|
# --- MusicBrainz ------------------------------------------------------
|
||||||
|
|
||||||
|
def _lookup(self, artist: str, title: str) -> tuple[bool, str | None]:
|
||||||
|
"""Return (ok, mbid). ``ok`` False on a transient error (do not cache)."""
|
||||||
|
query = f'artist:"{_escape(artist)}" AND recording:"{_escape(title)}"'
|
||||||
|
self._throttle()
|
||||||
|
try:
|
||||||
|
resp = self._http.get(
|
||||||
|
config.MUSICBRAINZ_URL,
|
||||||
|
params={"query": query, "fmt": "json", "limit": 3},
|
||||||
|
)
|
||||||
|
resp.raise_for_status()
|
||||||
|
data = resp.json()
|
||||||
|
except (httpx.HTTPError, ValueError) as exc:
|
||||||
|
log.warning("MusicBrainz lookup failed for %s — %s: %s", artist, title, exc)
|
||||||
|
return False, None
|
||||||
|
|
||||||
|
recordings = data.get("recordings") or []
|
||||||
|
if recordings:
|
||||||
|
best = recordings[0]
|
||||||
|
if int(best.get("score", 0)) >= config.CANONICAL_MIN_SCORE:
|
||||||
|
mbid = best.get("id")
|
||||||
|
log.info("MBID %s for %s — %s", mbid, artist, title)
|
||||||
|
return True, mbid
|
||||||
|
log.info("no confident MBID for %s — %s", artist, title)
|
||||||
|
return True, None # genuine miss: cache it
|
||||||
|
|
||||||
|
def _throttle(self) -> None:
|
||||||
|
with self._rate_lock:
|
||||||
|
wait = config.CANONICAL_RATE_INTERVAL - (time.time() - self._last_call)
|
||||||
|
if wait > 0:
|
||||||
|
time.sleep(wait)
|
||||||
|
self._last_call = time.time()
|
||||||
|
|
||||||
|
def close(self) -> None:
|
||||||
|
self._http.close()
|
||||||
|
|
||||||
|
|
||||||
|
# Lucene special characters we quote around; escape the ones that would break a
|
||||||
|
# quoted phrase.
|
||||||
|
def _escape(value: str) -> str:
|
||||||
|
return value.replace("\\", "\\\\").replace('"', '\\"')
|
||||||
|
|
@ -27,6 +27,23 @@ PREFETCH_INTERVAL = float(os.environ.get("RADIEO_PREFETCH_INTERVAL", "2.0"))
|
||||||
RETENTION_KEEP = int(os.environ.get("RADIEO_RETENTION_KEEP", "20"))
|
RETENTION_KEEP = int(os.environ.get("RADIEO_RETENTION_KEEP", "20"))
|
||||||
# Do not replay a track seen among the last N plays, when avoidable.
|
# Do not replay a track seen among the last N plays, when avoidable.
|
||||||
ANTIREPEAT_WINDOW = int(os.environ.get("RADIEO_ANTIREPEAT_WINDOW", "50"))
|
ANTIREPEAT_WINDOW = int(os.environ.get("RADIEO_ANTIREPEAT_WINDOW", "50"))
|
||||||
|
# How many draws the scheduler tries to dodge a recent repeat before giving up.
|
||||||
|
SCHEDULER_MAX_TRIES = int(os.environ.get("RADIEO_SCHEDULER_MAX_TRIES", "5"))
|
||||||
|
|
||||||
|
# --- Canonicalizer (MusicBrainz MBID lookup) ---
|
||||||
|
# Resolves (artist, title) -> recording MBID for source-agnostic de-dup.
|
||||||
|
CANONICAL_ENABLED = os.environ.get("RADIEO_CANONICAL_ENABLED", "1") != "0"
|
||||||
|
MUSICBRAINZ_URL = os.environ.get(
|
||||||
|
"RADIEO_MUSICBRAINZ_URL", "https://musicbrainz.org/ws/2/recording"
|
||||||
|
)
|
||||||
|
# Minimum MusicBrainz match score (0-100) to accept a recording as canonical.
|
||||||
|
CANONICAL_MIN_SCORE = int(os.environ.get("RADIEO_CANONICAL_MIN_SCORE", "90"))
|
||||||
|
# Minimum seconds between MusicBrainz requests (anonymous limit ~1 req/s).
|
||||||
|
CANONICAL_RATE_INTERVAL = float(os.environ.get("RADIEO_CANONICAL_RATE_INTERVAL", "1.1"))
|
||||||
|
# Sent to MusicBrainz (which requires a descriptive User-Agent) and yt-dlp/others.
|
||||||
|
USER_AGENT = os.environ.get(
|
||||||
|
"RADIEO_USER_AGENT", "radieo/0.1 (personal music radio)"
|
||||||
|
)
|
||||||
|
|
||||||
# --- Navidrome / OpenSubsonic source ---
|
# --- Navidrome / OpenSubsonic source ---
|
||||||
# Left empty means the provider is disabled (the stream then plays its own
|
# Left empty means the provider is disabled (the stream then plays its own
|
||||||
|
|
|
||||||
|
|
@ -1,13 +1,17 @@
|
||||||
"""SQLite state: play history (anti-repeat + stats) and cache-file retention.
|
"""SQLite state: play history, cache-file retention and MBID canonical cache.
|
||||||
|
|
||||||
Two concerns, two tables:
|
Three concerns, three tables:
|
||||||
|
|
||||||
- ``history`` is append-only. It drives anti-repeat (recently played track
|
- ``history`` is append-only. It drives anti-repeat (recently played canonical
|
||||||
keys) and survives cache eviction, so a track can stay "recently played" even
|
keys, and raw locators for the providers' cheap local filter) and survives
|
||||||
after its file is deleted.
|
cache eviction, so a track can stay "recently played" even after its file is
|
||||||
|
deleted.
|
||||||
- ``cache_files`` tracks downloaded files so we can keep only the N most
|
- ``cache_files`` tracks downloaded files so we can keep only the N most
|
||||||
recently *played* ones (LRU retention). Files not yet played are never
|
recently *played* ones (LRU retention). Files not yet played are never
|
||||||
evicted.
|
evicted.
|
||||||
|
- ``canonical_cache`` memoizes ``(artist, title) -> MBID`` lookups (a NULL mbid
|
||||||
|
means a confirmed no-match) so the Canonicalizer hits the network at most once
|
||||||
|
per distinct track.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import sqlite3
|
import sqlite3
|
||||||
|
|
@ -21,6 +25,7 @@ _SCHEMA = """
|
||||||
CREATE TABLE IF NOT EXISTS history (
|
CREATE TABLE IF NOT EXISTS history (
|
||||||
id INTEGER PRIMARY KEY,
|
id INTEGER PRIMARY KEY,
|
||||||
track_key TEXT NOT NULL,
|
track_key TEXT NOT NULL,
|
||||||
|
locator TEXT,
|
||||||
artist TEXT,
|
artist TEXT,
|
||||||
title TEXT,
|
title TEXT,
|
||||||
origin TEXT,
|
origin TEXT,
|
||||||
|
|
@ -33,6 +38,14 @@ CREATE TABLE IF NOT EXISTS cache_files (
|
||||||
track_key TEXT,
|
track_key TEXT,
|
||||||
played_at REAL -- NULL until the file has been played
|
played_at REAL -- NULL until the file has been played
|
||||||
);
|
);
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS canonical_cache (
|
||||||
|
artist_norm TEXT NOT NULL,
|
||||||
|
title_norm TEXT NOT NULL,
|
||||||
|
mbid TEXT, -- NULL means a confirmed no-match
|
||||||
|
resolved_at REAL NOT NULL,
|
||||||
|
PRIMARY KEY (artist_norm, title_norm)
|
||||||
|
);
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -45,6 +58,16 @@ class Database:
|
||||||
)
|
)
|
||||||
self._conn.row_factory = sqlite3.Row
|
self._conn.row_factory = sqlite3.Row
|
||||||
self._conn.executescript(_SCHEMA)
|
self._conn.executescript(_SCHEMA)
|
||||||
|
self._migrate()
|
||||||
|
|
||||||
|
def _migrate(self) -> None:
|
||||||
|
# Add columns introduced after a DB may have been created (milestone 5).
|
||||||
|
cols = {
|
||||||
|
r["name"]
|
||||||
|
for r in self._conn.execute("PRAGMA table_info(history)").fetchall()
|
||||||
|
}
|
||||||
|
if "locator" not in cols:
|
||||||
|
self._conn.execute("ALTER TABLE history ADD COLUMN locator TEXT")
|
||||||
|
|
||||||
# --- anti-repeat / history -------------------------------------------
|
# --- anti-repeat / history -------------------------------------------
|
||||||
|
|
||||||
|
|
@ -56,12 +79,56 @@ class Database:
|
||||||
).fetchall()
|
).fetchall()
|
||||||
return {r["track_key"] for r in rows}
|
return {r["track_key"] for r in rows}
|
||||||
|
|
||||||
|
def recent_locators(self, limit: int) -> set[str]:
|
||||||
|
"""Raw backend locators recently played (providers' cheap local filter)."""
|
||||||
|
with self._lock:
|
||||||
|
rows = self._conn.execute(
|
||||||
|
"SELECT locator FROM history WHERE locator IS NOT NULL"
|
||||||
|
" ORDER BY played_at DESC LIMIT ?",
|
||||||
|
(limit,),
|
||||||
|
).fetchall()
|
||||||
|
return {r["locator"] for r in rows}
|
||||||
|
|
||||||
def record_play(self, track: Track) -> None:
|
def record_play(self, track: Track) -> None:
|
||||||
with self._lock:
|
with self._lock:
|
||||||
self._conn.execute(
|
self._conn.execute(
|
||||||
"INSERT INTO history (track_key, artist, title, origin, played_at)"
|
"INSERT INTO history"
|
||||||
" VALUES (?, ?, ?, ?, ?)",
|
" (track_key, locator, artist, title, origin, played_at)"
|
||||||
(track.key, track.artist, track.title, track.origin, time.time()),
|
" VALUES (?, ?, ?, ?, ?, ?)",
|
||||||
|
(
|
||||||
|
track.key,
|
||||||
|
track.locator,
|
||||||
|
track.artist,
|
||||||
|
track.title,
|
||||||
|
track.origin,
|
||||||
|
time.time(),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
# --- MBID canonical cache --------------------------------------------
|
||||||
|
|
||||||
|
def get_canonical(self, artist_norm: str, title_norm: str):
|
||||||
|
"""Return (cached: bool, mbid: str | None). ``cached`` False means the
|
||||||
|
pair was never looked up; a cached row with mbid None is a known miss."""
|
||||||
|
with self._lock:
|
||||||
|
row = self._conn.execute(
|
||||||
|
"SELECT mbid FROM canonical_cache"
|
||||||
|
" WHERE artist_norm = ? AND title_norm = ?",
|
||||||
|
(artist_norm, title_norm),
|
||||||
|
).fetchone()
|
||||||
|
if row is None:
|
||||||
|
return False, None
|
||||||
|
return True, row["mbid"]
|
||||||
|
|
||||||
|
def put_canonical(
|
||||||
|
self, artist_norm: str, title_norm: str, mbid: str | None
|
||||||
|
) -> None:
|
||||||
|
with self._lock:
|
||||||
|
self._conn.execute(
|
||||||
|
"INSERT OR REPLACE INTO canonical_cache"
|
||||||
|
" (artist_norm, title_norm, mbid, resolved_at)"
|
||||||
|
" VALUES (?, ?, ?, ?)",
|
||||||
|
(artist_norm, title_norm, mbid, time.time()),
|
||||||
)
|
)
|
||||||
|
|
||||||
# --- cache-file retention --------------------------------------------
|
# --- cache-file retention --------------------------------------------
|
||||||
|
|
|
||||||
|
|
@ -6,8 +6,18 @@ it into a local file; the queue and the state database use ``key`` for
|
||||||
de-duplication and anti-repeat.
|
de-duplication and anti-repeat.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
import re
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
|
|
||||||
|
_WS = re.compile(r"\s+")
|
||||||
|
|
||||||
|
|
||||||
|
def norm_name(value: str) -> str:
|
||||||
|
"""Normalize an artist/title for stable keying and cache lookups:
|
||||||
|
case-fold and collapse whitespace. Kept deliberately light (no accent
|
||||||
|
stripping) so distinct titles never collapse together."""
|
||||||
|
return _WS.sub(" ", value.strip()).casefold()
|
||||||
|
|
||||||
|
|
||||||
@dataclass(frozen=True)
|
@dataclass(frozen=True)
|
||||||
class Track:
|
class Track:
|
||||||
|
|
@ -22,14 +32,16 @@ class Track:
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def key(self) -> str:
|
def key(self) -> str:
|
||||||
"""Stable identity for de-duplication and anti-repeat.
|
"""Stable, source-agnostic identity for de-duplication and anti-repeat.
|
||||||
|
|
||||||
Until the Canonicalizer (milestone 5) fills ``mbid``, we key on the
|
The Canonicalizer fills ``mbid`` when it can, giving a truly
|
||||||
backend locator, which is unique within a source.
|
cross-source identity. When it can't, we fall back to the normalized
|
||||||
|
``(artist, title)`` — still source-agnostic, so the same track fetched
|
||||||
|
from two backends collapses to one key.
|
||||||
"""
|
"""
|
||||||
if self.mbid:
|
if self.mbid:
|
||||||
return f"mbid:{self.mbid}"
|
return f"mbid:{self.mbid}"
|
||||||
return f"{self.backend}:{self.locator}"
|
return f"name:{norm_name(self.artist)}|{norm_name(self.title)}"
|
||||||
|
|
||||||
def __str__(self) -> str:
|
def __str__(self) -> str:
|
||||||
return f"{self.artist} — {self.title} [{self.origin}]"
|
return f"{self.artist} — {self.title} [{self.origin}]"
|
||||||
|
|
|
||||||
|
|
@ -1,9 +1,10 @@
|
||||||
"""NavidromeProvider: picks tracks from an OpenSubsonic playlist.
|
"""NavidromeProvider: picks tracks from an OpenSubsonic playlist.
|
||||||
|
|
||||||
Emits ``subsonic`` tracks (locator = song id). The playlist is cached in
|
Emits ``subsonic`` tracks (locator = song id). The playlist is cached in
|
||||||
memory and refreshed periodically. Anti-repeat is applied by filtering out
|
memory and refreshed periodically. A cheap local anti-repeat filters out songs
|
||||||
tracks whose key is among the recently played ones; if that empties the pool
|
whose id was played recently; if that empties the pool (short playlist), the
|
||||||
(short playlist), the filter is dropped so playback never stalls.
|
filter is dropped so playback never stalls. The authoritative, source-agnostic
|
||||||
|
anti-repeat lives in the Scheduler (on the canonical key).
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import logging
|
import logging
|
||||||
|
|
@ -53,9 +54,9 @@ class NavidromeProvider:
|
||||||
if not self._songs:
|
if not self._songs:
|
||||||
return None
|
return None
|
||||||
|
|
||||||
recent = self._db.recent_keys(config.ANTIREPEAT_WINDOW)
|
recent = self._db.recent_locators(config.ANTIREPEAT_WINDOW)
|
||||||
candidates = [
|
candidates = [
|
||||||
s for s in self._songs if f"subsonic:{s['id']}" not in recent
|
s for s in self._songs if str(s["id"]) not in recent
|
||||||
] or self._songs
|
] or self._songs
|
||||||
song = random.choice(candidates)
|
song = random.choice(candidates)
|
||||||
return Track(
|
return Track(
|
||||||
|
|
|
||||||
|
|
@ -63,7 +63,7 @@ class YtdlpProvider:
|
||||||
self._load_urls()
|
self._load_urls()
|
||||||
if not self._urls:
|
if not self._urls:
|
||||||
return None
|
return None
|
||||||
recent = self._db.recent_keys(config.ANTIREPEAT_WINDOW)
|
recent = self._db.recent_locators(config.ANTIREPEAT_WINDOW)
|
||||||
# Try source lines in random order until one yields a usable track.
|
# Try source lines in random order until one yields a usable track.
|
||||||
candidates = list(self._urls)
|
candidates = list(self._urls)
|
||||||
random.shuffle(candidates)
|
random.shuffle(candidates)
|
||||||
|
|
@ -81,7 +81,7 @@ class YtdlpProvider:
|
||||||
return None
|
return None
|
||||||
if not entries:
|
if not entries:
|
||||||
return None
|
return None
|
||||||
pool = [e for e in entries if f"ytdlp:{e['url']}" not in recent] or entries
|
pool = [e for e in entries if e["url"] not in recent] or entries
|
||||||
entry = random.choice(pool)
|
entry = random.choice(pool)
|
||||||
return Track(
|
return Track(
|
||||||
backend="ytdlp",
|
backend="ytdlp",
|
||||||
|
|
|
||||||
|
|
@ -1,10 +1,15 @@
|
||||||
"""Weighted scheduler mixing several providers.
|
"""Weighted scheduler mixing several providers, with canonical anti-repeat.
|
||||||
|
|
||||||
Picks a provider at random, weighted by ``SOURCE_WEIGHTS``, and asks it for a
|
Two responsibilities:
|
||||||
track. If the chosen provider has nothing right now (empty list, unreachable
|
|
||||||
source…), the remaining providers are tried in weighted-random order, so a
|
- **Source mix**: pick a provider at random, weighted by ``SOURCE_WEIGHTS``. If
|
||||||
temporarily-idle source never stalls playback. Returns ``None`` only when every
|
the chosen one has nothing right now, try the rest in weighted-random order,
|
||||||
provider is exhausted.
|
so a temporarily-idle source never stalls playback.
|
||||||
|
- **Authoritative anti-repeat**: canonicalize the picked track (fill its MBID,
|
||||||
|
cached/best-effort) and reject it if its canonical key was played recently,
|
||||||
|
retrying a bounded number of times. This is source-agnostic: the same track
|
||||||
|
coming from two backends collapses to one key. Providers still apply a cheap
|
||||||
|
locator-based filter first, which keeps the number of retries low.
|
||||||
|
|
||||||
The weights live in one place (``config.SOURCE_WEIGHTS``) so the mix can later
|
The weights live in one place (``config.SOURCE_WEIGHTS``) so the mix can later
|
||||||
move to a config file without touching this logic.
|
move to a config file without touching this logic.
|
||||||
|
|
@ -13,15 +18,37 @@ move to a config file without touching this logic.
|
||||||
import logging
|
import logging
|
||||||
import random
|
import random
|
||||||
|
|
||||||
|
from . import config
|
||||||
|
|
||||||
log = logging.getLogger("radieo.scheduler")
|
log = logging.getLogger("radieo.scheduler")
|
||||||
|
|
||||||
|
|
||||||
class Scheduler:
|
class Scheduler:
|
||||||
def __init__(self, entries):
|
def __init__(self, entries, canonicalizer, db):
|
||||||
# entries: list[(provider, weight)]; drop non-positive weights.
|
# entries: list[(provider, weight)]; drop non-positive weights.
|
||||||
self._entries = [(p, w) for p, w in entries if w > 0]
|
self._entries = [(p, w) for p, w in entries if w > 0]
|
||||||
|
self._canonicalizer = canonicalizer
|
||||||
|
self._db = db
|
||||||
|
|
||||||
def next(self):
|
def next(self):
|
||||||
|
if not self._entries:
|
||||||
|
return None
|
||||||
|
recent = self._db.recent_keys(config.ANTIREPEAT_WINDOW)
|
||||||
|
last = None
|
||||||
|
for _ in range(config.SCHEDULER_MAX_TRIES):
|
||||||
|
track = self._pick()
|
||||||
|
if track is None:
|
||||||
|
return None # no source has anything right now
|
||||||
|
track = self._canonicalizer.canonicalize(track)
|
||||||
|
if track.key not in recent:
|
||||||
|
return track
|
||||||
|
last = track # recently played; try another
|
||||||
|
log.debug("skipping recent %s", track)
|
||||||
|
# Everything drawn was recent (e.g. tiny library): play the last anyway.
|
||||||
|
return last
|
||||||
|
|
||||||
|
def _pick(self):
|
||||||
|
"""Weighted provider draw, falling through to the others when empty."""
|
||||||
pool = list(self._entries)
|
pool = list(self._entries)
|
||||||
while pool:
|
while pool:
|
||||||
weights = [w for _, w in pool]
|
weights = [w for _, w in pool]
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue