Milestone 5: MusicBrainz MBID canonicalizer
Give tracks a source-agnostic identity so the same song from different sources no longer replays in a loop. - Canonicalizer resolves (artist, title) to a MusicBrainz recording MBID (no API key; ~1 req/s, descriptive User-Agent, best-effort). Hits and confirmed misses are cached in SQLite; transient errors are not. - Track.key becomes mbid:<id> when resolved, else a normalized name:<artist>|<title> fallback — still source-agnostic. - Scheduler now owns the authoritative anti-repeat on the canonical key, canonicalizing the drawn track with a bounded retry; providers keep a cheap recent-locator filter to limit retries. - db: canonical_cache table, history.locator column with migration for existing databases, recent_locators(). - Canonicalization can be turned off via RADIEO_CANONICAL_ENABLED=0. Verified: MBID hit/cache/miss, cross-source key collapse, scheduler dodging a recent play, schema migration, and full stack (Navidrome + yt-dlp) with zero Python tracebacks and a valid 192 kbps MP3 stream. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
8774f5c2a1
commit
7e0f08b863
11 changed files with 292 additions and 33 deletions
|
|
@ -19,6 +19,13 @@ RADIEO_NAVIDROME_PLAYLIST=Radio
|
|||
RADIEO_WEIGHT_NAVIDROME=3
|
||||
RADIEO_WEIGHT_YTDLP=1
|
||||
|
||||
# --- Canonicalizer MBID (optionnel) ---
|
||||
# Résout (artiste, titre) -> MBID MusicBrainz pour dédupliquer entre sources.
|
||||
# Aucune clé requise. Mettre 0 pour désactiver (clé = (artiste, titre)).
|
||||
RADIEO_CANONICAL_ENABLED=1
|
||||
# User-Agent envoyé à MusicBrainz (qui en exige un, descriptif).
|
||||
RADIEO_USER_AGENT=radieo/0.1 (personal music radio)
|
||||
|
||||
# --- Rétention du cache (optionnel) ---
|
||||
# Nombre de morceaux joués conservés sur disque avant éviction (LRU).
|
||||
RADIEO_RETENTION_KEEP=20
|
||||
|
|
|
|||
16
README.md
16
README.md
|
|
@ -67,12 +67,18 @@ source); the file being absent also disables yt-dlp.
|
|||
|
||||
## Current status
|
||||
|
||||
**Milestone 4 — yt-dlp provider: done.**
|
||||
**Milestone 5 — MBID canonicalizer: done.**
|
||||
|
||||
- Two playback sources feed a weighted scheduler: a Navidrome/OpenSubsonic
|
||||
playlist and a hand-maintained list of yt-dlp URLs (`config/urls.txt`).
|
||||
Container URLs (playlist/album/label/artist) are expanded and one track is
|
||||
drawn at random, honouring the anti-repeat window.
|
||||
drawn at random.
|
||||
- Each track is canonicalized to a MusicBrainz recording MBID (no API key
|
||||
needed; ~1 req/s, best-effort, results cached in SQLite). This gives a
|
||||
source-agnostic identity, so the same song from two sources collapses to one;
|
||||
when no confident match is found it falls back to a normalized
|
||||
`(artist, title)` key. The scheduler uses this canonical key for anti-repeat,
|
||||
with the providers applying a cheap locator filter first.
|
||||
- Each source has its own fetcher (Subsonic stream / yt-dlp download); files are
|
||||
cached ahead of playback (prefetch buffer) and decoded by Liquidsoap.
|
||||
- Play history and LRU retention are tracked in a SQLite database under
|
||||
|
|
@ -86,7 +92,9 @@ source); the file being absent also disables yt-dlp.
|
|||
- HTTP stream served at `http://localhost:8000/radio.mp3` (MP3, 192 kbps),
|
||||
multiple simultaneous listeners supported.
|
||||
|
||||
The ListenBrainz suggestion feed comes next.
|
||||
The ListenBrainz suggestion feed comes next. (Known cosmetic quirk: at startup
|
||||
the fallback logs a few harmless ffmpeg "Invalid data" warnings while probing
|
||||
non-audio files such as `.gitkeep`; to be quieted in the polish milestone.)
|
||||
|
||||
## Roadmap
|
||||
|
||||
|
|
@ -97,7 +105,7 @@ The ListenBrainz suggestion feed comes next.
|
|||
LRU retention and play history.
|
||||
4. ✅ **yt-dlp provider** — fetch tracks from a maintained URL/artist list;
|
||||
weighted mixing between sources.
|
||||
5. **Canonicalizer** — ListenBrainz MBID lookup for source-agnostic
|
||||
5. ✅ **Canonicalizer** — MusicBrainz MBID lookup for source-agnostic
|
||||
de-duplication.
|
||||
6. **ListenBrainz provider** — parse the RSS suggestions feed and resolve each
|
||||
one to Navidrome or yt-dlp.
|
||||
|
|
|
|||
|
|
@ -22,6 +22,9 @@ services:
|
|||
# Dosage du mix entre les sources (0 désactive).
|
||||
- RADIEO_WEIGHT_NAVIDROME=${RADIEO_WEIGHT_NAVIDROME:-3}
|
||||
- RADIEO_WEIGHT_YTDLP=${RADIEO_WEIGHT_YTDLP:-1}
|
||||
# Canonicalizer MusicBrainz (identité MBID inter-sources ; sans clé).
|
||||
- RADIEO_CANONICAL_ENABLED=${RADIEO_CANONICAL_ENABLED:-1}
|
||||
- RADIEO_USER_AGENT=${RADIEO_USER_AGENT:-radieo/0.1 (personal music radio)}
|
||||
restart: unless-stopped
|
||||
|
||||
stream:
|
||||
|
|
|
|||
|
|
@ -11,8 +11,18 @@ from .scheduler import Scheduler
|
|||
log = logging.getLogger("radieo")
|
||||
|
||||
|
||||
class _NullCanonicalizer:
|
||||
"""Used when canonicalization is disabled: leaves the Track untouched."""
|
||||
|
||||
def canonicalize(self, track):
|
||||
return track
|
||||
|
||||
def close(self):
|
||||
pass
|
||||
|
||||
|
||||
def _build_pipeline(db: Database):
|
||||
"""Return (scheduler, fetchers). Assembles whichever sources are enabled;
|
||||
"""Return (providers, fetchers). Assembles whichever sources are enabled;
|
||||
when none is, the scheduler yields nothing and the stream plays its local
|
||||
cache fallback."""
|
||||
providers = [] # list[(provider, weight)]
|
||||
|
|
@ -58,7 +68,7 @@ def _build_pipeline(db: Database):
|
|||
|
||||
if not providers:
|
||||
log.warning("no source active: the stream plays its local cache only.")
|
||||
return Scheduler(providers), fetchers
|
||||
return providers, fetchers
|
||||
|
||||
|
||||
def _sweep_temp_files() -> None:
|
||||
|
|
@ -94,7 +104,19 @@ def main() -> None:
|
|||
_sweep_temp_files()
|
||||
|
||||
db = Database(config.STATE_DB)
|
||||
scheduler, fetchers = _build_pipeline(db)
|
||||
|
||||
if config.CANONICAL_ENABLED:
|
||||
from .canonicalizer import Canonicalizer
|
||||
|
||||
canonicalizer = Canonicalizer(db)
|
||||
log.info("Canonicalizer enabled (MusicBrainz, min_score=%d)",
|
||||
config.CANONICAL_MIN_SCORE)
|
||||
else:
|
||||
canonicalizer = _NullCanonicalizer()
|
||||
log.info("Canonicalizer disabled: tracks keyed by (artist, title).")
|
||||
|
||||
providers, fetchers = _build_pipeline(db)
|
||||
scheduler = Scheduler(providers, canonicalizer, db)
|
||||
queue = TrackQueue(scheduler, fetchers, db)
|
||||
queue.start()
|
||||
|
||||
|
|
@ -113,6 +135,7 @@ def main() -> None:
|
|||
finally:
|
||||
queue.stop()
|
||||
server.server_close()
|
||||
canonicalizer.close()
|
||||
db.close()
|
||||
|
||||
|
||||
|
|
|
|||
94
ingest/radieo/canonicalizer.py
Normal file
94
ingest/radieo/canonicalizer.py
Normal file
|
|
@ -0,0 +1,94 @@
|
|||
"""Canonicalizer: resolve a Track to a MusicBrainz recording MBID.
|
||||
|
||||
Given ``(artist, title)`` it queries the MusicBrainz recording search and keeps
|
||||
the best match above a score threshold. Results — hits *and* confirmed misses —
|
||||
are cached in SQLite so the network is hit at most once per distinct track.
|
||||
Genuine no-matches are cached; transient network errors are not, so they get
|
||||
retried later.
|
||||
|
||||
MusicBrainz asks anonymous clients to stay under ~1 request/second and to send a
|
||||
descriptive User-Agent; both are honoured here. The whole thing is best-effort:
|
||||
any failure just leaves ``mbid`` unset and the Track falls back to its
|
||||
name-based key.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import threading
|
||||
import time
|
||||
from dataclasses import replace
|
||||
|
||||
import httpx
|
||||
|
||||
from . import config
|
||||
from .models import Track, norm_name
|
||||
|
||||
log = logging.getLogger("radieo.canonicalizer")
|
||||
|
||||
|
||||
class Canonicalizer:
|
||||
def __init__(self, db):
|
||||
self._db = db
|
||||
self._http = httpx.Client(
|
||||
timeout=httpx.Timeout(connect=10.0, read=30.0, write=10.0, pool=10.0),
|
||||
headers={"User-Agent": config.USER_AGENT},
|
||||
follow_redirects=True,
|
||||
)
|
||||
self._rate_lock = threading.Lock()
|
||||
self._last_call = 0.0
|
||||
|
||||
def canonicalize(self, track: Track) -> Track:
|
||||
"""Return the Track with ``mbid`` filled when resolvable, else unchanged."""
|
||||
artist_norm = norm_name(track.artist)
|
||||
title_norm = norm_name(track.title)
|
||||
if not artist_norm or not title_norm:
|
||||
return track
|
||||
|
||||
cached, mbid = self._db.get_canonical(artist_norm, title_norm)
|
||||
if not cached:
|
||||
ok, mbid = self._lookup(track.artist, track.title)
|
||||
if ok: # cache hits and genuine misses; skip transient errors
|
||||
self._db.put_canonical(artist_norm, title_norm, mbid)
|
||||
return replace(track, mbid=mbid) if mbid else track
|
||||
|
||||
# --- MusicBrainz ------------------------------------------------------
|
||||
|
||||
def _lookup(self, artist: str, title: str) -> tuple[bool, str | None]:
|
||||
"""Return (ok, mbid). ``ok`` False on a transient error (do not cache)."""
|
||||
query = f'artist:"{_escape(artist)}" AND recording:"{_escape(title)}"'
|
||||
self._throttle()
|
||||
try:
|
||||
resp = self._http.get(
|
||||
config.MUSICBRAINZ_URL,
|
||||
params={"query": query, "fmt": "json", "limit": 3},
|
||||
)
|
||||
resp.raise_for_status()
|
||||
data = resp.json()
|
||||
except (httpx.HTTPError, ValueError) as exc:
|
||||
log.warning("MusicBrainz lookup failed for %s — %s: %s", artist, title, exc)
|
||||
return False, None
|
||||
|
||||
recordings = data.get("recordings") or []
|
||||
if recordings:
|
||||
best = recordings[0]
|
||||
if int(best.get("score", 0)) >= config.CANONICAL_MIN_SCORE:
|
||||
mbid = best.get("id")
|
||||
log.info("MBID %s for %s — %s", mbid, artist, title)
|
||||
return True, mbid
|
||||
log.info("no confident MBID for %s — %s", artist, title)
|
||||
return True, None # genuine miss: cache it
|
||||
|
||||
def _throttle(self) -> None:
|
||||
with self._rate_lock:
|
||||
wait = config.CANONICAL_RATE_INTERVAL - (time.time() - self._last_call)
|
||||
if wait > 0:
|
||||
time.sleep(wait)
|
||||
self._last_call = time.time()
|
||||
|
||||
def close(self) -> None:
|
||||
self._http.close()
|
||||
|
||||
|
||||
# Lucene special characters we quote around; escape the ones that would break a
|
||||
# quoted phrase.
|
||||
def _escape(value: str) -> str:
|
||||
return value.replace("\\", "\\\\").replace('"', '\\"')
|
||||
|
|
@ -27,6 +27,23 @@ PREFETCH_INTERVAL = float(os.environ.get("RADIEO_PREFETCH_INTERVAL", "2.0"))
|
|||
RETENTION_KEEP = int(os.environ.get("RADIEO_RETENTION_KEEP", "20"))
|
||||
# Do not replay a track seen among the last N plays, when avoidable.
|
||||
ANTIREPEAT_WINDOW = int(os.environ.get("RADIEO_ANTIREPEAT_WINDOW", "50"))
|
||||
# How many draws the scheduler tries to dodge a recent repeat before giving up.
|
||||
SCHEDULER_MAX_TRIES = int(os.environ.get("RADIEO_SCHEDULER_MAX_TRIES", "5"))
|
||||
|
||||
# --- Canonicalizer (MusicBrainz MBID lookup) ---
|
||||
# Resolves (artist, title) -> recording MBID for source-agnostic de-dup.
|
||||
CANONICAL_ENABLED = os.environ.get("RADIEO_CANONICAL_ENABLED", "1") != "0"
|
||||
MUSICBRAINZ_URL = os.environ.get(
|
||||
"RADIEO_MUSICBRAINZ_URL", "https://musicbrainz.org/ws/2/recording"
|
||||
)
|
||||
# Minimum MusicBrainz match score (0-100) to accept a recording as canonical.
|
||||
CANONICAL_MIN_SCORE = int(os.environ.get("RADIEO_CANONICAL_MIN_SCORE", "90"))
|
||||
# Minimum seconds between MusicBrainz requests (anonymous limit ~1 req/s).
|
||||
CANONICAL_RATE_INTERVAL = float(os.environ.get("RADIEO_CANONICAL_RATE_INTERVAL", "1.1"))
|
||||
# Sent to MusicBrainz (which requires a descriptive User-Agent) and yt-dlp/others.
|
||||
USER_AGENT = os.environ.get(
|
||||
"RADIEO_USER_AGENT", "radieo/0.1 (personal music radio)"
|
||||
)
|
||||
|
||||
# --- Navidrome / OpenSubsonic source ---
|
||||
# Left empty means the provider is disabled (the stream then plays its own
|
||||
|
|
|
|||
|
|
@ -1,13 +1,17 @@
|
|||
"""SQLite state: play history (anti-repeat + stats) and cache-file retention.
|
||||
"""SQLite state: play history, cache-file retention and MBID canonical cache.
|
||||
|
||||
Two concerns, two tables:
|
||||
Three concerns, three tables:
|
||||
|
||||
- ``history`` is append-only. It drives anti-repeat (recently played track
|
||||
keys) and survives cache eviction, so a track can stay "recently played" even
|
||||
after its file is deleted.
|
||||
- ``history`` is append-only. It drives anti-repeat (recently played canonical
|
||||
keys, and raw locators for the providers' cheap local filter) and survives
|
||||
cache eviction, so a track can stay "recently played" even after its file is
|
||||
deleted.
|
||||
- ``cache_files`` tracks downloaded files so we can keep only the N most
|
||||
recently *played* ones (LRU retention). Files not yet played are never
|
||||
evicted.
|
||||
- ``canonical_cache`` memoizes ``(artist, title) -> MBID`` lookups (a NULL mbid
|
||||
means a confirmed no-match) so the Canonicalizer hits the network at most once
|
||||
per distinct track.
|
||||
"""
|
||||
|
||||
import sqlite3
|
||||
|
|
@ -21,6 +25,7 @@ _SCHEMA = """
|
|||
CREATE TABLE IF NOT EXISTS history (
|
||||
id INTEGER PRIMARY KEY,
|
||||
track_key TEXT NOT NULL,
|
||||
locator TEXT,
|
||||
artist TEXT,
|
||||
title TEXT,
|
||||
origin TEXT,
|
||||
|
|
@ -33,6 +38,14 @@ CREATE TABLE IF NOT EXISTS cache_files (
|
|||
track_key TEXT,
|
||||
played_at REAL -- NULL until the file has been played
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS canonical_cache (
|
||||
artist_norm TEXT NOT NULL,
|
||||
title_norm TEXT NOT NULL,
|
||||
mbid TEXT, -- NULL means a confirmed no-match
|
||||
resolved_at REAL NOT NULL,
|
||||
PRIMARY KEY (artist_norm, title_norm)
|
||||
);
|
||||
"""
|
||||
|
||||
|
||||
|
|
@ -45,6 +58,16 @@ class Database:
|
|||
)
|
||||
self._conn.row_factory = sqlite3.Row
|
||||
self._conn.executescript(_SCHEMA)
|
||||
self._migrate()
|
||||
|
||||
def _migrate(self) -> None:
|
||||
# Add columns introduced after a DB may have been created (milestone 5).
|
||||
cols = {
|
||||
r["name"]
|
||||
for r in self._conn.execute("PRAGMA table_info(history)").fetchall()
|
||||
}
|
||||
if "locator" not in cols:
|
||||
self._conn.execute("ALTER TABLE history ADD COLUMN locator TEXT")
|
||||
|
||||
# --- anti-repeat / history -------------------------------------------
|
||||
|
||||
|
|
@ -56,12 +79,56 @@ class Database:
|
|||
).fetchall()
|
||||
return {r["track_key"] for r in rows}
|
||||
|
||||
def recent_locators(self, limit: int) -> set[str]:
|
||||
"""Raw backend locators recently played (providers' cheap local filter)."""
|
||||
with self._lock:
|
||||
rows = self._conn.execute(
|
||||
"SELECT locator FROM history WHERE locator IS NOT NULL"
|
||||
" ORDER BY played_at DESC LIMIT ?",
|
||||
(limit,),
|
||||
).fetchall()
|
||||
return {r["locator"] for r in rows}
|
||||
|
||||
def record_play(self, track: Track) -> None:
|
||||
with self._lock:
|
||||
self._conn.execute(
|
||||
"INSERT INTO history (track_key, artist, title, origin, played_at)"
|
||||
" VALUES (?, ?, ?, ?, ?)",
|
||||
(track.key, track.artist, track.title, track.origin, time.time()),
|
||||
"INSERT INTO history"
|
||||
" (track_key, locator, artist, title, origin, played_at)"
|
||||
" VALUES (?, ?, ?, ?, ?, ?)",
|
||||
(
|
||||
track.key,
|
||||
track.locator,
|
||||
track.artist,
|
||||
track.title,
|
||||
track.origin,
|
||||
time.time(),
|
||||
),
|
||||
)
|
||||
|
||||
# --- MBID canonical cache --------------------------------------------
|
||||
|
||||
def get_canonical(self, artist_norm: str, title_norm: str):
|
||||
"""Return (cached: bool, mbid: str | None). ``cached`` False means the
|
||||
pair was never looked up; a cached row with mbid None is a known miss."""
|
||||
with self._lock:
|
||||
row = self._conn.execute(
|
||||
"SELECT mbid FROM canonical_cache"
|
||||
" WHERE artist_norm = ? AND title_norm = ?",
|
||||
(artist_norm, title_norm),
|
||||
).fetchone()
|
||||
if row is None:
|
||||
return False, None
|
||||
return True, row["mbid"]
|
||||
|
||||
def put_canonical(
|
||||
self, artist_norm: str, title_norm: str, mbid: str | None
|
||||
) -> None:
|
||||
with self._lock:
|
||||
self._conn.execute(
|
||||
"INSERT OR REPLACE INTO canonical_cache"
|
||||
" (artist_norm, title_norm, mbid, resolved_at)"
|
||||
" VALUES (?, ?, ?, ?)",
|
||||
(artist_norm, title_norm, mbid, time.time()),
|
||||
)
|
||||
|
||||
# --- cache-file retention --------------------------------------------
|
||||
|
|
|
|||
|
|
@ -6,8 +6,18 @@ it into a local file; the queue and the state database use ``key`` for
|
|||
de-duplication and anti-repeat.
|
||||
"""
|
||||
|
||||
import re
|
||||
from dataclasses import dataclass
|
||||
|
||||
_WS = re.compile(r"\s+")
|
||||
|
||||
|
||||
def norm_name(value: str) -> str:
|
||||
"""Normalize an artist/title for stable keying and cache lookups:
|
||||
case-fold and collapse whitespace. Kept deliberately light (no accent
|
||||
stripping) so distinct titles never collapse together."""
|
||||
return _WS.sub(" ", value.strip()).casefold()
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class Track:
|
||||
|
|
@ -22,14 +32,16 @@ class Track:
|
|||
|
||||
@property
|
||||
def key(self) -> str:
|
||||
"""Stable identity for de-duplication and anti-repeat.
|
||||
"""Stable, source-agnostic identity for de-duplication and anti-repeat.
|
||||
|
||||
Until the Canonicalizer (milestone 5) fills ``mbid``, we key on the
|
||||
backend locator, which is unique within a source.
|
||||
The Canonicalizer fills ``mbid`` when it can, giving a truly
|
||||
cross-source identity. When it can't, we fall back to the normalized
|
||||
``(artist, title)`` — still source-agnostic, so the same track fetched
|
||||
from two backends collapses to one key.
|
||||
"""
|
||||
if self.mbid:
|
||||
return f"mbid:{self.mbid}"
|
||||
return f"{self.backend}:{self.locator}"
|
||||
return f"name:{norm_name(self.artist)}|{norm_name(self.title)}"
|
||||
|
||||
def __str__(self) -> str:
|
||||
return f"{self.artist} — {self.title} [{self.origin}]"
|
||||
|
|
|
|||
|
|
@ -1,9 +1,10 @@
|
|||
"""NavidromeProvider: picks tracks from an OpenSubsonic playlist.
|
||||
|
||||
Emits ``subsonic`` tracks (locator = song id). The playlist is cached in
|
||||
memory and refreshed periodically. Anti-repeat is applied by filtering out
|
||||
tracks whose key is among the recently played ones; if that empties the pool
|
||||
(short playlist), the filter is dropped so playback never stalls.
|
||||
memory and refreshed periodically. A cheap local anti-repeat filters out songs
|
||||
whose id was played recently; if that empties the pool (short playlist), the
|
||||
filter is dropped so playback never stalls. The authoritative, source-agnostic
|
||||
anti-repeat lives in the Scheduler (on the canonical key).
|
||||
"""
|
||||
|
||||
import logging
|
||||
|
|
@ -53,9 +54,9 @@ class NavidromeProvider:
|
|||
if not self._songs:
|
||||
return None
|
||||
|
||||
recent = self._db.recent_keys(config.ANTIREPEAT_WINDOW)
|
||||
recent = self._db.recent_locators(config.ANTIREPEAT_WINDOW)
|
||||
candidates = [
|
||||
s for s in self._songs if f"subsonic:{s['id']}" not in recent
|
||||
s for s in self._songs if str(s["id"]) not in recent
|
||||
] or self._songs
|
||||
song = random.choice(candidates)
|
||||
return Track(
|
||||
|
|
|
|||
|
|
@ -63,7 +63,7 @@ class YtdlpProvider:
|
|||
self._load_urls()
|
||||
if not self._urls:
|
||||
return None
|
||||
recent = self._db.recent_keys(config.ANTIREPEAT_WINDOW)
|
||||
recent = self._db.recent_locators(config.ANTIREPEAT_WINDOW)
|
||||
# Try source lines in random order until one yields a usable track.
|
||||
candidates = list(self._urls)
|
||||
random.shuffle(candidates)
|
||||
|
|
@ -81,7 +81,7 @@ class YtdlpProvider:
|
|||
return None
|
||||
if not entries:
|
||||
return None
|
||||
pool = [e for e in entries if f"ytdlp:{e['url']}" not in recent] or entries
|
||||
pool = [e for e in entries if e["url"] not in recent] or entries
|
||||
entry = random.choice(pool)
|
||||
return Track(
|
||||
backend="ytdlp",
|
||||
|
|
|
|||
|
|
@ -1,10 +1,15 @@
|
|||
"""Weighted scheduler mixing several providers.
|
||||
"""Weighted scheduler mixing several providers, with canonical anti-repeat.
|
||||
|
||||
Picks a provider at random, weighted by ``SOURCE_WEIGHTS``, and asks it for a
|
||||
track. If the chosen provider has nothing right now (empty list, unreachable
|
||||
source…), the remaining providers are tried in weighted-random order, so a
|
||||
temporarily-idle source never stalls playback. Returns ``None`` only when every
|
||||
provider is exhausted.
|
||||
Two responsibilities:
|
||||
|
||||
- **Source mix**: pick a provider at random, weighted by ``SOURCE_WEIGHTS``. If
|
||||
the chosen one has nothing right now, try the rest in weighted-random order,
|
||||
so a temporarily-idle source never stalls playback.
|
||||
- **Authoritative anti-repeat**: canonicalize the picked track (fill its MBID,
|
||||
cached/best-effort) and reject it if its canonical key was played recently,
|
||||
retrying a bounded number of times. This is source-agnostic: the same track
|
||||
coming from two backends collapses to one key. Providers still apply a cheap
|
||||
locator-based filter first, which keeps the number of retries low.
|
||||
|
||||
The weights live in one place (``config.SOURCE_WEIGHTS``) so the mix can later
|
||||
move to a config file without touching this logic.
|
||||
|
|
@ -13,15 +18,37 @@ move to a config file without touching this logic.
|
|||
import logging
|
||||
import random
|
||||
|
||||
from . import config
|
||||
|
||||
log = logging.getLogger("radieo.scheduler")
|
||||
|
||||
|
||||
class Scheduler:
|
||||
def __init__(self, entries):
|
||||
def __init__(self, entries, canonicalizer, db):
|
||||
# entries: list[(provider, weight)]; drop non-positive weights.
|
||||
self._entries = [(p, w) for p, w in entries if w > 0]
|
||||
self._canonicalizer = canonicalizer
|
||||
self._db = db
|
||||
|
||||
def next(self):
|
||||
if not self._entries:
|
||||
return None
|
||||
recent = self._db.recent_keys(config.ANTIREPEAT_WINDOW)
|
||||
last = None
|
||||
for _ in range(config.SCHEDULER_MAX_TRIES):
|
||||
track = self._pick()
|
||||
if track is None:
|
||||
return None # no source has anything right now
|
||||
track = self._canonicalizer.canonicalize(track)
|
||||
if track.key not in recent:
|
||||
return track
|
||||
last = track # recently played; try another
|
||||
log.debug("skipping recent %s", track)
|
||||
# Everything drawn was recent (e.g. tiny library): play the last anyway.
|
||||
return last
|
||||
|
||||
def _pick(self):
|
||||
"""Weighted provider draw, falling through to the others when empty."""
|
||||
pool = list(self._entries)
|
||||
while pool:
|
||||
weights = [w for _, w in pool]
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue