Milestone 4: yt-dlp provider and weighted source scheduler
Add a second playback source and a weighted scheduler mixing it with Navidrome: - Scheduler picks a provider by SOURCE_WEIGHTS, falling through to the others when one has nothing ready, so no source can stall playback. - YtdlpProvider reads a hand-maintained config/urls.txt; container URLs (playlist/album/label/artist) are flat-extracted and one entry is drawn at random, honouring the anti-repeat window. Adds Track.source_url. - YtdlpFetcher downloads bestaudio via the yt-dlp library, reusing the atomic hidden-temp-then-rename pattern; Liquidsoap decodes the result. - Queue now dispatches to a fetcher registry keyed by backend. - Sweep orphaned download temp files on daemon startup (leftovers from a killed container otherwise pile up and trip the stream fallback). Verified end-to-end: yt-dlp opus decoded and served as 192 kbps MP3, and the 3:1 default mix observed in play history. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
8c27498632
commit
d1db6a11d8
13 changed files with 418 additions and 46 deletions
|
|
@ -10,6 +10,15 @@ RADIEO_NAVIDROME_PASSWORD=monmotdepasse
|
||||||
# Nom OU identifiant de la playlist à diffuser.
|
# Nom OU identifiant de la playlist à diffuser.
|
||||||
RADIEO_NAVIDROME_PLAYLIST=Radio
|
RADIEO_NAVIDROME_PLAYLIST=Radio
|
||||||
|
|
||||||
|
# --- Source yt-dlp ---
|
||||||
|
# La liste d'URL se met dans config/urls.txt (copier config/urls.txt.example).
|
||||||
|
# Rien à mettre ici ; le fichier absent désactive simplement la source.
|
||||||
|
|
||||||
|
# --- Dosage du mix entre sources (optionnel) ---
|
||||||
|
# Poids relatifs de tirage de chaque source (0 désactive la source).
|
||||||
|
RADIEO_WEIGHT_NAVIDROME=3
|
||||||
|
RADIEO_WEIGHT_YTDLP=1
|
||||||
|
|
||||||
# --- Rétention du cache (optionnel) ---
|
# --- Rétention du cache (optionnel) ---
|
||||||
# Nombre de morceaux joués conservés sur disque avant éviction (LRU).
|
# Nombre de morceaux joués conservés sur disque avant éviction (LRU).
|
||||||
RADIEO_RETENTION_KEEP=20
|
RADIEO_RETENTION_KEEP=20
|
||||||
|
|
|
||||||
3
.gitignore
vendored
3
.gitignore
vendored
|
|
@ -5,6 +5,9 @@
|
||||||
# Secrets locaux
|
# Secrets locaux
|
||||||
.env
|
.env
|
||||||
|
|
||||||
|
# Liste de sources yt-dlp personnelle (garder seulement l'exemple)
|
||||||
|
/config/urls.txt
|
||||||
|
|
||||||
# État de l'ingestion (base SQLite persistante)
|
# État de l'ingestion (base SQLite persistante)
|
||||||
/state/
|
/state/
|
||||||
*.db
|
*.db
|
||||||
|
|
|
||||||
27
README.md
27
README.md
|
|
@ -58,16 +58,27 @@ cp .env.example .env
|
||||||
If the Navidrome variables are left empty, the source is simply disabled and
|
If the Navidrome variables are left empty, the source is simply disabled and
|
||||||
the stream plays whatever is already in `cache/` (the milestone-1/2 behaviour).
|
the stream plays whatever is already in `cache/` (the milestone-1/2 behaviour).
|
||||||
|
|
||||||
|
For the yt-dlp source, list the URLs to draw from in `config/urls.txt` (copy
|
||||||
|
`config/urls.txt.example`). Each line is either a direct track URL or a
|
||||||
|
container URL (playlist, album, label, artist page) from which one track is
|
||||||
|
picked at random. The relative mix between sources is set by
|
||||||
|
`RADIEO_WEIGHT_NAVIDROME` / `RADIEO_WEIGHT_YTDLP` (a weight of 0 disables a
|
||||||
|
source); the file being absent also disables yt-dlp.
|
||||||
|
|
||||||
## Current status
|
## Current status
|
||||||
|
|
||||||
**Milestone 3 — Navidrome provider: done.**
|
**Milestone 4 — yt-dlp provider: done.**
|
||||||
|
|
||||||
- `ingest` pulls tracks from an OpenSubsonic playlist (Navidrome), downloading
|
- Two playback sources feed a weighted scheduler: a Navidrome/OpenSubsonic
|
||||||
them into the shared cache ahead of playback (prefetch buffer).
|
playlist and a hand-maintained list of yt-dlp URLs (`config/urls.txt`).
|
||||||
|
Container URLs (playlist/album/label/artist) are expanded and one track is
|
||||||
|
drawn at random, honouring the anti-repeat window.
|
||||||
|
- Each source has its own fetcher (Subsonic stream / yt-dlp download); files are
|
||||||
|
cached ahead of playback (prefetch buffer) and decoded by Liquidsoap.
|
||||||
- Play history and LRU retention are tracked in a SQLite database under
|
- Play history and LRU retention are tracked in a SQLite database under
|
||||||
`state/`: only the N most recently played files are kept on disk
|
`state/`: only the N most recently played files are kept on disk
|
||||||
(`RADIEO_RETENTION_KEEP`, default 20); anti-repeat avoids replaying a track
|
(`RADIEO_RETENTION_KEEP`, default 20). Orphaned download temp files are swept
|
||||||
seen among the last plays.
|
on startup.
|
||||||
- `GET /next` returns the next track as an annotated Liquidsoap URI with real
|
- `GET /next` returns the next track as an annotated Liquidsoap URI with real
|
||||||
title/artist metadata (or an empty body when nothing is ready).
|
title/artist metadata (or an empty body when nothing is ready).
|
||||||
- `stream` (Liquidsoap v2.4.5) pulls via `request.dynamic` and falls back to the
|
- `stream` (Liquidsoap v2.4.5) pulls via `request.dynamic` and falls back to the
|
||||||
|
|
@ -75,7 +86,7 @@ the stream plays whatever is already in `cache/` (the milestone-1/2 behaviour).
|
||||||
- HTTP stream served at `http://localhost:8000/radio.mp3` (MP3, 192 kbps),
|
- HTTP stream served at `http://localhost:8000/radio.mp3` (MP3, 192 kbps),
|
||||||
multiple simultaneous listeners supported.
|
multiple simultaneous listeners supported.
|
||||||
|
|
||||||
The yt-dlp and ListenBrainz sources come next.
|
The ListenBrainz suggestion feed comes next.
|
||||||
|
|
||||||
## Roadmap
|
## Roadmap
|
||||||
|
|
||||||
|
|
@ -84,8 +95,8 @@ The yt-dlp and ListenBrainz sources come next.
|
||||||
switches to a `request.dynamic` source with the cache as fallback.
|
switches to a `request.dynamic` source with the cache as fallback.
|
||||||
3. ✅ **Navidrome provider** — play from an OpenSubsonic playlist, with caching,
|
3. ✅ **Navidrome provider** — play from an OpenSubsonic playlist, with caching,
|
||||||
LRU retention and play history.
|
LRU retention and play history.
|
||||||
4. **yt-dlp provider** — fetch tracks from a maintained URL/artist list; weighted
|
4. ✅ **yt-dlp provider** — fetch tracks from a maintained URL/artist list;
|
||||||
mixing between sources.
|
weighted mixing between sources.
|
||||||
5. **Canonicalizer** — ListenBrainz MBID lookup for source-agnostic
|
5. **Canonicalizer** — ListenBrainz MBID lookup for source-agnostic
|
||||||
de-duplication.
|
de-duplication.
|
||||||
6. **ListenBrainz provider** — parse the RSS suggestions feed and resolve each
|
6. **ListenBrainz provider** — parse the RSS suggestions feed and resolve each
|
||||||
|
|
|
||||||
14
config/urls.txt.example
Normal file
14
config/urls.txt.example
Normal file
|
|
@ -0,0 +1,14 @@
|
||||||
|
# radieo — liste de sources yt-dlp. Copier en `config/urls.txt`.
|
||||||
|
# Une URL par ligne. Les lignes vides et celles commençant par '#' sont ignorées.
|
||||||
|
#
|
||||||
|
# Chaque URL peut être :
|
||||||
|
# - un morceau précis (téléchargé tel quel)
|
||||||
|
# - une playlist / album / label / page d'artiste
|
||||||
|
# -> radieo y pioche un morceau au hasard à chaque tour,
|
||||||
|
# en évitant ceux joués récemment.
|
||||||
|
#
|
||||||
|
# Exemples (à remplacer par les tiens) :
|
||||||
|
# https://www.youtube.com/watch?v=dQw4w9WgXcQ
|
||||||
|
# https://soundcloud.com/artiste/un-morceau
|
||||||
|
# https://artiste.bandcamp.com/album/un-album
|
||||||
|
# https://www.youtube.com/playlist?list=PLxxxxxxxx
|
||||||
|
|
@ -5,6 +5,7 @@ services:
|
||||||
volumes:
|
volumes:
|
||||||
- ./cache:/cache # volume partagé avec le stream (rw : téléchargements)
|
- ./cache:/cache # volume partagé avec le stream (rw : téléchargements)
|
||||||
- ./state:/state # état persistant (SQLite) hors du cache éphémère
|
- ./state:/state # état persistant (SQLite) hors du cache éphémère
|
||||||
|
- ./config:/config:ro # config éditable sans rebuild (urls.txt yt-dlp)
|
||||||
environment:
|
environment:
|
||||||
- RADIEO_CACHE_DIR=/cache
|
- RADIEO_CACHE_DIR=/cache
|
||||||
- RADIEO_STATE_DIR=/state
|
- RADIEO_STATE_DIR=/state
|
||||||
|
|
@ -16,6 +17,11 @@ services:
|
||||||
- RADIEO_NAVIDROME_PASSWORD=${RADIEO_NAVIDROME_PASSWORD:-}
|
- RADIEO_NAVIDROME_PASSWORD=${RADIEO_NAVIDROME_PASSWORD:-}
|
||||||
- RADIEO_NAVIDROME_PLAYLIST=${RADIEO_NAVIDROME_PLAYLIST:-}
|
- RADIEO_NAVIDROME_PLAYLIST=${RADIEO_NAVIDROME_PLAYLIST:-}
|
||||||
- RADIEO_RETENTION_KEEP=${RADIEO_RETENTION_KEEP:-20}
|
- RADIEO_RETENTION_KEEP=${RADIEO_RETENTION_KEEP:-20}
|
||||||
|
# Source yt-dlp : liste d'URL dans config/urls.txt (créer depuis l'exemple).
|
||||||
|
- RADIEO_YTDLP_URLS_FILE=/config/urls.txt
|
||||||
|
# Dosage du mix entre les sources (0 désactive).
|
||||||
|
- RADIEO_WEIGHT_NAVIDROME=${RADIEO_WEIGHT_NAVIDROME:-3}
|
||||||
|
- RADIEO_WEIGHT_YTDLP=${RADIEO_WEIGHT_YTDLP:-1}
|
||||||
restart: unless-stopped
|
restart: unless-stopped
|
||||||
|
|
||||||
stream:
|
stream:
|
||||||
|
|
|
||||||
|
|
@ -6,39 +6,81 @@ from . import config
|
||||||
from .api import IngestServer
|
from .api import IngestServer
|
||||||
from .db import Database
|
from .db import Database
|
||||||
from .queue import TrackQueue
|
from .queue import TrackQueue
|
||||||
|
from .scheduler import Scheduler
|
||||||
|
|
||||||
log = logging.getLogger("radieo")
|
log = logging.getLogger("radieo")
|
||||||
|
|
||||||
|
|
||||||
class _NullProvider:
|
|
||||||
"""Fallback provider when no source is configured: never yields a track."""
|
|
||||||
|
|
||||||
def next(self):
|
|
||||||
return None
|
|
||||||
|
|
||||||
|
|
||||||
def _build_pipeline(db: Database):
|
def _build_pipeline(db: Database):
|
||||||
"""Return (provider, fetcher). Falls back to a no-op provider when the
|
"""Return (scheduler, fetchers). Assembles whichever sources are enabled;
|
||||||
Navidrome source is not configured, so the daemon still runs and the
|
when none is, the scheduler yields nothing and the stream plays its local
|
||||||
stream plays its local cache."""
|
cache fallback."""
|
||||||
if not config.NAVIDROME_ENABLED:
|
providers = [] # list[(provider, weight)]
|
||||||
log.warning(
|
fetchers = {} # backend name -> fetcher
|
||||||
"Navidrome not configured (RADIEO_NAVIDROME_*): no source active, "
|
|
||||||
"the stream will play its local cache fallback."
|
if config.NAVIDROME_ENABLED:
|
||||||
|
from .fetchers.subsonic import SubsonicFetcher
|
||||||
|
from .providers.navidrome import NavidromeProvider
|
||||||
|
from .subsonic import SubsonicClient
|
||||||
|
|
||||||
|
client = SubsonicClient(
|
||||||
|
config.NAVIDROME_URL, config.NAVIDROME_USER, config.NAVIDROME_PASSWORD
|
||||||
)
|
)
|
||||||
return _NullProvider(), None
|
provider = NavidromeProvider(client, config.NAVIDROME_PLAYLIST, db)
|
||||||
|
providers.append((provider, config.SOURCE_WEIGHTS.get("navidrome", 0)))
|
||||||
|
fetchers["subsonic"] = SubsonicFetcher(client, config.CACHE_DIR)
|
||||||
|
log.info(
|
||||||
|
"Navidrome source enabled (playlist=%r, weight=%d)",
|
||||||
|
config.NAVIDROME_PLAYLIST,
|
||||||
|
config.SOURCE_WEIGHTS.get("navidrome", 0),
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
log.warning("Navidrome not configured (RADIEO_NAVIDROME_*): source off.")
|
||||||
|
|
||||||
from .fetchers.subsonic import SubsonicFetcher
|
if config.SOURCE_WEIGHTS.get("ytdlp", 0) > 0 and config.YTDLP_URLS_FILE.exists():
|
||||||
from .providers.navidrome import NavidromeProvider
|
from .fetchers.ytdlp import YtdlpFetcher
|
||||||
from .subsonic import SubsonicClient
|
from .providers.ytdlp import YtdlpProvider
|
||||||
|
|
||||||
client = SubsonicClient(
|
provider = YtdlpProvider(config.YTDLP_URLS_FILE, db)
|
||||||
config.NAVIDROME_URL, config.NAVIDROME_USER, config.NAVIDROME_PASSWORD
|
providers.append((provider, config.SOURCE_WEIGHTS["ytdlp"]))
|
||||||
)
|
fetchers["ytdlp"] = YtdlpFetcher(config.CACHE_DIR)
|
||||||
provider = NavidromeProvider(client, config.NAVIDROME_PLAYLIST, db)
|
log.info(
|
||||||
fetcher = SubsonicFetcher(client, config.CACHE_DIR)
|
"yt-dlp source enabled (urls=%s, weight=%d)",
|
||||||
log.info("Navidrome source enabled (playlist=%r)", config.NAVIDROME_PLAYLIST)
|
config.YTDLP_URLS_FILE,
|
||||||
return provider, fetcher
|
config.SOURCE_WEIGHTS["ytdlp"],
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
log.info(
|
||||||
|
"yt-dlp source off (weight=%d, urls file present=%s).",
|
||||||
|
config.SOURCE_WEIGHTS.get("ytdlp", 0),
|
||||||
|
config.YTDLP_URLS_FILE.exists(),
|
||||||
|
)
|
||||||
|
|
||||||
|
if not providers:
|
||||||
|
log.warning("no source active: the stream plays its local cache only.")
|
||||||
|
return Scheduler(providers), fetchers
|
||||||
|
|
||||||
|
|
||||||
|
def _sweep_temp_files() -> None:
|
||||||
|
"""Delete orphaned download temp files left by a killed container.
|
||||||
|
|
||||||
|
Fetchers write to hidden temp files and rename them into place atomically;
|
||||||
|
the only hidden files we ever create are those temps. If a download is
|
||||||
|
interrupted (SIGKILL on ``compose down``), the temp survives and the stream
|
||||||
|
fallback keeps trying to decode it. Clearing them on startup avoids that
|
||||||
|
noise and reclaims disk.
|
||||||
|
"""
|
||||||
|
removed = 0
|
||||||
|
for path in config.CACHE_DIR.glob(".*"):
|
||||||
|
if path.name == ".gitkeep" or not path.is_file():
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
path.unlink()
|
||||||
|
removed += 1
|
||||||
|
except OSError:
|
||||||
|
log.warning("could not remove stale temp %s", path.name)
|
||||||
|
if removed:
|
||||||
|
log.info("swept %d stale temp file(s) from cache", removed)
|
||||||
|
|
||||||
|
|
||||||
def main() -> None:
|
def main() -> None:
|
||||||
|
|
@ -49,10 +91,11 @@ def main() -> None:
|
||||||
|
|
||||||
config.CACHE_DIR.mkdir(parents=True, exist_ok=True)
|
config.CACHE_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
config.STATE_DIR.mkdir(parents=True, exist_ok=True)
|
config.STATE_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
_sweep_temp_files()
|
||||||
|
|
||||||
db = Database(config.STATE_DB)
|
db = Database(config.STATE_DB)
|
||||||
provider, fetcher = _build_pipeline(db)
|
scheduler, fetchers = _build_pipeline(db)
|
||||||
queue = TrackQueue(provider, fetcher, db)
|
queue = TrackQueue(scheduler, fetchers, db)
|
||||||
queue.start()
|
queue.start()
|
||||||
|
|
||||||
server = IngestServer((config.HTTP_HOST, config.HTTP_PORT), queue)
|
server = IngestServer((config.HTTP_HOST, config.HTTP_PORT), queue)
|
||||||
|
|
|
||||||
|
|
@ -41,3 +41,19 @@ PLAYLIST_REFRESH = float(os.environ.get("RADIEO_PLAYLIST_REFRESH", "300"))
|
||||||
NAVIDROME_ENABLED = bool(
|
NAVIDROME_ENABLED = bool(
|
||||||
NAVIDROME_URL and NAVIDROME_USER and NAVIDROME_PASSWORD and NAVIDROME_PLAYLIST
|
NAVIDROME_URL and NAVIDROME_USER and NAVIDROME_PASSWORD and NAVIDROME_PLAYLIST
|
||||||
)
|
)
|
||||||
|
|
||||||
|
# --- yt-dlp source ---
|
||||||
|
# Plain text file (one URL per line, '#' comments) mounted into the container.
|
||||||
|
# May contain direct track URLs or container URLs (playlist/album/label/artist),
|
||||||
|
# from which one track is picked at random. Left absent disables the source.
|
||||||
|
YTDLP_URLS_FILE = Path(os.environ.get("RADIEO_YTDLP_URLS_FILE", "/config/urls.txt"))
|
||||||
|
# How often to reload the URL list and re-expand container URLs, in seconds.
|
||||||
|
YTDLP_REFRESH = float(os.environ.get("RADIEO_YTDLP_REFRESH", "300"))
|
||||||
|
|
||||||
|
# --- Source mix (weights) ---
|
||||||
|
# Relative odds of drawing from each source. Isolated here on purpose so the
|
||||||
|
# mix can later move to a config file. A weight of 0 disables a source.
|
||||||
|
SOURCE_WEIGHTS = {
|
||||||
|
"navidrome": int(os.environ.get("RADIEO_WEIGHT_NAVIDROME", "3")),
|
||||||
|
"ytdlp": int(os.environ.get("RADIEO_WEIGHT_YTDLP", "1")),
|
||||||
|
}
|
||||||
|
|
|
||||||
71
ingest/radieo/fetchers/ytdlp.py
Normal file
71
ingest/radieo/fetchers/ytdlp.py
Normal file
|
|
@ -0,0 +1,71 @@
|
||||||
|
"""YtdlpFetcher: download a ``ytdlp`` track into the cache directory.
|
||||||
|
|
||||||
|
Mirrors SubsonicFetcher's atomic pattern: yt-dlp writes to a hidden temporary
|
||||||
|
name and the finished file is renamed into place, so the ready file appears
|
||||||
|
atomically on the shared cache mount. Temps use a distinctive hidden prefix so
|
||||||
|
the reuse glob ignores them and the daemon's startup sweep can reclaim any left
|
||||||
|
behind by an interrupted download.
|
||||||
|
|
||||||
|
We grab ``bestaudio`` and keep the original container (webm/opus/m4a/mp3…);
|
||||||
|
Liquidsoap decodes it via ffmpeg, so no re-encoding is needed.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import hashlib
|
||||||
|
import logging
|
||||||
|
from pathlib import Path
|
||||||
|
from uuid import uuid4
|
||||||
|
|
||||||
|
from ..models import Track
|
||||||
|
|
||||||
|
log = logging.getLogger("radieo.fetcher.ytdlp")
|
||||||
|
|
||||||
|
|
||||||
|
class YtdlpFetcher:
|
||||||
|
backend = "ytdlp"
|
||||||
|
|
||||||
|
def __init__(self, cache_dir: Path):
|
||||||
|
self._cache_dir = cache_dir
|
||||||
|
|
||||||
|
def fetch(self, track: Track) -> Path:
|
||||||
|
h = hashlib.sha1(track.locator.encode()).hexdigest()[:16]
|
||||||
|
stem = f"ytdlp-{h}"
|
||||||
|
# Reuse a still-cached copy rather than downloading again.
|
||||||
|
for existing in self._cache_dir.glob(f"{stem}.*"):
|
||||||
|
if not existing.name.endswith(".part"):
|
||||||
|
return existing
|
||||||
|
|
||||||
|
tmp_base = self._cache_dir / f".{stem}.{uuid4().hex}"
|
||||||
|
opts = {
|
||||||
|
"format": "bestaudio/best",
|
||||||
|
"outtmpl": str(tmp_base) + ".%(ext)s",
|
||||||
|
"quiet": True,
|
||||||
|
"no_warnings": True,
|
||||||
|
"noprogress": True,
|
||||||
|
"noplaylist": True,
|
||||||
|
"socket_timeout": 30,
|
||||||
|
"retries": 2,
|
||||||
|
}
|
||||||
|
try:
|
||||||
|
from yt_dlp import YoutubeDL
|
||||||
|
|
||||||
|
with YoutubeDL(opts) as ydl:
|
||||||
|
info = ydl.extract_info(track.locator, download=True)
|
||||||
|
downloaded = self._downloaded_path(info, tmp_base)
|
||||||
|
dest = self._cache_dir / f"{stem}{downloaded.suffix}"
|
||||||
|
downloaded.replace(dest)
|
||||||
|
except BaseException:
|
||||||
|
for leftover in self._cache_dir.glob(f"{tmp_base.name}*"):
|
||||||
|
leftover.unlink(missing_ok=True)
|
||||||
|
raise
|
||||||
|
log.info("downloaded %s -> %s", track, dest.name)
|
||||||
|
return dest
|
||||||
|
|
||||||
|
def _downloaded_path(self, info: dict, tmp_base: Path) -> Path:
|
||||||
|
reqs = info.get("requested_downloads")
|
||||||
|
if reqs and reqs[0].get("filepath"):
|
||||||
|
return Path(reqs[0]["filepath"])
|
||||||
|
# Fallback: find the produced file (ignore yt-dlp's own .part leftovers).
|
||||||
|
for candidate in self._cache_dir.glob(f"{tmp_base.name}.*"):
|
||||||
|
if not candidate.name.endswith(".part"):
|
||||||
|
return candidate
|
||||||
|
raise FileNotFoundError(f"yt-dlp produced no file for {tmp_base.name}")
|
||||||
|
|
@ -18,6 +18,7 @@ class Track:
|
||||||
origin: str # provider that produced it, e.g. "navidrome"
|
origin: str # provider that produced it, e.g. "navidrome"
|
||||||
mbid: str | None = None # filled by the Canonicalizer (milestone 5)
|
mbid: str | None = None # filled by the Canonicalizer (milestone 5)
|
||||||
source_ext: str | None = None # filename hint, e.g. "mp3", "flac"
|
source_ext: str | None = None # filename hint, e.g. "mp3", "flac"
|
||||||
|
source_url: str | None = None # container URL a track was picked from
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def key(self) -> str:
|
def key(self) -> str:
|
||||||
|
|
|
||||||
154
ingest/radieo/providers/ytdlp.py
Normal file
154
ingest/radieo/providers/ytdlp.py
Normal file
|
|
@ -0,0 +1,154 @@
|
||||||
|
"""YtdlpProvider: picks tracks from a hand-maintained list of URLs.
|
||||||
|
|
||||||
|
The list is a plain text file (one URL per line, ``#`` comments allowed),
|
||||||
|
mounted into the container so it can be edited without a rebuild. Each line may
|
||||||
|
be either a direct track URL or a *container* URL (playlist, album, label,
|
||||||
|
artist page); container URLs are expanded with a flat yt-dlp extraction and one
|
||||||
|
entry is picked at random, honouring the anti-repeat window.
|
||||||
|
|
||||||
|
The provider only *resolves* references — it emits ``ytdlp`` tracks whose
|
||||||
|
locator is the chosen media URL. The actual download happens in the matching
|
||||||
|
fetcher (milestone 4). Flat extraction is cached per source URL to avoid
|
||||||
|
re-hitting the network on every pick.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import random
|
||||||
|
import time
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from .. import config
|
||||||
|
from ..db import Database
|
||||||
|
from ..models import Track
|
||||||
|
|
||||||
|
log = logging.getLogger("radieo.provider.ytdlp")
|
||||||
|
|
||||||
|
|
||||||
|
class YtdlpProvider:
|
||||||
|
name = "ytdlp"
|
||||||
|
|
||||||
|
def __init__(self, urls_file: Path, db: Database):
|
||||||
|
self._urls_file = urls_file
|
||||||
|
self._db = db
|
||||||
|
self._urls: list[str] = []
|
||||||
|
self._loaded_at = 0.0
|
||||||
|
# source URL -> (entries, loaded_at); entries are normalized dicts.
|
||||||
|
self._expanded: dict[str, tuple[list[dict], float]] = {}
|
||||||
|
|
||||||
|
# --- source list ------------------------------------------------------
|
||||||
|
|
||||||
|
def _load_urls(self) -> None:
|
||||||
|
now = time.time()
|
||||||
|
if self._urls and now - self._loaded_at < config.YTDLP_REFRESH:
|
||||||
|
return
|
||||||
|
try:
|
||||||
|
lines = self._urls_file.read_text(encoding="utf-8").splitlines()
|
||||||
|
except OSError:
|
||||||
|
self._urls = []
|
||||||
|
self._loaded_at = now
|
||||||
|
return
|
||||||
|
urls = []
|
||||||
|
for line in lines:
|
||||||
|
line = line.strip()
|
||||||
|
if line and not line.startswith("#"):
|
||||||
|
urls.append(line)
|
||||||
|
if urls != self._urls:
|
||||||
|
log.info("loaded %d yt-dlp source URL(s)", len(urls))
|
||||||
|
self._urls = urls
|
||||||
|
self._loaded_at = now
|
||||||
|
|
||||||
|
# --- resolution -------------------------------------------------------
|
||||||
|
|
||||||
|
def next(self) -> Track | None:
|
||||||
|
self._load_urls()
|
||||||
|
if not self._urls:
|
||||||
|
return None
|
||||||
|
recent = self._db.recent_keys(config.ANTIREPEAT_WINDOW)
|
||||||
|
# Try source lines in random order until one yields a usable track.
|
||||||
|
candidates = list(self._urls)
|
||||||
|
random.shuffle(candidates)
|
||||||
|
for src in candidates:
|
||||||
|
track = self._resolve(src, recent)
|
||||||
|
if track is not None:
|
||||||
|
return track
|
||||||
|
return None
|
||||||
|
|
||||||
|
def _resolve(self, src_url: str, recent: set[str]) -> Track | None:
|
||||||
|
try:
|
||||||
|
entries = self._entries_for(src_url)
|
||||||
|
except Exception as exc: # yt-dlp raises many extractor-specific errors
|
||||||
|
log.warning("yt-dlp could not resolve %s: %s", src_url, exc)
|
||||||
|
return None
|
||||||
|
if not entries:
|
||||||
|
return None
|
||||||
|
pool = [e for e in entries if f"ytdlp:{e['url']}" not in recent] or entries
|
||||||
|
entry = random.choice(pool)
|
||||||
|
return Track(
|
||||||
|
backend="ytdlp",
|
||||||
|
locator=entry["url"],
|
||||||
|
artist=entry.get("artist") or "Unknown artist",
|
||||||
|
title=entry.get("title") or entry["url"],
|
||||||
|
origin=self.name,
|
||||||
|
source_url=src_url if src_url != entry["url"] else None,
|
||||||
|
)
|
||||||
|
|
||||||
|
def _entries_for(self, src_url: str) -> list[dict]:
|
||||||
|
now = time.time()
|
||||||
|
cached = self._expanded.get(src_url)
|
||||||
|
if cached and now - cached[1] < config.YTDLP_REFRESH:
|
||||||
|
return cached[0]
|
||||||
|
entries = self._extract(src_url)
|
||||||
|
self._expanded[src_url] = (entries, now)
|
||||||
|
return entries
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _extract(src_url: str) -> list[dict]:
|
||||||
|
from yt_dlp import YoutubeDL
|
||||||
|
|
||||||
|
opts = {
|
||||||
|
"quiet": True,
|
||||||
|
"no_warnings": True,
|
||||||
|
"extract_flat": "in_playlist",
|
||||||
|
"skip_download": True,
|
||||||
|
"socket_timeout": 30,
|
||||||
|
}
|
||||||
|
with YoutubeDL(opts) as ydl:
|
||||||
|
info = ydl.extract_info(src_url, download=False)
|
||||||
|
if info is None:
|
||||||
|
return []
|
||||||
|
|
||||||
|
raw_entries = info.get("entries")
|
||||||
|
if raw_entries is not None: # container URL -> list of tracks
|
||||||
|
out = []
|
||||||
|
for e in raw_entries:
|
||||||
|
if not e:
|
||||||
|
continue
|
||||||
|
url = e.get("url") or e.get("webpage_url") or e.get("id")
|
||||||
|
if not url:
|
||||||
|
continue
|
||||||
|
out.append(
|
||||||
|
{
|
||||||
|
"url": url,
|
||||||
|
"title": e.get("title"),
|
||||||
|
"artist": (
|
||||||
|
e.get("artist")
|
||||||
|
or e.get("uploader")
|
||||||
|
or e.get("channel")
|
||||||
|
or info.get("uploader")
|
||||||
|
),
|
||||||
|
}
|
||||||
|
)
|
||||||
|
return out
|
||||||
|
|
||||||
|
# direct track URL
|
||||||
|
return [
|
||||||
|
{
|
||||||
|
"url": info.get("webpage_url") or src_url,
|
||||||
|
"title": info.get("title"),
|
||||||
|
"artist": (
|
||||||
|
info.get("artist")
|
||||||
|
or info.get("uploader")
|
||||||
|
or info.get("channel")
|
||||||
|
),
|
||||||
|
}
|
||||||
|
]
|
||||||
|
|
@ -1,13 +1,13 @@
|
||||||
"""Prefetching pipeline feeding ready-to-play tracks to the stream layer.
|
"""Prefetching pipeline feeding ready-to-play tracks to the stream layer.
|
||||||
|
|
||||||
A background thread keeps a small buffer of downloaded tracks: it asks the
|
A background thread keeps a small buffer of downloaded tracks: it asks the
|
||||||
provider what to play next, has the matching fetcher download it into the
|
scheduler what to play next, hands the track to the fetcher registered for its
|
||||||
cache, and enqueues the resulting file. ``pop_next`` hands the oldest ready
|
backend, and enqueues the resulting file. ``pop_next`` hands the oldest ready
|
||||||
track to the HTTP API, records the play and runs LRU retention.
|
track to the HTTP API, records the play and runs LRU retention.
|
||||||
|
|
||||||
If the provider has nothing (e.g. Navidrome not configured, or unreachable),
|
If no source has anything (e.g. nothing configured, or all unreachable), the
|
||||||
the buffer simply stays empty and ``pop_next`` returns ``None`` — the stream
|
buffer simply stays empty and ``pop_next`` returns ``None`` — the stream then
|
||||||
then plays its own local-cache fallback.
|
plays its own local-cache fallback.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import logging
|
import logging
|
||||||
|
|
@ -23,9 +23,9 @@ log = logging.getLogger("radieo.queue")
|
||||||
|
|
||||||
|
|
||||||
class TrackQueue:
|
class TrackQueue:
|
||||||
def __init__(self, provider, fetcher, db: Database):
|
def __init__(self, scheduler, fetchers, db: Database):
|
||||||
self._provider = provider
|
self._scheduler = scheduler
|
||||||
self._fetcher = fetcher
|
self._fetchers = fetchers # dict: backend name -> fetcher
|
||||||
self._db = db
|
self._db = db
|
||||||
self._lock = threading.Lock()
|
self._lock = threading.Lock()
|
||||||
self._ready: deque[tuple[Path, Track]] = deque()
|
self._ready: deque[tuple[Path, Track]] = deque()
|
||||||
|
|
@ -56,11 +56,15 @@ class TrackQueue:
|
||||||
for _ in range(max(0, missing)):
|
for _ in range(max(0, missing)):
|
||||||
if self._stop.is_set():
|
if self._stop.is_set():
|
||||||
return
|
return
|
||||||
track = self._provider.next()
|
track = self._scheduler.next()
|
||||||
if track is None:
|
if track is None:
|
||||||
return # nothing to fetch right now
|
return # nothing to fetch right now
|
||||||
|
fetcher = self._fetchers.get(track.backend)
|
||||||
|
if fetcher is None:
|
||||||
|
log.error("no fetcher for backend %r (%s)", track.backend, track)
|
||||||
|
continue
|
||||||
try:
|
try:
|
||||||
path = self._fetcher.fetch(track)
|
path = fetcher.fetch(track)
|
||||||
except Exception:
|
except Exception:
|
||||||
log.exception("fetch failed for %s", track)
|
log.exception("fetch failed for %s", track)
|
||||||
continue
|
continue
|
||||||
|
|
|
||||||
39
ingest/radieo/scheduler.py
Normal file
39
ingest/radieo/scheduler.py
Normal file
|
|
@ -0,0 +1,39 @@
|
||||||
|
"""Weighted scheduler mixing several providers.
|
||||||
|
|
||||||
|
Picks a provider at random, weighted by ``SOURCE_WEIGHTS``, and asks it for a
|
||||||
|
track. If the chosen provider has nothing right now (empty list, unreachable
|
||||||
|
source…), the remaining providers are tried in weighted-random order, so a
|
||||||
|
temporarily-idle source never stalls playback. Returns ``None`` only when every
|
||||||
|
provider is exhausted.
|
||||||
|
|
||||||
|
The weights live in one place (``config.SOURCE_WEIGHTS``) so the mix can later
|
||||||
|
move to a config file without touching this logic.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import random
|
||||||
|
|
||||||
|
log = logging.getLogger("radieo.scheduler")
|
||||||
|
|
||||||
|
|
||||||
|
class Scheduler:
|
||||||
|
def __init__(self, entries):
|
||||||
|
# entries: list[(provider, weight)]; drop non-positive weights.
|
||||||
|
self._entries = [(p, w) for p, w in entries if w > 0]
|
||||||
|
|
||||||
|
def next(self):
|
||||||
|
pool = list(self._entries)
|
||||||
|
while pool:
|
||||||
|
weights = [w for _, w in pool]
|
||||||
|
idx = random.choices(range(len(pool)), weights=weights, k=1)[0]
|
||||||
|
provider, _ = pool.pop(idx)
|
||||||
|
try:
|
||||||
|
track = provider.next()
|
||||||
|
except Exception: # a provider must never crash the loop
|
||||||
|
log.exception(
|
||||||
|
"provider %s failed", getattr(provider, "name", "?")
|
||||||
|
)
|
||||||
|
continue
|
||||||
|
if track is not None:
|
||||||
|
return track
|
||||||
|
return None
|
||||||
|
|
@ -1 +1,2 @@
|
||||||
httpx>=0.27
|
httpx>=0.27
|
||||||
|
yt-dlp>=2024.1
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue