ingest: bias yt-dlp picks towards a discography's newest releases
Some checks are pending
continuous-integration/drone/push Build is running
Some checks are pending
continuous-integration/drone/push Build is running
Bandcamp label/artist pages list releases newest-first, so give the first YTDLP_RECENT_COUNT entries a YTDLP_RECENT_BOOST multiplier when picking, leaning the radio towards fresh music. Flat extraction carries no dates, so list position is used as the recency proxy rather than a real date lookup. The boost is gated on the source URL being a bandcamp discography listing: single /album/ and /track/ pages (whether a URL-file line or one the pick recursed into) list tracks in track order, not by recency, and non-bandcamp sources have unverified ordering, so all of those keep a uniform pick. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
49783218d8
commit
112a4b0c89
3 changed files with 63 additions and 1 deletions
|
|
@ -91,6 +91,14 @@ Add one URL per line: a single track, or a playlist/album/label/artist page to
|
||||||
pick from. The file is mounted read-only, so you can edit it without rebuilding.
|
pick from. The file is mounted read-only, so you can edit it without rebuilding.
|
||||||
A missing file just disables the yt-dlp source.
|
A missing file just disables the yt-dlp source.
|
||||||
|
|
||||||
|
To favour fresh music, set `RADIEO_YTDLP_RECENT_BOOST` (in `.env`) above `1.0`:
|
||||||
|
when picking from a Bandcamp label/artist/discography page — which lists
|
||||||
|
releases newest-first — the newest `RADIEO_YTDLP_RECENT_COUNT` releases (default
|
||||||
|
`5`) get that multiplier on their odds (e.g. `2.0` makes them twice as likely).
|
||||||
|
The default `1.0` disables it. The boost only applies to such discography
|
||||||
|
listings; single `/album/` and `/track/` pages and non-Bandcamp sources keep a
|
||||||
|
uniform pick.
|
||||||
|
|
||||||
### 4. ListenBrainz suggestions
|
### 4. ListenBrainz suggestions
|
||||||
|
|
||||||
Point `RADIEO_LISTENBRAINZ_URL` (in `.env`) at your recommendations syndication
|
Point `RADIEO_LISTENBRAINZ_URL` (in `.env`) at your recommendations syndication
|
||||||
|
|
|
||||||
|
|
@ -105,6 +105,15 @@ SUBSONIC_ENABLED = bool(
|
||||||
YTDLP_URLS_FILE = Path(os.environ.get("RADIEO_YTDLP_URLS_FILE", "/config/urls.txt"))
|
YTDLP_URLS_FILE = Path(os.environ.get("RADIEO_YTDLP_URLS_FILE", "/config/urls.txt"))
|
||||||
# How often to reload the URL list and re-expand container URLs, in seconds.
|
# How often to reload the URL list and re-expand container URLs, in seconds.
|
||||||
YTDLP_REFRESH = float(os.environ.get("RADIEO_YTDLP_REFRESH", "300"))
|
YTDLP_REFRESH = float(os.environ.get("RADIEO_YTDLP_REFRESH", "300"))
|
||||||
|
# Recency boost. Bandcamp label/artist/discography pages list releases
|
||||||
|
# newest-first, so the first entries are the freshest. When picking a release
|
||||||
|
# from such a listing, the newest RADIEO_YTDLP_RECENT_COUNT entries get this
|
||||||
|
# multiplier on their selection weight, biasing the radio towards new music.
|
||||||
|
# 1.0 disables the boost (uniform pick); 2.0 makes fresh releases twice as
|
||||||
|
# likely per slot. Applied only at the top level of a source (a discography):
|
||||||
|
# once drilled into an album, entry order is track order, not recency.
|
||||||
|
YTDLP_RECENT_BOOST = float(os.environ.get("RADIEO_YTDLP_RECENT_BOOST", "1.0"))
|
||||||
|
YTDLP_RECENT_COUNT = int(os.environ.get("RADIEO_YTDLP_RECENT_COUNT", "5"))
|
||||||
|
|
||||||
# --- ListenBrainz suggestions source ---
|
# --- ListenBrainz suggestions source ---
|
||||||
# Atom recommendations feed. May be an http(s) URL (the real syndication feed)
|
# Atom recommendations feed. May be an http(s) URL (the real syndication feed)
|
||||||
|
|
|
||||||
|
|
@ -12,6 +12,11 @@ letter, so a numeric part before the first ``:`` can only be a weight. Lines
|
||||||
without a prefix default to weight ``1``; a weight of ``0`` disables the line.
|
without a prefix default to weight ``1``; a weight of ``0`` disables the line.
|
||||||
Weights bias the *order* in which sources are tried each pick.
|
Weights bias the *order* in which sources are tried each pick.
|
||||||
|
|
||||||
|
Within a discography (a label/artist page), the newest releases can be given a
|
||||||
|
selection boost (``YTDLP_RECENT_BOOST``): bandcamp lists releases newest-first,
|
||||||
|
so the first ``YTDLP_RECENT_COUNT`` entries are favoured, keeping the radio
|
||||||
|
leaning towards fresh music.
|
||||||
|
|
||||||
The provider only *resolves* references — it emits ``ytdlp`` tracks whose
|
The provider only *resolves* references — it emits ``ytdlp`` tracks whose
|
||||||
locator is the chosen media URL. The actual download happens in the matching
|
locator is the chosen media URL. The actual download happens in the matching
|
||||||
fetcher (milestone 4). Flat extraction is cached per source URL to avoid
|
fetcher (milestone 4). Flat extraction is cached per source URL to avoid
|
||||||
|
|
@ -21,6 +26,7 @@ re-hitting the network on every pick.
|
||||||
import logging
|
import logging
|
||||||
import random
|
import random
|
||||||
import time
|
import time
|
||||||
|
import urllib.parse
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
from .. import config, tagging
|
from .. import config, tagging
|
||||||
|
|
@ -179,7 +185,7 @@ class YtdlpProvider:
|
||||||
if len(entries) == 1 and entries[0]["url"] == src_url:
|
if len(entries) == 1 and entries[0]["url"] == src_url:
|
||||||
return entries[0] # leaf: a track resolves to itself
|
return entries[0] # leaf: a track resolves to itself
|
||||||
pool = [e for e in entries if e["url"] not in recent] or entries
|
pool = [e for e in entries if e["url"] not in recent] or entries
|
||||||
entry = random.choice(pool)
|
entry = self._choose(src_url, pool, entries)
|
||||||
url = entry["url"]
|
url = entry["url"]
|
||||||
if url == src_url or depth >= self._MAX_EXPAND_DEPTH:
|
if url == src_url or depth >= self._MAX_EXPAND_DEPTH:
|
||||||
return entry
|
return entry
|
||||||
|
|
@ -187,6 +193,45 @@ class YtdlpProvider:
|
||||||
# Fall back to the entry as-is if that turns up nothing usable.
|
# Fall back to the entry as-is if that turns up nothing usable.
|
||||||
return self._pick_entry(url, recent, depth + 1) or entry
|
return self._pick_entry(url, recent, depth + 1) or entry
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _choose(src_url: str, pool: list[dict], entries: list[dict]) -> dict:
|
||||||
|
"""Pick one entry, favouring the newest releases of a discography.
|
||||||
|
|
||||||
|
Bandcamp label/artist/discography pages return releases newest-first, so
|
||||||
|
the first ``YTDLP_RECENT_COUNT`` entries are the freshest; they get
|
||||||
|
``YTDLP_RECENT_BOOST`` times the weight of the rest, biasing selection
|
||||||
|
towards new music without needing a per-release date lookup (flat
|
||||||
|
extraction carries no dates).
|
||||||
|
|
||||||
|
The boost only applies when ``src_url`` is such a discography listing
|
||||||
|
(see ``_boostable``). For a single ``/album/`` or ``/track/`` source —
|
||||||
|
whether it is a line in the URL file or one the pick recursed into — the
|
||||||
|
entries are tracks in track order, where position says nothing about
|
||||||
|
recency, so the pick stays uniform, as it does when the boost is
|
||||||
|
disabled (``== 1.0``).
|
||||||
|
"""
|
||||||
|
boost = config.YTDLP_RECENT_BOOST
|
||||||
|
if boost == 1.0 or not YtdlpProvider._boostable(src_url):
|
||||||
|
return random.choice(pool)
|
||||||
|
newest = {e["url"] for e in entries[: config.YTDLP_RECENT_COUNT]}
|
||||||
|
weights = [boost if e["url"] in newest else 1.0 for e in pool]
|
||||||
|
return random.choices(pool, weights=weights, k=1)[0]
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _boostable(src_url: str) -> bool:
|
||||||
|
"""True when ``src_url`` lists releases newest-first (a discography).
|
||||||
|
|
||||||
|
The recency boost relies on bandcamp ordering a label/artist/music page
|
||||||
|
newest-first. That does not hold for a single ``/album/`` or ``/track/``
|
||||||
|
page (those list tracks in track order) nor for non-bandcamp sources
|
||||||
|
whose ordering we have not verified, so the boost is limited to bandcamp
|
||||||
|
discography URLs.
|
||||||
|
"""
|
||||||
|
if not tagging.is_bandcamp({}, src_url):
|
||||||
|
return False
|
||||||
|
path = urllib.parse.urlsplit(src_url).path
|
||||||
|
return "/album/" not in path and "/track/" not in path
|
||||||
|
|
||||||
def _resolved_tags(self, locator: str):
|
def _resolved_tags(self, locator: str):
|
||||||
"""Corrected identity a previous download wrote into the cached file.
|
"""Corrected identity a previous download wrote into the cached file.
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue