Subscribes to dangling.external-target.v1 entries via AutoFillDiscoveryEntries
and runs RDAP per registrable domain (deduped, parallelised, capped at 8
concurrent), publishing a per-Ref Facts map consumed by checker-dangling.
Domain- and user-scoped consumers were missing every discovery entry
published below their scope. The exact-match prefix dscent-tgt|{u/d/s}|
introduced in 9c6398b1b only returned entries stored at the literal
target string, so a domain-scoped consumer like checker-tls or
checker-caa never received the tls.endpoint.v1 entries that
service-scoped producers (checker-dane, checker-smtp, checker-sip,
checker-srv, checker-stun-turn) publish under the same domain. The
symptom on the consumer side was "No TLS endpoints have been discovered
for this target yet." even when producers had run.
Drop the trailing "|" from the iteration prefix when the target lacks a
ServiceId (and the DomainId for user scope) so the prefix scan matches
narrower scopes too. RawURLEncoded identifiers contain neither "/" nor
"|", so slash boundaries in the encoded "u/d/s" target form remain
unambiguous. Service-scoped lookups stay exact. Each matching key is
parsed back into its actual stored target before fetching the primary
record, so the returned StoredDiscoveryEntry.Target reflects where the
entry was published, not the (broader) target that found it.
Propagate the persisted CheckEvaluation.States through BuildReportContext
and the HTTP report transport so reporters can render rule-driven
sections (hints, severity) without re-deriving them from raw data. When
no evaluation is available the context carries nil states, matching the
SDK's documented nil-safe fallback to data-only rendering.
Rules now return []CheckState, the engine stamps RuleName from the rule,
and the HTTP rule-result lookup matches on RuleName rather than Code.
domain_contact emits one state per role (Subject) instead of a
concatenated single-state message.
Complete the ReportContext composition path so reporters can fold
downstream observations into their output:
- checker.BuildReportContext wraps a raw payload plus the engine's
RelatedObservationLookup in a lazy ReportContext: Related(key) is
resolved on first access and cached. When no lookup is wired the
context falls back to sdk.StaticReportContext, matching the
pre-existing behaviour.
- GetHTMLReportWithContext / GetMetricsWithContext: new helpers that
accept a pre-built ReportContext, for callers that want to feed
Related into a reporter explicitly.
- The execution controller now builds a ReportContext via the
engine's RelatedLookup method before calling the HTML reporter.
When the engine is wired with discovery storage, the reporter sees
the producer's consumer lineage through ctx.Related(consumerKey).
- HTTPObservationProvider implements CheckerHTMLReporter and
CheckerMetricsReporter: both forward to POST /report with
ExternalReportRequest{Key, Data, Related}. A 501 response is
surfaced as an explicit "does not support /report" error. These
methods are available for callers that want to route reports to
remote checkers; the default in-process reporter dispatch is
unchanged.
Close the discovery loop described in docs/checker-discovery.md: entries
published in commit 3 now feed consumer checkers, and their observations
flow back to the original producer.
Three tightly-coupled changes:
- CheckerOptionsUsecase gains an optional DiscoveryEntryStorage
dependency (WithDiscoveryEntryStore). When a checker declares
AutoFill="discovery_entries" on an option,
BuildMergedCheckerOptionsWithAutoFill populates it with the entries
stored for the target: all producers, no host-side filtering by
Type. The method also returns the concrete list of entries injected
so the engine can persist lineage for them.
- CheckerEngine records a DiscoveryObservationRef per (entry, obs key)
tuple after the snapshot is stored. The ref namespaces back to the
*producer* (ProducerID, Target, Ref) while carrying the consumer's
key and the snapshot pointer, so a later GetRelated from the
producer can reach the consumer's observation in one lookup.
- ObservationContext exposes SetRelatedLookup (called once per run by
the engine) and implements GetRelated on top of the installed
closure. The engine's closure walks the producer's published
entries, resolves each ref's observation refs, loads the snapshots,
and materialises []RelatedObservation. Stale refs (entry gone,
snapshot TTL'd) are skipped silently: implicit GC, as the doc
permits.
Wire the newly-added DiscoveryEntryStorage into the execution pipeline:
- ObservationContext tracks DiscoveryEntry records published by each
provider. After Collect, providers that implement DiscoveryPublisher
are asked for their entries (on the native Go value, no JSON round
trip), and the results are cached by observation key.
- HTTPObservationProvider also implements DiscoveryPublisher: it
records the Entries field of the remote /collect response and
surfaces them through DiscoverEntries. Each override instance is
scoped to a single execution run, so no locking is needed.
- CheckerEngine.runPipeline calls ReplaceDiscoveryEntries after
persisting the snapshot, always replacing the previous set for
(checkerID, target), including when a run produces none, so stale
entries from earlier cycles self-heal.
Introduce the two KV indexes that back the cross-checker discovery
mechanism described in docs/checker-discovery.md:
dscent|{producer}|{target}|{type}|{ref} primary record
dscent-tgt|{target}|{producer}|{type}|{ref} target lookup (auto-fill)
dscobs|{producer}|{target}|{ref}|{consumer}|{k} observation lineage
dscobs-snap|{snapshotId}|... cascade on snapshot delete
ReplaceDiscoveryEntries is the canonical publication path: the whole
set previously stored for (producer, target) is cleared, then the new
set is written. The observation-lineage side uses a single upsert per
(producer, target, ref, consumer, key) tuple, with a snapshot-scoped
reverse index so deleting a snapshot cascades cleanly. Putting a ref
under a new snapshot removes the previous snap-index so a later
cascade on the old snapshot does not wipe the refreshed primary.
Adds StoredDiscoveryEntry and DiscoveryObservationRef to the host-only
model, DiscoveryEntryStorage / DiscoveryObservationStorage to the
checker usecase storage surface, embeds both in storage.Storage, and
regenerates the instrumented wrapper. Unit tests cover round-trip,
atomic replace, multi-producer aggregation, upsert, and cascade
delete.
No pipeline wiring yet.
Update happyDomain to the new checker-sdk-go reporter contract, where
CheckerHTMLReporter.GetHTMLReport and CheckerMetricsReporter.ExtractMetrics
take a ReportContext instead of a raw json.RawMessage. The ReportContext
will later carry cross-checker related observations; for now every call
site wraps the raw payload via sdk.StaticReportContext, so behavior is
unchanged.
Also re-export the new discovery-related SDK types (DiscoveryEntry,
DiscoveryPublisher, RelatedObservation, ReportContext,
AutoFillDiscoveryEntries) as aliases under happydns, and satisfy the
extended ObservationGetter interface on ObservationContext and the
test stub with a no-op GetRelated.
No new behavior: plumbing for the upcoming discovery pipeline.
CheckTarget has no zone identifier, so zone-scoped checkers were
silently dropped by the scheduler and ListCheckerStatuses, leaving
external_whois (the only ApplyToZone checker) never planned nor
listed. Surface them at the domain scope, matching the existing
treatment in checker_options_usecase.
The predicate guarding service-checker auto-scheduling was duplicated
across buildQueue and two sites in NotifyDomainChange. Pull it into a
single helper so the rule lives in one place.
Service-level checkers without LimitToServices no longer get enqueued
for every matching service: they must be activated explicitly via a
CheckPlan. Domain checkers and service checkers that declare a
LimitToServices whitelist keep their previous auto-discovery behavior.
Extend the admin backup to cover checker configurations, plans,
evaluations and executions — previously these were stored but silently
lost on restore. Add RestoreX storage methods so primary records keep
their original Id and secondary indexes are rebuilt (Create* generates
new IDs, Update* requires an existing record to clean stale indexes).
Thread a dropInvalid bool through every TidyUpUseCase method and
expose it as a drop_invalid query parameter on POST /tidy (default
true). When set, Tidy deletes records that fail to decode — e.g.
legacy executions and evaluations whose CheckState.Status was stored
as a string before the SDK switched it to int — instead of leaving
them stuck in the store to log on every iteration.
Also reset KVIterator.err on exhaustion so a prior decode failure
does not surface as a spurious iteration error.
The whoisparser library does not return ErrNotFoundDomain for Verisign
"No match" responses — it parses them into a result with an empty
Domain field. Add a post-parse check to detect this case and return
ErrDomainDoesNotExist.
Replaces the three REST count calls with a single Prometheus scrape that
auto-refreshes every 15s, surfaces queue/worker/in-flight/RSS/version/uptime
as featured cards, and tucks counters and Go runtime stats under a
"Show more metrics" Collapse.
Metrics endpoints now skip incomplete/planned executions by passing a
`doneExecution` filter so only fully-evaluated runs contribute to the
Prometheus output.
The metrics endpoints now negotiate response format via the Accept
header: application/json returns the JSON array, anything else returns
the Prometheus text exposition format.
Add providerName field to DNSControlAdapterNSProvider and wrap GetZoneRecords,
GetZoneCorrections, CreateDomain, and ListZones with timing and call counters
using happydomain_provider_api_calls_total and happydomain_provider_api_duration_seconds.
Expose four live gauges queried at each scrape via a custom Collector:
- happydomain_registered_users_total
- happydomain_domains_total
- happydomain_zones_total
- happydomain_providers_total
- Add HTTP metrics middleware to public router in setupRouter()
- Wrap storage with InstrumentedStorage after initialization
- Set build info metric from main() with actual version string
- Promote prometheus/client_golang to direct dependency
Add ContactInfo struct to DomainInfo and extract contact data (registrant,
admin, tech) from both RDAP and WHOIS responses. Introduce a new
domain_contact checker that compares actual contact fields against
user-specified expected values, with redaction detection for
privacy-protected domains. The WHOIS observation provider now also exposes
contact data so the new rule can reuse the same lookup as domain_expiry.
Registers the external checker-ns-restrictions plugin, which probes each
authoritative nameserver for AXFR/IXFR acceptance, recursion availability,
RFC 8482 ANY handling and authoritative status.
The /api/domaininfo/:domain route is unauthenticated and proxies
outbound RDAP/WHOIS queries, making it an abuse vector. Add a
per-IP rate limiter (10 req/min) using gin-rate-limit to mitigate
enumeration and proxy abuse while keeping the endpoint public.
Introduces a domaininfo package with RDAP and WHOIS getters, exposed
through a new DomainInfoUsecase and /api/domaininfo/:domain route (also
mounted under domain scope). Adds a /whois frontend page and a zone
sidebar modal to display registrar, dates, nameservers and status.
Wire UserQuota.MaxChecksPerDay field into the scheduler via the
UserGater: an in-memory daily counter per user
(reset at UTC midnight): gates scheduled executions, with a two-tier
heuristic that skips short-interval jobs first once the budget is 80%
consumed so rare/important checks are not starved by frequent
pings. Planned executions returned by ListPlannedExecutions are marked
with a new ExecutionRateLimited status when the user is over
budget. Manual API triggers bypass the quota.
Tidy and scheduler now check both the WIP zone ([0]) and the latest
published zone ([1]) so that services being drafted are not cleaned up
or ignored by the scheduler. Auto-fill searches WIP first for the
best user experience when configuring new services.
ZoneHistory is ordered [WIP, newest-published, ..., oldest-published].
The tidy, scheduler, and auto-fill code was using ZoneHistory[len-1]
(the oldest zone) instead of ZoneHistory[1] (the latest published).
This caused the scheduler to enumerate services from the oldest zone
snapshot, tidy to check service existence against outdated data, and
auto-fill to resolve from the wrong zone.