checker: report transient apex-lookup failures as Unknown, not Crit
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/tag Build is passing

apexLookupRule mapped every findApex failure to Crit, including transport
and resolver faults like "lookup nemunai.re on 127.0.0.11:53: server
misbehaving" — a flaky recursive resolver, not a broken delegation. That
made the check flap into Crit whenever the resolver hiccuped, the same
class of false negative the chain path already fixed.

Mark apex-lookup failures that stem from a transport/resolver fault
(resolveZoneNSAddrs net errors, recursiveExchange transport errors, and
SERVFAIL/REFUSED seen during the SOA walk) as transient via a typed
error, surface it as ApexLookupTransient, and have apexLookupRule report
Unknown for those. Definitive failures (NXDOMAIN-only walk, no resolvable
NS) still drive Crit.
This commit is contained in:
nemunaire 2026-06-18 10:05:51 +09:00
commit da6def100c
7 changed files with 123 additions and 23 deletions

View file

@ -47,16 +47,23 @@ type ChainTermination struct {
Subject string `json:"subject,omitempty"`
Detail string `json:"detail,omitempty"`
Rcode string `json:"rcode,omitempty"` // only with TermRcode
// Transient is meaningful with TermQueryErr: true when the query could not be
// completed because of a transport/resolver fault (could not observe), false
// when it stems from definitive evidence such as a target with no locatable apex.
Transient bool `json:"transient,omitempty"`
}
// AliasData carries raw facts only; judgement is delegated to the rules.
type AliasData struct {
Owner string `json:"owner"`
// Apex is empty iff the apex lookup failed; ApexLookupError explains why.
Apex string `json:"apex,omitempty"`
ApexLookupError string `json:"apex_lookup_error,omitempty"`
AuthServers []string `json:"auth_servers,omitempty"`
// Apex is empty iff the apex lookup failed; ApexLookupError explains why and
// ApexLookupTransient is true when the failure was a transport/resolver fault
// (could not observe) rather than definitive evidence the apex is missing.
Apex string `json:"apex,omitempty"`
ApexLookupError string `json:"apex_lookup_error,omitempty"`
ApexLookupTransient bool `json:"apex_lookup_transient,omitempty"`
AuthServers []string `json:"auth_servers,omitempty"`
Chain []ChainHop `json:"chain,omitempty"`
ChainTerminated ChainTermination `json:"chain_terminated"`