queryAtAuth already failed over on transport errors but treated any DNS
response as final, so a SERVFAIL from the first auth server terminated the
chain as Crit even when a sibling server would answer NOERROR. This made
the check flap against a flaky server. Treat SERVFAIL/REFUSED as transient
and try the remaining servers, returning a definitive answer when any
server gives one and only falling back to the transient response (or the
last transport error) when every server fails.
A transport-level query failure (connection refused, timeout, network
unreachable) means the alias state could not be observed, not that the
alias is misconfigured. Mapping it to Warn made the check flap whenever a
flaky auth server alternated between refusing connections (Warn) and
answering SERVFAIL (Crit). Report TermQueryErr as Unknown so only
definitive DNS evidence drives Warn/Crit.
A recursive resolver following a CNAME returns the target zone's SOA in
the answer, which made findApex wrongly treat a CNAME owner as an apex.
Only accept a SOA whose owner is the candidate itself.
Extract querySiblings from observeCoexistence so both CNAME and DNAME
coexistence checks share the same parallel RRset scan. Add
observeDNAMECoexistence (called from Collect) that populates
AliasData.DNAMECoexistence for each DNAME node in DNAMESubstitutions.
Add the dname_coexistence rule (RFC 6672 §2.3) that flags any sibling
RRsets at a DNAME owner as CRIT, with matching tests.