checker-smtp/README.md

10 KiB

checker-smtp

Deep SMTP checker for the MX-based inbound mail service of a happyDomain domain.

For every MX target of the zone, it performs the live probes a human operator would run with swaks or telnet … 25: TCP connect, ESMTP banner & EHLO, STARTTLS negotiation, mail-transaction (null sender, postmaster, open-relay) probes, reverse DNS / FCrDNS, extension inventory, and IPv4/IPv6 coverage. The result is an actionable HTML report whose "What to fix" panel foregrounds the most common real-world failures rather than burying them in endpoint tabs.

Scope

This checker probes the inbound side of the domain's mail service: it connects to each MX target and exercises the SMTP server's protocol-level posture (banner, EHLO, STARTTLS handshake, mail transactions stopped at RCPT, reverse DNS, IPv4/IPv6 coverage…).

It does not test outbound deliverability: SPF/DKIM/DMARC alignment, ARC, BIMI, spam scoring (SpamAssassin/rspamd), blacklist status, header hygiene or message content are not evaluated here. Those require actually emitting a message from the domain and analysing what arrives; that is the job of checker-happydeliver, which drives a happyDeliver instance.

In short: checker-smtp answers "can this domain receive mail correctly?", while checker-happydeliver answers "does mail this domain sends land in the inbox?".

TLS certificate chain / SAN / expiry / cipher posture is also out of scope: a dedicated TLS checker handles that. This checker only confirms STARTTLS completes and records the negotiated TLS version/cipher for context.

We publish each MX target as a DiscoveryEntry of type tls.endpoint.v1 (contract: git.happydns.org/checker-tls/contract) with STARTTLS="smtp" and RequireSTARTTLS=false (opportunistic for port 25; make it required by publishing MTA-STS or DANE in dedicated checkers). checker-tls picks up those entries and runs certificate posture on the same connection our probe just validated; the resulting tls_probes observations are folded back into our rule aggregation and HTML report via ObservationGetter.GetRelated / ReportContext.Related, so a bad certificate on an MX shows up on the SMTP service page, not only in a separate TLS view.

Rules

Code Description Severity
smtp.null_mx Reports whether the domain publishes a null MX (RFC 7505), declaring it does not accept mail. INFO
smtp.mx_present Verifies the domain publishes at least one MX record (or a null MX). CRITICAL
smtp.mx_sanity Flags MX targets that violate RFC 5321 § 5.1 (IP literals, CNAME chains, unresolved names). CRITICAL
smtp.endpoint_reachable Verifies every MX endpoint accepts a TCP connection on port 25. CRITICAL
smtp.banner_sanity Verifies every reachable endpoint emits a 220 SMTP greeting. CRITICAL
smtp.ehlo_supported Verifies every endpoint accepts EHLO (required for STARTTLS, PIPELINING, SIZE, …). CRITICAL
smtp.starttls_offered Verifies every endpoint advertises the STARTTLS extension. CRITICAL
smtp.starttls_handshake Verifies the STARTTLS handshake succeeds wherever STARTTLS is advertised. CRITICAL
smtp.auth_posture Flags endpoints that advertise SMTP AUTH before STARTTLS (cleartext credentials). CRITICAL
smtp.reverse_dns Verifies every endpoint has a matching PTR record (FCrDNS). WARNING
smtp.null_sender Verifies endpoints accept the null sender MAIL FROM:<> (required for DSNs). CRITICAL
smtp.postmaster Verifies endpoints accept RCPT TO:postmaster@domain (RFC 5321 § 4.5.1). CRITICAL
smtp.open_relay Flags endpoints that relay mail for recipients outside the tested domain. CRITICAL
smtp.extension_posture Reports ESMTP extension posture (PIPELINING, 8BITMIME). INFO
smtp.ipv6_reachable Verifies at least one MX endpoint is reachable over IPv6. INFO
smtp.tls_quality Folds downstream TLS checker findings (certificate chain, hostname match, expiry) onto SMTP. CRITICAL

Most common failures and how the report addresses them

Symptom Issue code Report message
MX target is a CNAME smtp.mx.cname CRIT, fix suggests replacing CNAME with A/AAAA
No STARTTLS on any endpoint smtp.all_no_starttls CRIT, fix mentions Postfix/Exim settings and MTA-STS/DANE next steps
AUTH advertised over plaintext port 25 smtp.auth.plaintext CRIT, fix suggests smtpd_tls_auth_only=yes / moving auth to 587
postmaster@ rejected smtp.postmaster.rejected CRIT, cites RFC 5321 § 4.5.1
Bounces (MAIL FROM:<>) rejected smtp.null_sender.rejected CRIT
Missing PTR or FCrDNS mismatch smtp.ptr.missing, smtp.fcrdns.mismatch WARN, names Gmail/Outlook/Yahoo impact
Open relay smtp.open_relay CRIT (the endpoint panel also shows a red "OPEN RELAY" badge in the summary)

Usage

Standalone HTTP server

make
./checker-smtp -listen :8080

The standalone binary also exposes a browser-friendly GET /check page (via the SDK's CheckerInteractive interface): enter a domain, submit, and the same CollectEvaluate → HTML-report pipeline runs without needing a happyDomain instance in front. MX records are looked up live; no zone payload is required.

Docker

make docker
docker run -p 8080:8080 happydomain/checker-smtp

happyDomain plugin

make plugin

Options

Scope Id Default Description
Run domain (none) Domain to test (auto-filled from the service).
Run timeout 12 Per-endpoint timeout, in seconds.
Run helo_name mx-checker.happydomain.org Hostname announced in EHLO/HELO. Pick a name with valid A/AAAA and PTR.
Run test_null_sender true Probe MAIL FROM:<> (RFC 5321 DSN acceptance).
Run test_postmaster true Probe RCPT TO:<postmaster@domain> (RFC 5321 § 4.5.1).
Run test_open_relay true Probe RCPT TO:<recipient-outside-domain> to detect open relays.
Run test_probe_address postmaster@example.com Recipient used for the open-relay probe. Automatically overridden when equal to the tested domain.

Applies to services of type svcs.MXs (the DNS-level MX record set).

Safety / hosted deployment

The checker connects out to arbitrary SMTP servers on port 25 with the host's IP, and concatenates user-supplied values (domain, helo_name, test_probe_address) into SMTP commands. Two consequences worth considering before exposing the standalone server (or its GET /check form) to untrusted users:

  • CRLF / SMTP-command injection is mitigated: domain and helo_name are validated as hostnames, and test_probe_address is validated as an addr-spec. Inputs containing CR, LF, <, > or other SMTP metacharacters are rejected before any command is written to the wire.
  • Probe-from-our-IP abuse vector remains: anyone who can reach the service can have it open SMTP connections to any host:25, optionally with an attacker-chosen RCPT (the open-relay probe). This is functionally similar to an SSRF: outbound traffic appears to come from the checker's address and may trigger blocklisting or abuse reports against the operator. When deploying publicly, gate access behind authentication, add per-IP rate limiting, and consider restricting target domains (e.g. only domains owned by the requester) before exposing the form. The happyDomain plugin path is unaffected: targets there are always the MXs of the zone the user already controls.

Design notes

  • Why not net/smtp? The standard library's client hides the banner text, muxes multiline responses into a single string, and does not expose the pre- vs post-TLS extension set separately. A bespoke ~200-line SMTP client (see checker/smtp.go) gives us verbatim responses for every step, which is what operators want to see in a diagnostic report.
  • Why stop at RCPT? The open-relay, null-sender and postmaster probes all end at RCPT and emit RSET before the next transaction. We never send DATA, so no mail is actually delivered and no bounces are generated. A receiving server that accepts a spoofed RCPT but would have rejected the message at DATA is still reported as open relay (a sensible choice for a posture check).
  • Certificate posture via checker-tls. MX SMTP on port 25 is opportunistic, so we do not verify the certificate ourselves. Each probed MX target is published as a tls.endpoint.v1 discovery entry with STARTTLS="smtp". checker-tls's resulting observations are folded back into the rule aggregation and the HTML report via the SDK's GetRelated / ReportContext.Related path (same pattern as checker-xmpp).
  • No DANE / MTA-STS checks here. These are policy surfaces, not connection-time behaviours, and deserve their own checkers (checker-dane on TLSA records, checker-mta-sts on the TXT/HTTPS policy artefact). This checker answers the question "does the MX actually work?"; policy enforcement layers on top.

License

MIT (see LICENSE). Third-party attributions in NOTICE.