checker-smtp/README.md

# checker-smtp

Deep SMTP checker for the MX-based inbound mail service of a
[happyDomain](https://www.happydomain.org/) domain.

For every MX target of the zone, it performs the live probes a human
operator would run with `swaks` or `telnet … 25`: TCP connect, ESMTP
banner & EHLO, STARTTLS negotiation, mail-transaction (null sender,
postmaster, open-relay) probes, reverse DNS / FCrDNS, extension
inventory, and IPv4/IPv6 coverage. The result is an actionable HTML
report whose "What to fix" panel foregrounds the most common real-world
failures rather than burying them in endpoint tabs.

## Scope

This checker probes the **inbound** side of the domain's mail service:
it connects to each MX target and exercises the SMTP server's
protocol-level posture (banner, EHLO, STARTTLS handshake, mail
transactions stopped at RCPT, reverse DNS, IPv4/IPv6 coverage…).

It does **not** test outbound deliverability: SPF/DKIM/DMARC alignment,
ARC, BIMI, spam scoring (SpamAssassin/rspamd), blacklist status, header
hygiene or message content are not evaluated here. Those require
actually emitting a message from the domain and analysing what arrives;
that is the job of `checker-happydeliver`, which drives a
[happyDeliver](https://git.nemunai.re/happyDomain/happyDeliver) instance.

In short: **`checker-smtp` answers "can this domain *receive* mail
correctly?"**, while **`checker-happydeliver` answers "does mail this
domain *sends* land in the inbox?"**.

TLS certificate chain / SAN / expiry / cipher posture is also **out of scope**:
a dedicated TLS checker handles that. This checker only confirms STARTTLS
completes and records the negotiated TLS version/cipher for context.

We publish each MX target as a `DiscoveryEntry` of type
`tls.endpoint.v1` (contract: `git.happydns.org/checker-tls/contract`)
with `STARTTLS="smtp"` and `RequireSTARTTLS=false` (opportunistic for
port 25; make it required by publishing MTA-STS or DANE in dedicated
checkers). `checker-tls` picks up those entries and runs certificate
posture on the same connection our probe just validated; the resulting
`tls_probes` observations are folded back into our rule aggregation and
HTML report via `ObservationGetter.GetRelated` / `ReportContext.Related`,
so a bad certificate on an MX shows up on the SMTP service page, not
only in a separate TLS view.

## Rules

| Code                       | Description                                                                                       | Severity   |
|----------------------------|---------------------------------------------------------------------------------------------------|------------|
| `smtp.null_mx`             | Reports whether the domain publishes a null MX (RFC 7505), declaring it does not accept mail.     | INFO       |
| `smtp.mx_present`          | Verifies the domain publishes at least one MX record (or a null MX).                              | CRITICAL   |
| `smtp.mx_sanity`           | Flags MX targets that violate RFC 5321 § 5.1 (IP literals, CNAME chains, unresolved names).       | CRITICAL   |
| `smtp.endpoint_reachable`  | Verifies every MX endpoint accepts a TCP connection on port 25.                                   | CRITICAL   |
| `smtp.banner_sanity`       | Verifies every reachable endpoint emits a 220 SMTP greeting.                                      | CRITICAL   |
| `smtp.ehlo_supported`      | Verifies every endpoint accepts EHLO (required for STARTTLS, PIPELINING, SIZE, …).                | CRITICAL   |
| `smtp.starttls_offered`    | Verifies every endpoint advertises the STARTTLS extension.                                        | CRITICAL   |
| `smtp.starttls_handshake`  | Verifies the STARTTLS handshake succeeds wherever STARTTLS is advertised.                         | CRITICAL   |
| `smtp.auth_posture`        | Flags endpoints that advertise SMTP AUTH before STARTTLS (cleartext credentials).                 | CRITICAL   |
| `smtp.reverse_dns`         | Verifies every endpoint has a matching PTR record (FCrDNS).                                       | WARNING    |
| `smtp.null_sender`         | Verifies endpoints accept the null sender MAIL FROM:<> (required for DSNs).                       | CRITICAL   |
| `smtp.postmaster`          | Verifies endpoints accept RCPT TO:<postmaster@domain> (RFC 5321 § 4.5.1).                         | CRITICAL   |
| `smtp.open_relay`          | Flags endpoints that relay mail for recipients outside the tested domain.                         | CRITICAL   |
| `smtp.extension_posture`   | Reports ESMTP extension posture (PIPELINING, 8BITMIME).                                           | INFO       |
| `smtp.ipv6_reachable`      | Verifies at least one MX endpoint is reachable over IPv6.                                         | INFO       |
| `smtp.tls_quality`         | Folds downstream TLS checker findings (certificate chain, hostname match, expiry) onto SMTP.      | CRITICAL   |

## Most common failures and how the report addresses them

| Symptom                                   | Issue code                  | Report message |
|-------------------------------------------|-----------------------------|----------------|
| MX target is a CNAME                      | `smtp.mx.cname`             | CRIT, fix suggests replacing CNAME with A/AAAA |
| No STARTTLS on any endpoint               | `smtp.all_no_starttls`      | CRIT, fix mentions Postfix/Exim settings and MTA-STS/DANE next steps |
| `AUTH` advertised over plaintext port 25  | `smtp.auth.plaintext`       | CRIT, fix suggests `smtpd_tls_auth_only=yes` / moving auth to 587 |
| `postmaster@` rejected                    | `smtp.postmaster.rejected`  | CRIT, cites RFC 5321 § 4.5.1 |
| Bounces (`MAIL FROM:<>`) rejected         | `smtp.null_sender.rejected` | CRIT |
| Missing PTR or FCrDNS mismatch            | `smtp.ptr.missing`, `smtp.fcrdns.mismatch` | WARN, names Gmail/Outlook/Yahoo impact |
| Open relay                                | `smtp.open_relay`           | CRIT (the endpoint panel also shows a red "OPEN RELAY" badge in the summary) |

## Usage

### Standalone HTTP server

```bash
make
./checker-smtp -listen :8080
```

The standalone binary also exposes a browser-friendly `GET /check` page
(via the SDK's `CheckerInteractive` interface): enter a domain, submit,
and the same `Collect` → `Evaluate` → HTML-report pipeline runs without
needing a happyDomain instance in front. MX records are looked up live;
no zone payload is required.

### Docker

```bash
make docker
docker run -p 8080:8080 happydomain/checker-smtp
```

### happyDomain plugin

```bash
make plugin
```

## Options

| Scope | Id                    | Default                        | Description |
|-------|-----------------------|--------------------------------|-------------|
| Run   | `domain`              | (none)                         | Domain to test (auto-filled from the service). |
| Run   | `timeout`             | `12`                           | Per-endpoint timeout, in seconds. |
| Run   | `helo_name`           | `mx-checker.happydomain.org`   | Hostname announced in EHLO/HELO. Pick a name with valid A/AAAA and PTR. |
| Run   | `test_null_sender`    | `true`                         | Probe `MAIL FROM:<>` (RFC 5321 DSN acceptance). |
| Run   | `test_postmaster`     | `true`                         | Probe `RCPT TO:<postmaster@domain>` (RFC 5321 § 4.5.1). |
| Run   | `test_open_relay`     | `true`                         | Probe `RCPT TO:<recipient-outside-domain>` to detect open relays. |
| Run   | `test_probe_address`  | `postmaster@example.com`       | Recipient used for the open-relay probe. Automatically overridden when equal to the tested domain. |

Applies to services of type `svcs.MXs` (the DNS-level MX record set).

## Safety / hosted deployment

The checker connects out to arbitrary SMTP servers on port 25 with the
host's IP, and concatenates user-supplied values (`domain`, `helo_name`,
`test_probe_address`) into SMTP commands. Two consequences worth
considering before exposing the standalone server (or its `GET /check`
form) to untrusted users:

- **CRLF / SMTP-command injection** is mitigated: `domain` and
  `helo_name` are validated as hostnames, and `test_probe_address` is
  validated as an addr-spec. Inputs containing CR, LF, `<`, `>` or other
  SMTP metacharacters are rejected before any command is written to the
  wire.
- **Probe-from-our-IP abuse vector** remains: anyone who can reach the
  service can have it open SMTP connections to any host:25, optionally
  with an attacker-chosen RCPT (the open-relay probe). This is
  functionally similar to an SSRF: outbound traffic appears to come
  from the checker's address and may trigger blocklisting or abuse
  reports against the operator. When deploying publicly, gate access
  behind authentication, add per-IP rate limiting, and consider
  restricting target domains (e.g. only domains owned by the requester)
  before exposing the form. The happyDomain plugin path is unaffected:
  targets there are always the MXs of the zone the user already
  controls.

## Design notes

- **Why not `net/smtp`?** The standard library's client hides the banner
  text, muxes multiline responses into a single string, and does not
  expose the pre- vs post-TLS extension set separately. A bespoke
  ~200-line SMTP client (see `checker/smtp.go`) gives us verbatim
  responses for every step, which is what operators want to see in a
  diagnostic report.
- **Why stop at RCPT?** The open-relay, null-sender and postmaster
  probes all end at RCPT and emit RSET before the next transaction. We
  never send `DATA`, so no mail is actually delivered and no bounces are
  generated. A receiving server that accepts a spoofed RCPT but would
  have rejected the message at DATA is still reported as open relay (a
  sensible choice for a posture check).
- **Certificate posture via `checker-tls`.** MX SMTP on port 25 is
  opportunistic, so we do not verify the certificate ourselves. Each
  probed MX target is published as a `tls.endpoint.v1` discovery entry
  with `STARTTLS="smtp"`. `checker-tls`'s resulting observations are
  folded back into the rule aggregation and the HTML report via the
  SDK's `GetRelated` / `ReportContext.Related` path (same pattern as
  `checker-xmpp`).
- **No DANE / MTA-STS checks here.** These are policy surfaces, not
  connection-time behaviours, and deserve their own checkers
  (`checker-dane` on TLSA records, `checker-mta-sts` on the TXT/HTTPS
  policy artefact). This checker answers the question "does the MX
  actually work?"; policy enforcement layers on top.

## License

MIT (see `LICENSE`). Third-party attributions in `NOTICE`.