139 lines
7.1 KiB
Markdown
139 lines
7.1 KiB
Markdown
# checker-smtp
|
|
|
|
Deep SMTP checker for the MX-based inbound mail service of a
|
|
[happyDomain](https://www.happydomain.org/) domain.
|
|
|
|
For every MX target of the zone, it performs the live probes a human
|
|
operator would run with `swaks` or `telnet … 25`: TCP connect, ESMTP
|
|
banner & EHLO, STARTTLS negotiation, mail-transaction (null sender,
|
|
postmaster, open-relay) probes, reverse DNS / FCrDNS, extension
|
|
inventory, and IPv4/IPv6 coverage. The result is an actionable HTML
|
|
report whose "What to fix" panel foregrounds the most common real-world
|
|
failures rather than burying them in endpoint tabs.
|
|
|
|
TLS certificate chain / SAN / expiry / cipher posture is **out of scope**:
|
|
a dedicated TLS checker handles that. This checker only confirms STARTTLS
|
|
completes and records the negotiated TLS version/cipher for context.
|
|
|
|
We publish each MX target as a `DiscoveryEntry` of type
|
|
`tls.endpoint.v1` (contract: `git.happydns.org/checker-tls/contract`)
|
|
with `STARTTLS="smtp"` and `RequireSTARTTLS=false` (opportunistic for
|
|
port 25; make it required by publishing MTA-STS or DANE in dedicated
|
|
checkers). `checker-tls` picks up those entries and runs certificate
|
|
posture on the same connection our probe just validated; the resulting
|
|
`tls_probes` observations are folded back into our rule aggregation and
|
|
HTML report via `ObservationGetter.GetRelated` / `ReportContext.Related`,
|
|
so a bad certificate on an MX shows up on the SMTP service page, not
|
|
only in a separate TLS view.
|
|
|
|
## What it checks
|
|
|
|
### DNS posture
|
|
|
|
1. MX records published? (RFC 7505 null-MX is recognised and reported as INFO)
|
|
2. MX target is a hostname, **not** an IP literal (RFC 5321 § 5.1).
|
|
3. MX target is **not** a CNAME (RFC 5321 § 5.1).
|
|
4. MX target resolves (A and/or AAAA).
|
|
5. Implicit-MX fallback warned about.
|
|
|
|
### Per-endpoint (port 25, for each A/AAAA of each MX)
|
|
|
|
6. TCP reachability.
|
|
7. SMTP 220 banner, captured verbatim; announced hostname parsed.
|
|
8. ESMTP EHLO (fallback to HELO detected and flagged).
|
|
9. Extension inventory: STARTTLS, PIPELINING, 8BITMIME, SMTPUTF8,
|
|
CHUNKING, DSN, ENHANCEDSTATUSCODES, SIZE, AUTH.
|
|
10. `AUTH` advertised *before* STARTTLS (credentials-over-plaintext risk).
|
|
11. STARTTLS negotiation and TLS version/cipher recorded (no cert checks; handed off to `checker-tls`).
|
|
12. Post-TLS EHLO: extensions may expand after the upgrade; we union them.
|
|
13. Reverse DNS (PTR) present for each IP.
|
|
14. Forward-confirmed reverse DNS (FCrDNS): PTR's forward resolution must include our IP (Gmail / Outlook / Yahoo reject without this).
|
|
15. Null sender acceptance (`MAIL FROM:<>`; RFC 5321 mandates this for bounces).
|
|
16. Postmaster mailbox acceptance (`RCPT TO:<postmaster@domain>`; RFC 5321 § 4.5.1).
|
|
17. **Open-relay probe** (`MAIL FROM:<checker@…>` then `RCPT TO:<postmaster@example.com>`; a 2xx indicates an open relay). The probe stops at RCPT; `DATA` is never sent.
|
|
18. IPv4 / IPv6 coverage.
|
|
|
|
The rule aggregates all of the above into a single `CheckState`
|
|
(OK / WARN / CRIT / INFO), with the worst severity winning. The HTML
|
|
report renders a domain-level "What to fix" panel (sorted
|
|
crit → warn → info) plus one collapsible section per probed endpoint,
|
|
open by default when something is wrong.
|
|
|
|
## Most common failures and how the report addresses them
|
|
|
|
| Symptom | Issue code | Report message |
|
|
|-------------------------------------------|-----------------------------|----------------|
|
|
| MX target is a CNAME | `smtp.mx.cname` | CRIT, fix suggests replacing CNAME with A/AAAA |
|
|
| No STARTTLS on any endpoint | `smtp.all_no_starttls` | CRIT, fix mentions Postfix/Exim settings and MTA-STS/DANE next steps |
|
|
| `AUTH` advertised over plaintext port 25 | `smtp.auth.plaintext` | CRIT, fix suggests `smtpd_tls_auth_only=yes` / moving auth to 587 |
|
|
| `postmaster@` rejected | `smtp.postmaster.rejected` | CRIT, cites RFC 5321 § 4.5.1 |
|
|
| Bounces (`MAIL FROM:<>`) rejected | `smtp.null_sender.rejected` | CRIT |
|
|
| Missing PTR or FCrDNS mismatch | `smtp.ptr.missing`, `smtp.fcrdns.mismatch` | WARN, names Gmail/Outlook/Yahoo impact |
|
|
| Open relay | `smtp.open_relay` | CRIT (the endpoint panel also shows a red "OPEN RELAY" badge in the summary) |
|
|
|
|
## Usage
|
|
|
|
### Standalone HTTP server
|
|
|
|
```bash
|
|
make
|
|
./checker-smtp -listen :8080
|
|
```
|
|
|
|
### Docker
|
|
|
|
```bash
|
|
make docker
|
|
docker run -p 8080:8080 happydomain/checker-smtp
|
|
```
|
|
|
|
### happyDomain plugin
|
|
|
|
```bash
|
|
make plugin
|
|
```
|
|
|
|
## Options
|
|
|
|
| Scope | Id | Default | Description |
|
|
|-------|-----------------------|--------------------------------|-------------|
|
|
| Run | `domain` | (none) | Domain to test (auto-filled from the service). |
|
|
| Run | `timeout` | `12` | Per-endpoint timeout, in seconds. |
|
|
| Run | `helo_name` | `mx-checker.happydomain.org` | Hostname announced in EHLO/HELO. Pick a name with valid A/AAAA and PTR. |
|
|
| Run | `test_null_sender` | `true` | Probe `MAIL FROM:<>` (RFC 5321 DSN acceptance). |
|
|
| Run | `test_postmaster` | `true` | Probe `RCPT TO:<postmaster@domain>` (RFC 5321 § 4.5.1). |
|
|
| Run | `test_open_relay` | `true` | Probe `RCPT TO:<recipient-outside-domain>` to detect open relays. |
|
|
| Run | `test_probe_address` | `postmaster@example.com` | Recipient used for the open-relay probe. Automatically overridden when equal to the tested domain. |
|
|
|
|
Applies to services of type `svcs.MXs` (the DNS-level MX record set).
|
|
|
|
## Design notes
|
|
|
|
- **Why not `net/smtp`?** The standard library's client hides the banner
|
|
text, muxes multiline responses into a single string, and does not
|
|
expose the pre- vs post-TLS extension set separately. A bespoke
|
|
~200-line SMTP client (see `checker/smtp.go`) gives us verbatim
|
|
responses for every step, which is what operators want to see in a
|
|
diagnostic report.
|
|
- **Why stop at RCPT?** The open-relay, null-sender and postmaster
|
|
probes all end at RCPT and emit RSET before the next transaction. We
|
|
never send `DATA`, so no mail is actually delivered and no bounces are
|
|
generated. A receiving server that accepts a spoofed RCPT but would
|
|
have rejected the message at DATA is still reported as open relay (a
|
|
sensible choice for a posture check).
|
|
- **Certificate posture via `checker-tls`.** MX SMTP on port 25 is
|
|
opportunistic, so we do not verify the certificate ourselves. Each
|
|
probed MX target is published as a `tls.endpoint.v1` discovery entry
|
|
with `STARTTLS="smtp"`. `checker-tls`'s resulting observations are
|
|
folded back into the rule aggregation and the HTML report via the
|
|
SDK's `GetRelated` / `ReportContext.Related` path (same pattern as
|
|
`checker-xmpp`).
|
|
- **No DANE / MTA-STS checks here.** These are policy surfaces, not
|
|
connection-time behaviours, and deserve their own checkers
|
|
(`checker-dane` on TLSA records, `checker-mta-sts` on the TXT/HTTPS
|
|
policy artefact). This checker answers the question "does the MX
|
|
actually work?"; policy enforcement layers on top.
|
|
|
|
## License
|
|
|
|
MIT (see `LICENSE`). Third-party attributions in `NOTICE`.
|