checker-dummy/README.md

468 lines
16 KiB
Markdown

# checker-dummy - How to Build a happyDomain Checker
This repository is a **fully working, educational example** of a happyDomain checker. It is intentionally simple: instead of performing real monitoring, it returns a random score and a user-configurable message. This lets you focus on learning the structure without dealing with external dependencies.
Use this as a template when you create your own checker.
---
## Table of Contents
1. [What is a Checker?](#what-is-a-checker)
2. [Architecture Overview](#architecture-overview)
3. [Repository Structure](#repository-structure)
4. [Step-by-Step Walkthrough](#step-by-step-walkthrough)
- [Step 1: Define Your Data Types](#step-1-define-your-data-types)
- [Step 2: Create the Provider](#step-2-create-the-provider)
- [Step 3: Implement Data Collection](#step-3-implement-data-collection)
- [Step 4: Describe Your Checker (Definition)](#step-4-describe-your-checker-definition)
- [Step 5: Write Evaluation Rules](#step-5-write-evaluation-rules)
- [Step 6: Wire It Up (main.go)](#step-6-wire-it-up-maingo)
- [Step 7: Create the Plugin Entrypoint](#step-7-create-the-plugin-entrypoint)
5. [Running the Checker](#running-the-checker)
6. [Testing with curl](#testing-with-curl)
7. [Deploying to happyDomain](#deploying-to-happydomain)
8. [Going Further](#going-further)
---
## What is a Checker?
A **checker** is a small, self-contained program that monitors one aspect of a domain's DNS infrastructure. happyDomain runs checkers periodically and displays their results in its dashboard.
Every checker does three things:
1. **Collect** — Gather raw observation data (e.g., ping a server, query an API, measure DNS response time).
2. **Evaluate** — Compare the collected data against user-defined thresholds to produce a status: OK, Warning, or Critical.
3. **Report** *(optional)* — Extract time-series metrics or generate HTML reports for the dashboard.
## Architecture Overview
A checker can run in two modes:
### Standalone HTTP Server (External Checker)
The checker runs as its own process and exposes an HTTP API. happyDomain communicates with it over the network. This is the most flexible option — you can write your checker in any language, deploy it independently, and scale it separately.
```
┌─────────────┐ HTTP ┌─────────────────┐
│ happyDomain │ ──────────► │ checker-dummy │
│ server │ ◄────────── │ (this program) │
└─────────────┘ └─────────────────┘
```
### In-Process Plugin
The checker is compiled as a Go plugin (`.so` file) and loaded directly into the happyDomain process. This is simpler to deploy (single binary) but requires the checker to be written in Go.
```
┌──────────────────────────────────────┐
│ happyDomain server │
│ │
│ ┌──────────────────────────────┐ │
│ │ checker-dummy.so (plugin) │ │
│ └──────────────────────────────┘ │
└──────────────────────────────────────┘
```
Both modes use the same checker code — only the entry point differs.
## Repository Structure
```
checker-dummy/
├── main.go # Entry point for standalone HTTP server mode
├── checker/
│ ├── types.go # Data structures (what the checker observes)
│ ├── provider.go # The provider: glues everything together
│ ├── collect.go # Collection logic (the actual monitoring)
│ ├── definition.go # Checker metadata (options, rules, intervals)
│ └── rule.go # Evaluation rules (OK / Warning / Critical)
├── plugin/
│ └── plugin.go # Entry point for in-process plugin mode
├── go.mod # Go module definition
├── Makefile # Build targets
├── Dockerfile # Container image
└── .gitignore
```
Each file has a single, clear responsibility. This is the recommended layout for all happyDomain checkers.
---
## Step-by-Step Walkthrough
### Step 1: Define Your Data Types
**File: `checker/types.go`**
Start by defining the data structure that your checker will produce during collection. This struct is serialised to JSON by the SDK, stored by happyDomain, and later deserialised during evaluation.
```go
const ObservationKeyDummy = "dummy"
type DummyData struct {
Message string `json:"message"`
Score float64 `json:"score"`
CollectedAt time.Time `json:"collected_at"`
}
```
Key points:
- **`ObservationKeyDummy`** is a unique string that identifies observations produced by this checker. Every checker needs at least one key.
- **Design for evaluation**: include everything your rules will need to decide OK/Warning/Critical. The evaluation step only sees this struct — it cannot re-collect data.
### Step 2: Create the Provider
**File: `checker/provider.go`**
The **provider** is the central object of your checker. It must implement the `ObservationProvider` interface:
```go
type ObservationProvider interface {
Key() ObservationKey
Collect(ctx context.Context, opts CheckerOptions) (any, error)
}
```
You can also implement optional interfaces to unlock additional features:
| Interface | What it enables |
|-------------------------------|------------------------------------------|
| `CheckerDefinitionProvider` | `/definition` and `/evaluate` endpoints |
| `CheckerMetricsReporter` | `/report` endpoint (JSON metrics) |
| `CheckerHTMLReporter` | `/report` endpoint (HTML) |
In this example, we implement all three optional interfaces:
```go
type dummyProvider struct{}
func (p *dummyProvider) Key() ObservationKey { return ObservationKeyDummy }
func (p *dummyProvider) Definition() *CheckerDefinition { return Definition() }
func (p *dummyProvider) ExtractMetrics(raw json.RawMessage, collectedAt time.Time) ([]CheckMetric, error) { ... }
```
The `Key()` method must return the same string as your `ObservationKeyDummy` constant.
### Step 3: Implement Data Collection
**File: `checker/collect.go`**
This is where the real work happens. The `Collect` method is called every time happyDomain runs your check.
```go
func (p *dummyProvider) Collect(ctx context.Context, opts CheckerOptions) (any, error) {
// Read options using SDK helpers
message := "Hello from the dummy checker!"
if v, ok := sdk.GetOption[string](opts, "message"); ok && v != "" {
message = v
}
// Do your monitoring work here!
// In a real checker, you would: ping a server, query an API,
// measure DNS response time, check TLS certificates, etc.
score := rand.Float64() * 100
return &DummyData{
Message: message,
Score: score,
CollectedAt: time.Now(),
}, nil
}
```
Key points:
- **Always honour `ctx`** — happyDomain may cancel long-running checks.
- **Use SDK option helpers** (`sdk.GetOption`, `sdk.GetFloatOption`, `sdk.GetIntOption`, `sdk.GetBoolOption`) to read options. They handle type coercion between in-process (native Go types) and HTTP mode (JSON-decoded types).
- **Return your data struct** — the SDK serialises it to JSON automatically.
- **Return an error** only if collection failed entirely. Partial results are fine.
### Step 4: Describe Your Checker (Definition)
**File: `checker/definition.go`**
The `CheckerDefinition` tells happyDomain everything about your checker:
```go
func Definition() *CheckerDefinition {
return &CheckerDefinition{
ID: "dummy", // Unique, stable identifier (never change after release)
Name: "Dummy (example)", // Human-readable label for the UI
Availability: CheckerAvailability{
ApplyToDomain: true, // Show in the "Domain checks" section
},
ObservationKeys: []ObservationKey{ObservationKeyDummy},
Options: CheckerOptionsDocumentation{
UserOpts: []CheckerOptionDocumentation{
{Id: "message", Type: "string", Label: "Custom message", Default: "Hello!"},
{Id: "warningThreshold", Type: "number", Label: "Warning threshold", Default: float64(50)},
...
},
},
Rules: []CheckRule{Rule()},
Interval: &CheckIntervalSpec{
Min: 1 * time.Minute, Max: 1 * time.Hour, Default: 5 * time.Minute,
},
HasMetrics: true,
}
}
```
**Availability** — Choose where your checker appears:
| Field | When to use |
|------------------|-----------------------------------------------------|
| `ApplyToDomain` | The check applies to the entire domain |
| `ApplyToZone` | The check applies to a specific DNS zone |
| `ApplyToService` | The check applies to a specific service (e.g., A/AAAA records). Use `LimitToServices` to restrict which service types. |
**Options** — Grouped by audience:
| Group | Who sets it | Example |
|---------------|----------------------|--------------------------------------|
| `AdminOpts` | happyDomain admin | API endpoint URL |
| `UserOpts` | End-user in the UI | Thresholds, count, custom messages |
| `DomainOpts` | Auto-filled per domain | `domain_name` (via `AutoFill`) |
| `ServiceOpts` | Auto-filled per service | The service payload (via `AutoFill`) |
| `RunOpts` | Set at collect-time | Runtime overrides |
**Option types** for the UI widget: `"string"`, `"number"`, `"uint"`, `"bool"`. You can also provide `Choices` for dropdown menus.
### Step 5: Write Evaluation Rules
**File: `checker/rule.go`**
A rule implements the `CheckRule` interface:
```go
type CheckRule interface {
Name() string
Description() string
Evaluate(ctx context.Context, obs ObservationGetter, opts CheckerOptions) CheckState
}
```
Optionally, your rule can also implement `ValidateOptions(opts) error` for early validation.
The `Evaluate` method receives an `ObservationGetter` to retrieve the collected data:
```go
func (r *dummyRule) Evaluate(ctx context.Context, obs ObservationGetter, opts CheckerOptions) CheckState {
var data DummyData
if err := obs.Get(ctx, ObservationKeyDummy, &data); err != nil {
return CheckState{Status: StatusError, Message: "..."}
}
warningThreshold := sdk.GetFloatOption(opts, "warningThreshold", 50)
criticalThreshold := sdk.GetFloatOption(opts, "criticalThreshold", 20)
switch {
case data.Score < criticalThreshold:
return CheckState{Status: StatusCrit, ...}
case data.Score < warningThreshold:
return CheckState{Status: StatusWarn, ...}
default:
return CheckState{Status: StatusOK, ...}
}
}
```
**Status values**: `StatusOK`, `StatusWarn`, `StatusCrit`, `StatusError`, `StatusUnknown`.
You can define **multiple rules** per checker. Each rule evaluates the same collected data from a different angle. Users can enable/disable rules individually in the UI.
### Step 6: Wire It Up (main.go)
**File: `main.go`**
The standalone entry point is minimal — the SDK does all the heavy lifting:
```go
func main() {
flag.Parse()
server := sdk.NewServer(dummy.Provider())
server.ListenAndServe(*listenAddr)
}
```
`sdk.NewServer` inspects your provider and automatically registers HTTP endpoints based on which interfaces it implements:
| Endpoint | Always | Requires |
|--------------------|--------|------------------------------|
| `GET /health` | Yes | — |
| `POST /collect` | Yes | — |
| `GET /definition` | — | `CheckerDefinitionProvider` |
| `POST /evaluate` | — | `CheckerDefinitionProvider` |
| `POST /report` | — | `CheckerMetricsReporter` or `CheckerHTMLReporter` |
### Step 7: Create the Plugin Entrypoint
**File: `plugin/plugin.go`**
For in-process plugin mode, register your provider and definition in an `init()` function:
```go
package plugin
import (
dummy "git.happydns.org/happyDomain/checker-dummy/checker"
sdk "git.happydns.org/happyDomain/sdk/checker"
)
func init() {
sdk.RegisterObservationProvider(dummy.Provider())
sdk.RegisterChecker(dummy.Definition())
}
```
Then, in your happyDomain build, add a blank import:
```go
import _ "git.happydns.org/happyDomain/checker-dummy/plugin"
```
---
## Running the Checker
### Build and run locally
```bash
make build
./checker-dummy -listen :8080
```
### Docker
```bash
make docker
docker run -p 8080:8080 happydomain/checker-dummy
```
---
## Testing with curl
### Health check
```bash
curl http://localhost:8080/health
# {"status":"ok"}
```
### Get the checker definition
```bash
curl http://localhost:8080/definition
```
### Collect an observation
```bash
curl -X POST http://localhost:8080/collect \
-H "Content-Type: application/json" \
-d '{
"key": "dummy",
"options": {
"message": "Testing my checker!"
}
}'
```
Response:
```json
{
"data": {
"message": "Testing my checker!",
"score": 73.2,
"collected_at": "2026-01-15T10:30:00Z"
}
}
```
### Evaluate observations
```bash
curl -X POST http://localhost:8080/evaluate \
-H "Content-Type: application/json" \
-d '{
"observations": {
"dummy": "{\"message\":\"test\",\"score\":42.5,\"collected_at\":\"2026-01-15T10:30:00Z\"}"
},
"options": {
"warningThreshold": 50,
"criticalThreshold": 20
}
}'
```
Response (score 42.5 is below the warning threshold of 50):
```json
{
"states": [
{
"status": 3,
"message": "Score: 42.5 — test",
"code": "dummy_score_check"
}
]
}
```
Status codes: `1` = OK, `3` = Warning, `4` = Critical.
### Extract metrics
```bash
curl -X POST http://localhost:8080/report \
-H "Content-Type: application/json" \
-d '{
"data": "{\"message\":\"test\",\"score\":73.2,\"collected_at\":\"2026-01-15T10:30:00Z\"}"
}'
```
---
## Deploying to happyDomain
### As an external checker (recommended)
1. Deploy your checker as a standalone service (Docker, systemd, etc.).
2. In happyDomain, set the checker's `endpoint` admin option to its URL (e.g., `http://checker-dummy:8080`).
3. happyDomain will call `/collect`, `/evaluate`, and `/report` automatically.
### As an in-process plugin
1. Add the blank import to your happyDomain build:
```go
import _ "git.happydns.org/happyDomain/checker-dummy/plugin"
```
2. Rebuild happyDomain. The checker registers itself at startup.
---
## Going Further
Now that you understand the structure, here are ideas for your own checker:
- **HTTP checker**: send an HTTP request to a domain's web server and check the status code, response time, or TLS certificate expiry.
- **DNS checker**: query specific DNS record types and verify the response matches expectations.
- **SMTP checker**: connect to a mail server and verify it responds correctly to EHLO.
- **Whois checker**: check domain expiry date and alert before it lapses.
For a real-world example, look at [checker-ping](https://git.happydns.org/happyDomain/checker-ping), which implements ICMP ping monitoring with multiple targets, packet loss detection, and RTT metrics.
### Tips
- Keep `Collect` focused on data gathering. Put all threshold logic in `Evaluate`.
- Design your data struct to hold everything rules need — evaluation cannot re-collect.
- Use `sdk.GetFloatOption` / `sdk.GetIntOption` / `sdk.GetBoolOption` instead of raw type assertions. They handle the JSON/native type mismatch transparently.
- Always honour the `context.Context` — set timeouts and check for cancellation.
- Return partial results from `Collect` when possible (only return an error if the entire collection failed).