How it works
Kapkan is a single, statically-linked Go binary with modular internals. There is no sidecar, no database and no separate web service to run: flow ingestion, the detection engine, BGP mitigation, the REST API, the dashboard and the notifiers are all packages compiled into one process. You point it at a YAML config, point your routers' exporters at it, and it runs.
Everything you observe — attacks, top talkers, learned baselines, bans — lives in process memory and is served straight from there. Nothing external is required to run; optional integrations (a ClickHouse server for history, an SMTP relay for email) are the only dependencies, and Kapkan runs entirely in-process without them.
The pipeline
The binary is organized as a single, one-directional pipeline. Each stage hands normalized data to the next, and the engine fans its results out to the three consumer stages:
- Ingest — UDP listeners decode sFlow v5, NetFlow v5/v9 and IPFIX datagrams (via the
goflow2 library, in library mode) into a single
normalized
Flowrepresentation. Sampling rate is read from the packet when the exporter reports it, otherwise fromsampling.default_rate. - Engine (hot path) — every flow is folded into sharded per-host counters over a sliding window, sampling-corrected to real traffic units, and evaluated against the active thresholds (global, per-protocol, per-hostgroup, and learned baselines). This is the performance-critical stage; see Performance.
- Mitigate / notify / api — when a threshold trips, the engine emits an attack event. The mitigate stage announces or withdraws RTBH routes through the embedded BGP speaker, the notify stage fans the event out to your channels, and the API stage exposes live state to the dashboard and to callers.
The data flow, in one line:
ingest → engine (hot path) → [mitigate, notify, api]
iOne direction only
Flows move forward through the pipeline; the consumer stages never feed back into the hot path. The engine is the only stage that touches per-flow state, which is what keeps the hot path allocation-free and lock-light.
Components
Internally, Kapkan follows the standard Go project layout. Each package owns one stage of the pipeline:
| Package | Responsibility |
|---|---|
cmd/kapkan/ | main, flag parsing, signal handling |
internal/app/ | wiring of all components; end-to-end test |
internal/config/ | YAML load, validation, SIGHUP hot-reload |
internal/ingest/ | goflow2 library-mode ingestion into a normalized Flow |
internal/engine/ | sharded per-host counters, sliding window, threshold eval |
internal/mitigate/ | embedded GoBGP: RTBH announce/withdraw, TTL, caps, dry-run |
internal/notify/ | Telegram, Slack, email, webhook and exec-hook notifications |
internal/api/ | REST API + Prometheus metrics |
pkg/flowgen/ | synthetic NetFlow/sFlow generator for tests and load |
The same layout as a tree:
cmd/kapkan/ main, flag parsing, signal handling
internal/app/ wiring of all components; end-to-end test
internal/config/ YAML load, validation, SIGHUP hot-reload
internal/ingest/ goflow2 library-mode ingestion -> normalized Flow
internal/engine/ sharded per-host counters, sliding window, threshold eval
internal/mitigate/ embedded GoBGP: RTBH announce/withdraw, TTL, caps, dry-run
internal/notify/ Telegram + webhook notifications
internal/api/ REST API + Prometheus metrics
pkg/flowgen/ synthetic NetFlow/sFlow generator for tests and load
The key third-party libraries are goflow2 for flow decoding and GoBGP for the BGP speaker — both used in library mode, so there is no external collector or routing daemon to deploy alongside Kapkan. HTTP and structured logging use the Go standard library.
Data flow
ingest → engine (hot path) → [mitigate, notify, api]
Each hop, in order:
- Router exporter → ingest. Your routers send sampled flow records over UDP to the
configured listen ports (
listen.sflow,listen.netflow). The ingest package decodes the wire format and normalizes each record — addresses, ports, protocol, byte/packet counts, sampling rate — into one internalFlowtype, regardless of which protocol produced it. - Ingest → engine. Normalized flows enter the hot path. The engine attributes each flow
to a destination host inside
networks, multiplies its counts by the sampling rate so rates are expressed in real (unsampled) traffic, and updates that host's sliding-window counters. Destinations outsidenetworksare counted in metrics but never trigger action. - Engine → mitigate. When a host crosses a threshold, the engine raises an attack event.
The mitigate stage decides whether to announce an RTBH route — honoring dry-run, the
whitelist, the hostgroup
banpolicy and the ban cap — and schedules its TTL-based withdrawal. - Engine → notify. The same event is delivered to every configured channel (Telegram, Slack, email, webhook, exec hook) with the attack's classification and flow sample attached.
- Engine → api. Live state — active and recent attacks, tracked hosts, bans — is read out of engine memory by the REST API and the embedded dashboard.
Because recent flows are buffered continuously (samples.*), the moment a threshold trips
the attack's dominant sources, ports and protocols are already attached to the event — there
is no post-detection capture delay.
Performance
The hot path is the per-flow processing in internal/engine, and it is built to stay fast
under attack-scale flow rates:
- Sharded per-host counters. Host state is split across shards keyed by a hash of the IP (256 shards), so concurrent flows rarely contend on the same lock.
- Sliding window. Each host keeps a windowed view of its recent traffic, so thresholds are evaluated against a rolling rate rather than instantaneous spikes.
- Allocation-free hot path. Buffers are pre-allocated and counters are atomic; the per-flow path avoids heap allocations so the garbage collector stays out of the way.
Two figures describe the target, at different scopes — present them as what they are:
- The README states the engine sustains ≥20M flows/sec/core on the hot path — the per-core throughput of the in-memory folding step in isolation.
- The engineering target in
CLAUDE.mdis ≥200k flows/sec on 8 cores for end-to-end per-flow processing — the bar a hot-path change must clear in themake benchbenchmarks before it is considered done.
iBenchmark before you trust it
The engine ships with go test -bench benchmarks (make bench). The project rule is to run
them before claiming any hot-path change is complete, so the figures above stay honest across
releases.
Configuration & reload
Kapkan is configured by a single YAML file, passed with -config. There is no layered or
environment-merged config to reason about — what is in the file (plus the few secrets read
from named environment variables) is the running state. Keeping it in git gives you a
diffable, reviewable record of every threshold and peer.
Configuration is hot-reloadable without restarting the daemon. Trigger a reload two ways:
- Send
SIGHUPto the process (for examplesudo systemctl reload kapkan). - Call
POST /api/v1/config/reloadagainst the API.
Both re-read and re-validate the file in place. Detection thresholds, hostgroups, baselines,
notification settings and the dry_run switch all take effect on reload. A handful of
structural settings — notably the traffic-buffer sizing under samples.* — require a full
restart; see the Configuration reference for which keys reload and
which do not.
!Reloads are validated
A reload re-validates the whole file. If the new config is invalid, the reload is rejected and the previous good config keeps running — a bad edit will not take the daemon down, but check the logs to confirm your change actually applied.
Related
- Introduction — what Kapkan is and who it is for.
- Quickstart — build the binary and see your first detection.
- Detection & thresholds — how the engine decides what is an attack.
- Mitigation — how RTBH announcements are made and withdrawn.
- Configuration reference — every key in the YAML file.
- Metrics — the
kapkan_Prometheus series.