Kapkandocs
GitHub

Baselines

With a baseline block, Kapkan continuously learns each host's normal traffic level and tightens detection to it. It keeps an EWMA (exponentially weighted moving average) of the real, sampling-corrected rate per host, per direction — and a per-group total for calculation: total hostgroups — then derives an effective threshold of learned_normal × factor. A host that normally does 10k pps is flagged at roughly 30k instead of waiting for the global 80k.

This is the "stop tuning thresholds by hand" mode. Where FastNetMon's automated baseline is an offline calculator you run and copy numbers from, Kapkan's is online: it follows your traffic continuously and re-derives the effective threshold as the host's normal level drifts.

iBaselines tighten, never loosen

Your static thresholds always remain in force as a hard ceiling. A baseline can only make detection more sensitive for a host — it can never raise the bar above what you configured.

Example

baseline:
  factor: 3              # attack = traffic above learned_normal × factor
  half_life_seconds: 3600
  warmup_seconds: 600
  floor:
    pps: 5000
    mbps: 50
    flows_per_sec: 2000

The block is optional. Absent, Kapkan uses your static thresholds alone. The effective threshold for each host and metric is:

effective = clamp(learned_normal × factor, low = floor, high = static_threshold)

In words: learned_normal × factor, bounded below by floor and above by the static thresholds (the ceiling).

Keys

KeyDefaultMeaning
factor3Multiplier on the learned normal level. The effective threshold is learned_normal × factor.
half_life_seconds3600EWMA half-life. The time over which an old observation loses half its weight; larger values make the baseline slower and steadier.
warmup_seconds600A freshly observed host is gated by static thresholds only for this long, counted from its first real traffic.
floor.pps / floor.mbps / floor.flows_per_secexample 5000 / 50 / 2000Lower bound on the effective threshold per metric, so quiet hosts never become hair-triggers.

Poisoning-aware design

A naive learned baseline is attackable: feed a host a slow, rising flood and it learns the flood as "normal," raising the bar until real attacks slip under it. Kapkan's baseline is built to resist this. The static thresholds stay as guards, and several rules bound what the baseline can ever do.

  • Ceiling. Traffic above the static thresholds always triggers. A poisoned or fast-grown baseline can never raise the effective threshold above what you configured.
  • Floor. The effective threshold never drops below floor, so a normally quiet host does not become a hair-trigger that fires on any small burst.
  • Frozen under attack. While a host is under attack — including the unban_hysteresis_seconds tail after the rate drops — its baseline is not trained at all. Attack traffic never teaches the host that the attack is normal.
  • Clamped learning. Outside attacks, each training sample is capped at baseline × factor before it updates the EWMA. A slow attacker ramp can therefore raise the baseline by at most 2^(factor−1) per half-life — about per hour at the defaults (factor: 3, half_life_seconds: 3600). That is hours to climb from a normal level to the static ceiling, and never past it. Aggressive settings (a large factor or a short half_life_seconds) shrink that window, so choose them deliberately.
  • Learning only on real traffic. A direction with no traffic in the window never trains its baseline. An incoming-only host keeps its static outgoing threshold, and an empty total group never warms up to a zero baseline.
  • Warm-up. A freshly observed host is protected by static thresholds only for warmup_seconds, counted from its first real traffic. An evicted (long-quiet) host re-warms up when it returns.

!A host already flooded at first sight

Warm-up traffic itself trains the initial baseline. A host that is already under a sub-static flood the first time Kapkan sees it learns that flood as "normal" — bounded by the static ceiling, but there is no clean reference for a host attacked from first sight. Set warmup_seconds to at least a few multiples of half_life_seconds so the baseline converges before it starts gating.

Visibility

Learned levels are exposed per host in the REST API. The GET /api/v1/hosts snapshot carries baseline (incoming) and baseline_out (outgoing) alongside each host's measured per-direction rates and attack state. The dashboard renders the same learned baselines next to the top talkers, so you can see what each host is being held to.

Per-hostgroup

Baselines follow hostgroup scope. A group either inherits the global baseline block or overrides it wholesale — there is no partial merge. To opt a group out of baselines entirely while keeping the global block for everyone else, disable it on the group:

hostgroups:
  - name: dns-pool
    networks: ["203.0.113.128/26"]
    calculation: total
    baseline: { enabled: false }   # this group uses static thresholds only

For calculation: total groups, the baseline learns the group's summed traffic rather than per-host levels, and the same poisoning-aware rules apply to that total.