Kapkandocs
GitHub

RTBH mitigation

When the engine reports an attack, Kapkan mitigates it by announcing a remotely-triggered blackhole (RTBH) route for the targeted host: a /32 (IPv4) or /128 (IPv6) prefix carried over an embedded GoBGP speaker, tagged with your RTBH community and pointed at a discard next-hop. Your edge routers match the community and drop all traffic to that host, dropping the flood before it reaches your network's core.

Mitigation is destination-based: the blackhole kills traffic to the target. For an incoming flood this stops the attack; for an outgoing attack from a compromised host it takes that host offline (which usually stops the abuse) and stops the outbound flood only where your edge also drops sources in blackholed prefixes, for example with uRPF.

!Dry-run by default

Until you explicitly set dry_run: false, every would-be blackhole is computed, logged and exposed through the API — but never announced to your routers. Validate detection and BGP peering against production telemetry before any route can be sent. See Going live.

Mitigation methods

blackhole (RTBH) is the default method and the subject of the rest of this page. Two other methods are available via the mitigation key, each overridable per hostgroup:

  • FlowSpec (mitigation: flowspec) — drop only the attack vector with BGP FlowSpec rules (RFC 8955/8956) instead of blackholing the whole victim.
  • Escalation ladders (escalation:) — step the response up the longer an attack persists, from alert to FlowSpec to blackhole.

Both share the same ban lifecycle, TTL, unban hysteresis and max_active_bans cap described below.

BGP configuration

The bgp block defines Kapkan's BGP identity, the blackhole next-hops, the RTBH community, and the eBGP peers it announces to.

bgp:
  local_asn: 65000               # Kapkan's local ASN
  router_id: "198.51.100.1"      # must be a valid IPv4 address
  next_hop: "192.0.2.1"          # IPv4 discard next-hop
  next_hop6: "100::1"            # IPv6 discard next-hop (optional)
  community: "65000:666"         # RTBH community, ASN:value
  # communities: ["65000:666", "65000:777"]  # full set; overrides `community`
  # local_pref: 100              # optional LOCAL_PREF for iBGP peers; 0 = omit
  neighbors:
    - address: "198.51.100.254"
      remote_asn: 65001
      port: 179                  # optional; defaults to 179, override for testing
KeyMeaning
local_asnKapkan's local ASN for the eBGP sessions. Required.
router_idThe BGP router ID. Must be a valid IPv4 address.
next_hopThe IPv4 discard next-hop announced with each /32 blackhole.
next_hop6The IPv6 discard next-hop announced with each /128 blackhole. Optional; when unset, Kapkan falls back to 100::1 (the RFC 6666 discard prefix).
communityThe RTBH community in ASN:value form (for example 65000:666). Parsed once at load into the wire value sent with every route.
communitiesOptional list of communities that overrides community when set, for upstreams that expect a full set.
local_prefOptional LOCAL_PREF path attribute attached to blackholes (meaningful to iBGP peers). Default 0 omits it.
neighbors[]The eBGP peers Kapkan announces to. Each entry takes an address, a remote_asn, and an optional port (default 179, intended for testing).

community, communities, next_hop, next_hop6 and local_pref can all be overridden per hostgroup — see Per-hostgroup BGP attributes.

The blackhole route Kapkan builds for a target reads as <prefix> next-hop <next_hop> community <community> — exactly the string surfaced in the route field of the ban object. IPv6 targets use next_hop6 (or the 100::1 fallback) in place of next_hop.

iPeering happens in dry-run

The BGP speaker peers with your neighbors even while dry_run is true — only route announcements are gated on the flag. This lets you confirm sessions reach ESTABLISHED (logged as bgp peer state) and validate connectivity before you announce a single route.

Per-hostgroup BGP attributes

The global bgp block sets the default blackhole next-hops and RTBH community. A hostgroup can override any of them, so different customers signal their own upstreams:

bgp:
  next_hop: "192.0.2.1"
  community: "65000:666"                          # the default RTBH community

hostgroups:
  - name: customer-a
    networks: ["203.0.113.64/26"]
    bgp:
      communities: ["65000:100", "65001:200"]     # customer-A's own blackhole signal
      next_hop: "192.0.2.50"                       # and discard next-hop
      local_pref: 250

Each field left unset inherits the global bgp value, so a group can override just its community while sharing the global next-hop. The resolved attributes are frozen on each ban when it is created: a config reload changes only future bans, never the route a live ban already announced. The per-ban next_hop, community and local_pref are visible in /api/v1/bans and in the route field.

The ban lifecycle

A ban is the unit Kapkan tracks for each blackhole decision. Its lifecycle is bounded at every step by the safety model — there is no path to a permanent or runaway ban.

  1. Announce on detection. When the engine reports a new per-host attack and the host's policy permits banning, Kapkan announces the blackhole route (or, in dry-run, records the would-be route) and the ban enters the active state. A second attack — for example an incoming and an outgoing flood on the same host at once — shares the one RTBH route rather than announcing twice.
  2. Auto-withdraw on TTL. Every announcement carries a TTL. After ban.ttl_seconds the route is auto-withdrawn even if the attack is still ongoing; a persisting attack refreshes the TTL, but never beyond a fresh TTL from now. There are no permanent bans.
  3. Unban with hysteresis. A ban driven by an attack is withdrawn only after traffic stays below threshold for ban.unban_hysteresis_seconds. This anti-flap delay prevents a borderline attack from rapidly announcing and withdrawing the same route.
  4. Hard cap. ban.max_active_bans is an absolute ceiling on simultaneous active bans. Past the cap, new bans are refused (returned as a ban in the rejected state with reason max_active_bans reached) and operators are alerted — so Kapkan can never blackhole half your network in a broad attack.
ban:
  ttl_seconds: 1800            # every announcement auto-withdraws after this
  unban_hysteresis_seconds: 60 # traffic must stay below threshold this long before withdraw
  max_active_bans: 100         # hard cap on simultaneous bans

A ban is also withdrawn early if a config reload removes its target from the protected networks — Kapkan will not keep a route up for address space it no longer protects.

Dry-run

In dry-run mode (dry_run: true, the default) Kapkan does everything except send the route. The target prefix, next-hop, community and the full route string are computed, the decision is logged as DRY-RUN: would announce blackhole route (not sent), and the ban is tracked in the API with dry_run: true and the live active/withdrawn states. TTL expiry and hysteresis run exactly as they would in production, so the entire lifecycle is observable before any route reaches a router.

Nothing is announced until you set dry_run: false and reload (SIGHUP or POST /api/v1/config/reload). Because peering happens in dry-run, the only behavior that changes when you flip the flag is the announce/withdraw itself.

iValidate first

Run against production telemetry in dry-run, confirm detections fire on the right prefixes and the route fields are correct, then turn off dry-run. The full procedure is in Going live.

Manual bans

Operators can blackhole or release a host directly through the REST API:

POST /api/v1/ban
Content-Type: application/json

{"ip": "203.0.113.66"}
POST /api/v1/unban
Content-Type: application/json

{"ip": "203.0.113.66"}

A manual ban is created with manual: true and follows the same TTL and hysteresis as an automatic one. It honors every safety rule with no exceptions: a whitelisted target is refused (returns 409, reason whitelisted), a target outside the configured networks is refused (returns 409, reason outside configured networks), and exceeding max_active_bans is refused (returns 409). Both POST endpoints require the application/json content type. See the API reference for the full request and response shapes.

The ban object

The API returns each ban as a JSON object. These are the fields exactly as serialized by the mitigator:

FieldTypeMeaning
targetstringThe blackholed host address (IPv4 or IPv6).
prefixstringThe host prefix announced: /32 for IPv4, /128 for IPv6.
metricstringThe metric that triggered the attack-driven ban (omitted for manual bans).
ratenumberThe observed rate at ban time, in the metric's units (omitted when zero).
thresholdnumberThe threshold the rate crossed (omitted when zero).
next_hopstringThe discard next-hop announced with the route.
communitystringThe RTBH community in ASN:value form.
local_prefnumberThe LOCAL_PREF attached to the route (omitted when zero).
routestringThe full route as <prefix> next-hop <next_hop> community <community>, or a flowspec: ... summary for FlowSpec bans.
statestringOne of active, withdrawn, or rejected.
dry_runbooleanWhether the ban was made in dry-run mode (route not actually sent).
manualbooleantrue for operator-requested bans, false for automatic ones.
started_attimestampWhen the ban was created.
expires_attimestampWhen the TTL auto-withdraws the route.
withdrawn_attimestampWhen the route was withdrawn (omitted while active).
reasonstringWhy the ban was withdrawn or rejected (omitted while active).
methodstringThe mitigation method that produced the ban: blackhole or flowspec.
flowspecarrayThe generated FlowSpec rules, when the method is flowspec.
escalationarrayThe configured escalation ladder, when one is set.
escalation_stepnumberThe index of the ladder's current rung.

A rejected ban is never announced — it records a refusal (whitelisted, outside networks, or cap reached) so the decision is auditable in the API and notifications.

  • FlowSpec mitigation — surgical drops that spare the victim's other traffic.
  • Escalation ladders — step the response up the longer an attack persists.
  • Safety model — the dry-run, TTL, hysteresis, cap and whitelist rules enforced in code.
  • Going live — validate detection and peering, then turn off dry-run.
  • REST API — the ban endpoints, request shapes and response codes.