From Static Rate Limiting to Adaptive Traffic Management in Airbnb’s Key-Value Store
Airbnb describes upgrading Mussel, its multi-tenant key-value store, from static per-client QPS limits to an adaptive layered QoS stack: resource-aware request-unit (RU) accounting, real-time load-shedding driven by a p95 latency-ratio and CoDel-like queueing, and hot-key detection with local caching and request coalescing. The post covers algorithms (P² quantiles, Space-Saving top-k), per-dispatcher local control loops, calibration, production results (including DDoS drill), and operational lessons.