Airbnb engineering blog

Try:

Airbnb3 weeks ago

From Static Rate Limiting to Adaptive Traffic Management in Airbnb’s Key-Value Store

Airbnb describes upgrading Mussel, its multi-tenant key-value store, from static per-client QPS limits to an adaptive layered QoS stack: resource-aware request-unit (RU) accounting, real-time load-shedding driven by a p95 latency-ratio and CoDel-like queueing, and hot-key detection with local caching and request coalescing. The post covers algorithms (P² quantiles, Space-Saving top-k), per-dispatcher local control loops, calibration, production results (including DDoS drill), and operational lessons.

Airbnb1 month ago

Building a Next-Generation Key-Value Store at Airbnb

Airbnb rearchitected its Mussel key-value store from a custom v1 storage backend to Mussel v2 using a NewSQL backend and a Kubernetes-native control plane. v2 adopts Kafka-backed durable writes, a stateless Dispatcher layer, dynamic range sharding/presplitting, topology-aware TTL expiration, and preserves Airflow→S3 bulk-load onboarding. The migration used a blue/green, per-table bootstrapping and dual-write pipeline (sampling, pre-splitting, bootstrap, checksums, catch-up) to move petabytes with zero downtime. Outcomes include reduced operational overhead, predictable p99 latency, high throughput, and improved tenancy/quota transparency.

Airbnb1 month ago

Viaduct, Five Years On: Modernizing the Data-Oriented Service Mesh

Airbnb announces Viaduct is now open-source and describes Viaduct Modern, a ground-up overhaul that simplifies the tenant API, strengthens framework/engine abstraction boundaries, introduces tenant modules and a typed-to-dynamic bridge (Kotlin wrappers over a dynamic engine), and supports gradual migration. The post covers operational improvements (observability, build-time, dispatcher/shard routing) and argues the platform scales to large, multi-team workloads while improving developer ergonomics.

Airbnb1 month ago

Taming Service-Oriented Architecture Using A Data-Oriented Service Mesh

Airbnb describes Viaduct, a GraphQL-based "data-oriented service mesh" that centralizes a schema across microservices to abstract dependencies, improve modularity and data agility, support serverless derived fields, and provide field-level observability and runtime reliability features.

Airbnb2 months ago

Migrating Airbnb’s JVM Monorepo to Bazel

Airbnb migrated its large JVM monorepo from Gradle to Bazel over 4.5 years to gain speed, reliability, and a uniform build infrastructure. They adopted remote build execution and caching, built an automated build-file generator (inspired by Gazelle) to manage fine-grained Bazel targets, implemented multi-version dependency support via multiple maven_install rules and conflict resolution, and migrated services and data pipelines (Spark/Flink) with testing and targeted fixes. Outcomes included large improvements in local build/test times, IntelliJ syncs, and deploy speed; key learnings covered partnering with pilot teams and avoiding premature optimization.

Airbnb3 months ago

Seamless Istio Upgrades at Scale

Airbnb describes their process for safely performing zero-downtime, gradual Istio upgrades at scale across thousands of pods and VMs. They run parallel Istiod revisions, use a rollouts.yml spec to deterministically assign namespaces to revisions, mutate Kubernetes manifests and pods with an internal framework (Krispr) during CI/admission, and coordinate VM upgrades with mxagent/mxrc and cloud tags — all while enforcing health limits and deterministic rollouts.

Airbnb3 months ago

Achieving High Availability with distributed database on Kubernetes at Airbnb

Airbnb describes running an open-source distributed SQL database across three independent Kubernetes clusters (one per AZ) using k8s operators, PVC/EBS-backed storage, admission hooks and multi-cluster deployment patterns to achieve high availability, safe node replacements, and mitigations for EBS tail latency.

Airbnb4 months ago

Understanding and Improving SwiftUI Performance

Airbnb engineers analyze SwiftUI performance pitfalls (reflection-based view diffing causing unnecessary body evaluations), and present practical fixes: auto-generated Equatable conformance via an @Equatable macro (with @SkipEquatable), splitting large view bodies into smaller diffable child views, and a SwiftLint rule based on SwiftSyntax to track view complexity. These changes reduced re-renders and improved scroll performance (e.g., ~15% fewer scroll hitches on the Search screen).

Airbnb4 months ago

Load Testing with Impulse at Airbnb

Airbnb describes Impulse, an internal load-testing-as-a-service framework composed of four modular components — a context-aware load generator (Java/Kotlin), dependency mocker (supports HTTP JSON, Thrift, GraphQL with latency configuration and replay), traffic collector (captures upstream/downstream production traffic for replay), and a testing API generator (wraps async flows for synchronous load tests). Impulse is containerized, decentralized, integrates with CI/CD and the observability stack, and enables teams to run self-service, high-fidelity load tests to find bottlenecks and capacity limits.

Airbnb5 months ago

Listening, Learning, and Helping at Scale: How Machine Learning Transforms Airbnb’s Voice Support…

Airbnb describes an ML-powered adaptive IVR for voice support that uses domain-adapted ASR, contact-reason intent detection, vector-embedding retrieval with an LLM-based re-ranker to surface Help Center articles, and a paraphrasing step (nearest-neighbor on curated summaries) to improve self-resolution. The post covers model performance improvements (WER reduction), production deployment (Issue Detection Service, parallel serving with low latency), evaluation metrics, and user engagement results.

Airbnb7 months ago

How Airbnb Measures Listing Lifetime Value

Airbnb describes its framework for measuring listing lifetime value (LTV): baseline LTV (predicted bookings over the next 365 days using ML models), incremental LTV (adjusting baseline for cannibalization via a production-function model linking supply and demand to bookings), and marketing-induced incremental LTV (measuring value from internal initiatives). The post covers modeling and evaluation challenges (waiting for 365-day labels, pandemic-driven distribution shifts), mitigation strategies (shorter training windows, granular geographic/external features), adoption of LightGBM for high-cardinality features, and operational practices including daily updates to predictions to manage uncertainty and evaluate marketing ROI.

Airbnb7 months ago

Embedding-Based Retrieval for Airbnb Search

Airbnb built an embedding-based retrieval system for Homes Search that maps queries and listings to embeddings via a two-tower network trained with contrastive learning. They constructed training pairs using trip-based positive/negative sampling from user journeys, precompute listing embeddings offline daily, and evaluated ANN serving options. They compared IVF and HNSW (choosing IVF due to update and filter integration constraints) and found Euclidean distance produced more balanced clusters than dot product. The system improved booking metrics in production.

Airbnb7 months ago

Accelerating Large-Scale Test Migration with LLMs

Airbnb used LLMs and an automated, step-based pipeline to migrate ~3.5K React component tests from Enzyme to React Testing Library. Their approach used a state-machine of validation/refactor steps, retry loops with dynamic prompts, and progressively larger prompt contexts (including up to 50 related files and 40k–100k tokens) plus breadth-first prompt tuning. The bulk run migrated 75% of files in four hours and, after sample/tune/sweep iterations, reached 97% automation; the remaining files were manually finished. The migration preserved test intent and coverage and completed in six weeks with LLM API costs and engineering time far lower than the initial manual estimate.

Airbnb10 months ago

Improving Search Ranking for Maps

Airbnb modified its ranking approach for the map interface because list-style ranking by booking probability doesn't map to spatial UIs. They explored three approaches—restricting booking-probability variance among displayed pins, tiering pins (regular vs mini-pins) to prioritize attention, and re-centering the map toward high-probability listings—and validated each with online A/B experiments, achieving improvements in bookings and user discovery metrics. The work was also published at KDD '24.

Airbnb10 months ago

Airbnb at KDD 2024

Airbnb summarizes its KDD 2024 presence: multiple ADS and research-track papers and talks covering deep learning for search ranking (including maps and multi-objective ranking via distillation), online experimentation and metric decomposition, two-sided marketplace modeling (intent modeling, pricing, demand), listing embeddings, customer-support prediction integrated into ranking, and LLM pretraining on activity logs (using BERT) for downstream tasks and product-quality estimation using text embeddings and LLMs.

Airbnb11 months ago

My Journey To Airbnb | Vijaya Kaza

Vijaya Kaza outlines her technical leadership overseeing engineering for Trust & Safety and security at Airbnb, where teams build platforms, tools, and AI models to protect the community and secure infrastructure and information assets. Her prior work includes product engineering at Cisco and leading a transition from on‑prem to a cloud/SaaS model and cloud security portfolio growth at FireEye, plus product development for mobile security at Lookout. The post cites a reservation screening system designed to identify higher‑risk bookings to reduce disruptive events. It emphasizes that major technology shifts — cloud, mobile, and AI — create novel security challenges that shape engineering and threat mitigation priorities. The cybersecurity work focuses on implementing security controls and threat detection to safeguard Airbnb’s data, employees, and infrastructure.

Airbnb11 months ago

From Data to Insights: Segmenting Airbnb’s Supply

Airbnb describes how it segments listings by availability using calendar-derived features (availability rate, streakiness, seasonality), applies k-means clustering (k selected via elbow plot), validates clusters via A/B tests, correlates and UX research, and productionizes the segmentation by training a decision tree and translating its rules into SQL (CASE WHEN) to run in the data warehouse at scale.

Airbnb11 months ago

Building a User Signals Platform at Airbnb

Airbnb built the User Signals Platform (USP), a Lambda-style stream processing platform that ingests Kafka events, processes transforms in Flink (using RocksDB for state), writes append-only records to a Key-Value store for online serving, and supports user signals, user segments, and session engagements. The platform emphasizes developer-friendly config-driven workflows (python + Java transforms), sub-second end-to-end latency goals, asynchronous compute for heavy operations (e.g., ML), and operational practices (latency metrics and hot-standby Task Managers) to support >1M events/sec and 70k QPS.

Airbnb11 months ago

Airbnb’s AI-powered photo tour using Vision Transformer

Airbnb’s AI-powered photo tour uses fine-tuned Vision Transformers to classify listing photos into 16 room types and groups similar images via a Siamese network. To overcome limited labeled data, the team applied transfer learning, multi-task training, ensemble methods, and knowledge distillation to boost accuracy while managing computational cost.

Airbnb11 months ago

Adopting Bazel for Web at Scale

Airbnb migrated its large web monorepo to Bazel to get better parallelism, caching, and a unified build system. They prepared the repo by breaking dependency cycles (using a minimum feedback arc set) and auto-generating BUILD.bazel files with a sync-configs tool (using jest-haste-map, watchman, and parts of Gazelle). They migrated key CI jobs incrementally — TypeScript type checking (custom rule and packaging node_modules/ts inputs into tar archives), ESLint (restricting rules to per-file inputs), and Jest (tarred deps, remote and Docker-layer caching, preserving symlinks, and fixes for implicit deps via a bazelKeep comment). Changes improved CI performance (examples: typescript ~34% faster, eslint ~35% faster, jest 42% faster incremental / 29% overall), reduced CI input sizes, and preserved or improved developer experience. Future work includes warm Bazel hosts, further build-graph optimizations, and exploring SquashFS.