Pinterest engineering blog

Try:

Pinterest1 month ago

Next Gen Data Processing at Massive Scale At Pinterest With Moka (Part 2 of 2)

Part 2 of Pinterest's Moka series focuses on infrastructure: deploying Spark on AWS EKS using Terraform/Helm and EKS Blueprints, a Fluent Bit + S3 logging pipeline (with CloudWatch control-plane logs), observability using OpenTelemetry + Prometheus-style metrics and kube-state-metrics, multi-architecture image pipelines (Hadoop/Spark debs and Corretto Java 11 base images) for Intel and ARM, ingress/ingress-nginx + AWS load-balancer/NLB patterns for Spark UI, a centralized Spark History Server per environment, and a React/Typescript internal UI (ITP). The article also covers operational learnings (networking, multi-account, pod identities) and future adoption of TiDB, Flink, Ray, and PyTorch on EKS.

Pinterest2 months ago

Developer Experience at Pinterest: The Journey to PinConsole

Pinterest built PinConsole, an internal developer portal on top of Backstage to improve developer experience and reduce tool fragmentation. They extended Backstage with custom authentication (OAuth, LDAP), a synchronized entity data model, PostgreSQL databases hosted on AWS RDS, UI theming with Pinterest's Gestalt design system, and a custom PinCompute plugin that interfaces with Kubernetes (custom CRDs and multi-cluster aggregation) via a PinCompute API. The platform integrates GitHub (GraphQL), Jira, PagerDuty and observability dashboards, and uses frontend and performance optimizations (react, apollo client caching, code-splitting, server-side rendering, CDN/API gateway caching). The article covers architecture choices, performance and scalability tactics, adoption metrics, lessons learned, and a roadmap for future platform capabilities.

Pinterest3 months ago

Debugging the One-in-a-Million Failure: Migrating Pinterest’s Search Infrastructure to Kubernetes

Pinterest migrated its in-house search system Manas to their Kubernetes platform (PinCompute). During testing they observed rare, 100x latency spikes on leaf nodes. OS-level and blackbox debugging traced the cause to cAdvisor's WSS estimation (container_referenced_bytes) which uses smaps and clear_refs to scan and clear page table accessed bits; for very large memory-mapped indices this scanning caused kernel-level contention and TLB effects that stalled requests. The team disabled cAdvisor's WSS estimation and filed a cAdvisor issue. The post details the debugging process, kernel interactions, and lessons learned about isolation and observability.

Pinterest3 months ago

Next Gen Data Processing at Massive Scale At Pinterest With Moka (Part 1 of 2)

Pinterest describes Moka, a next-generation Spark-on-Kubernetes (EKS) data processing platform to replace its Hadoop/Monarch stack. Part one covers application-layer design: Spark Operator usage and hardening, Archer job submission and Spinner/Airflow integration, remote shuffle (Celeborn), scheduling with YuniKorn, migration dry-run validation, performance tuning (JDK/Graviton), deployment tooling (Terraform/Helm), logging/observability, and operational learnings. The platform emphasizes containerization, autoscaling, resource management, and compatibility with existing pipelines.

Pinterest4 months ago

Scaling Pinterest ML Infrastructure with Ray: From Training to End-to-End ML Pipelines

Pinterest extended Ray beyond training to power end-to-end ML pipelines (feature development, sampling, labeling, and production retrains). They built a Ray Data native transformation API, implemented Iceberg bucket joins and writes to avoid expensive pre-joins, persisted features (S3 + Iceberg) for reuse and production launch (Galaxy), and optimized Ray Data internals (pyarrow conversions, Numba JIT, UDF consolidation) to achieve large iteration and throughput improvements.

Pinterest4 months ago

Unlocking Efficient Ad Retrieval: Offline Approximate Nearest Neighbors in Pinterest Ads

Pinterest evaluates offline ANN retrieval for ads to reduce infrastructure cost and improve throughput for static query contexts. They migrated from HNSW to IVF for larger indexes, implemented offline workflows that precompute K nearest neighbors and store results in a KV store, and applied this to similar-item ads and visual-embedding candidate generators. Experiments show comparable or improved recall/CTR and substantial infra cost savings. Pinterest plans an offline ANN framework with index hyperparameter tuning and recall monitoring.

Pinterest5 months ago

Next-Level Personalization: How 16k+ Lifelong User Actions Supercharge Pinterest’s Recommendations

Pinterest describes TransActV2, a recommender-system upgrade that models lifelong user action sequences (up to ~16k actions), introduces a Next Action Loss with impression-based negatives, and ships extensive engineering optimizations (on-device NN selection, custom Triton kernels, pinned memory, request deduplication) to achieve large online and offline gains in engagement and major latency reductions.

Pinterest5 months ago

Automated Migration and Scaling of Hadoop™ Clusters

Pinterest SREs built the Hadoop Control Center (HCC) to automate in-place migration and safe scaling (especially scale-in) of large Hadoop/YARN clusters on AWS. HCC coordinates decommissioning, monitors replication/JMX/CMDB, manipulates ASG sizes via the AWS API (to avoid unsafe AWS scale-in behavior), integrates with Terraform and Puppet workflows, and exposes CLI and reporting to reduce manual steps and prevent HDFS data loss.

Pinterest5 months ago

Adopting Docs-as-Code at Pinterest

Pinterest adopted a docs-as-code approach and built PDocs, a centralized documentation platform that crawls Markdown projects across repositories and renders them into a unified site. The implementation uses a pdocs.yaml config model, the Unified ecosystem to process Markdown ASTs, React components from Gestalt, and Next.js for static rendering; a custom Node.js server, CLI, GitHub integrations, server-sent events, and search/GenAI integrations support the developer experience. The post covers adoption, UX choices (draft/publish, page trees, edit links), and future plans (converters, live editors, health dashboards).

Pinterest5 months ago

Healthier Personalization with Surveys

The post describes Pinterest’s use of Pinner surveys to evaluate and tune personalized recommendation models and home feed relevance. It details the Home feed Relevance Survey, which samples Pinners and asks them to rate Pins chosen for them and to provide follow-up reasons when Pins are not relevant. Surveys are used proactively to validate models before running experiments and diagnostically to detect distribution or relevance issues (for example, a spike in low-relevance, wrong-language Pins). Pinterest also describes an internal guild of PhD-trained survey experts and explains that survey signals are combined with engagement signals and community guidelines to balance personalization for wellbeing.

Pinterest5 months ago

Modernizing Home Feed Pre-Ranking Stage

Pinterest modernized its home-feed pre-ranking (lightweight scoring) by replacing legacy two-tower light rankers with a jointly-trained request-level and item-level model that are decoupled at serving. They implemented an early-funnel logging pipeline to mitigate sample selection bias, adopted online item feature fetching with a root-leaf sharded serving architecture to enable early user-item crossings and caching, and used distillation (KL) plus BCE on logged negatives to better align pre-ranking with the L2 ranker. They also set up auto-retraining and observed engagement improvements.

Pinterest5 months ago

How Pinterest Accelerates ML Feature Iterations via Effective Backfill

Pinterest describes a multi-year evolution of their ML feature backfill system: v1 used Spark + Airflow to run full backfills (with S3-parquet checkpointing and pyspark), which exposed cost, concurrency, and partition-management pain points. v2 introduced a two-stage backfill (stage1: staged feature tables; stage2: promotion), adopted Iceberg (dynamic partition inserts, snapshots, bucketing) to reduce shuffles and enable fast rollback, and implemented bucketing/sorting for compression and faster joins. Future work moves joining to training time via Ray (map-side bucket joins) to avoid materializing full training tables and further accelerate feature iteration.

Pinterest5 months ago

500X Scalability of Experiment Metric Computing with Unified Dynamic Framework

Pinterest built the Unified Dynamic Framework (UDF) to standardize and scale experiment metric computation for its Helium experimentation platform. UDF uses Apache Airflow dynamic DAGs to process metrics in dynamic, parallel batches, persists in-progress metric lists to avoid duplicate computation, and provides automatic backfills, tracking, and notifications. Results: faster metric delivery (4x), support for 100x more metrics today with design for 500x, and major reductions in engineering overhead and partial-data incidents. Metrics outputs are stored in Druid and the system integrates with centralized Experiment Metrics Metadata.

Pinterest6 months ago

Multi-gate-Mixture-of-Experts (MMoE) model architecture and knowledge distillation in Ads…

Pinterest describes adopting a Multi-gate Mixture-of-Experts (MMoE) architecture plus knowledge distillation to improve Ads engagement modeling. They compare expert architectures (DCNv2, Masknet, FinalMLP, MLP), use mixed-precision inference and lightweight gates to reduce serving cost (≈40% latency reduction in benchmarks), apply distillation (batch-stage, pairwise-style loss) to mitigate missing training data, avoid distillation in incremental training due to overfitting, and report significant offline and online metric improvements.

Pinterest6 months ago

Migrating 3.7 Million Lines of Flow Code to TypeScript

Pinterest migrated 3.7 million lines of Flow-annotated JavaScript to TypeScript in ~8 months using a big‑bang codemod approach (typescripify + community codemods), changes to linting and editor configs, migration of generated types (Thrift, OpenAPI, Relay), Babel-based transpilation with minimal output diffs, extensive automation and daily validation, and a staged canary rollout — resulting in improved type safety and developer experience.

Pinterest7 months ago

Improving Pinterest Search Relevance Using Large Language Models

Pinterest built an LLM-based relevance pipeline for Search: a cross-encoder LLM teacher (fine-tuned on human labels and enriched Pin text) generates 5-scale relevance labels at scale; those labels are used to distill a lightweight servable student model that combines embeddings (SearchSAGE, PinSAGE), visual embeddings, BM25/text-match signals, and engagement features for real-time ranking. They experimented with multiple pretrained LMs (multilingual BERT, T5, mDeBERTa, XLM-RoBERTa, Llama-3–8B), used techniques like qLoRA, quantization, gradient checkpointing and mixed precision for efficiency, and reported offline and online gains (e.g., +2.18% nDCG@20 and >1% relevance / >1.5% fulfillment improvements in A/B tests).

Pinterest11 months ago

How Pinterest Leverages Honeycomb to Enhance CI Observability and Improve CI Build Stability

Pinterest's Mobile Builds team uses Honeycomb to gain deep observability into CI workflows (dashboards, traces, correlation, derived metrics), identify bottlenecks in Buildkite pipelines, instrument Bazel build scripts, and implement error categorization and alert routing via integrations (Buildkite logs + AWS EventBridge + Buildkite Jobs API) to improve build stability and streamline on-call.

Pinterest11 months ago

Change Data Capture at Pinterest

Pinterest built a Generic CDC platform using Debezium + Kafka Connect to capture changes from thousands of database shards. They split control plane and data plane (control on an AWS ASG, data plane running Kafka Connect distributed across AZs), solved large-scale issues (OOMs, rebalancing churn, duplicate tasks, slow failovers) via bootstrapping, rate limiting, configuration changes and a Kafka upgrade, and plan further scalability and DR work.

Pinterest1 year ago

Resource Management with Apache YuniKorn™ for Apache Spark™ on AWS EKS at Pinterest

Pinterest migrated half of its Spark batch workload from a Hadoop/YARN platform (Monarch) to a Kubernetes-based platform (Moka) on AWS EKS. To do this they adopted Apache YuniKorn as an application-aware scheduler, added application-level resource-usage logging (ingested via Fluent Bit to S3), implemented a new resource-allocation algorithm using OR-Tools CP-SAT, and built orchestration/routing (Archer and CCR) to maintain SLOs while improving cluster/resource utilization.

Pinterest1 year ago

Ray Batch Inference at Pinterest (Part 3)

Pinterest describes Ray Batch Inference, an SDK built on Ray Data for offline batch model inference. The post covers architecture (streaming execution, heterogeneous clusters, ray actors), implementation details (map_batches, pyarrow zero-copy, carryover columns), multi-model inference, accumulators for metrics, vLLM integration for LLMs, and concrete results (4.5x throughput, 30x cost savings). It also notes adoption across teams and future plans like KubeRay integration and Ray Tune.