Next Gen Data Processing at Massive Scale At Pinterest With Moka (Part 2 of 2)
Part 2 of Pinterest's Moka series focuses on infrastructure: deploying Spark on AWS EKS using Terraform/Helm and EKS Blueprints, a Fluent Bit + S3 logging pipeline (with CloudWatch control-plane logs), observability using OpenTelemetry + Prometheus-style metrics and kube-state-metrics, multi-architecture image pipelines (Hadoop/Spark debs and Corretto Java 11 base images) for Intel and ARM, ingress/ingress-nginx + AWS load-balancer/NLB patterns for Spark UI, a centralized Spark History Server per environment, and a React/Typescript internal UI (ITP). The article also covers operational learnings (networking, multi-account, pod identities) and future adoption of TiDB, Flink, Ray, and PyTorch on EKS.