Uber: Go microservices and 24,000 CPU cores saved via PGO
saved fleet-wide via profile-guided optimisation on Go microservices. Source: go.dev/blog/pgo-preview
Uber's Go stack
Uber runs thousands of Go microservices handling hundreds of millions of trips across 70+ countries. Their Go services include the Geofence service (checking whether a location is in a pickup zone), dispatch (matching drivers to riders), pricing, payment processing, and dozens of data pipeline services.
Uber has been Go-first for backend services since approximately 2015. Their internal Go library ecosystem (YARPC for RPC, fx for dependency injection, zap for logging) is substantial. This is not a new startup decision; it's a decade of investment in Go at scale.
Profile-guided optimisation: what it is and why it matters
Profile-guided optimisation (PGO) lets the Go compiler use real production CPU profiles to make better optimisation decisions. Instead of making static guesses about which code paths are hot, the compiler inlines functions that are actually called frequently, optimises branch prediction for real-world distributions, and avoids overhead on cold paths.
Go 1.21 shipped PGO as stable. Uber applied PGO across their Go fleet. The result: a 4% CPU reduction fleet-wide. That sounds small. At Uber's scale (600,000+ CPU cores running Go services), 4% is 24,000 cores. That's 24,000 cores of infrastructure cost eliminated without changing any application code.
The counterpoint to Discord's Rust migration
Discord's migration shows Rust winning on latency for a GC-sensitive workload. Uber's PGO result shows Go winning on throughput at hyperscale through compiler-level optimisation. Both are right. The lesson: don't switch languages because of a benchmark. Switch because you have a specific, measured, production problem that the alternative solves.