Home/Case Studies/Discord
DISCORD · Go to Rust · 2020

Discord: why they switched from Go to Rust

GC pauses eliminated

Source: discord.com/blog/why-discord-is-switching-from-go-to-rust

What Discord built and why it mattered

Discord's Read States service tracks what messages every user has read across every server and channel. At Discord's scale (hundreds of millions of users, thousands of servers per user), this service is on the critical path for every page load. It's the first thing Discord queries to show you whether you have unread messages.

The Go implementation was storing 8 million Read States in memory (using LRU caching). It worked, but every 2 minutes, Go's garbage collector would run a full collection cycle on this large heap. During GC, response times would spike dramatically and the service would effectively pause. Users experienced this as Discord “lagging” for a few seconds every couple of minutes.

8M
Concurrent Read States
in-memory LRU cache
~2 min
GC cycle frequency
causing latency spikes
0
GC pauses after
Rust rewrite complete

Why Go's fix didn't work

Discord's engineering team tried the standard Go performance fixes before switching to Rust:

Fix tried: Tuning GOGCReduced GC frequency but increased GC pause duration and memory usage. Zero-sum tradeoff.
Fix tried: Memory poolsReduced allocations, which helped, but the LRU eviction cache still caused enough heap pressure to trigger GC cycles.
Fix tried: Rewriting hot pathsHelped latency overall but didn't solve the periodic GC spike. The problem was structural: Go's GC pauses are non-deterministic.

What the Rust rewrite achieved

Discord rewrote the Read States service in Rust. Rust has no garbage collector: memory is freed deterministically at compile-time via the ownership system. There are no GC cycles to pause execution. The Rust version also used less memory and had lower median latency (not just improved tail latency).

According to Discord's blog post: “Not only was performance improved, but the latency was now rock solid. There were no longer any latency spikes. The Rust service was just faster, full stop.”

The lesson: Rust doesn't replace Go everywhere

Discord didn't rewrite their entire backend in Rust. They rewrote one service that had a specific, measurable GC problem at their specific scale. The other Go services continue to run fine. This is the appropriate use of Rust: a targeted rewrite where you can prove the problem is GC-specific and the rewrite is worth the cost.

All case studies →Goroutines vs Tokio →Benchmark data →