Welcome to News Feed at Scale. We're building Facebook and Twitter-style timelines for billions of users.
News feed systems must solve three fundamental problems. First, write amplification — one post creates millions of timeline updates across the platform. Second, read latency — users expect instant feed loads in under 200 milliseconds. And third, fairness — celebrity posts can't overwhelm the system and starve out regular content. These three constraints are often in tension with each other, and that's what makes this problem interesting.
list
Looking at the overall architecture, we have three main sections. On the left is the write path — a user posts content, it goes through a fan-out service, then into a message queue. The storage layer in the middle contains three databases — a timeline store for pre-computed feeds, a post store for the actual content, and a social graph tracking who follows whom. On the read path, clients go through Redis cache and a timeline aggregator to fetch from both the timeline and post stores. The fan-out service also updates the social graph as needed.
mermaid
We have two fundamental approaches to distributing posts to followers, and they have opposite trade-offs. Let's examine each one.
image
The push model works like this — when Alice posts, we immediately write to all followers' timelines. {{step}}Fast reads are a major advantage here. The timeline is pre-computed, so responses come back in under 100 milliseconds with no aggregation needed and the data is cache-friendly. {{step}}But we have slow writes due to write amplification. One post becomes N writes where N is the number of followers, and this scales linearly with follower count, creating a bottleneck for celebrities. {{step}}This approach is best for regular users with fewer than 10,000 followers, which represents the majority of the platform with predictable load. {{step}}The implementation uses a message queue pattern to handle the distributed fan-out work.
cards
The pull model is the opposite. When Bob loads his feed, we query posts from people he follows in real-time. {{step}}Fast writes are the big win here — it's just a single database insert with O of one complexity, no fan-out delay, and it handles celebrities gracefully. {{step}}But we have slow reads because we need to query N timelines, sort and merge the results, and handle cache invalidation. {{step}}This approach works best for celebrities and publishers with more than one million followers who have high tweet volume and broadcast use cases. {{step}}The implementation uses a scatter-gather query pattern to fetch from multiple sources in parallel.
cards
Let's think about what happens when someone with 100 million followers posts. This is where the push model hits a hard wall.
image
Here are the numbers. We have 100 million followers, and we need 10 million writes per second to complete the fan-out in a reasonable time. But at 10,000 writes per second, the fan-out takes about 2 hours. The issue is that the push model tries to create 100 million timeline writes. At typical throughput, fan-out takes roughly 3 hours, meaning followers see the post slowly appear in their feeds over time. The real-world impact is significant — delayed visibility for viral content, queue backlog that affects other posts, and database hotspots on the celebrity user IDs causing cascading failures.
stats list
The solution is a hybrid approach combining push and pull based on follower count. {{step}}Regular users with fewer than 10,000 followers use the push model with fan-out on write, pre-computed timelines, and fast reads. {{step}}Power users with 10,000 to one million followers use async push with delays, partial fan-out, and caching of their top followers. {{step}}Celebrities with more than one million followers use the pull model with fan-out on read, no pre-computation, and on-demand fetching. {{step}}When serving a user's timeline, we merge together pushed posts from their friends and pulled posts from celebrities they follow, then sort everything by timestamp.
cards
Now let's walk through how a post gets from creation to appearing in a user's feed in milliseconds.
image
Looking at the write path, when a user posts, the system first checks their follower count. If it's under 10,000, the post goes into the fan-out queue. If it's over one million, it only goes to the post store. Between 10,000 and one million, it uses async fan-out. The post store persists the content, and then either the immediate queue or async queue distributes it to the timeline store, which triggers cache invalidation. The full pipeline includes validation to check content and media, persisting the post to the database, routing based on follower count, queuing to the message broker, and distributing via either fan-out or caching strategy.
mermaid list
Here's the code for a fan-out worker that processes messages from the queue. It takes a message containing the post ID and author ID, then batch fetches followers in chunks of 1000. For each batch, it parallelizes timeline inserts using Promise.all so multiple followers are written simultaneously. After writing, it invalidates the affected timeline caches using deleteMany. The key optimizations are batch processing to reduce database round trips, parallel writes to maximize throughput, targeted cache invalidation to avoid clearing unrelated caches, and a dead letter queue for failed writes that need manual investigation.
code list
On the read side, when a client requests their timeline, the system checks Redis cache first. If there's a hit, return immediately. If not, query the timeline store for pre-computed posts, then fetch post metadata in parallel, pull celebrity posts on-demand, merge everything, and rank by timestamp. Finally, cache the result and return to the client. Key optimizations include caching timelines for 30 to 60 seconds, pre-fetching post content in parallel to reduce latency, lazy-loading images and videos, and cursor-based pagination to handle large feeds efficiently.
mermaid list
Here's the code for timeline aggregation. First, check the cache and return if it's there. Then fetch the pre-computed timeline stored from pushed posts, over-fetching by 2x to account for filtering. Next, get celebrity following list and fetch their recent posts in parallel using Promise.all. Merge the pushed and pulled posts, sort by timestamp descending, and slice to the requested limit. Enrich with post metadata like author info and engagement counts. Finally, cache for 60 seconds and return to the client. This pattern ensures we serve both cached content quickly while still including the latest from celebrities we follow.
code
We use a multi-layer caching approach to hit sub-100 millisecond read times at scale.
image
We have four distinct cache layers. {{step}}Browser cache stores data client-side with 5 to 10 minute TTL using localStorage or IndexedDB, giving instant results when users revisit. {{step}}CDN cache handles static assets like images and videos, is geo-distributed globally, and uses a 1-hour TTL. {{step}}Redis cache stores hot pre-computed feeds with 30 to 60 second TTL and is critical for handling cache storms when many users request simultaneously. {{step}}Database query cache saves results for repeated queries like celebrity post lookups and follower graph traversals with a 5-minute TTL.
cards
Looking at cache invalidation patterns in code. Pattern one is TTL-based, the simplest approach using automatic expiration, which provides eventual consistency. Pattern two is write-through, where we delete the cache when new data arrives, ensuring immediate consistency. Pattern three is tag-based invalidation, where we delete all timeline caches for affected followers using a single operation. And pattern four is lazy invalidation, where we return slightly stale data while refreshing in the background asynchronously, minimizing blocking operations.
code
Beyond chronological ordering, we use machine learning-powered relevance scoring. {{step}}Engagement signals track user interaction patterns like like, share, and comment history, time spent on similar posts, and how strong the relationship is to the author. {{step}}Content features include characteristics of the post itself — video content ranks higher than images which rank higher than text, topic relevance computed using N-L-P embeddings, and a recency decay function that gradually de-prioritizes older posts. {{step}}Real-time scoring uses lightweight models like linear regression or gradient boosted trees with under 10 millisecond inference latency, and retrains daily or weekly. {{step}}A-B testing continuously optimizes by testing ranking weights, measuring engagement lift, and iterating on features based on what drives user retention and satisfaction.
cards
Here are the key takeaways. First, use a hybrid fan-out approach — push for regular users, pull for celebrities, and merge on read to get the best of both worlds. Second, implement multi-layer caching at the browser, C-D-N, Redis, and database levels, each with appropriate time-to-live values. Third, make the write-read trade-off deliberately by pre-computing timelines on write to optimize read latency. And fourth, rank by predicted relevance using M-L models instead of pure chronological order to boost engagement. Next steps are to study Twitter's timeline architecture, read Instagram's feed ranking papers, and practice implementing Redis cache patterns for high-throughput systems.
list