Designing Twitter A System Design Interview Question (original) (raw)

Last Updated : 31 Oct, 2025

Designing Twitter (or Facebook feed or Facebook search..) is a quite common question that interviewers ask candidates. A lot of candidates get afraid of this round more than the coding round because they don't get an idea of what topics and tradeoffs they should cover within this limited timeframe.

flowchart_6

1. How Would You Design Twitter?

Don't jump into the technical details immediately when you are asked this question in your interviews. Do not run in one direction, it will just create confusion between you and the interviewer. Most of the candidates make mistakes here and immediately they start listing out some bunch of tools or frameworks like MongoDB, Bootstrap, MapReduce, etc.

You can put yourself in a situation where you're working on real-life projects. Firstly, define the problem and clarify the problem statement.

2. Requirements for Twitter System Design

2.1 Functional Requirements:

2.2 Non Functional Requirements:

2.3 Explicit Non‑Goals (MVP)

3. Capacity Estimation for Twitter System Design

To estimate the system's capacity, we need to analyze the expected daily click rate.

3.1 Traffic Estimation:

Let us assume we have 1 billion total users with 200 million daily active users (DAU), and on average each user tweets 5 times a day. This give us 1 billion tweets per day.

200 million * 5 tweets = 1 billion/day

Tweets can also contains media such as images, or videos. We can assume that 10 percent of tweets are media files shared by users, which gives us additional 100 million files we would need to store.

10 percent * 1 billion = 100 million/day

For our System Request per Second (RPS) will be:

1 billion requests per day translate into 12K requests per second.

1 billion / (24 hrs * 3600 seconds) = 12K requests/second

Throughput

3.2 Storage Estimation:

Lets assume each message on average is 100 bytes, we will require about 100 GB of database storage every day.

1 billion * 100 bytes = 100 GB/day

10 percent of our daily messages (100 million) are media files per our requirements. Let's assume each file is 50KB on average, we will require 5 TB of storage everyday.

100 million * 50 KB = 5TB/day

For 10 years require 19 PB of storage.

(5TB + 0.1 TB ) * 365 days * 10 years = 19 PB

3.3 Bandwidth Estimation

As our system is handling 5.1 TB of ingress everyday, we will require a minimum bandwidth of around 60 MB per second.

5.1 TB / (24 hrs * 3600 seconds) = 60 MB/second

3.4 Identity & ID Generation

flowchart_5

1. Snowflake-style 64-bit IDs (k-sortable, time-ordered)

A 64-bit integer that encodes time + machine identity + per-millisecond sequence, so IDs are _roughly increasing by time and unique without round-trips to a DB.

2. Where Snowflake IDs are used

3. Assigning worker IDs (10-bit space)

4. Failure & edge cases to plan for

5. User IDs vs. Tweet IDs

4. Use Case Design for Twitter System Design

In the above Diagram,

5. Low Level Design for Twitter System Design

A low-level design of Twitter dives into the details of individual components and functionalities. Here's a breakdown of some key aspects:

low-level-design2-copy

5.1 Data storage:

5.2 Core functionalities:

5.3 Additional considerations:

low-level-design1-copy-(1)

6. High Level Design for Twitter System Design

We will discuss about high level design for twitter,

High-Level-Design-of-Twitter-

6.1 Architecture:

For twitter we are using microservices architecture since it will make it easier to horizontally scale and decouple our services. Each service will have ownership of its own data model. We will divide our system into some cores services.

6.2 User Services

This service handles user related concern such as authentication and user information. Login Page, Sign Up page, Profile Page and Home page will be handle into User services.

6.3 Newsfeed Service:

This service will handle the generation and publishing of user newsfeed. We will discuss about newsfeed in more details. When it comes to the newsfeed, it seems easy enough to implement, but there are a lot of things that can make or break this features. So, let's divide our problem into two parts:

6.3.1 Generation:

Let's assume we want to generate the feed for user A, we will perform the following steps:

Feed geneartion is an intensive process and can take quite a lot of time, especially for users following a lot of people. To imporve the performance, the feed can be pre-generated and stored in the cache, then we can have a mechanism to periodically update the feed and apply or ranking algorithm to the new tweets.

6.3.2 Publishing

Publishing is the step where the feed data us pushed according to each specify user. This can be a quite heavy operation, as a user may have million of friend or followers. To deal with this, we have three different approcahes:

Pull-Design-copy

Push-Design-copy-(1)

6.4 Tweet service:

The tweet service handle tweet-related use case such as posting a tweet, favorites, etc.

6.5 Retweets :

Retweets are one of our extended requirements. To implement this feature, we can simply create a new tweet with the user id of the user retweeting the original tweet and then modify the type enum and content property of the new tweet to link it with the original tweet.

6.6 Search Service:

This service is responsible for handling search related functionality. In search service we get the Top post, latest post etc. These things we get because of ranking.

6.7 Media Service:

This service will handle the media(images, videos, files etc.) uploads.

6.8 Analytics Service:

This service will be use for metrics and analytics use cases.

6.9 Ranking Algorithm:

We will need a ranking algorithm to rank each tweet according to its relevance to each specific user.

Example: Facebook used to utilize an EdgeRank algorithm. Here, the rank of each feed item is described by:

Rank = Affinity * Weight * Decay

Where,

Now a days, algorithms are much more complex and ranking is done using machine learning models which can take thousands of factors into consideration.

6.10 Search Service

6.12 Notifications Service:

6.13 Architecture Overview

[Mobile/Web]
│ HTTPS / HTTP2 + TLS

[API Gateway / Edge] ──> [Auth] ──> [Rate Limiter] ──> [Service Mesh]
│ │
│ ├── Tweet Service (write path)
│ ├── Engagement Service (likes/retweets/replies)
│ ├── Social Graph Service (follow edges)
│ ├── Timeline Service (home/profile)
│ ├── Search Service (query/index)
│ ├── Media Service (upload/origin)
│ ├── Notification Service (push/email)
│ └── User Service (profiles, identity)

├─► CDN (images/video, thumbnails)
└─► WebSocket/SSE fanout (live updates)

Core Data Plane:

6.14 Write Path (Tweet/Create)

6.15 Home Timeline (Read Path & Fan-out Strategy)

**a. Approaches

**b. Read Flow

  1. Client calls GET /timeline/home?cursor=….
  2. Check cache (Redis): materialized page for user. On miss, fetch TimelineEntries (pinned lists) and rank.
  3. Join with tweet bodies (batch mget), filter blocks/mutes, apply visibility.
  4. Return paginated results + nextCursor (opaque).

**c. Ranking (MVP → ML)

**d. Caching

**a. Indexing

**b. Query

**c. Trends

**6.19 Media Pipeline

**6.20 Notifications

7. Data Model Design for Twitter System Design

This is the general Dara model which reflects our requirements.

DataBase-design

Database Design for Twitter

In the diagram, we have following table:

7.1 Users:

In this table contain a user's information such name, email, DOB, and other details where ID will be autofield and it will be unique.

Users
{
ID: Autofield,
Name: Varchar,
Email: Varchar,
DOB: Date,
Created At: Date
}

7.2 Tweets:

As the name suggests, this table will store tweets and their properties such as type (text, image, video, etc.) content etc. UserID will also store.

Tweets
{
id: uuid,
UserID: uuid,
type: enum,
content: varchar,
createdAt: timestamp
}

7.3 Favorites:

This table maps tweets with users for the favorite tweets functionality in our application.

Favorites
{ id: uuid,
UserID: uuid,
TweetID: uuid,
CreatedAt: timestamp

7.4 Followers:

This table maps the followers and followess (one who is followed) as users can folloe each other. The relation will be N:M relationship.

Followers
{
id: uuid,
followerID: uuid,
followeeID: uuid,
}

7.5 Feeds:

This table stores feed properties with the corresponding userID.

Feeds
{
id: uuid,
userID,
UpdatedAt: Timestamp
}

7.6 Feeds_Tweets:

This table maps tweets and feed. There relation will be (N:M relationship).

feeds_tweets{
id: uuid,
tweetID: uuid,
feedID: uuid
}

7.7 What Kind of Database is used in Twitter?

7.8 Core Entities (Logical Model – Production)

**7.9 Indexing & Sharding Keys (Practical)

**7.10 Storage Choices (Operational)

8. API Design for Twitter System Design

A basic API design for our services:

8.1 Post a Tweets:

This API will allow the user to post a tweet on the platform.

{
userID: UUID,
content: string,
mediaURL?: string
}

8.2 Follow or unfollow a user

This API will allow the user to follow or unfollow another user.

Follow `

{ followerID: UUID, followeeID: UUID }

Unfollow

{ followerID: UUID, followeeID: UUID }

`

Parameters

8.3 Get NewsFeed

This API will return all the tweets to be shown within a given newsfeed.

getNewsfeed `

{ userID: UUID }

`

Parameters

9. Microservices Used for Twitter System Design

9.1 Data Partitioning

To scale out our databases we will need to partition our data. Horizontal partitioning (aka Sharding) can be a good first step. We can use partitions schemes such as:

The above approaches can still cause uneven data and load distribution, we can solve this using Consistent hashing.

9.2 Mutual friends

9.3 Metrics and Analytics

9.4 Caching

9.5 Media access and storage

9.6 Content Delivery Network (CDN)

10. Scalability for Twitter System Design

Let us identify and resolve Scalability such as single points of failure in our design:

To make our system more resilient we can do the following:

**11. Caching Strategy

**12. Partitioning & Hot Keys

**13. Consistency Model

**14. Reliability, SRE & Observability

**15. Security, Privacy & Abuse

**16. Cost Controls

**17. Failure Modes & Mitigations (Interview-Gold)

18. E2E Sequence (Tweet → Follower Home)

XML `

Client → API → TweetSvc: validate, allocate id TweetSvc → Store: write quorum OK TweetSvc → Bus: publish TweetCreated FanoutWorker (consumes):

`

19. Schema Sketches (illustrative)

XML `

Tweet{ tweetId PK, authorId, text, createdAt, replyToId?, quoteOfId?, media[], lang, visibility, entities{hashtags[],mentions[]} }

FollowEdge{ followerId, followeeId, createdAt, state, PRIMARY KEY(followerId, followeeId) }

TimelineEntry{ userId, tweetId, insertedAt, rankScore, source, PRIMARY KEY(userId, insertedAt DESC, tweetId) }

Like{ userId, tweetId, createdAt, PRIMARY KEY(userId, tweetId) }

Retweet{ userId, srcTweetId, retweetId, createdAt, PRIMARY KEY(userId, srcTweetId) }

Notification{ userId, notifId, type, actorId, subjectId, createdAt, state }

`

**20. Search Relevance Features (starter)