Designing Facebook Messenger | System Design Interview (original) (raw)

Last Updated : 6 Apr, 2026

We are designing a real-time messaging app similar to Facebook Messenger that can support millions of users. The system focuses on scalable architecture and efficient communication. It also includes features like group messaging, media sharing, and notifications.

1. System Requirements

This section defines what Facebook Messenger must achieve in terms of features and performance.

1. Functional Requirements

This section lists the main capabilities the system should provide to users.

2. Non-Functional Requirements

This section describes the performance, scalability, and security expectations of the system.

2. Design of Facebook Messenger

Now discuss the overall architecture of the app. Specifically, let's see how we can send messages from one user to another:

fb_1_

FB - API Server

3. Communication Protocol

What happens when user A sends a message to user B?

When user A sends a message, it reaches the server, which should instantly deliver it to user B. However, traditional HTTP does not support server-initiated communication. Therefore, a different approach (like persistent connections) is needed for real-time messaging.

There are a few options we can use, Let's discuss them and their trade-offs:

1. **HTTP polling

The client repeatedly sends requests to the server to check for new messages, but most of the time the server responds with no new information. This leads to unnecessary network traffic and increased server load. It also causes higher latency since messages are only received when the client asks for them. Overall, this approach is inefficient for real-time communication.

2. **Long polling

In this model, the client sends an HTTP request and the server holds it open until new data is available before responding. This creates a near-continuous connection by immediately opening a new request after each response. It reduces latency compared to constant polling, as data is delivered as soon as it’s available. However, it still relies on repeated connections and is not ideal for real-time chat systems. It is more suitable for use cases like notifications.

3. W**ebsockets

WebSockets maintain a persistent connection between the client and server, enabling full-duplex communication. This allows both the client and server to send data to each other in real time without repeated requests. The connection stays open for the entire session, making it ideal for real-time applications like chat. However, there are practical limits on how many concurrent connections a server can handle.

WebSockets run over TCP, which limits each server to around 65,000 concurrent connections due to port constraints. To handle large-scale users, multiple servers are required instead of a single one. A load balancer is used to distribute connections efficiently across these servers.So we are going to insert a load balancer and going to draw in some API servers as shown below.

fb_2

FB - Load balancer

4. API Used

Here we are taking only three API servers but in a real system, when we are trying to support hundreds of millions of users, we will need hundreds or thousands of API servers to support the huge amount of requests as one API server can handle only thousands of requests at a time.

In addition, we now have a new problem because before we were able to send a message from one user to another via our chat API server, However, now we have a distributed system and we need to be able to communicate from one API server to another. So, we could adopt one design pattern something like a Message Queue. it's sort of a natural solution for a messaging problem between servers in a distributed system.

Below is the message service that is going to implement this message queue and the idea is that each API server will publish messages into this centralized queue and subscribe to updates for the users to whom it is connected to, that way when a new message comes in it can be added to the queue. Any service that is listening for messages for that user, can then receive that update and forward the message to the user.

fb_3

FB - Message Service

5. **High-Level System Architecture

Each server handling WebSocket connections is limited by the TCP protocol, which provides around 65,000 ports. This means a single server cannot support a very large number of concurrent users in real-time applications. As the number of users increases, this limitation becomes a major bottleneck. Therefore, relying on just one server is not practical for scalable systems.

To address this limitation, multiple servers are deployed to handle the growing number of connections. A load balancer is introduced to distribute incoming requests evenly across these servers. This ensures that no single server is overloaded and improves system performance. As a result, the system becomes more scalable, reliable, and capable of handling millions of concurrent users.

fb_4

FB - Database

6. Data Types

So, how we are going to store and model this data in our database. We know we're going to need a few key tables and features like users, and messages and we are also going to probably need this concept of conversations, which will be groups of users who are supposed to receive messages.

The last thing we need is a way to query and understand which users are part of a conversation and which conversations a user is part of. So for that, we are going to add one more table which we are going to call **conversation_users and it's just going to store the mapping from a conversation id to our user id.

fb_6

Datatypes

7. Scalability

So in particular, one thing we are thinking about is the cost of going to our database and retrieving messages from it repeatedly so one thing we would like to add to the architecture is some sort of caching service or caching layer which would be like a read-through cache.

**How we are going to store media?

Now, how we are going to store media like images and videos, and how we can upload those to the correct place. We are not going to store those in our database but instead, we're going to choose some sort of other storage platform like an object storage service like Amazon S3. Now, in order to make that more efficient we will also want to add caching. In this case, we would use something like a CDN.

The last thing we want to add in the architecture is some sort of way to notify users who are offline, about messages they may have missed. So in this case, we might want to have a notification service that is also going to be contacted by our message service in the event that the user is offline.

fb_5

FB - Scalability