Design Distributed Cache | System Design (original) (raw)

Last Updated : 4 May, 2026

Designing a Distributed Cache system involves building a fast, scalable, and reliable layer to store frequently accessed data. It helps reduce latency and improves overall system performance.

Caching

In computing, a cache is a high-speed storage layer that temporarily stores frequently accessed data to improve performance. It helps serve future requests faster compared to fetching data from the primary storage.

Distributed Caching

Distributed caching is a technique where cached data is stored across multiple servers to improve scalability and performance. It reduces load on the main database and speeds up data access.

distributed_cache

Distributed Cache

1. System Requirements

This section defines what the distributed cache system should do and how well it should perform under different conditions.

1. Functional Requirements for Distributed Cache Design

Functional Requirements define what the cache system must do to meet application needs.

2. Non-Functional Requirements for Distributed Cache Design

Non-Functional Requirements define how well the cache system performs under various conditions.

2. Use Case Diagram

A use case diagram helps visualize the interactions between users and the system.

use_case_diagram_1

Use Case diagram

3. Capacity Estimation

Capacity estimation involves calculating the expected load on the system.

1. Traffic Estimate

Estimate read and write requests per second to ensure cache handles the expected load.

For example, if we expect 10,000 reads per second and 1,000 writes per second, our cache should handle this load.

2. Storage Estimate

Determine the total data size to be stored in the cache.

If each entry is 1KB and we have 1 million entries, the total storage required is 1GB.

3. Bandwidth Estimate

Calculate the required bandwidth for read and write operations.

For example, if each read operation is 1KB and we have 10,000 reads per second, the read bandwidth is 10MB/s.

4. Memory Estimate

Determine memory requirements per node and across the cluster.

If each node handles 10GB of data and we have 10 nodes, the total memory required is 100GB.

4. High-Level Design

The high-level design of a distributed cache system outlines the overall architecture, key components, and their interactions. It focuses on the big picture, ensuring that the system is scalable, fault-tolerant, and efficient. A high-level design outlines the overall architecture of the system.

high_level_design

HLD

The high-level design of a distributed cache system, as illustrated in the above diagram, outlines the major components and their interactions to achieve a scalable, fault-tolerant, and efficient caching mechanism. Key components include:

5. Low-Level Design

The low-level design (LLD) of the distributed cache system, as depicted in the diagram provided below, outlines the detailed interactions and responsibilities of each component in the system. This design delves into specific classes or modules, their functions, and how they collaborate to achieve the desired functionality. A low-level design provides detailed descriptions of system components and interactions.

low_level_design

LLD

Components of the Low-Level Design include:

6. Database Design

A distributed cache system combines in-memory storage with backend databases to provide fast data access and durability. Its design ensures data consistency, fault tolerance, and efficient cache management, often integrating with databases for persistence.

CacheEntry Table

**SQL of above database table:

SQL `

CREATE TABLE CacheEntry ( key VARCHAR(255) PRIMARY KEY, value TEXT, expiration_time TIMESTAMP, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP );

`

7. Microservices and APIs

In a distributed cache system, microservices play a crucial role in ensuring modularity, scalability, and maintainability. Each microservice handles a specific set of functionalities and interacts with other microservices through well-defined APIs.

1. Cache Service

The Cache Service is responsible for handling read and write operations on the cache.

**1. Set Cache Data API:

**Endpoint: POST /cache

**Request:

Request `

{ "key": "string", "value": "string", "ttl": "integer" // Time-to-live in seconds }

`

**Response:

Response `

{ "status": "success", "message": "Data cached successfully." }

`

**2. Get Cache Data API:

Endpoint: GET /cache/{key}

**Request: None (key is part of the URL)

**Response:

Response `

{ "key": "string", "value": "string", "ttl": "integer" // Remaining time-to-live in seconds }

`

**3. Delete Cache Data API:

Endpoint: DELETE /cache/{key}

**Request: None (key is part of the URL)

**Response:

Response `

{ "status": "success", "message": "Data deleted successfully." }

`

2. Replication Service

The Replication Service manages the replication of data across multiple cache nodes to ensure data availability and fault tolerance.

**1. Replicate Data API:

Endpoint: POST /replicate

**Request:

Request `

{ "key": "string", "value": "string" }

`

**Response:

Response `

{ "status": "success", "message": "Data replicated successfully." }

`

**2. Get Replication Status API

Endpoint: GET /replication/status/{key}

**Request: None (key is part of the URL)

**Response:

Response `

{ "key": "string", "replication_status": "string" // e.g., "completed", "in_progress", "failed" }

`

3. Node Management Service

The Node Management Service handles the addition and removal of cache nodes, ensuring the system can scale dynamically.

**1. Add Node API

Endpoint: POST /node/add

**Request:

Request `

{ "node_id": "string", "node_address": "string" }

`

**Response:

Response `

{ "status": "success", "message": "Node added successfully." }

`

**2. Remove Node API

Endpoint: DELETE /node/remove/{node_id}

**Request: None (node_id is part of the URL)

**Response:

Response `

{ "status": "success", "message": "Node removed successfully." }

`

4. Coordinator Service

The Coordinator Service manages consistent hashing, rebalancing, and overall coordination of the cache nodes.

**1. Rebalance Data API

Endpoint: POST /rebalance

**Request:

Request `

{ "action": "start" // or "stop" to halt rebalancing }

`

**Response:

Response `

{ "status": "success", "message": "Rebalancing initiated." }

`

**2. Get Rebalance Status

Endpoint: GET /rebalance/status

**Request: None

**Response:

Response `

{ "rebalance_status": "string" // e.g., "in_progress", "completed", "not_started" }

`

5. Monitoring and Management Service

This service tracks performance metrics and health status of the cache system, providing insights for administrators.

**1. Get Cache Metrics API

Endpoint: GET /metrics

**Request: None

**Response:

Response `

{ "cache_hits": "integer", "cache_misses": "integer", "node_health": [ { "node_id": "string", "status": "string" // e.g., "healthy", "unhealthy" } ] }

`

**2. Get Node Health

Endpoint: GET /node/health/{node_id}

**Request: None (node_id is part of the URL)

**Response:

Response `

{ "node_id": "string", "status": "string" // e.g., "healthy", "unhealthy" }

`

8. Scalability for Distributed Cache Design

To ensure scalability, the system should support horizontal scaling, load balancing, and efficient data distribution.