Apache HBase (original) (raw)

Last Updated : 24 Apr, 2026

Apache HBase is a distributed, scalable, NoSQL database built on top of the Hadoop Distributed File System (HDFS). It is modeled after Google's Bigtable and is designed for storing large volumes of sparse, unstructured, or semi-structured data across clusters. HBase is column-oriented, supports horizontal scaling, and allows for real-time read/write access, making it an essential component of the Hadoop ecosystem.

**Example: Facebook migrated from Cassandra to HBase in 2010 to power its messaging infrastructure, needing a scalable, real-time system to unify chat, email, and SMS conversations.

**HBase evolution flowchart:

Key Features of Apache HBase

Architecture of Apache HBase

Apache HBase follows a master-slave architecture and is built on top of Hadoop HDFS. Here's how its major components work together:

Apache_HBase_Architecture

Apache HBase architecture

**Here’s how each part of the Apache HBase architecture works in detail:

1. HMaster

Acts as the master node of the HBase cluster. Its main responsibilities include:

If the HMaster fails, a backup HMaster can take over to ensure high availability.

2. RegionServer

A worker node in HBase, serving client requests. Each RegionServer manages multiple regions, meaning chunks of tables. Internally, RegionServers have:

RegionServers also manage WAL (Write Ahead Log) for crash recovery. If a RegionServer fails, the HMaster reassigns its regions to other RegionServers.

3. Region

4. ZooKeeper

It's an external, reliable coordination service. HBase uses ZooKeeper to:

Without ZooKeeper, the cluster cannot coordinate properly.

5. HDFS (Hadoop Distributed File System)

HBase uses HDFS to store actual data on disk. It stores HFiles which are compressed files with the actual data and also WAL (Write Ahead Logs) files for durability. It Provides fault-tolerant, distributed storage so data is protected even if hardware fails.

Read related article Difference Between Hive and HBase