Create and Configure Azure HDInsight (original) (raw)

Last Updated : 23 Jul, 2025

In our chapter about the amazing Poly Base thingy, we presented this super cool SQL Server 2024 feature to query CSV files stored in Azure Storage accounts. We mentioned that in PolyBase, hey, you can query data in Hadoop (HDInsight) using SQL Server. HDInsight is like, totally a very popular system in Azure that eventually you will, like, need to interact with if you use SQL Server. That is why we will, like, give an explanation for all the newbies out there about it, you know?

What is Hadoop?

It's an extremely scalable Distributed File System (HDFS) used for handling big data. There are multiple scenarios when a traditional database such as SQL Server or Oracle is not the optimal way to store data. For instance, to store YouTube or Facebook info, it would be very expensive to store all the images and videos in a traditional database. That's why Hadoop was invented. Hadoop can handle Petabytes of info easily using several distributed computers. With Hadoop, you can easily manage SQL and NoSQL Data and it's easy to distribute the info to several servers.

What is HDInsight?

Understanding Of Primary Terminologies

Configuring Azure HDInsight : A Step-By-Step Guide

**Step 1: We will learn how to create an Hadoop clusters, upload a CSV file and query the file using HIVE (a query language in Hadoop)

Creating A Linux Cluster

Step 2: Cluster Configuration

Cluster Configurations

Choose Hbase for Cluster configuration

Cluster Type

**Step 3: In a credential section, you will need login access and administer the cluster and another account to use SSH.

Creating HDInsight Cluster

**Step 4: HDInsight is stored in an Azure Storage Account; it's then stored in a container. A container is kind of like a folder to store information in Azure. You can also specify the location to store. Usually, the location should be, close to, your local, location!

Azure Accounts, containers and location

**Step 5: Then Press View all to see all the different options:

Configuring And Pricing HDInsight Cluster

**Step 6: You need create resource group or create new one. There groups are used to a group resources to the make the administration easier and the press create:

Resource Group

Resource group

**Step 7: The Login with the credentials a created and press Log In

Authentication With Credentials

**Step 8: In a Dashboard shows hardware of information like disk usage, node time, number of live nodes, memory and network usage:

Hardware Metrics And Dashboard

**Step 9: Go Query Tab and click Default. The table created by default hive sample table:

Sample Query Table

**Step 10: Can query the customers csv file using the following query:

Query Custom Csv File

**Step 11: In results you see the values of the csv File like if were table:

results displayedord-image-20

**Step 11: That the MASE is installed, connect to the Azure Storage Account and blob container created in Step 4 and go to the hive folder:

folders in HDInsight

Conclusion

In a Azure HDInsight is a robust cloud service that empowers organizations to unlock the potential of big data by offering a fully managed environment for Apache Hadoop and Spark clusters. By understanding its features, configuration process, supported cluster types, and data storage options, users can harness the power of Azure HDInsight to drive meaningful insights and innovation in their data analytics endeavors.