structured data (original) (raw)

What is structured data?

Structured data is data that has been organized into a formatted repository, typically a database. This is done so the data's elements can be made addressable for more effective processing and analysis. The data resides in a fixed field within a record or file.

Structured data contrasts with unstructured and semi-structured data. The three types of data exist on a continuum, with unstructured data being the least formatted and structured data being the most formatted. The more structured a set of data is, the more amenable to processing and analysis it is.

How does structured data work?

Structured data needs a data model and data repository, which is usually a database. A data model organizes elements of data and defines how they relate to one another. For example, a data model might specify that the data element representing a customer in a database contain several smaller elements or attributes that represent specific customer information. Examples of structured data in this scenario include a person's name, phone number, address and ZIP code.

The structure of the data might be enforced. For example, the ZIP code field might only accept numeric data that is five characters long. This maintains the integrity of the data, while preventing data that doesn't fit this description from being entered into the schema. The nature of structured data is that it can be logically grouped by similar values and constraints.

Data is defined narrowly by these constraints and written to defined slots in a data repository. For instance, in a database, each field in a record is discrete, and its information can be retrieved either on its own or along with data from other fields, in a various combination.

Databases make data more comprehensive so that it yields useful information. Structured data might also be stored in rows and columns in a relational database, which links tables of data together so they can be tapped by a broader set of search criteria to return more detailed information.

A database query language, such as structured query language (SQL), lets a database administrator interact with and manipulate the data in the database. Extract, transform and load (ETL) processes are sometimes used to integrate different structured databases into a data warehouse.

Table showing relational vs. nonrelational databases

Relational databases handle structured data, while nonrelational databases deal with unstructured data.

Benefits and drawbacks of structured data

There are many benefits to using structured data, including:

There are also drawbacks to structured data, such as:

Use cases for structured data

Some common examples of how structured data is used include:

Table showing structured vs. unstructured data

Key areas of differentiation between structured and unstructured data include the type of analysis, schema used, type of format and various storage criteria.

Structured vs. unstructured data

There are several key differences between unstructured and structured data. Whereas structured data is highly specific and conforms to a predefined data model, unstructured data does not. Unstructured data is generally stored in its native format. It is more plentiful and flexible than structured data, but it is harder to manipulate and often requires more advanced methods and data science techniques to do so.

Examples of unstructured data include social media posts; IoT remote sensor data; rich media, such as images, video and audio; and webpages. Unstructured data is often stored in data lakes.

When the HTML microdata function is used as an SEO tool, it helps provide structure to the otherwise unstructured data of webpages.

List of unstructured data types

Unstructured data can come from the web or other applications and does not need to fit into a predefined schema to be stored or used.

Data lake governance can be difficult because of the sheer amount of unstructured data stored. Learn some governance principles and best practices to keep a data lake from becoming a data swamp.

This was last updated in January 2023

Continue Reading About structured data