Dataset Inspection and Visualization (original) (raw)

Last Updated : 6 May, 2026

Dataset inspection and visualisation are the first steps in data science. They help you understand your data, spot patterns and identify issues before building models. Tools like Hugging Face Dataset Viewer make this process faster and more interactive.

Hugging Face Dataset Viewer

Hugging Face Dataset Viewer is a built-in tool that allows you to explore datasets directly from the browser without writing code. It simplifies data inspection into a clean and interactive experience.

**Step 1: Open the Hugging Face Dataset Hub

**Step 2: Search for a Dataset

Search-for-dataset

searching for a dataset

**Step 3: Access the Dataset Viewer

**Step 4: Explore Data in Table Format

dataset

Dataset

**Step 5: Navigate Through Data

pagination

Pagination

**Step 6: Use Search Functionality

**Step 7: Inspect Different Data Types

text-data-shown-directly

Text data shown directly

**Step 9: Check Dataset Splits

Visualization in Hugging Face Dataset

The default Dataset Viewer focuses on structured inspection, but true visualization (patterns, clusters, trends) is achieved by integrating tools like Spotlight on top of Hugging Face datasets.

**Step 1: Install Required Libraries

Run the following command in your terminal

pip install datasets renumics-spotlight transformers torch

**Step 2: Import Required Libraries

Importing necessary libraries for loading datasets, processing images and launching visualization.

Python `

from datasets import load_dataset from transformers import ViTForImageClassification, ViTImageProcessor import torch from transformers import ViTModel from renumics import spotlight

`

**Step 3: Load a Dataset from Hugging Face

Loading a sample dataset for visualization.

Python `

ds = load_dataset("cifar100", split="test[:500]")

`

**Step 4: Add Model Predictions

Generating predictions for each data sample using a pre-trained model.

Python `

model_name = "Ahmed9275/Vit-Cifar100"

processor = ViTImageProcessor.from_pretrained(model_name) model = ViTForImageClassification.from_pretrained(model_name)

def add_predictions(example): image = example["img"].convert("RGB") inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

pred = outputs.logits.argmax(dim=-1).item()
example["prediction"] = pred
return example

ds = ds.map(add_predictions)

`

**Step 5: Add Embeddings

Extracting feature vectors (embeddings) from the model.

Python `

feature_model = ViTModel.from_pretrained(model_name)

def add_embedding(example): image = example["img"].convert("RGB") inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = feature_model(**inputs)

embedding = outputs.last_hidden_state[:, 0].squeeze().numpy()
example["embedding"] = embedding
return example

ds = ds.map(add_embedding)

`

**Step 6: Launch Spotlight

Visualizing the dataset with embeddings.

Python `

spotlight.show( ds, dtype={"embedding": spotlight.Embedding} )

`

**Output:

If your dataset contains numerical or structured data, you can perform simple but powerful visualizations to understand patterns, relationships, and overall data behavior.

Advantages

Limitations