Tall Arrays and mapreduce - MATLAB & Simulink (original) (raw)

Main Content

Analyze big data sets in parallel using MATLAB® tall arrays and datastores or mapreduce on Spark™ and Hadoop® clusters, and parallel pools

You can use Parallel Computing Toolbox™ to evaluate tall-array expressions in parallel using a parallel pool on your desktop. Using tall arrays allows you to run big data applications that do not fit in memory on your machine. You can also use Parallel Computing Toolbox to scale up tall-array processing by connecting to a parallel pool running on a MATLAB Parallel Server™ cluster. Alternatively, you can use a Spark enabled Hadoop cluster running MATLAB Parallel Server. For more information, see Big Data Workflow Using Tall Arrays and Datastores.

Functions

expand all

Key Functions

tall	Create tall array
datastore	Create datastore for large collections of data
mapreduce	Programming technique for analyzing data sets that do not fit in memory
mapreducer	Define parallel execution environment for mapreduce and tall arrays
partition	Partition a datastore
numpartitions	Number of datastore partitions

Classes

expand all

Key Classes

parallel.Pool	Parallel pool of workers
parallel.cluster.Hadoop	Hadoop cluster for mapreducer, mapreduce and tall arrays
parallel.cluster.Spark	Spark cluster for mapreducer, mapreduce and tall arrays (Since R2022b)

Examples and How To

Big Data Workflow Using Tall Arrays and Datastores
Learn about typical workflows using tall arrays to analyze big data sets.
Use Tall Arrays on a Parallel Pool
Discover tall arrays in Parallel Computing Toolbox and MATLAB Parallel Server.
Process Big Data in the Cloud
This example shows how to access a large data set in the cloud and process it in a cloud cluster using MATLAB® capabilities for big data.
Use Parallel Computing to Optimize Big Data Set for Analysis
This example shows how to optimize data preprocessing for analysis using parallel computing. (Since R2024a)
Use Tall Arrays on a Spark Cluster
Create and use tall tables on Spark clusters without changing your MATLAB code.
Run mapreduce on a Parallel Pool
Try mapreduce for advanced analysis of big data using Parallel Computing Toolbox.
Run mapreduce on a Hadoop Cluster
Learn about mapreduce for advanced big data analysis on a Hadoop cluster.
Partition a Datastore in Parallel
Use partition to split yourdatastore into smaller parts.

Concepts

Run Code on Parallel Pools
Learn about starting and stopping parallel pools, pool size, and cluster selection.

Featured Examples

Process Big Data in the Cloud

Access a large data set in the cloud and process it in a cloud cluster using MATLAB® capabilities for big data.

Open Live Script

Use Parallel Computing to Optimize Big Data Set for Analysis

Optimize data preprocessing for analysis using parallel computing.

Since R2024a
Open Live Script

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

(English)
(Deutsch)
(Français)
（简体中文）
(English)

You can also select a web site from the following list

How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

Americas

América Latina (Español)
Canada (English)
United States (English)

Europe

Belgium (English)
Denmark (English)
Deutschland (Deutsch)
España (Español)
Finland (English)
France (Français)
Ireland (English)
Italia (Italiano)
Luxembourg (English)
Netherlands (English)
Norway (English)
Österreich (Deutsch)
Portugal (English)
Sweden (English)
Switzerland
United Kingdom (English)

Asia Pacific

Australia (English)
India (English)
New Zealand (English)
中国
- 简体中文Chinese
- English
日本Japanese (日本語)
한국Korean (한국어)

Contact your local office