Export Data to Parquet File Using Record Block - MATLAB & Simulink (original) (raw)

When you log data using a Record block, you can chose to log data to the workspace, a file, or both. One file type that you can record data to is a Parquet file. Parquet is an open-source file format with efficient compression and encoding of column-oriented data that is often used when processing big data.

Use the Record block to log scalar and multidimensional data from a signal, message, bus, or array of buses to a Parquet file. The Record block also supports logging variable-size signals, but does not support visualizing or logging variable-size signals to a Parquet file. Use the Record block to log real and complex data of any built-in data type or user-defined data types such as buses, enumerations, and fixed-point data.

To log data to a Parquet file, double-click the Record block. Click theRecord To File button arrow. From this menu, select and set the file type to*.parquet. By default, the filename isrecording.parquet, and the file is saved in the local folder. You can also select to change other options specific to Parquet files, such as the compression style, time column configuration, and row group options.

The Record To File menu expanded in the Record block.

You can also configure the Record block to log data to a Parquet file programmatically using the set_param function. First, set theRecord block to record to a file. Then, specify the name of the Parquet file.

set_param("MyModel/Record","RecordToFile","on","FileName","myRecording.parquet")

Once the Record block is configured to record data to a Parquet file, you can also use the set_param function to set other Parquet file options, such as the compression style.

set_param("MyModel/Record","ParquetCompression","compact")

You can log data to a Parquet file using normal, accelerator, and rapid accelerator mode simulations. All signals connected to the Record block are written to the Parquet file when the simulation is paused or stopped. By default, when you record data to a Parquet file, the Record block stores all logged metadata associated with the Simulink® simulation in a JSON sidecar. The sidecar contains information such as the block path, port index, and sample time.

The Record block supports logging some data types that are not supported by a Parquet file. Most data types, such as double, int, orstring, do not change when the Record block saves data to a Parquet file. This table shows the data types supported by the Record block and how that data is represented when saved to a Parquet file.

Simulink Data Type Parquet File Logical Data Type
double double
single single
int8 int8
int16 int16
int32 int32
int64 int64
uint8 uint8
uint16 uint16
uint32 uint32
uint64 uint64
string string
Boolean Boolean
half double
fixed point double (fixed-point data is stored in the JSON sidecar)
enum int32
image Data type of underlying image data
datetime double representation of epoch time

For more information about Parquet file data types, see Apache Parquet Data Type Mappings.

How the Record block formats data in the Parquet file depends on the type of signal being recorded. This table shows how each type of Simulink signal is recorded in the Parquet file.

Simulink Signal Type Parquet File Logging Format
Scalar signal Single column with a scalar value at each time step
Scalar signal with complex data Single column with a 1-by-2 vector representing the real and imaginary parts of the complex value at each time step
Nonscalar signal The logging format of a nonscalar signal depends on how the signal is represented in the Record block. Single signal with multidimensional sample points — Single column with sample values in the form of a vector, list of column vectors, or a nested list of column vectors for each time stepSet of channels — Separate columns for each channel, each containing scalar values for each time step
Nonscalar signal with complex data The logging format of a nonscalar signal depends on how the signal is represented in the Record block. Single signal with multidimensional sample points — Single column containing 1-by-2 vectors representing the real and imaginary parts of each sample value nested in a vector, list of column vectors, or a nested list of column vectors at each time stepSet of channels — Separate columns for each channel, each containing a1-by-2 vector representing the real and imaginary parts of the complex value at each time step
Virtual or nonvirtual bus Separate columns for each element in the bus or bus hierarchy
Array of buses Separate columns for each element in the array of buses

Single-Rate and Multirate Data

You can choose to save data to a Parquet file using shared or individual time columns. When you save single-rate data with a shared time column, the first column in the file contains time data, followed by columns containing signal data. The Record block logs data to a Parquet file using shared time columns by default.

A model that logs two signals to a Record block, with a Parquet file that contains one time column followed by two columns of signal data.

When you save multirate data using shared time columns, data is grouped by shared time vectors. Time columns specify the sample times for signals to the right, up to the next time vector.

A model that logs five signals to a Record block. Three signals have a sample time of 0.5, while the other two have a sample time of 0.1. In the Parquet file, columns for the three signals with a 0.5 sample time follow the time column with time steps of 0.5. Then, columns for the two signals with a 0.1 sample time follow a separate time column with time steps of 0.1.

A Parquet file requires that all columns be of equal length. When you record signals that are not of equal length to the same Parquet file, the Record block appends NULL to any empty cells.

To save data to a Parquet file using individual time columns, select > > . Alternatively, you can configure the Record block to log data to the Parquet file using individual time columns programmatically using the Time parameter. To specify how to log time data, first set the block to record data to a Parquet file.

set_param("MyModel/Record","RecordToFile","on","FileName","myRecording.parquet","SharedTimeColumn","off")

When you save data using individual time columns, the software saves data in pairs of time and signal data columns.

A model that logs two signals to a Record block, with the Parquet file using separate time columns for each signal data column.

Complex Signals

The Record block exports complex sample values to a Parquet file as a1×2 vector, where the first element is the real part and the second element is the imaginary part of the complex value.

A model that logs complex data to a Record block, with the Parquet file storing real and imaginary parts as a two-element vector. For example, at time 0.2, the signal value 0.3973 + 0.5960i is saved as [0.3973, 0.5960].

Multidimensional Signal Data

Multidimensional signal data with fixed dimensions can be represented in theRecord block in two ways:

When multidimensional signal data is represented as a single signal with multidimensional sample points, the data for each time step is stored in the Parquet file as vectors for one-dimensional arrays, a list of column vectors for two-dimensional arrays, or as a nested list of column vectors for arrays with more than two dimensions.

A model logs a 2 by 3 matrix using a Record block, saving data to a Parquet file with two columns: time and signal data grouped in three 1 by 2 column-wise vectors of the 2 by 3 matrix signal at each sample time.

By default, signals with samples that contain fewer than five elements are represented as channels. When multidimensional signal data is represented as channels in theRecord block, the Parquet file allocates a separate column for each channel.

You can control how multidimensional signal data is saved to a Parquet file using theexpand function, the collapse function, or the signal table in the Record block or the Simulation Data Inspector. When you change the representation of a multidimensional signal in theRecord block, the change is reflected in the Simulation Data Inspector and vice versa. For more information, see Analyze Multidimensional Signal Data.

When you save multidimensional signals that contain complex data to a Parquet file, each sample element is a nested 1×2 vector, where the first element is the real part and the second element is the imaginary part of the complex value. For real values, the second element is 0.

A model logs a 2 by 3 matrix signal containing complex data using a Record block. In the saved Parquet file, there are two columns: time and signal data grouped into three 1 by 2 column-wise vectors of the 2 by 3 matrix signal at each sample time. Each element of the sample values is represented as a pair of real and imaginary components in the form ([real, imaginary]).

The Record block does not support saving variable-sized signals to a Parquet file.

Buses

You can export data logged from virtual or nonvirtual buses to a Parquet file. In the Parquet file, dots in signal names specify the bus hierarchy.

A model containing a nested bus connected to a Record block. The associated Parquet file uses dot notation to specify the bus heirarchy. For example, the signal named sine is an element of nestedBus, which is an element of topBus. In the Parquet file, this signal is named topBus.nestedBus.sine_data.

You can also use the Record block to save arrays of buses to a Parquet file. The Record block saves each element in the array of buses as a column in the Parquet file.

A model that logs an array of two buses to a Record block. Each bus in the array of buses contains two signals named a and b. The Record block uses a combination of index and dot notation in the Parquet file. For example, the column for the signal named a in the first nonvirtual bus is labeled AOB(1).a_data.

Enumerated Data

The Record block supports logging enumerated data. When you save enumerated data to a Parquet file, the Record block exports only the underlying integer data as int32 values.

For example, the MyColors class in this model defines a set of enumerated values consisting of six colors, each associated with an integer value between0 and 5.

Logged enumerated data visualized in the Record block.

When you save the enumerated data to a Parquet file, only the underlying integer values associated with each enumerated value are saved in the file.

Model that records enumerated data. The Parquet file logs the underlying integer values associated with each enumerated value but does not log the enumerated name.

See Also

Record | Playback | parquetread | parquetinfo

Topics