Preprocess Data for Domain-Specific Deep Learning Applications - MATLAB & Simulink (original) (raw)

Data preprocessing is used for training, validation, and inference. Preprocessing consists of a series of deterministic operations that normalize or enhance desired data features. For example, you can normalize data to a fixed range or rescale data to the size required by the network input layer.

Preprocessing can occur at two stages in the deep learning workflow.

Data augmentation consists of randomized operations that are applied to the training data while the network is training. Augmentation increases the effective amount of training data and helps to make the network invariant to common distortion in the data. For example, you can add artificial noise to training data so that the network is invariant to noise.

To augment training data, start by loading your data into a datastore. For more information, see Datastores for Deep Learning. Some built-in datastores apply a specific and limited set of augmentation to data for specific applications. You can also apply your own set of augmentation operations on data in the datastore by using the transform andcombine functions. During training, the datastore randomly perturbs the training data for each epoch, so that each epoch uses a slightly different data set.

Image Processing Applications

Augment image data to simulate variations in the image acquisition. For example, the most common type of image augmentation operations are geometric transformations such as rotation and translation, which simulate variations in the camera orientation with respect to the scene. Color jitter simulates variations of lighting conditions and color in the scene. Artificial noise simulates distortions caused by the electrical fluctuations in the sensor and analog-to-digital conversion errors. Blur simulates an out-of-focus lens or movement of the camera with respect to the scene.

Common image preprocessing operations include noise removal, edge-preserving smoothing, color space conversion, contrast enhancement, and morphology.

If you have Image Processing Toolbox™, then you can process data using these operations as well as any other functionality in the toolbox. For an example that shows how to create and apply these transformations, see Augment Images for Deep Learning Workflows.

Processing Type Description Sample Functions Sample Output
Resize images Resize images by a fixed scaling factor or to a target size imresize, imresize3 (Image Processing Toolbox) The original image is on the left . The resized image is on the right.
Warp images Apply random reflection, rotation, scale, shear, and translation to images randomAffine2d (Image Processing Toolbox), randomAffine3d (Image Processing Toolbox) From left to right, the figure shows the original image, the reflected image, the rotated image, and the scaled image.
Crop images Crop an image to a target size from the center or a random position centerCropWindow2d (Image Processing Toolbox), centerCropWindow3d (Image Processing Toolbox)randomWindow2d (Image Processing Toolbox), randomCropWindow3d (Image Processing Toolbox) The image cropped from the center is on the left. The image cropped from a random position is on the right.
Jitter color Randomly adjust image hue, saturation, brightness, or contrast jitterColorHSV (Image Processing Toolbox) From left to right, the figure shows the original image with random adjustments to the image hue, saturation, brightness, and contrast.
Simulate noise Add random Gaussian, Poisson, salt and pepper, or multiplicative noise imnoise (Image Processing Toolbox) The image with randomly added salt and pepper noise is on the left. The image with randomly added Gaussian noise is on the right.
Simulate blur Add Gaussian or directional motion blur imgaussfilt (Image Processing Toolbox), imgaussfilt3 (Image Processing Toolbox)imfilter (Image Processing Toolbox) The image with a Gaussian blur is on the left. The image with a directional motion blur is on the right.

Object Detection

Object detection data consists of an image and bounding boxes that describe the location and characteristics of objects in the image.

If you have Computer Vision Toolbox™, then you can use the Image Labeler (Computer Vision Toolbox) and the Video Labeler (Computer Vision Toolbox) apps to interactively label ROIs and export the label data for training a neural network. If you have Automated Driving Toolbox™, then you also use the Ground Truth Labeler (Automated Driving Toolbox) app to create labeled ground truth training data.

When you transform an image, you must perform an identical transformation to the corresponding bounding boxes. If you have Computer Vision Toolbox, then you can process bounding box data using the operations in the table. For an example that shows how to create and apply these transformations, see Augment Bounding Boxes for Object Detection. For more information, see Get Started with Object Detection Using Deep Learning (Computer Vision Toolbox).

Processing Type Description Sample Functions Sample Output
Resize bounding boxes Resize bounding boxes by a fixed scaling factor or to a target size bboxresize (Computer Vision Toolbox) The original image with bounding box is on the left. The resized image with resized bounding box is on the right.
Crop bounding boxes Crop a bounding box to a target size from the center or a random position bboxcrop (Computer Vision Toolbox) The original image with bounding box is on the left. The cropped image and cropped bounding box is on the right.
Warp bounding boxes Apply reflection, rotation, scale, shear, and translation to bounding boxes bboxwarp (Computer Vision Toolbox) From left to right, the figure shows the original image and bounding box, the reflected image and bounding box, the rotated image and bounding box, and the scaled image and bounding box.

Semantic Segmentation

Semantic segmentation data consists of images and corresponding pixel labels represented as categorical arrays.

If you have Computer Vision Toolbox, then you can use the Image Labeler (Computer Vision Toolbox) and the Video Labeler (Computer Vision Toolbox) apps to interactively label pixels and export the label data for training a neural network. If you have Automated Driving Toolbox, then you also use the Ground Truth Labeler (Automated Driving Toolbox) app to create labeled ground truth training data.

When you transform an image, you must perform an identical transformation to the corresponding pixel labeled image. If you have Image Processing Toolbox, then you can preprocess pixel label images using the functions in the table and any other toolbox function that supports categorical input. For an example that shows how to create and apply these transformations, see Augment Pixel Labels for Semantic Segmentation. For more information, see Getting Started with Semantic Segmentation Using Deep Learning (Computer Vision Toolbox).

Processing Type Description Sample Functions Sample Output
Resize pixel labels Resize pixel label images by a fixed scaling factor or to a target size imresize The original image with pixel labels is on the left. The resized image with pixel labels is on the right.
Crop pixel labels Crop a pixel label image to a target size from the center or a random position imcrop (Image Processing Toolbox)centerCropWindow2d (Image Processing Toolbox), centerCropWindow3d (Image Processing Toolbox)randomWindow2d (Image Processing Toolbox), randomCropWindow3d (Image Processing Toolbox) The image with pixel labels cropped from the center is on the left. The image with pixel labels cropped from a random position is on the right.
Warp pixel labels Apply random reflection, rotation, scale, shear, and translation to pixel label images randomAffine2d (Image Processing Toolbox), randomAffine3d (Image Processing Toolbox) From left to right, the figure shows the original image and pixel labels, the reflected image and pixel labels, the rotated image and pixel labels, and the scaled image and pixel labels.

Lidar Processing Applications

Lidar Toolbox™ enables you to design, analyze, and test lidar systems. You can perform object detection and tracking, semantic segmentation, shape fitting, and registration. Raw point cloud data from lidar sensors requires basic processing before you can use them for these advanced workflows.

Lidar Toolbox provides tools to perform preprocessing such as downsampling, filtering, aligning, and extracting features from point cloud data. You can also augment and transform point clouds to increase the diversity of your training data.

Use Lidar Viewer (Lidar Toolbox) app to visualize, analyze and measure point cloud data. You can preprocess data by using the built-in preprocessing algorithms or import a custom algorithm. For more information, see Create Custom Preprocessing Workflow with Lidar Viewer (Lidar Toolbox).

You can create labeled ground truth training data by using the Lidar Labeler (Lidar Toolbox) app. For more information on automated labelling, see Automate Ground Truth Labeling for Vehicle Detection Using PointPillars (Lidar Toolbox).

Processing Type Description Sample Functions Sample Output
Clean and filter point cloud data Downsample point cloud data using downsampling algorithmsApply median filteringRemove noise pcdownsample (Computer Vision Toolbox)pcmedian (Lidar Toolbox)pcdenoise (Computer Vision Toolbox) From left to right, the figure shows the original point cloud, the downsampled point cloud, and the point cloud with median filtering.
Organize point cloud Convert point cloud into organized format, where you arrange the data as rows and columns according to the spatial relationship between the points pcorganize (Lidar Toolbox) size(ptCloudUnorg.Location) ans = 1×2 37879 3 ptCloudOrg = pcorganize(ptCloudUnorg,params); size(ptCloudOrg.Location) ans = 1×3 64 1024 3
Create blocked point clouds When your data is too large to fit into the memory, divide and process the point cloud as discrete blocks blockedPointCloud (Lidar Toolbox)blockedPointCloudDatastore (Lidar Toolbox) The original point cloud is on the left. The blocked point cloud is on the right.
Augment point cloud data Apply a geometric transformation, such as random rotation, translation, shearing, and scalingRandomly add bounding boxes to the training data pctransform (Computer Vision Toolbox), transformsampleLidarData (Lidar Toolbox), pcBboxOversample (Lidar Toolbox) For geometric transformation, from left to right, the figure shows the original point cloud, the rotated point cloud, and the sheared point cloud. For data augmentation, the figure shows the original point cloud on the left and the augmented point cloud on the right.

Signal Processing Applications

Signal Processing Toolbox™ enables you to denoise, smooth, detrend, and resample signals. You can augment training data with noise, multipath fading, and synthetic signals such as pulses and chirps. You can also create labeled sets of signals by using the Signal Labeler (Signal Processing Toolbox) app and the labeledSignalSet (Signal Processing Toolbox) object. For an example that shows how to create and apply these transformations, see Waveform Segmentation Using Deep Learning.

Wavelet Toolbox™ and Signal Processing Toolbox enable you to generate 2-D time-frequency representations of time series data that you can use as image inputs for signal classification applications. For an example, see Classify Time Series Using Wavelet Analysis and Deep Learning. Similarly, you can extract sequences from signal data to use as input for LSTM networks. For an example, see Classify ECG Signals Using Long Short-Term Memory Networks (Signal Processing Toolbox).

Communications Toolbox™ expands on signal processing functionality to enable you to perform error correction, interleaving, modulation, filtering, synchronization, and equalization of communication systems. For an example that shows how to create and apply these transformations, see Modulation Classification with Deep Learning.

You can process signal data using the functions in the table as well as any other functionality in each toolbox.

Processing Type Description Sample Functions Sample Output
Clean signals Apply median filtering or moving average to signalRemove polynomial trendResample signal to new fixed rate medfilt1 (Signal Processing Toolbox),smoothdatadetrenddownsample (Signal Processing Toolbox),interp (Signal Processing Toolbox),upsample (Signal Processing Toolbox) Plot with original and mean filtered signal on the left. Plot with original and detrended signal on the right.
Filter signals Perform lowpass, highpass, and bandstop filtering of IIR and FIR signalsDesign IIR and FIR filtersApply IIR and FIR filters bandpass (Signal Processing Toolbox), bandstop (Signal Processing Toolbox), highpass (Signal Processing Toolbox), lowpass (Signal Processing Toolbox)butter (Signal Processing Toolbox),designfilt (Signal Processing Toolbox),fir1 (Signal Processing Toolbox),gaussdesign (Signal Processing Toolbox),rcosdesign (Signal Processing Toolbox)filter Plot with original and bandpass filtered signal on the left. Plot with original and lowpass filtered signal on the right.
Augment signals Add white Gaussian noise to signal using Communications ToolboxAdjust time information of the signal, and perform multipath fading using Communications ToolboxAdd synthetic chirps and waveforms awgn (Communications Toolbox)chirp (Signal Processing Toolbox),square (Signal Processing Toolbox), rectpuls (Signal Processing Toolbox),sawtooth (Signal Processing Toolbox) Plot with original signal and signal with added white Gaussian noise.
Create time-frequency representations Create spectrograms, scalograms, and other 2-D representations of 1-D signals pspectrum (Signal Processing Toolbox), xspectrogram (Signal Processing Toolbox)fsst (Signal Processing Toolbox),ifsst (Signal Processing Toolbox) stft (Signal Processing Toolbox), istft (Signal Processing Toolbox)cwt (Wavelet Toolbox) From left to right, the figure shows the original signals, the cross-spectrogram, an original signal, and the CWT scalogram.
Extract features from signals Extract time-domain, frequency-domain, and time-frequency features from signals signalTimeFeatureExtractor (Signal Processing Toolbox)signalFrequencyFeatureExtractor (Signal Processing Toolbox)signalTimeFrequencyFeatureExtractor (Signal Processing Toolbox) From left to right, the figure shows the original signal, the instantaneous frequency, the instantaneous spectral entropy, and the scalar spectral entropy.

Audio Processing Applications

Audio Toolbox™ provides tools for audio processing, speech analysis, and acoustic measurement. Use these tools to extract auditory features and transform audio signals. Augment audio data with randomized or deterministic time scaling, time stretching, and pitch shifting. You can also create labeled ground truth training data by using the Signal Labeler (Signal Processing Toolbox) app. You can process audio data using the functions in this table as well as any other functionality in the toolbox. For an example that shows how to create and apply these transformations, see Augment Audio Dataset (Audio Toolbox).

Audio Toolbox also provides MATLAB® and Simulink® support for pretrained audio deep learning networks. Locate and classify sounds with YAMNet and estimate pitch with CREPE. Extract VGGish or OpenL3 feature embeddings to input to machine learning and deep learning systems. The Audio Toolbox pretrained networks are available in Deep Network Designer. For a YAMNet example, see Adapt Pretrained Audio Network for New Data Using Deep Network Designer.

Processing Type Description Sample Functions Sample Output
Augment audio data Perform random or deterministic pitch shifting, time-scale modification, time shifting, noise addition, and volume control audioDataAugmenter (Audio Toolbox), audioTimeScaler (Audio Toolbox), shiftPitch (Audio Toolbox), stretchAudio (Audio Toolbox) From left to right, the figure shows the original audio, the audio with time stretch applied, the audio with gain applied, and the audio with time shift applied.
Extract audio features Extract spectral parameters from audio segments audioFeatureExtractor (Audio Toolbox), mfcc (Audio Toolbox) Plot of original audio.Processed output:ans = struct with fields: mfcc: [1 2 3 4 5 6 7 8 9 10 11 12 13] mfccDelta: [14 15 16 17 18 19 20 21 22 23 24 25 26] mfccDeltaDelta: [27 28 29 30 31 32 33 34 35 36 37 38 39] spectralCentroid: 40 pitch: 41
Create time-frequency representations Create mel spectrograms and other 2-D representations of audio signalsPrepare audio signals to feed to pretrained deep learning networks melSpectrogram (Audio Toolbox), mdct (Audio Toolbox)crepePreprocess (Audio Toolbox), openl3Preprocess (Audio Toolbox), vggishPreprocess (Audio Toolbox), yamnetPreprocess (Audio Toolbox) From left to right, the figure shows the original audio, the MEL spectrogram, and the modified discrete cosine transform.

Text Analytics

Text Analytics Toolbox™ includes tools for processing raw text from sources such as equipment logs, news feeds, surveys, operator reports, and social media. Use these tools to extract text from popular file formats, preprocess raw text, extract individual words or multiword phrases (n-grams), convert text into numerical representations, and build statistical models. You can process text data using the functions in this table as well as any other functionality in the toolbox. For an example showing how to get started, see Prepare Text Data for Analysis (Text Analytics Toolbox).

Processing Type Description Sample Functions Sample Output
Tokenize text Parse text into words and punctuation tokenizedDocument (Text Analytics Toolbox) Original:"A few tree limbs greater than 6 inches down on HWY 18 in Roseland."Processed output:15 tokens: A few tree limbs greater than 6 inches down on HWY 18 in Roseland.
Clean text Remove variations in word forms and caseRemove punctuationRemove stop words, short words, and long words normalizeWords (Text Analytics Toolbox)erasePunctuation (Text Analytics Toolbox)removeStopWords (Text Analytics Toolbox), removeShortWords (Text Analytics Toolbox), removeLongWords (Text Analytics Toolbox) Processed output:15 tokens: a few tree limb great than 6 inch down on hwy 18 in roseland.14 tokens: a few tree limb great than 6 inch down on hwy 18 in roseland8 tokens: few tree limb great inch down hwy roseland

See Also

transform | combine | read | trainnet | trainingOptions | dlnetwork