Accelerate Link-Level Simulations with Parallel Processing - MATLAB & Simulink (original) (raw)

This example shows how to accelerate link-level simulations by using a cluster of workers from a parallel pool.

Introduction

Link-level simulations require a large number of frames to provide statistically valid results. Therefore, these simulations can take a long time to run. Parallel computing is a common technique to speed up these simulations. This example shows how to run link-level simulations by using MATLAB® workers from a parallel pool (requires Parallel Computing Toolbox™).

Parallel Computing Toolbox enables you to use the full processing power of multicore desktops by executing applications on workers (MATLAB computational engines) that run locally. Without changing the code, you can run the same applications on clusters or clouds.

Parallel Computing Toolbox enables you to use the full processing power of multicore desktops by executing applications on workers (MATLAB computational engines) that run locally. Without changing the code, you can run the same applications on clusters or clouds.

For an example of how to discover and set up a cluster of workers, see the Scale Up from Desktop to Cluster (Parallel Computing Toolbox) example.

You can parallelize the link-level simulation over a number of Nworkers parallel workers. Each worker runs the same link simulation with different random processes to generate random bits and noise samples. Each worker simulates Nslots_per_worker slots. Therefore, the total number of slots in this simulation is Nslots_per_worker×Nworkers. The example combines the resulting throughput measurements for each worker to produce the overall throughput. Each worker runs all the required SNR points.

Parallel workers on a cluster. Each worker runs a portion of the overall number of slots with a different random number generator.

To show how to speed up link-level simulations by using parallel processing, this example uses a simplified link-level simulation modeling a 5G link with one antenna, one layer, AWGN channel, and no HARQ.

Set Simulation Parameters

Set the SNR points and the overall number of frames to simulate.

SNRdB = 5.7:0.1:6.2; % SNR in dB numFrames = 12; % Number of frames to simulate

Configure the carrier, PDSCH, and DL-SCH.

carrier = nrCarrierConfig; pdsch = nrPDSCHConfig; pdsch.Modulation = "16QAM"; pdsch.PRBSet = 0:carrier.NSizeGrid-1; % Full band allocation

[encodeDLSCH,decodeDLSCH] = dlschEncoderDecoder();

Configure Parallel Pool

By default, this example enables parallel execution. Alternatively, you can disable parallel execution, for example, when debugging your code.

enableParallelism = true;

Create a parallel pool and get the number of workers if parallel execution is enabled.

if (enableParallelism) pool = gcp; % create parallel pool, requires Parallel Computing Toolbox numWorkers = pool.NumWorkers; maxNumWorkers = pool.NumWorkers; else numWorkers = 1; % No parallelism maxNumWorkers = 0; % Used to convert the parfor-loop into a for-loop end

Starting parallel pool (parpool) using the 'Processes' profile ... 01-Jul-2024 12:17:03: Job Queued. Waiting for parallel pool job with ID 1 to start ... 01-Jul-2024 12🔞04: Job Queued. Waiting for parallel pool job with ID 1 to start ... Connected to parallel pool with 12 workers.

Configure Random Number Generator

To reproduce the same set of random bits and noise samples in a parfor-loop each time the loop runs, you must control random generation by assigning a particular substream for each worker. First, create a constant random stream to avoid unnecessary copying of the random stream multiple times to each worker. Use a generator with substream support. Substreams provide mutually independent random streams to each worker. For information about random number streams on workers, see Control Random Number Streams on Workers (Parallel Computing Toolbox) and Repeat Random Numbers in parfor-Loops (Parallel Computing Toolbox).

randStr = RandStream('Threefry','Seed',0); constantStream = parallel.pool.Constant(randStr);

Simulate PDSCH Throughput

Calculate the number of slots per worker by taking into account the number of frames to simulate and the available number of workers. Use the ceil function to ensure that all workers simulate the same number of slots. This operation may result in the total number of frames simulated being slightly larger than the value specified in numFrames.

% Calculate the number of slots per worker numSlotsPerWorker = ceil((numFrames*carrier.SlotsPerFrame)/numWorkers); disp("Parallel execution: "+enableParallelism)

Display the number of workers. This value depends on the workers available to you and the settings of your parallel pool. This example sets the number of workers to 1 if enableParallelism = false.

disp("Number of workers: "+numWorkers)

disp("Number of slots per worker: "+numSlotsPerWorker)

Number of slots per worker: 10

The simulation is based on a parallel loop that uses the workers from the parallel pool. By setting maxNumWorkers = 0, you can switch between parallel and serial execution when testing your code. This setting allows you to debug your code. You cannot set a breakpoint in the body of the parfor-loop, but you can set breakpoints within functions called from the body of the parfor-loop.

% Results storage numSNRPoints = numel(SNRdB); numSlotErrorsPerWorker = zeros(numWorkers,numSNRPoints); simulatedBitsPerWorker = zeros(numWorkers,numSNRPoints); numCorrectBitsPerWorker = zeros(numWorkers,numSNRPoints);

% Parallel processing, worker parfor-loop parfor (workerIdx = 1:numWorkers,maxNumWorkers)
% Set random streams to ensure repeatability % Use substreams in the generator so each worker uses mutually independent streams stream = constantStream.Value; % Extract the stream from the Constant stream.Substream = workerIdx; % Set substream value = parfor index RandStream.setGlobalStream(stream); % Set global stream per worker

% Per worker processing: PDSCH link
resultsPerWorker = pdschLink(carrier,pdsch,encodeDLSCH,decodeDLSCH,SNRdB,numSlotsPerWorker);

% Gather results    
numSlotErrorsPerWorker(workerIdx,:) = resultsPerWorker.NumSlotErrors;
simulatedBitsPerWorker(workerIdx,:) = resultsPerWorker.NumBits;
numCorrectBitsPerWorker(workerIdx,:) = resultsPerWorker.NumCorrectBits;

end % parfor

% Combine results from all workers totalNumTrBlkErrors = sum(numSlotErrorsPerWorker,1); totalSimulatedTrBlks = numSlotsPerWorkernumWorkersones(1,numSNRPoints); totalSimulatedFrames = totalSimulatedTrBlks/carrier.SlotsPerFrame; totalsimulatedBits = sum(simulatedBitsPerWorker,1); totalCorrectBits = sum(numCorrectBitsPerWorker,1);

% Throughput results calculation throughput = 100*(1-totalNumTrBlkErrors./totalSimulatedTrBlks); throughputMbps = 1e-6totalCorrectBits/(numFrames10e-3); ResultsTable = table(SNRdB.',totalsimulatedBits.',totalNumTrBlkErrors.',totalSimulatedTrBlks.',totalSimulatedFrames.',throughput.',throughputMbps.'); ResultsTable.Properties.VariableNames = ["SNR" "Simulated bits" "Tr Block errors" "Number of Tr Blocks" "Number of frames" "Throughput (%)" "Throughput (Mbps)"]; disp(ResultsTable)

SNR    Simulated bits    Tr Block errors    Number of Tr Blocks    Number of frames    Throughput (%)    Throughput (Mbps)
___    ______________    _______________    ___________________    ________________    ______________    _________________

5.7      1.8749e+06            120                  120                   12                    0                  0      
5.8      1.8749e+06            108                  120                   12                   10             1.5624      
5.9      1.8749e+06             67                  120                   12               44.167             6.9006      
  6      1.8749e+06             31                  120                   12               74.167             11.588      
6.1      1.8749e+06              6                  120                   12                   95             14.843      
6.2      1.8749e+06              1                  120                   12               99.167             15.494      

Accelerate Simulation

You can reduce the simulation time by increasing the number of workers. You can use all the workers on your local machine or use multiple workers in a cluster. You do not need to set the number of workers in the example code. To configure the number of workers, use the Cluster Profile Manager in the Parallel menu on the MATLAB® Home tab. For more information on how to discover and set up a cluster of workers, see the Scale Up from Desktop to Cluster (Parallel Computing Toolbox) example.

The table shows the results of running the example three times for 1000 frames with different worker configurations.

| | 1 Worker on Desktop (No Parallelism) | 6 Workers on Desktop | 96 Workers in Cluster | | | -------------------------------------- | -------------------- | --------------------- | ------------------- | | Simulation Time | 3543 sec (~1 hr) | 983 sec (~16 min) | 108 sec (~1.8 min) |

Local Functions

function resultsPerWorker = pdschLink(carrier,pdsch,encodeDLSCH,decodeDLSCH,SNRdB,numSlotsPerWorker) % Simplified PDSCH link simulation executed by all workers

resultsPerWorker.NumSlotErrors = zeros(1,numel(SNRdB));
resultsPerWorker.NumBits = zeros(1,numel(SNRdB));
resultsPerWorker.NumCorrectBits = zeros(numel(SNRdB),1);

ofdmInfo = nrOFDMInfo(carrier);

% for all SNR points
for snrIdx = 1:length(SNRdB)

    % Noise power calculation
    SNR = 10^(SNRdB(snrIdx)/10); % Linear noise gain
    % No need to normalize N0 by the number of receive antennas as
    % there is only one
    N0 = 1/sqrt(double(ofdmInfo.Nfft)*SNR);

    % Process all the slots per worker
    for nSlot = 0:numSlotsPerWorker-1

        % New slot number
        carrier.NSlot = nSlot;

        % Transmit and receive slot (AWGN channel)
        [blkerr,trBlkSize] = slotTxRxAWGN(carrier,pdsch,encodeDLSCH,decodeDLSCH,N0);

        % Store results
        resultsPerWorker.NumSlotErrors(snrIdx) = resultsPerWorker.NumSlotErrors(snrIdx)+blkerr;
        resultsPerWorker.NumBits(snrIdx) = resultsPerWorker.NumBits(snrIdx)+trBlkSize;
        resultsPerWorker.NumCorrectBits(snrIdx) = resultsPerWorker.NumCorrectBits(snrIdx)+sum(~blkerr .* trBlkSize);

    end % for nSlot = 0:numSlotsPerWorker-1

end % for all SNR points

end

function [blkerr,trBlkSize] = slotTxRxAWGN(carrier,pdsch,encodeDLSCH,decodeDLSCH,N0)

% Generate PDSCH indices info and indices for present slot
[pdschIndices,pdschInfo] = nrPDSCHIndices(carrier,pdsch);

% Calculate transport block sizes
trBlkSize = nrTBS(pdsch.Modulation,pdsch.NumLayers,numel(pdsch.PRBSet),pdschInfo.NREPerPRB,encodeDLSCH.TargetCodeRate,0);

% Get new transport blocks (single codeword) and flush decoder soft buffer
trBlk = randi([0 1],trBlkSize,1);
setTransportBlock(encodeDLSCH,trBlk);
decodeDLSCH.TransportBlockLength = trBlkSize;
resetSoftBuffer(decodeDLSCH,0);

% DL-SCH encoding
codedTrBlock = encodeDLSCH(pdsch.Modulation,pdsch.NumLayers,pdschInfo.G,0);

% PDSCH encoding
pdschSymbols = nrPDSCH(carrier,pdsch,codedTrBlock);

% Create resource grid and map PDSCH
pdschGrid = nrResourceGrid(carrier,1,"OutputDataType","single");
pdschGrid(pdschIndices) = pdschSymbols;

% OFDM modulation
[txWaveform,waveformInfo] = nrOFDMModulate(carrier,pdschGrid);

% AWGN channel
noise = N0*randn(size(txWaveform),"like",txWaveform);
rxWaveform = txWaveform + noise;

% OFDM demodulation
rxGrid = nrOFDMDemodulate(carrier,rxWaveform);

% Extract PDSCH
pdschRx = nrExtractResources(pdschIndices,rxGrid);

% PDSCH decoding, assume noise variance is known
noiseEst = (N0.^2*waveformInfo.Nfft);
[dlschLLRs,~] = nrPDSCHDecode(carrier,pdsch,pdschRx,noiseEst);

% DL-SCH decoding
[~,blkerr] = decodeDLSCH(dlschLLRs,pdsch.Modulation,pdsch.NumLayers,0);

end

function [encodeDLSCH,decodeDLSCH] = dlschEncoderDecoder() % Coding rate codeRate = 490/1024;

% Create DL-SCH encoder object
encodeDLSCH = nrDLSCH;
encodeDLSCH.MultipleHARQProcesses = false;
encodeDLSCH.TargetCodeRate = codeRate;

% Create DL-SCH decoder object
decodeDLSCH = nrDLSCHDecoder;
decodeDLSCH.MultipleHARQProcesses = false;
decodeDLSCH.TargetCodeRate = codeRate;
decodeDLSCH.LDPCDecodingAlgorithm = "Normalized min-sum";
decodeDLSCH.MaximumLDPCIterationCount = 20;

end

See Also

parfor (Parallel Computing Toolbox)

Topics