Accelerate Link-Level Simulations with Parallel Processing - MATLAB & Simulink (original) (raw)
This example shows how to accelerate link-level simulations by using a cluster of workers from a parallel pool.
Introduction
Link-level simulations require a large number of frames to provide statistically valid results. Therefore, these simulations can take a long time to run. Parallel computing is a common technique to speed up these simulations. This example shows how to run link-level simulations by using MATLAB® workers from a parallel pool (requires Parallel Computing Toolbox™).
Parallel Computing Toolbox enables you to use the full processing power of multicore desktops by executing applications on workers (MATLAB computational engines) that run locally. Without changing the code, you can run the same applications on clusters or clouds.
For an example of how to discover and set up a cluster of workers, see the Scale Up from Desktop to Cluster (Parallel Computing Toolbox) example.
You can parallelize the link-level simulation over a number of Nworkers parallel workers. Each worker runs the same link simulation with different random processes to generate random bits and noise samples. Each worker simulates Nslots_per_worker slots. Therefore, the total number of slots in this simulation is Nslots_per_worker×Nworkers. The example combines the resulting throughput measurements for each worker to produce the overall throughput. Each worker runs all the required SNR points.
To show how to speed up link-level simulations by using parallel processing, this example uses a simplified link-level simulation modeling a 5G link with one antenna, one layer, AWGN channel, and no HARQ.
Set Simulation Parameters
Set the SNR points and the overall number of frames to simulate.
SNRdB = 5.7:0.1:6.2; % SNR in dB numFrames = 12; % Number of frames to simulate
Configure the carrier, PDSCH, and DL-SCH.
carrier = nrCarrierConfig; pdsch = nrPDSCHConfig; pdsch.Modulation = "16QAM"; pdsch.PRBSet = 0:carrier.NSizeGrid-1; % Full band allocation
[encodeDLSCH,decodeDLSCH] = dlschEncoderDecoder();
Configure Parallel Pool
By default, this example enables parallel execution. Alternatively, you can disable parallel execution, for example, when debugging your code.
enableParallelism = true;
Create a parallel pool and get the number of workers if parallel execution is enabled.
if (enableParallelism) pool = gcp; % create parallel pool, requires Parallel Computing Toolbox numWorkers = pool.NumWorkers; maxNumWorkers = pool.NumWorkers; else numWorkers = 1; % No parallelism maxNumWorkers = 0; % Used to convert the parfor-loop into a for-loop end
Starting parallel pool (parpool) using the 'Processes' profile ... 01-Jul-2024 12:17:03: Job Queued. Waiting for parallel pool job with ID 1 to start ... 01-Jul-2024 12🔞04: Job Queued. Waiting for parallel pool job with ID 1 to start ... Connected to parallel pool with 12 workers.
Configure Random Number Generator
To reproduce the same set of random bits and noise samples in a parfor
-loop each time the loop runs, you must control random generation by assigning a particular substream for each worker. First, create a constant random stream to avoid unnecessary copying of the random stream multiple times to each worker. Use a generator with substream support. Substreams provide mutually independent random streams to each worker. For information about random number streams on workers, see Control Random Number Streams on Workers (Parallel Computing Toolbox) and Repeat Random Numbers in parfor-Loops (Parallel Computing Toolbox).
randStr = RandStream('Threefry','Seed',0); constantStream = parallel.pool.Constant(randStr);
Simulate PDSCH Throughput
Calculate the number of slots per worker by taking into account the number of frames to simulate and the available number of workers. Use the ceil
function to ensure that all workers simulate the same number of slots. This operation may result in the total number of frames simulated being slightly larger than the value specified in numFrames
.
% Calculate the number of slots per worker numSlotsPerWorker = ceil((numFrames*carrier.SlotsPerFrame)/numWorkers); disp("Parallel execution: "+enableParallelism)
Display the number of workers. This value depends on the workers available to you and the settings of your parallel pool. This example sets the number of workers to 1 if enableParallelism = false
.
disp("Number of workers: "+numWorkers)
disp("Number of slots per worker: "+numSlotsPerWorker)
Number of slots per worker: 10
The simulation is based on a parallel loop that uses the workers from the parallel pool. By setting maxNumWorkers = 0
, you can switch between parallel and serial execution when testing your code. This setting allows you to debug your code. You cannot set a breakpoint in the body of the parfor-
loop, but you can set breakpoints within functions called from the body of the parfor-
loop.
% Results storage numSNRPoints = numel(SNRdB); numSlotErrorsPerWorker = zeros(numWorkers,numSNRPoints); simulatedBitsPerWorker = zeros(numWorkers,numSNRPoints); numCorrectBitsPerWorker = zeros(numWorkers,numSNRPoints);
% Parallel processing, worker parfor-loop
parfor (workerIdx = 1:numWorkers,maxNumWorkers)
% Set random streams to ensure repeatability
% Use substreams in the generator so each worker uses mutually independent streams
stream = constantStream.Value; % Extract the stream from the Constant
stream.Substream = workerIdx; % Set substream value = parfor index
RandStream.setGlobalStream(stream); % Set global stream per worker
% Per worker processing: PDSCH link
resultsPerWorker = pdschLink(carrier,pdsch,encodeDLSCH,decodeDLSCH,SNRdB,numSlotsPerWorker);
% Gather results
numSlotErrorsPerWorker(workerIdx,:) = resultsPerWorker.NumSlotErrors;
simulatedBitsPerWorker(workerIdx,:) = resultsPerWorker.NumBits;
numCorrectBitsPerWorker(workerIdx,:) = resultsPerWorker.NumCorrectBits;
end % parfor
% Combine results from all workers totalNumTrBlkErrors = sum(numSlotErrorsPerWorker,1); totalSimulatedTrBlks = numSlotsPerWorkernumWorkersones(1,numSNRPoints); totalSimulatedFrames = totalSimulatedTrBlks/carrier.SlotsPerFrame; totalsimulatedBits = sum(simulatedBitsPerWorker,1); totalCorrectBits = sum(numCorrectBitsPerWorker,1);
% Throughput results calculation throughput = 100*(1-totalNumTrBlkErrors./totalSimulatedTrBlks); throughputMbps = 1e-6totalCorrectBits/(numFrames10e-3); ResultsTable = table(SNRdB.',totalsimulatedBits.',totalNumTrBlkErrors.',totalSimulatedTrBlks.',totalSimulatedFrames.',throughput.',throughputMbps.'); ResultsTable.Properties.VariableNames = ["SNR" "Simulated bits" "Tr Block errors" "Number of Tr Blocks" "Number of frames" "Throughput (%)" "Throughput (Mbps)"]; disp(ResultsTable)
SNR Simulated bits Tr Block errors Number of Tr Blocks Number of frames Throughput (%) Throughput (Mbps)
___ ______________ _______________ ___________________ ________________ ______________ _________________
5.7 1.8749e+06 120 120 12 0 0
5.8 1.8749e+06 108 120 12 10 1.5624
5.9 1.8749e+06 67 120 12 44.167 6.9006
6 1.8749e+06 31 120 12 74.167 11.588
6.1 1.8749e+06 6 120 12 95 14.843
6.2 1.8749e+06 1 120 12 99.167 15.494
Accelerate Simulation
You can reduce the simulation time by increasing the number of workers. You can use all the workers on your local machine or use multiple workers in a cluster. You do not need to set the number of workers in the example code. To configure the number of workers, use the Cluster Profile Manager in the Parallel menu on the MATLAB® Home tab. For more information on how to discover and set up a cluster of workers, see the Scale Up from Desktop to Cluster (Parallel Computing Toolbox) example.
The table shows the results of running the example three times for 1000 frames with different worker configurations.
| | 1 Worker on Desktop (No Parallelism) | 6 Workers on Desktop | 96 Workers in Cluster | | | -------------------------------------- | -------------------- | --------------------- | ------------------- | | Simulation Time | 3543 sec (~1 hr) | 983 sec (~16 min) | 108 sec (~1.8 min) |
Local Functions
function resultsPerWorker = pdschLink(carrier,pdsch,encodeDLSCH,decodeDLSCH,SNRdB,numSlotsPerWorker) % Simplified PDSCH link simulation executed by all workers
resultsPerWorker.NumSlotErrors = zeros(1,numel(SNRdB));
resultsPerWorker.NumBits = zeros(1,numel(SNRdB));
resultsPerWorker.NumCorrectBits = zeros(numel(SNRdB),1);
ofdmInfo = nrOFDMInfo(carrier);
% for all SNR points
for snrIdx = 1:length(SNRdB)
% Noise power calculation
SNR = 10^(SNRdB(snrIdx)/10); % Linear noise gain
% No need to normalize N0 by the number of receive antennas as
% there is only one
N0 = 1/sqrt(double(ofdmInfo.Nfft)*SNR);
% Process all the slots per worker
for nSlot = 0:numSlotsPerWorker-1
% New slot number
carrier.NSlot = nSlot;
% Transmit and receive slot (AWGN channel)
[blkerr,trBlkSize] = slotTxRxAWGN(carrier,pdsch,encodeDLSCH,decodeDLSCH,N0);
% Store results
resultsPerWorker.NumSlotErrors(snrIdx) = resultsPerWorker.NumSlotErrors(snrIdx)+blkerr;
resultsPerWorker.NumBits(snrIdx) = resultsPerWorker.NumBits(snrIdx)+trBlkSize;
resultsPerWorker.NumCorrectBits(snrIdx) = resultsPerWorker.NumCorrectBits(snrIdx)+sum(~blkerr .* trBlkSize);
end % for nSlot = 0:numSlotsPerWorker-1
end % for all SNR points
end
function [blkerr,trBlkSize] = slotTxRxAWGN(carrier,pdsch,encodeDLSCH,decodeDLSCH,N0)
% Generate PDSCH indices info and indices for present slot
[pdschIndices,pdschInfo] = nrPDSCHIndices(carrier,pdsch);
% Calculate transport block sizes
trBlkSize = nrTBS(pdsch.Modulation,pdsch.NumLayers,numel(pdsch.PRBSet),pdschInfo.NREPerPRB,encodeDLSCH.TargetCodeRate,0);
% Get new transport blocks (single codeword) and flush decoder soft buffer
trBlk = randi([0 1],trBlkSize,1);
setTransportBlock(encodeDLSCH,trBlk);
decodeDLSCH.TransportBlockLength = trBlkSize;
resetSoftBuffer(decodeDLSCH,0);
% DL-SCH encoding
codedTrBlock = encodeDLSCH(pdsch.Modulation,pdsch.NumLayers,pdschInfo.G,0);
% PDSCH encoding
pdschSymbols = nrPDSCH(carrier,pdsch,codedTrBlock);
% Create resource grid and map PDSCH
pdschGrid = nrResourceGrid(carrier,1,"OutputDataType","single");
pdschGrid(pdschIndices) = pdschSymbols;
% OFDM modulation
[txWaveform,waveformInfo] = nrOFDMModulate(carrier,pdschGrid);
% AWGN channel
noise = N0*randn(size(txWaveform),"like",txWaveform);
rxWaveform = txWaveform + noise;
% OFDM demodulation
rxGrid = nrOFDMDemodulate(carrier,rxWaveform);
% Extract PDSCH
pdschRx = nrExtractResources(pdschIndices,rxGrid);
% PDSCH decoding, assume noise variance is known
noiseEst = (N0.^2*waveformInfo.Nfft);
[dlschLLRs,~] = nrPDSCHDecode(carrier,pdsch,pdschRx,noiseEst);
% DL-SCH decoding
[~,blkerr] = decodeDLSCH(dlschLLRs,pdsch.Modulation,pdsch.NumLayers,0);
end
function [encodeDLSCH,decodeDLSCH] = dlschEncoderDecoder() % Coding rate codeRate = 490/1024;
% Create DL-SCH encoder object
encodeDLSCH = nrDLSCH;
encodeDLSCH.MultipleHARQProcesses = false;
encodeDLSCH.TargetCodeRate = codeRate;
% Create DL-SCH decoder object
decodeDLSCH = nrDLSCHDecoder;
decodeDLSCH.MultipleHARQProcesses = false;
decodeDLSCH.TargetCodeRate = codeRate;
decodeDLSCH.LDPCDecodingAlgorithm = "Normalized min-sum";
decodeDLSCH.MaximumLDPCIterationCount = 20;
end
See Also
parfor (Parallel Computing Toolbox)
Topics
- Scale Up from Desktop to Cluster (Parallel Computing Toolbox)
- Control Random Number Streams on Workers (Parallel Computing Toolbox)
- Repeat Random Numbers in parfor-Loops (Parallel Computing Toolbox)