Increase Image Resolution Using VDSR Network Running on FPGA - MATLAB & Simulink (original) (raw)

This example shows how to create a high-resolution image from a low-resolution image using a very-deep super-resolution (VDSR) neural network running on an FPGA. Use Deep Learning HDL Toolbox™ to deploy the VDSR network and MATLAB® to retrieve the high-resolution image. To create the trained VDSR network used in this example, see Increase Image Resolution Using Deep Learning.

Create Sample Low-Resolution Image

The test data set, testImages, contains 20 undistorted images. Load the images into an imageDatastore object.

fileNames = ["sherlock.jpg","peacock.jpg","fabric.png","greens.jpg", ... "hands1.jpg","kobi.png","lighthouse.png","office_4.jpg", ... "onion.png","pears.png","yellowlily.jpg","indiancorn.jpg", ... "flamingos.jpg","sevilla.jpg","llama.jpg","parkavenue.jpg", ... "strawberries.jpg","trailer.jpg","wagon.jpg","football.jpg"]; filePath = fullfile(matlabroot,"toolbox","images","imdata")+filesep; filePathNames = strcat(filePath,fileNames); testImages = imageDatastore(filePathNames);

Display the test images as a montage.

Select one of the test images to use to test the super-resolution network.

testImage = "sherlock.jpg"; Ireference = imread(testImage); Ireference = im2double(Ireference); imshow(Ireference) title("High-Resolution Reference Image")

Create a low-resolution version of the high-resolution reference image by using imresize with a scaling factor of 0.25. The high-frequency components of the image are lost during the down scaling.

scaleFactor = 0.25; Ilowres = imresize(Ireference,scaleFactor,"bicubic"); imshow(Ilowres) title("Low-Resolution Image")

Improve Image Resolution Using Bicubic Interpolation

A standard way to increase image resolution without deep learning is to use bicubic interpolation. Upscale the low-resolution image using bicubic interpolation so that the resulting high-resolution image is the same size as the reference image.

[nrows,ncols,np] = size(Ireference); Ibicubic = imresize(Ilowres,[nrows ncols],"bicubic"); imshow(Ibicubic) title("High-Resolution Image Obtained Using Bicubic Interpolation")

Improve Image Resolution Using a Pretrained VDSR Network in Single Precision

VDSR is trained using only the luminance channel of an image because human perception is more sensitive to changes in brightness than to changes in color.

Convert the low-resolution image from the RGB color space to the luminance (Iy) and chrominance (Icb and Icr) channels by using the rgb2ycbcr (Image Processing Toolbox) function.

Iycbcr = rgb2ycbcr(Ilowres); Iy = Iycbcr(:,:,1); Icb = Iycbcr(:,:,2); Icr = Iycbcr(:,:,3);

Upscale the luminance and two chrominance channels using bicubic interpolation. The upsampled chrominance channels, Icb_bicubic and Icr_bicubic, require no further processing.

Iy_bicubic = imresize(Iy,[nrows ncols],"bicubic"); Icb_bicubic = imresize(Icb,[nrows ncols],"bicubic"); Icr_bicubic = imresize(Icr,[nrows ncols],"bicubic");

Load Pretrained Network

Load the pretrained VDSR network.

load('trainedVDSRNet.mat')

View the network layers by using the Deep Network Designer app.

Define FPGA Board Interface

Define the target FPGA board programming interface by using a dlhdl.Target object. Create a programming interface with a custom name for your target device and an Ethernet interface to connect the target device to the host computer.

hT = dlhdl.Target('Xilinx',Interface="Ethernet");

Prepare Network for Deployment

Prepare the network for deployment by creating a dlhdl.Workflow object. Specify the network and bitstream name. Ensure that the bitstream name matches the data type and the FPGA board that you are targeting. In this example, the target FPGA board is the Xilinx® Zynq® UltraScale+™ MPSoC ZCU102 board and the bitstream uses the single data type.

hW = dlhdl.Workflow(Network=net,Bitstream='zcu102_single',Target=hT);

Compile Network

Run the compile method of the dlhdl.Workflow object to compile the network and generate the instructions, weights, and biases for deployment.

dn = compile(hW,'InputFrameNumberLimit',600);

Compiling network for Deep Learning FPGA prototyping ...

Targeting FPGA bitstream zcu102_single.

The network includes the following layers:

 1   'InputLayer'             Image Input         41×41×1 images                                                      (SW Layer)
 2   'Conv1'                  2-D Convolution     64 3×3×1 convolutions with stride [1  1] and padding [1  1  1  1]   (HW Layer)
 3   'ReLU1'                  ReLU                ReLU                                                                (HW Layer)
 4   'Conv2'                  2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
 5   'ReLU2'                  ReLU                ReLU                                                                (HW Layer)
 6   'Conv3'                  2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
 7   'ReLU3'                  ReLU                ReLU                                                                (HW Layer)
 8   'Conv4'                  2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
 9   'ReLU4'                  ReLU                ReLU                                                                (HW Layer)
10   'Conv5'                  2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
11   'ReLU5'                  ReLU                ReLU                                                                (HW Layer)
12   'Conv6'                  2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
13   'ReLU6'                  ReLU                ReLU                                                                (HW Layer)
14   'Conv7'                  2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
15   'ReLU7'                  ReLU                ReLU                                                                (HW Layer)
16   'Conv8'                  2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
17   'ReLU8'                  ReLU                ReLU                                                                (HW Layer)
18   'Conv9'                  2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
19   'ReLU9'                  ReLU                ReLU                                                                (HW Layer)
20   'Conv10'                 2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
21   'ReLU10'                 ReLU                ReLU                                                                (HW Layer)
22   'Conv11'                 2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
23   'ReLU11'                 ReLU                ReLU                                                                (HW Layer)
24   'Conv12'                 2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
25   'ReLU12'                 ReLU                ReLU                                                                (HW Layer)
26   'Conv13'                 2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
27   'ReLU13'                 ReLU                ReLU                                                                (HW Layer)
28   'Conv14'                 2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
29   'ReLU14'                 ReLU                ReLU                                                                (HW Layer)
30   'Conv15'                 2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
31   'ReLU15'                 ReLU                ReLU                                                                (HW Layer)
32   'Conv16'                 2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
33   'ReLU16'                 ReLU                ReLU                                                                (HW Layer)
34   'Conv17'                 2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
35   'ReLU17'                 ReLU                ReLU                                                                (HW Layer)
36   'Conv18'                 2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
37   'ReLU18'                 ReLU                ReLU                                                                (HW Layer)
38   'Conv19'                 2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
39   'ReLU19'                 ReLU                ReLU                                                                (HW Layer)
40   'Conv20'                 2-D Convolution     1 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]   (HW Layer)
41   'FinalRegressionLayer'   Regression Output   mean-squared-error with response 'ResponseImage'                    (SW Layer)

Notice: The layer 'InputLayer' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.

Notice: The layer 'FinalRegressionLayer' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software.

Compiling layer group: Conv1>>Conv20 ...

Compiling layer group: Conv1>>Conv20 ... complete.

Allocating external memory buffers:

      offset_name          offset_address    allocated_space 
_______________________    ______________    ________________

"InputDataOffset"           "0x00000000"     "16.0 MB"       
"OutputResultOffset"        "0x01000000"     "16.0 MB"       
"SchedulerDataOffset"       "0x02000000"     "0.0 MB"        
"SystemBufferOffset"        "0x02000000"     "28.0 MB"       
"InstructionDataOffset"     "0x03c00000"     "4.0 MB"        
"ConvWeightDataOffset"      "0x04000000"     "4.0 MB"        
"EndOffset"                 "0x04400000"     "Total: 68.0 MB"

Network compilation complete.

Program the Bitstream onto the FPGA and Download Network Weights

To deploy the network on the Xilinx® Zynq® UltraScale+ MPSoC ZCU102 hardware, run the deploy method of the dlhdl.Workflow object. This method programs the FPGA board using the output of the compile method and the programming file, downloads the network weights and biases, and displays the progress messages and the time it takes to deploy the network.

Programming FPGA Bitstream using Ethernet...

Attempting to connect to the hardware board at 192.168.1.101...

Connection successful

Programming FPGA device on Xilinx SoC hardware board at 192.168.1.101...

Copying FPGA programming files to SD card...

Setting FPGA bitstream and devicetree for boot...

Copying Bitstream zcu102_single.bit to /mnt/hdlcoder_rd

Set Bitstream to hdlcoder_rd/zcu102_single.bit

Copying Devicetree devicetree_dlhdl.dtb to /mnt/hdlcoder_rd

Set Devicetree to hdlcoder_rd/devicetree_dlhdl.dtb

Set up boot for Reference Design: 'AXI-Stream DDR Memory Access : 3-AXIM'

Rebooting Xilinx SoC at 192.168.1.101...

Reboot may take several seconds...

Attempting to connect to the hardware board at 192.168.1.101...

Connection successful

Programming the FPGA bitstream has been completed successfully.

Loading weights to Conv Processor.

Conv Weights loaded. Current time is 27-Dec-2022 13:38:36

Test Network

Pass the upscaled luminance component, Iy_bicubic, through the trained VDSR network. To do this, use the helper function, runNetworkOnHWVDSR and observe the residual image obtained. To view the code of this function, see Helper Functions.

Finished writing input activations.

Running in multi-frame mode with 384 inputs.

          Deep Learning Processor Profiler Performance Results

               LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                     -------------             -------------              ---------        ---------       ---------

Network 9648715 0.04386 384 3704890389 22.8 Conv1 76769 0.00035 Conv2 527210 0.00240 Conv3 527660 0.00240 Conv4 527358 0.00240 Conv5 527328 0.00240 Conv6 527678 0.00240 Conv7 527244 0.00240 Conv8 527190 0.00240 Conv9 527233 0.00240 Conv10 527560 0.00240 Conv11 527263 0.00240 Conv12 527219 0.00240 Conv13 527536 0.00240 Conv14 527414 0.00240 Conv15 527228 0.00240 Conv16 527596 0.00240 Conv17 527401 0.00240 Conv18 527211 0.00240 Conv19 527451 0.00240 Conv20 79115 0.00036

The clock frequency of the DL processor is: 220MHz

I = double(Iresidual); imshow(Iresidual,[]) title("Residual Image from VDSR")

Add the residual image to the upscaled luminance component to generate the high-resolution VDSR luminance component.

Isr = Iy_bicubic + Iresidual;

Concatenate the high-resolution VDSR luminance component with the upscaled color components. Convert the image to the RGB color space by using the ycbcr2rgb (Image Processing Toolbox) function. The result is the final high-resolution color image.

Ivdsr = ycbcr2rgb(cat(3,Isr,Icb_bicubic,Icr_bicubic)); imshow(Ivdsr) title("High-Resolution Image Obtained Using VDSR")

Improve Image Resolution By Using a Quantized VDSR Network

The single data type VDSR network performs at 22.8 frames per second. To improve the network performance quantize the network.

Create `dlquantizer` Object

Create a quantized network object by using the dlquantizer object. Set the target execution environment to FPGA.

dlq = dlquantizer(net,ExecutionEnvironment="FPGA");

Calibrate Quantized Network

Use the calibrate method to exercise the network by using sample inputs to collect the range information. The calibrate method exercises the network and collects the dynamic ranges for the learnable parameters of the layers of the network.

For best quantization results, the calibration data must be a representative of actual inputs that are predicted by the network.

imds = augmentedImageDatastore([640,960],Iy_bicubic);
calibrate(dlq,imds);

Prepare Network for Deployment

hW = dlhdl.Workflow(Network=dlq,Bitstream='zcu102_int8',Target=hT);

Compile Network

Run the compile method of the dlhdl.Workflow object to compile the network and generate the instructions, weights, and biases for deployment.

dn = compile(hW,'InputFrameNumberLimit', 600);

Compiling network for Deep Learning FPGA prototyping ...

Targeting FPGA bitstream zcu102_int8.

The network includes the following layers:

 1   'InputLayer'             Image Input         41×41×1 images                                                      (SW Layer)
 2   'Conv1'                  2-D Convolution     64 3×3×1 convolutions with stride [1  1] and padding [1  1  1  1]   (HW Layer)
 3   'ReLU1'                  ReLU                ReLU                                                                (HW Layer)
 4   'Conv2'                  2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
 5   'ReLU2'                  ReLU                ReLU                                                                (HW Layer)
 6   'Conv3'                  2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
 7   'ReLU3'                  ReLU                ReLU                                                                (HW Layer)
 8   'Conv4'                  2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
 9   'ReLU4'                  ReLU                ReLU                                                                (HW Layer)
10   'Conv5'                  2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
11   'ReLU5'                  ReLU                ReLU                                                                (HW Layer)
12   'Conv6'                  2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
13   'ReLU6'                  ReLU                ReLU                                                                (HW Layer)
14   'Conv7'                  2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
15   'ReLU7'                  ReLU                ReLU                                                                (HW Layer)
16   'Conv8'                  2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
17   'ReLU8'                  ReLU                ReLU                                                                (HW Layer)
18   'Conv9'                  2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
19   'ReLU9'                  ReLU                ReLU                                                                (HW Layer)
20   'Conv10'                 2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
21   'ReLU10'                 ReLU                ReLU                                                                (HW Layer)
22   'Conv11'                 2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
23   'ReLU11'                 ReLU                ReLU                                                                (HW Layer)
24   'Conv12'                 2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
25   'ReLU12'                 ReLU                ReLU                                                                (HW Layer)
26   'Conv13'                 2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
27   'ReLU13'                 ReLU                ReLU                                                                (HW Layer)
28   'Conv14'                 2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
29   'ReLU14'                 ReLU                ReLU                                                                (HW Layer)
30   'Conv15'                 2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
31   'ReLU15'                 ReLU                ReLU                                                                (HW Layer)
32   'Conv16'                 2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
33   'ReLU16'                 ReLU                ReLU                                                                (HW Layer)
34   'Conv17'                 2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
35   'ReLU17'                 ReLU                ReLU                                                                (HW Layer)
36   'Conv18'                 2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
37   'ReLU18'                 ReLU                ReLU                                                                (HW Layer)
38   'Conv19'                 2-D Convolution     64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
39   'ReLU19'                 ReLU                ReLU                                                                (HW Layer)
40   'Conv20'                 2-D Convolution     1 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]   (HW Layer)
41   'FinalRegressionLayer'   Regression Output   mean-squared-error with response 'ResponseImage'                    (SW Layer)

Notice: The layer 'InputLayer' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.

Notice: The layer 'FinalRegressionLayer' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software.

Compiling layer group: Conv1>>Conv20 ...

Compiling layer group: Conv1>>Conv20 ... complete.

Allocating external memory buffers:

      offset_name          offset_address    allocated_space 
_______________________    ______________    ________________

"InputDataOffset"           "0x00000000"     "8.0 MB"        
"OutputResultOffset"        "0x00800000"     "8.0 MB"        
"SchedulerDataOffset"       "0x01000000"     "0.0 MB"        
"SystemBufferOffset"        "0x01000000"     "28.0 MB"       
"InstructionDataOffset"     "0x02c00000"     "4.0 MB"        
"ConvWeightDataOffset"      "0x03000000"     "4.0 MB"        
"EndOffset"                 "0x03400000"     "Total: 52.0 MB"

Network compilation complete.

Program Bitstream onto FPGA and Download Network Weights

Programming FPGA Bitstream using Ethernet...

Attempting to connect to the hardware board at 192.168.1.101...

Connection successful

Programming FPGA device on Xilinx SoC hardware board at 192.168.1.101...

Copying FPGA programming files to SD card...

Setting FPGA bitstream and devicetree for boot...

Copying Bitstream zcu102_int8.bit to /mnt/hdlcoder_rd

Set Bitstream to hdlcoder_rd/zcu102_int8.bit

Copying Devicetree devicetree_dlhdl.dtb to /mnt/hdlcoder_rd

Set Devicetree to hdlcoder_rd/devicetree_dlhdl.dtb

Set up boot for Reference Design: 'AXI-Stream DDR Memory Access : 3-AXIM'

Rebooting Xilinx SoC at 192.168.1.101...

Reboot may take several seconds...

Attempting to connect to the hardware board at 192.168.1.101...

Connection successful

Programming the FPGA bitstream has been completed successfully.

Loading weights to Conv Processor.

Conv Weights loaded. Current time is 27-Dec-2022 13:41:24

Test Network

Pass the upscaled luminance component, Iy_bicubic, through the trained VDSR network. To do this, use the helper function, runNetworkOnHW.

Finished writing input activations.

Running in multi-frame mode with 384 inputs.

          Deep Learning Processor Profiler Performance Results

               LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                     -------------             -------------              ---------        ---------       ---------

Network 2740874 0.01096 384 1052577708 91.2 Conv1 38433 0.00015 Conv2 148211 0.00059 Conv3 148181 0.00059 Conv4 148406 0.00059 Conv5 148126 0.00059 Conv6 148047 0.00059 Conv7 148156 0.00059 Conv8 148122 0.00059 Conv9 148311 0.00059 Conv10 148289 0.00059 Conv11 148534 0.00059 Conv12 148056 0.00059 Conv13 148181 0.00059 Conv14 148113 0.00059 Conv15 148195 0.00059 Conv16 148209 0.00059 Conv17 148183 0.00059 Conv18 148456 0.00059 Conv19 148257 0.00059 Conv20 34357 0.00014

The clock frequency of the DL processor is: 250MHz

Iresidual = double(Iresidual); imshow(Iresidual,[]) title("Residual Image from Quantized VDSR")

Add the residual image to the upscaled luminance component to generate the high-resolution VDSR luminance component.

Isr = Iy_bicubic + Iresidual;

Ivdsrquantized = ycbcr2rgb(cat(3,Isr,Icb_bicubic,Icr_bicubic)); imshow(Ivdsrquantized) title("High-Resolution Image Obtained Using Quantized VDSR")

Compare Visual and Performance Results

Examine a small region inside of each image. Specify a region of interest (ROI) using a vector roi in the format [x y width height]. The elements define the x- and y- coordinates of the top-left corner, and the width and height of the ROI.

Crop the high-resolution images to this ROI, and display the results as a montage. The quantized VDSR image has clearer details and sharper resolutions than the high-resolution image created using single data type VDSR.

montage({imcrop(Ivdsr,roi);imcrop(Ivdsrquantized,roi)}) title("High Resolution Results Using VDSR (Left) vs. Quantized VDSR (Right)");

The quantized VDSR network has a performance of 91.2 frames per second compared to the 22.8 frames per second for the single-data-type network.

Helper Functions

The runNetworkOnHWVDSR function:

Generates a bicubic interpolation of the input image.
Splits the bicubic interpolation image into smaller blocks.
Passes the smaller blocks as multi-frame inputs to the deployed network.
Uses the predict method to retrieve the higher-resolution individual smaller block images.
Combines the higher-resolution block images into the output image and returns the output image.

networkInputSize = net.Layers(1).InputSize(1:2); imgTestSize = size(Iy_bicubic); Iy_bicubic_new = zeros(imgTestSize + ([41 41]-mod(imgTestSize,41))); Iy_bicubic_new(1:imgTestSize(1), 1:imgTestSize(2)) = Iy_bicubic ; numberOfBlocks = ceil(imgTestSize./networkInputSize); totalBlocks = prod(numberOfBlocks); splitImage = mat2cell(Iy_bicubic_new, networkInputSize(1)*ones(1, numberOfBlocks(1)), networkInputSize(1)*ones(1, numberOfBlocks(2))); multiFrameInput = zeros([networkInputSize 1 totalBlocks]); for i=1:(totalBlocks) multiFrameInput(:,:,:,i) = splitImage{i}; end

residualImage = hW.predict(multiFrameInput, 'Profile','on');

concatenatedResult = []; for i=1:numberOfBlocks(2) subset = residualImage(:,:,numberOfBlocks(1)(i-1)+1:inumberOfBlocks(1)); verticalConcatenation = []; for j=1:numberOfBlocks(1) verticalConcatenation = [verticalConcatenation; subset(:,:,j)]; end concatenatedResult = [concatenatedResult verticalConcatenation]; end Iresidual = concatenatedResult(1:imgTestSize(1), 1:imgTestSize(2));

References

[1] Kim, J., J. K. Lee, and K. M. Lee. "Accurate Image Super-Resolution Using Very Deep Convolutional Networks." Proceedings of the IEEE® Conference on Computer Vision and Pattern Recognition. 2016, pp. 1646-1654.

Increase Image Resolution Using VDSR Network Running on FPGA - MATLAB & Simulink (original) (raw)

Create Sample Low-Resolution Image

Improve Image Resolution Using Bicubic Interpolation

Improve Image Resolution Using a Pretrained VDSR Network in Single Precision

Load Pretrained Network

Define FPGA Board Interface

Prepare Network for Deployment

Compile Network

Compiling network for Deep Learning FPGA prototyping ...

Targeting FPGA bitstream zcu102_single.

The network includes the following layers:

Notice: The layer 'InputLayer' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.

Notice: The layer 'FinalRegressionLayer' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software.

Compiling layer group: Conv1>>Conv20 ...

Compiling layer group: Conv1>>Conv20 ... complete.

Allocating external memory buffers:

Network compilation complete.

Program the Bitstream onto the FPGA and Download Network Weights

Programming FPGA Bitstream using Ethernet...

Attempting to connect to the hardware board at 192.168.1.101...

Connection successful

Programming FPGA device on Xilinx SoC hardware board at 192.168.1.101...

Copying FPGA programming files to SD card...

Setting FPGA bitstream and devicetree for boot...

Copying Bitstream zcu102_single.bit to /mnt/hdlcoder_rd

Set Bitstream to hdlcoder_rd/zcu102_single.bit

Copying Devicetree devicetree_dlhdl.dtb to /mnt/hdlcoder_rd

Set Devicetree to hdlcoder_rd/devicetree_dlhdl.dtb

Set up boot for Reference Design: 'AXI-Stream DDR Memory Access : 3-AXIM'

Rebooting Xilinx SoC at 192.168.1.101...

Reboot may take several seconds...

Attempting to connect to the hardware board at 192.168.1.101...

Connection successful

Programming the FPGA bitstream has been completed successfully.

Loading weights to Conv Processor.

Conv Weights loaded. Current time is 27-Dec-2022 13:38:36

Test Network

Finished writing input activations.

Running in multi-frame mode with 384 inputs.

Improve Image Resolution By Using a Quantized VDSR Network

Create dlquantizer Object

Calibrate Quantized Network

Prepare Network for Deployment

Compile Network

Compiling network for Deep Learning FPGA prototyping ...

Targeting FPGA bitstream zcu102_int8.

The network includes the following layers:

Notice: The layer 'InputLayer' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.

Notice: The layer 'FinalRegressionLayer' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software.

Compiling layer group: Conv1>>Conv20 ...

Compiling layer group: Conv1>>Conv20 ... complete.

Allocating external memory buffers:

Network compilation complete.

Program Bitstream onto FPGA and Download Network Weights

Programming FPGA Bitstream using Ethernet...

Attempting to connect to the hardware board at 192.168.1.101...

Connection successful

Programming FPGA device on Xilinx SoC hardware board at 192.168.1.101...

Copying FPGA programming files to SD card...

Setting FPGA bitstream and devicetree for boot...

Copying Bitstream zcu102_int8.bit to /mnt/hdlcoder_rd

Set Bitstream to hdlcoder_rd/zcu102_int8.bit

Copying Devicetree devicetree_dlhdl.dtb to /mnt/hdlcoder_rd

Set Devicetree to hdlcoder_rd/devicetree_dlhdl.dtb

Set up boot for Reference Design: 'AXI-Stream DDR Memory Access : 3-AXIM'

Rebooting Xilinx SoC at 192.168.1.101...

Reboot may take several seconds...

Attempting to connect to the hardware board at 192.168.1.101...

Connection successful

Programming the FPGA bitstream has been completed successfully.

Loading weights to Conv Processor.

Conv Weights loaded. Current time is 27-Dec-2022 13:41:24

Test Network

Finished writing input activations.

Running in multi-frame mode with 384 inputs.

Compare Visual and Performance Results

Helper Functions

References

See Also

Create `dlquantizer` Object