G-CRS:GPU Accelerated Cauchy Reed-Solomon Coding (original) (raw)
Section 1: Throughput Under Different Workloads
Section 1.1 Introduction
We measured the Throughput of G-CRS with each block contains 128 threads, and the number of blocks ranged from 24 to 213. This section provided the source code and experimental results for our measurement on Maxwell GTX 980 and Pascal Titan X
Section 1.2 Source Code
Source Code: MeasureBufferSize.zip
Please take a look at the README.md to see how to run the program
Section 1.3 Experimental Results
Results on Maxwell GTX 980
m=1 (Unit for Throughput: GB/s)
Blocks | w=4 | w=5 | w=6 | w=7 | w=8 |
---|---|---|---|---|---|
24 | 11.95 | 11.60 | 11.74 | 11.70 | 11.77 |
25 | 23.56 | 22.99 | 23.31 | 23.38 | 23.64 |
26 | 47.18 | 45.00 | 46.53 | 41.90 | 42.32 |
27 | 57.38 | 50.34 | 52.48 | 50.03 | 55.94 |
28 | 71.24 | 77.57 | 83.39 | 77.92 | 69.04 |
29 | 93.14 | 99.74 | 101.86 | 98.15 | 89.26 |
210 | 114.82 | 115.98 | 118.86 | 116.74 | 113.27 |
211 | 128.72 | 131.94 | 128.92 | 129.22 | 128.50 |
212 | 139.38 | 139.12 | 138.04 | 138.09 | 138.61 |
213 | 143.78 | 144.19 | 144.28 | 144.67 | 144.71 |
m=2 (Unit for Throughput: GB/s)
Blocks | w=4 | w=5 | w=6 | w=7 | w=8 |
---|---|---|---|---|---|
24 | 12.07 | 11.64 | 11.77 | 11.73 | 12.07 |
25 | 24.42 | 23.09 | 20.77 | 20.75 | 21.29 |
26 | 29.93 | 26.40 | 27.43 | 27.47 | 27.19 |
27 | 48.47 | 57.10 | 57.52 | 54.49 | 43.60 |
28 | 68.15 | 73.79 | 74.59 | 71.70 | 62.92 |
29 | 92.38 | 91.57 | 94.52 | 92.61 | 86.62 |
210 | 105.82 | 109.96 | 105.58 | 105.78 | 105.17 |
211 | 117.68 | 119.47 | 119.02 | 118.03 | 118.62 |
212 | 126.80 | 127.90 | 126.96 | 126.98 | 126.20 |
213 | 131.40 | 130.35 | 130.53 | 130.47 | 131.38 |
m=3 (Unit for Throughput: GB/s)
Blocks | w=4 | w=5 | w=6 | w=7 | w=8 |
---|---|---|---|---|---|
24 | 11.94 | 11.65 | 10.63 | 10.54 | 10.77 |
25 | 15.79 | 14.20 | 14.26 | 13.29 | 13.52 |
26 | 26.84 | 23.01 | 23.40 | 22.02 | 23.55 |
27 | 47.66 | 43.70 | 42.08 | 40.05 | 42.44 |
28 | 65.55 | 67.68 | 65.97 | 63.31 | 53.83 |
29 | 88.00 | 86.82 | 84.40 | 79.72 | 73.30 |
210 | 98.95 | 97.65 | 102.07 | 96.13 | 88.14 |
211 | 109.60 | 109.05 | 110.17 | 105.3 | 97.94 |
212 | 116.71 | 116.20 | 116.98 | 112.00 | 104.65 |
213 | 120.60 | 118.91 | 120.46 | 114.89 | 107.87 |
m=4 (Unit for Throughput: GB/s)
Blocks | w=4 | w=5 | w=6 | w=7 | w=8 |
---|---|---|---|---|---|
24 | 11.71 | 9.62 | 9.56 | 9.32 | 9.63 |
25 | 14.44 | 11.73 | 12.09 | 11.48 | 12.54 |
26 | 24.17 | 26.60 | 26.17 | 23.29 | 20.10 |
27 | 41.54 | 43.82 | 42.77 | 39.98 | 33.52 |
28 | 63.53 | 58.97 | 58.05 | 52.87 | 47.26 |
29 | 80.42 | 80.23 | 71.36 | 63.30 | 60.02 |
210 | 92.98 | 92.57 | 85.99 | 74.69 | 69.45 |
211 | 102.41 | 102.95 | 95.02 | 82.01 | 76.42 |
212 | 108.53 | 108.16 | 99.28 | 85.25 | 80.00 |
213 | 111.48 | 109.81 | 102.65 | 87.79 | 82.14 |
Results on Pascal Titan X
m=1 (Unit for Throughput: GB/s)
Blocks | w=4 | w=5 | w=6 | w=7 | w=8 |
---|---|---|---|---|---|
24 | 14.16 | 14.95 | 15.15 | 15.16 | 15.04 |
25 | 36.66 | 33.89 | 34.83 | 33.92 | 35.43 |
26 | 76.69 | 69.39 | 74.92 | 72.04 | 74.01 |
27 | 82.28 | 80.96 | 81.66 | 76.19 | 78.41 |
28 | 145.82 | 125.30 | 127.37 | 116.20 | 124.03 |
29 | 159.27 | 151.46 | 154.13 | 153.34 | 158.76 |
210 | 225.33 | 227.30 | 219.83 | 207.59 | 230.65 |
211 | 249.45 | 239.98 | 236.42 | 245.38 | 259.73 |
212 | 267.65 | 259.40 | 254.72 | 253.72 | 268.44 |
213 | 277.29 | 269.12 | 267.61 | 267.68 | 277.87 |
m=2 (Unit for Throughput: GB/s)
Blocks | w=4 | w=5 | w=6 | w=7 | w=8 |
---|---|---|---|---|---|
24 | 18.56 | 21.22 | 21.77 | 19.37 | 19.87 |
25 | 44.46 | 42.97 | 34.07 | 33.12 | 34.8 |
26 | 41.43 | 40.04 | 40.59 | 37.89 | 38.70 |
27 | 75.62 | 71.36 | 68.79 | 67.12 | 69.54 |
28 | 117.73 | 114.90 | 110.02 | 100.84 | 99.76 |
29 | 164.95 | 171.73 | 162.69 | 160.06 | 153.72 |
210 | 207.43 | 195.60 | 203.04 | 189.38 | 195.53 |
211 | 235.23 | 235.10 | 217.73 | 209.91 | 224.77 |
212 | 252.54 | 240.16 | 243.93 | 239.96 | 250.72 |
213 | 256.18 | 251.11 | 244.91 | 244.62 | 258.17 |
m=3 (Unit for Throughput: GB/s)
Blocks | w=4 | w=5 | w=6 | w=7 | w=8 |
---|---|---|---|---|---|
24 | 17.56 | 19.88 | 19.00 | 18.10 | 18.92 |
25 | 20.84 | 20.15 | 20.09 | 17.77 | 19.54 |
26 | 55.18 | 49.17 | 44.59 | 45.19 | 44.52 |
27 | 81.21 | 74.56 | 72.32 | 69.59 | 71.75 |
28 | 122.28 | 107.83 | 105.39 | 102.08 | 102.87 |
29 | 198.34 | 174.65 | 152.89 | 127.04 | 126.47 |
210 | 202.57 | 189.84 | 171.17 | 159.15 | 153.76 |
211 | 222.40 | 205.54 | 203.70 | 192.70 | 217.93 |
212 | 236.22 | 226.82 | 220.66 | 215.90 | 228.09 |
213 | 240.97 | 231.18 | 229.94 | 226.69 | 237.06 |
m=4 (Unit for Throughput: GB/s)
Blocks | w=4 | w=5 | w=6 | w=7 | w=8 |
---|---|---|---|---|---|
24 | 18.83 | 17.62 | 17.91 | 15.39 | 16.62 |
25 | 22.23 | 20.15 | 20.35 | 18.97 | 19.45 |
26 | 42.54 | 37.20 | 36.41 | 36.87 | 36.92 |
27 | 78.57 | 67.11 | 66.35 | 61.05 | 63.53 |
28 | 112.47 | 98.60 | 95.82 | 89.88 | 90.13 |
29 | 176.87 | 144.88 | 140.98 | 137.58 | 140.28 |
210 | 203.15 | 178.58 | 174.79 | 162.83 | 166.07 |
211 | 218.14 | 198.77 | 199.25 | 189.35 | 178.73 |
212 | 221.21 | 212.86 | 207.82 | 197.71 | 193.00 |
213 | 226.567 | 218.06 | 213.79 | 205.06 | 201.72 |
Section 2: Peak Raw Coding Performance
Section 2.1 Introduction
We measured the raw coding throughput of G-CRS. The related code and experimental results are provided in this section.
Section 2.2 Source Code
Source Code: PeakRawMeasure.zip
Please take a look at the README.md to see how to run the program
Section 2.3 Experimental Results
Section 3: Optimization Analysis
Section 3.1 Introduction
We evaluate the effectiveness of our optimization strategies by adding our optimization strategies one by one, the code is in the file named EffectivenessOfOpt.zip. We investigate the dominating factor that contributes the most to the performance improvement by removing one optimization strategy each time, the code is in the file named DominatedFactor.zip.
Section 3.2 Source Code
Source Code: EffectivenessOfOpt.zip
Source Code: DominatedFactor.zip
Please take a look at the README.md to see how to run the program
Section 3.3 Experimental Results for Effectiveness of our Optimization Strategies
Results of Encoding on Maxwell GTX 980, note that the unit for throughput is GB/s for each table
k=10, m=1
w | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|
Base | 21.00 | 16.64 | 16.88 | 16.61 | 16.79 |
Bitmatrix | 108.76 | 106.49 | 108.55 | 103.54 | 111.07 |
EfficientAccess | 144.28 | 142.29 | 142.73 | 138.00 | 137.65 |
Bandwidth | 145.31 | 143.07 | 143.60 | 141.28 | 141.26 |
G-CRS | 145.19 | 144.70 | 145.38 | 145.69 | 145.22 |
k=10, m=2
w | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|
Base | 16.38 | 15.40 | 15.32 | 14.60 | 14.66 |
Bitmatrix | 52.45 | 51.26 | 52.75 | 50.08 | 54.10 |
EfficientAccess | 99.48 | 86.37 | 83.40 | 74.42 | 74.01 |
Bandwidth | 132.22 | 130.96 | 129.12 | 116.08 | 110.60 |
G-CRS | 132.15 | 131.07 | 131.74 | 131.83 | 132.10 |
k=10, m=3
w | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|
Base | 14.78 | 14.11 | 14.01 | 13.24 | 12.95 |
Bitmatrix | 34.50 | 33.18 | 33.97 | 31.80 | 34.81 |
EfficientAccess | 65.25 | 56.92 | 55.04 | 49.15 | 48.77 |
Bandwidth | 121.29 | 107.68 | 99.31 | 84.65 | 77.53 |
G-CRS | 121.18 | 119.94 | 120.69 | 115.97 | 108.48 |
k=10, m=4
w | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|
Base | 13.30 | 12.55 | 11.99 | 11.11 | 8.01 |
Bitmatrix | 25.81 | 24.26 | 24.87 | 23.04 | 25.96 |
EfficientAccess | 49.00 | 42.78 | 41.25 | 36.79 | 36.27 |
Bandwidth | 92.05 | 75.59 | 65.75 | 57.35 | 51.56 |
G-CRS | 111.86 | 110.89 | 104.07 | 88.64 | 82.62 |
Results on Pascal Titan X, note that the unit for throughput is GB/s for each table
k=10, m=1
w | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|
Base | 27.30 | 23.24 | 26.13 | 25.75 | 26.19 |
Bitmatrix | 247.81 | 226.05 | 219.92 | 198.12 | 250.64 |
EfficientAccess | 275.92 | 268.05 | 262.76 | 259.54 | 277.43 |
Bandwidth | 276.33 | 269.67 | 263.55 | 260.41 | 277.06 |
G-CRS | 278.77 | 272.38 | 267.65 | 267.04 | 279.52 |
k=10, m=2
w | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|
Base | 25.87 | 24.60 | 24.81 | 24.26 | 24.39 |
Bitmatrix | 142.74 | 129.84 | 129.95 | 119.73 | 134.80 |
EfficientAccess | 214.25 | 189.66 | 184.63 | 173.71 | 174.02 |
Bandwidth | 254.54 | 244.07 | 237.00 | 231.69 | 244.75 |
G-CRS | 256.23 | 249.15 | 245.73 | 243.33 | 257.80 |
k=10, m=3
w | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|
Base | 21.00 | 16.64 | 16.88 | 16.61 | 16.79 |
Bitmatrix | 93.20 | 83.97 | 85.40 | 77.84 | 86.44 |
EfficientAccess | 147.76 | 132.20 | 128.17 | 116.88 | 115.22 |
Bandwidth | 237.42 | 217.59 | 210.35 | 197.95 | 186.95 |
G-CRS | 240.68 | 233.10 | 227.06 | 223.70 | 238.60 |
k=10, m=4
w | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|
Base | 17.87 | 17.06 | 19.35 | 19.72 | 13.17 |
Bitmatrix | 69.89 | 62.69 | 63.26 | 57.95 | 65.93 |
EfficientAccess | 112.99 | 100.39 | 96.80 | 88.05 | 85.91 |
Bandwidth | 216.47 | 185.47 | 163.45 | 143.10 | 128.80 |
G-CRS | 228.91 | 220.57 | 213.31 | 203.72 | 200.66 |
Section 3.4 Experimental Results for Dominating Factor
Results of Encoding on Maxwell GTX 980, note that the unit for throughput is GB/s for each table
k=10, m=1
w | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|
-Bitmatrix | 23.67 | 17.91 | 18.01 | 17.91 | 17.82 |
-EfficientAccess | 103.92 | 100.83 | 104.09 | 98.36 | 110.58 |
-Bandwidth | 144.71 | 144.29 | 145.17 | 143.94 | 144.74 |
-DivergenceOpt | 145.31 | 143.07 | 143.60 | 141.28 | 141.26 |
G-CRS | 145.19 | 144.70 | 145.38 | 145.69 | 145.22 |
k=10, m=2
w | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|
-Bitmatrix | 17.95 | 17.73 | 17.89 | 17.62 | 15.66 |
-EfficientAccess | 104.23 | 100.26 | 100.49 | 95.90 | 109.58 |
-Bandwidth | 127.15 | 111.13 | 118.24 | 86.37 | 111.93 |
-DivergenceOpt | 132.22 | 130.96 | 129.12 | 116.08 | 110.60 |
G-CRS | 132.15 | 131.07 | 131.74 | 131.83 | 132.10 |
k=10, m=3
w | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|
-Bitmatrix | 18.02 | 17.74 | 17.79 | 16.37 | 14.44 |
-EfficientAccess | 107.23 | 97.64 | 91.84 | 79.49 | 80.61 |
-Bandwidth | 94.29 | 72.48 | 79.17 | 55.57 | 74.04 |
-DivergenceOpt | 121.29 | 107.68 | 99.31 | 84.65 | 77.53 |
G-CRS | 121.18 | 119.94 | 120.69 | 115.97 | 108.48 |
k=10, m=4
w | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|
-Bitmatrix | 17.19 | 17.55 | 16.33 | 15.93 | 8.89 |
-EfficientAccess | 102.72 | 98.35 | 52.33 | 54.46 | 69.59 |
-Bandwidth | 73.09 | 53.24 | 59.21 | 40.42 | 55.02 |
-DivergenceOpt | 92.05 | 75.59 | 65.75 | 57.35 | 51.56 |
G-CRS | 111.86 | 110.89 | 104.07 | 88.64 | 82.62 |
Results on Pascal Titan X, note that the unit for throughput is GB/s for each table
k=10, m=1
w | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|
-Bitmatrix | 30.36 | 27.46 | 27.65 | 27.31 | 27.26 |
-EfficientAccess | 239.29 | 227.47 | 217.89 | 194.79 | 249.79 |
-Bandwidth | 280.78 | 263.64 | 265.05 | 256.51 | 278.27 |
-DivergenceOpt | 276.33 | 269.67 | 263.55 | 260.41 | 277.06 |
G-CRS | 278.77 | 272.38 | 267.65 | 267.04 | 279.52 |
k=10, m=2
w | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|
-Bitmatrix | 26.88 | 27.10 | 27.45 | 27.03 | 23.91 |
-EfficientAccess | 253.88 | 210.52 | 203.05 | 183.86 | 241.61 |
-Bandwidth | 244.14 | 204.11 | 216.51 | 186.12 | 224.33 |
-DivergenceOpt | 254.54 | 244.07 | 237.00 | 231.69 | 244.75 |
G-CRS | 256.23 | 249.15 | 245.73 | 243.33 | 257.80 |
k=10, m=3
w | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|
-Bitmatrix | 27.12 | 26.75 | 27.23 | 25.43 | 22.16 |
-EfficientAccess | 229.26 | 176.00 | 172.87 | 146.08 | 170.87 |
-Bandwidth | 199.13 | 155.94 | 169.03 | 132.75 | 169.40 |
-DivergenceOpt | 237.42 | 217.59 | 210.35 | 197.95 | 186.95 |
G-CRS | 240.68 | 233.10 | 227.06 | 223.70 | 238.60 |
k=10, m=4
w | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|
-Bitmatrix | 26.13 | 26.72 | 25.46 | 24.95 | 23.09 |
-EfficientAccess | 222.01 | 188.70 | 110.48 | 114.58 | 168.92 |
-Bandwidth | 166.10 | 121.27 | 134.42 | 98.53 | 130.52 |
-DivergenceOpt | 216.47 | 185.47 | 163.45 | 143.10 | 128.80 |
G-CRS | 228.91 | 220.57 | 213.31 | 203.72 | 200.66 |