WiFiServerSecure: Cache SSL sessions by ZakCodes · Pull Request #7774 · esp8266/Arduino (original) (raw)

WiFiClientSecure::setSession allows users to use a feature in BearSSL to cache the SSL session to a server.

BearSSL also allows caching SSL session on the server side, therefore I've created the method WiFiServerSecure::setCache to allow the user to setup a cache allowing BearSSL to resume the SSL sessions of client and greatly shorten the length of TLS handshakes.

Here are the steps that I have followed when implementing this feature:

If you want to test this feature, I encourage you to use these examples and to enable and disable the cache to see the performance improvements. In order to reset the server's cache, you simply have to reset the microcontroller.

Testing the examples

Here's a ruby script that I've written to test the performance improvements that this PR is bringing. It does 100 requests using a new SSL session each time and 100 more using the same session.

#!/bin/env ruby

require 'net/http' require 'benchmark'

DOMAIN = "<your controller's IP>" TIMES = 100

TEST_URI = URI("https://#{DOMAIN}/")

def start_session() http = Net::HTTP.new(TEST_URI.host, TEST_URI.port) http.use_ssl = true

Allow self signed certificates

http.verify_mode = OpenSSL::SSL::VERIFY_NONE return http end

request = Net::HTTP::Get.new(TEST_URI) request["Connection"] = "close"

Benchmark.bm(20) do |bm| http = start_session() bm.report("don't reuse session:") { TIMES.times do |_| http = start_session() http.request(request) end }

The cached session of the last request is used.

Otherwise this would massively slow down the first request

when we're trying to test the improvement of cached sessions.

bm.report("reuse session:") { TIMES.times do |_| response = http.request(request) # Reuse the cached session. end } end

Results

I've used this script to test the BearSSL_Server and HelloServerBearSSL examples before, after this PR without the cache activated and after this PR with the cache activated. For BearSSL_Server, I did the test once with the RSA key and another time with the EC key.

BearSSL_Server with the RSA key

Before the PR
                       user     system      total        real

don't reuse session: 0.288973 0.082934 0.371907 (183.731719) reuse session: 0.285080 0.091825 0.376905 (184.243461)

After the PR without caching
                       user     system      total        real

don't reuse session: 0.367846 0.071787 0.439633 (183.834841) reuse session: 0.340344 0.089750 0.430094 (184.414041)

After the PR with caching
                       user     system      total        real

don't reuse session: 0.312145 0.102594 0.414739 (184.679486) reuse session: 0.204378 0.063052 0.267430 ( 6.986146)

Summary
Don't reuse the session (s) Reuse the session (s) Improvement
Before the PR 183.731719 184.243461 0.997
After the PR (without caching) 183.834841 184.414041 0.997
After the PR (with caching) 184.679486 6.986146 26.435

The improvement ratio is the time of Don't reuse the session over the time of Reuse the session.

BearSSL_Server with the EC key

Before the PR
                       user     system      total        real

don't reuse session: 0.302631 0.104204 0.406835 ( 35.997831) reuse session: 0.305480 0.119427 0.424907 ( 36.882044)

After the PR without caching

08:54:58 PM user system total real don't reuse session: 0.339462 0.088985 0.428447 ( 36.511242) reuse session: 0.320082 0.097038 0.417120 ( 37.105757)

After the PR with caching
                       user     system      total        real

don't reuse session: 0.338105 0.107970 0.446075 ( 36.216895) reuse session: 0.224127 0.066317 0.290444 ( 5.896962)

Summary
Don't reuse the session (s) Reuse the session (s) Improvement
Before the PR 35.997831 36.882044 0.976
After the PR (without caching) 36.511242 37.105757 0.984
After the PR (with caching) 36.216895 5.896962 6.142

The improvement ratio is the time of Don't reuse the session over the time of Reuse the session.

HelloServerBearSSL

Before the PR

                       user     system      total        real

don't reuse session: 0.304455 0.081010 0.385465 (184.723887) reuse session: 0.321123 0.100000 0.421123 (184.567364)

After the PR without caching

                       user     system      total        real

don't reuse session: 0.344634 0.115067 0.459701 (187.112183) reuse session: 0.371673 0.073724 0.445397 (185.084550)

After the PR with caching

                       user     system      total        real

don't reuse session: 0.345185 0.102189 0.447374 (185.611363) reuse session: 0.221588 0.078481 0.300069 ( 7.925369)

Summary
Don't reuse the session (s) Reuse the session (s) Improvement
Before the PR 184.723887 184.567364 1.001
After the PR (without caching) 187.112183 185.084550 1.011
After the PR (with caching) 185.611363 7.925369 23.4199

The improvement ratio is the time of Don't reuse the session over the time of Reuse the session.

Analysis

Those numbers show that this PR makes the HTTPS requests about 25x faster with an RSA key and 6x with an EC key when caching is enabled. When caching isn't enabled, this PR doesn't seem to negatively affect performance at all.

We can see that BearSSL_Server is faster than HelloServerBearSSL and its improvement is greater, because this PR only improves the TLS handshake, so the longer the server takes to parse the request and create a response, the less improvement there is and HelloServerBearSSL implements a web server which is slower than BearSSL_Server that answers all requests with the same response without parsing them.

It is to be noted that, in the script that reuses the session, the time of the first request of the session isn't counted because this PR doesn't improve it.

Testing the TLS handshake improvement

The previous test was testing the speed improvement for the full HTTP request, but this PR should only improves the TLS handshake.
In order to see it I've tested the BearSSL_Server with an RSA and an EC key using Firefox's network timing analyzer.
This test is a little less rigourous, because I didn't do it 100 times like the others, but it allows us to see the improvement for each part of the request.

RSA key

Before the PR

rsa-before

After the PR without caching

rsa-after-nc

After the PR with caching

First request:
rsa-after-wc-1

All subsequent requests:
rsa-after-wc-2

Summary

Connection TLS Setup Waiting Total without connection
Measure (ms) Improvement Measure (ms) Improvement Measure (ms) Improvement Measure (ms) Improvement
Before the PR 6 1 1730 1 96 1 1826 1
After the PR, without caching 48 0.125 1730 1 92 1.043 1822 1.002
After the PR, with caching, not cached 52 0.115 1740 0.994 90 1.067 1830 0.998
After the PR, with caching, cached 3 2 39 44.359 15 6.4 54 33.815

The improvement is the measure before the PR over the current measure.

EC key

Before the PR

ec-before

After the PR without caching

ec-after-nc

After the PR with caching

First request:
ec-after-wc-1

All subsequent requests:
ec-after-wc-2

Summary

Connection TLS Setup Waiting Total without connection
Measure (ms) Improvement Measure (ms) Improvement Measure (ms) Improvement Measure (ms) Improvement
Before the PR 28 1 309 1 38 1 347 1
After the PR, without caching 2 14 312 0.99 39 0.974 351 0.989
After the PR, with caching, not cached 41 0.683 309 1 44 0.864 353 0.983
After the PR, with caching, cached 3 9.333 84 3.679 13 2.923 97 3.577

The improvement is the measure before the PR over the current measure.

Analysis

These numbers show the same thing as the previous tests: this PR greatly improves the speed of cached requests and doesn't have any noticeable downside.

However, this test shows clearly shows how much the TLS handshake is a bottleneck without this PR and how much it's improved when the server caches the client's sessions.

Somehow it also slightly improves the waiting time for the server response. I don't think it means that the server decrypts the request faster or processes it faster in any way. I simply think that this is because when resuming cached sessions the client ends the handshake instead of the server. This means that the client can start sending the application data at the same time as it sends the TLS record to end the handshake. This would therefore reduce the time the client has to wait for the server response.

You can see this at page 35 and 36 of the TLS 1.2 standard:

  Client                                               Server

  ClientHello                  -------->
                                                  ServerHello
                                                 Certificate*
                                           ServerKeyExchange*
                                          CertificateRequest*
                               <--------      ServerHelloDone
  Certificate*
  ClientKeyExchange
  CertificateVerify*
  [ChangeCipherSpec]
  Finished                     -------->
                                           [ChangeCipherSpec]
                               <--------             Finished
  Application Data             <------->     Application Data

         Figure 1.  Message flow for a full handshake

  Client                                                Server

  ClientHello                   -------->
                                                   ServerHello
                                            [ChangeCipherSpec]
                                <--------             Finished
  [ChangeCipherSpec]
  Finished                      -------->
  Application Data              <------->     Application Data

      Figure 2.  Message flow for an abbreviated handshake

Conclusion

This PR doesn't seem to have any negative performance impact, only positive ones. Once enabled by the user, it will increase performance of cached sessions by 20 to 25 times depending the type of encryption used. The slower the encryption is, the more this feature will boost performances. However, users need to be well aware that the TLS client they're using also needs to cache the session in order to experiment this performance boost. This is why it was clearly mentionned in the documentation.