Improving the implementation of cryptography in Tezos Octez (original) (raw)


A safety-critical program is only as trustworthy as the libraries it relies on, so we at Nomadic Labs pay close attention to our tools and dependencies — i.e. to our toolchain.

Our toolchain is based on HACL*; a verified library of cryptographic primitives, which include the hash functions which are the backbone of blockchain technology (principally: Blake2), and the digital signatures which we use to assure transaction authenticity (Ed25519, P-256, and secp256k1 2).

In this blog post, we’ll discuss recent improvements to our cryptography toolchain, and how we integrated them into practical OCaml programming of the Tezos Octez implementation. Namely:

In this blog post, we’ll survey

2. The HACL* library

2.1 Quick, correct, and compatible: pick three

HACL* is a cryptographic library1 offering a comprehensive collection of cryptographic primitives (we give three examples below).

HACL* is complex, safety-critical code. Accordingly, it is written in and formally verified using the F* language, and then extracted (compiled) to correct and efficient C using KreMLin, a tool which translates a subset of F* to C. KreMLin also facilitates using the generated C code in OCaml projects — such as the Tezos Octez implementation — by automatically building the scaffolding that developers would otherwise write themselves.

The resulting code offers three formal safety guarantees:

  1. Functional correctness: the code’s behaviour complies with its specification.
  2. Memory safety: memory is managed correctly — so no buffer overflows, dereferencing null pointers, accessing memory after it has been freed, etc.
  3. Secret independence: the C instructions executed, in what order, and any memory accesses, do not depend on any secret values. This protects against timing attacks that might slurp unintended information.

For more details see the original HACL* paper.

The point of this toolstack is that it gives us three important properties which it is far from trivial to reconcile:

  1. Safety guarantees as described above, while
  2. retaining the state-of-the-art performance required of a cryptographic library in a real-world system, and
  3. safely interfacing with the code from within OCaml.

2.2 Three cryptographic primitives

The Tezos Octez implementation has been using HACL* since before its launch, and Nomadic Labs has been actively supporting the continued development of HACL*: via grants to the Prosecco team at Inria Paris; and through the work of engineers at Nomadic.

HACL* offers implementations for all but one2 of the core crytographic primitives used by the Nomadic Labs Octez implementation of Tezos.

Let’s survey three examples, which are cryptographic primitives recently introduced into HACL* and relevant to the Tezos implementation:

2.2.1 P-256

P-256 (also called secp256r1) is an elliptic curve signature algorithm and one of the three signature schemes supported in Tezos: P-256; Ed25519; and secp256k1. P-256 is a NIST standard with wide industry support. It allows interoperability with HSMs (hardware security modules), including hardware wallets and Apple’s Secure Enclave.

Any Tezos address generated with P-256 starts with tz3.

A verified implementation of P-256 by the Prosecco team is now in HACL* and replaces our previous library.

2.2.2 SHA-3

Version 1 of the Tezos protocol environment introduced three hash functions based on the Keccak algorithm:

Starting with the Edo protocol upgrade, these three hash functions are available as Michelson opcodes, alongside the previously-present hash functions BLAKE2 and SHA-256 and SHA-512 (which are two versions of SHA-2).

2.2.3 BLAKE2

BLAKE2 is the main hash function of the Tezos protocol.BLAKE2 hashes everything from individual keys and messages, to whole blocks.3

We now use BLAKE2 via its new HACL* implementation. This gives us the three formal safety guarantees above, and furthermore HACL* offers

Modern cryptographic algorithms, including BLAKE2, are often designed from the ground up to allow implementors to make certain optimizations where possible, such as using hardware features like AVX2 or SIMD mentioned above where the hardware supports this. See a paper on how this fits into HACL*’s verification pipeline: for example, the SIMD implementation is around 30% faster than the non-SIMD one.4

3. EverCrypt: an API for HACL* (and more)

3.1 How EverCrypt enhances HACL*

Having a great library like HACL* is one thing. But how to package its features for developers to efficiently deploy in working code? This is where EverCrypt can help.

EverCrypt is a cryptographic provider that bundles the cryptographic primitives present in HACL* into a unified package. EverCrypt — which like HACL* is written in F* and extracted to correct and efficient C using KreMLin — does not replace HACL* so much as provide an additional interface to access the suite of cryptographic primitives offered by HACL*. EverCrypt also bundles routines written in highly-optimized assembly verified with a tool called Vale.5

We can sum this up as follows:

3.2 Advantages of having a cryptographic provider

In practice, accessing cryptographic primitives through EverCrypt offers two advantages over accessing them directly (e.g. using HACL*): multiplexing and agile interfaces.

See this paper for a much more in-depth look at how EverCrypt works and how it supports the development of verified applications.

3.3 The OCaml API

So now we have a cryptographic library (HACL*) and a cryptographic provider (EverCrypt).

There remains a technical issue: calling C functions (like those of HACL* and EveryCrypt) from an OCaml application (like Octez). How best to call a C function from OCaml?

It depends! One common mechanism is a foreign function interface (FFI). The OCaml FFI allows developers to call C functions; it’s up to the developer to write a binding that matches the signature of the C function, and to manage the relevant memory.

Interfacing with an external library using the OCaml FFI works, but it can be error-prone, time-consuming, and can duplicate effort for each project that uses the external library.

To ameliorate this and facilitate adoption in the OCaml ecosystem, EverCrypt and HACL* support an OCaml API primarily developed by Nomadic Labs. Released as the hacl-star package on opam, this provides an idiomatic, high-level interface to the EverCrypt and HACL* APIs. Starting with the recently-released version 0.4, the API is also fully documented.

As is often the case in programming, a more convenient interface is also a safer one, and hacl-star offers safety benefits compared to binding with the C code directly:

  1. The lower-level bindings which interact directly with the C code are automatically generated as part of the compilation of F* code to C and are thus valid by construction with respect to the C code.
  2. In the higher-level interface, all function calls check the preconditions that these functions have in F*. The formal guarantees of the verified code only hold if the arguments satisfy these preconditions (such as buffers being a certain length or not passing the same buffer as multiple arguments). This prevents developers from using the API incorrectly, and is therefore safer.

The rest of this blog post dives into a description of how the tools above actually get invoked in the Octez codebase. We are proud of our code and would love you to read this — but you are also welcome to skip to the further reading!

4. A deep dive into the OCaml API

The OCaml API is split into

Let’s look in more detail at how hacl-star-raw replaces manually-written bindings with automatically generated ones, and how hacl-star builds on top of this to offer a convenient, safer API.

4.1 The low-level API hacl-star-raw

Consider the example of the SHA-256 function from HACL*. Its C signature is:

void Hacl_Hash_SHA2_hash_256(uint8_t *input, uint32_t input_len, uint8_t *dst);

To call it from OCaml we can write a C stub file and match the C types with compatible OCaml FFI types:

`#include <caml/mlvalues.h> #include <caml/bigarray.h> #include "Hacl_Hash.h"

CAMLprim value ml_Hacl_Hash_SHA2_hash_256(value input, value input_len, value dst) { Hacl_Hash_SHA2_hash_256(String_val(input), Int_val(input_len) String_val(dst)); return Val_unit; } `

Then we can bind this external function to an OCaml function which we can call in the rest of the code as we would any OCaml function:

external sha2_256_hash : Bytes.t -> int -> Bytes.t -> unit = "ml_Hacl_Hash_SHA2_hash_256" [@@noalloc]

However, there’s a simpler way to write bindings.

Using the Ctypes library, we can just write OCaml declarations for the C functions that we want to bind, and the library takes care of the rest. A Ctypes declaration for the SHA-256 example above would look like this:

`open Ctypes module Bindings(F:Cstubs.FOREIGN) = struct open F

let hacl_Hash_SHA2_hash_256 =
  foreign "Hacl_Hash_SHA2_hash_256"
    (ocaml_bytes @-> uint32_t @-> ocaml_bytes @-> returning void)

end `

This is cleaner than before, but we’ve only bound a single function – the SHA-256 hash. We would need to write one such binding for every function from the C library that we want to use in OCaml.

We now take this a step further by having the KreMLin tool automatically generate these Ctypes declarations, at the same time as the C code. This automation saves time and effort and furthermore it ensures that the signatures are correct with respect to the original F* code and the extracted C code, and that they remain in sync when the F* code changes.

This functionality in KreMLin can be fine-tuned,

The output is a collection of bindings that resemble the snippet above, along with the required Ctypes boilerplate and a .depend file listing dependencies between the generated bindings to be used as part of the build.

Thus, we generate bindings for the entirety of the EverCrypt/HACL* code every time a new snapshot is produced. This is then all packaged as the hacl-star-raw opam package.

4.2 The high-level API hacl-star

hacl-star-raw is better than writing C bindings by hand, but it still offers a low-level, C style interface to the library.

To complement this we developed a handwritten idiomatic OCaml API. This further improves convenience and safety; primarily because the preconditions of the original F* functions (which are lost when compiling the F* code to C) can be enforced at runtime.

To illustrate, let’s look at the original F* signature (permalink) of the SHA-256 function above:

`module B = LowStar.Buffer

let hash_st (a: hash_alg) = input:B.buffer uint8 -> input_len:size_t { B.length input = v input_len } -> dst:hash_t a-> ST.Stack unit (requires (fun h -> B.live h input /
B.live h dst /
B.disjoint input dst /
B.length input <= max_input_length a)) (ensures (fun h0 _ h1 -> B.(modifies (loc_buffer dst) h0 h1) /
Seq.equal (B.as_seq h1 dst) (Spec.Agile.Hash.hash a (B.as_seq h0 input))))

[...]

val hash_256: hash_st SHA2_256 `

Let’s break hash_st down:

  1. We can see that hash_256 is an instantiation of a generic SHA-2 function sha_st, parameterized by the specific algorithm. This is a pervasive pattern throughout the library.
  2. Just as we specified in the OCaml binding above, it takes three arguments:
    1. input, which is a uint8_t buffer,
    2. input_len, with a refinement specifying that the size of input must be equal to input_len, and
    3. dst of type hash_t a, which is also a uint8_t buffer of the correct digest size for algorithm a.

Glossing over some of the details, we see in the requires clause above that the function hash_st (and therefore hash_256 too) has certain liveness, disjointness, and other preconditions. In particular:

In the original F* these preconditions are statically checked when compiling code that calls hash_256. But in OCaml we only get to work with the extracted C code, in which this information has been erased (as seen in the C signature of the function). Users of the C library must make sure that the arguments they pass respect the original preconditions of these functions.

In the OCaml API, we do check all of these preconditions at runtime. For example, this is what the functor used internally for hash functions roughly looks like:

module Make_HashFunction (C: Buffer) (Impl : sig val hash_alg : alg val hash : C.buf -> uint32 -> C.buf -> unit end) = struct let hash ~msg ~digest = check_max_input_len Impl.hash_alg (C.size msg); assert (C.size digest = digest_len Impl.hash_alg); assert (C.disjoint msg digest); Impl.hash (C.ctypes_buf msg) (C.size_uint32 msg) (C.ctypes_buf digest) end

Make_HashFunction is parameterized both by

Before calling the bound C function, the buffers are checked to have the correct size and to be disjoint (which for Bytes simply means checking for inequality).

4.3 Using the high-level API

We’ve seen how the OCaml library comes together, from the HACL* C code, to the low-level Ctypes bindings, to the idiomatic OCaml API. Now let’s look at how it can be used.

Most cryptographic algorithms exposed through this OCaml API can be called in more than one way, to suit different use cases. Our SHA-256 example above will conveniently illustrate them.

The API is split in two interfaces:

In Hacl, SHA-256 can be used in two styles:

  1. Hacl.SHA2_256.hash which takes a buffer representing the message and returns the digest
  2. Hacl.SHA2_256.Noalloc.hash which, more in keeping with the C style, takes as inputs both the message buffer and the output buffer into which the digest will be written

The first style is usually more convenient, but the second style can be useful in cases where an output buffer has already been allocated and we don’t want to allocate a new one. The choice is the programmer’s to make: most modules in Hacl and EverCrypt offer both styles above.

EverCrypt offers three further options:

  1. EverCrypt.SHA2_256 is an identical interface to Hacl.SHA2_256, but with a different underlying implementation: Hacl.SHA2_256 uses the portable C implementation of SHA-256; whereas EverCrypt.SHA2_256 uses the multiplexing EverCrypt interface which automatically uses code relying on Intel SHA extensions if the architecture supports it.
  2. EverCrypt.Hash.hash is an agile and multiplexing interface to all hashing functions supported in EverCrypt and it is parameterized by the hashing algorithm:
    let digest = EverCrypt.Hash.hash ~alg:SHA2_256 ~msg  
      
  1. The EverCrypt.Hash incremental hashing interface can be used when we need to update the internal state repeatedly before generating a digest:
    let st = EverCrypt.Hash.init ~alg:SHA2_256 in  
    EverCrypt.Hash.update ~st ~msg; (* can be called multiple times *)  
    let digest = EverCrypt.Hash.finish ~st  
      

5. Further reading

Taken together, these changes constitute a significant investment in improving our codebase, which has brought us up-to-date with the best technology available and allowed us to increase reliability and flexibility, without compromising on performance.

You can read further in:

We hope this survey has been informative and may inspire you to use EverCrypt in your own OCaml projects too!

Acknowledgements

Thanks to Karthikeyan Bhargavan, Natalia Kulatova, and Marina Polubelova of the Prosecco team at INRIA, and to Jonathan Protzenko of Microsoft Research, for their help, support, and collaboration. At Nomadic Labs, the work reported on in this article was carried out principally by Victor Dumitrescu.