Recursive-length prefix (RLP) serialization (original) (raw)

Recursive Length Prefix (RLP) serialization is used extensively in Ethereum's execution clients. RLP standardizes the transfer of data between nodes in a space-efficient format. The purpose of RLP is to encode arbitrarily nested arrays of binary data, and RLP is the primary encoding method used to serialize objects in Ethereum's execution layer. The main purpose of RLP is to encode structure; with the exception of positive integers, RLP delegates encoding specific data types (e.g. strings, floats) to higher-order protocols. Positive integers must be represented in big-endian binary form with no leading zeroes (thus making the integer value zero equivalent to the empty byte array). Deserialized positive integers with leading zeroes must be treated as invalid by any higher-order protocol using RLP.

More information in the Ethereum yellow paper (Appendix B)(opens in a new tab).

To use RLP to encode a dictionary, the two suggested canonical forms are:

Definition

The RLP encoding function takes in an item. An item is defined as follows:

For example, all of the following are items:

Note that in the context of the rest of this page, 'string' means "a certain number of bytes of binary data"; no special encodings are used, and no knowledge about the content of the strings is implied (except as required by the rule against non-minimal positive integers).

RLP encoding is defined as follows:

In code, this is:

1def rlp_encode(input):

2 if isinstance(input,str):

3 if len(input) == 1 and ord(input) < 0x80:

4 return input

5 return encode_length(len(input), 0x80) + input

6 elif isinstance(input, list):

7 output = ''

8 for item in input:

9 output += rlp_encode(item)

10 return encode_length(len(output), 0xc0) + output

11

12def encode_length(L, offset):

13 if L < 56:

14 return chr(L + offset)

15 elif L < 256**8:

16 BL = to_binary(L)

17 return chr(len(BL) + offset + 55) + BL

18 raise Exception("input too long")

19

20def to_binary(x):

21 if x == 0:

22 return ''

23 return to_binary(int(x / 256)) + chr(x % 256)

Examples

RLP decoding

According to the rules and process of RLP encoding, the input of RLP decode is regarded as an array of binary data. The RLP decoding process is as follows:

  1. according to the first byte (i.e. prefix) of input data and decoding the data type, the length of the actual data and offset;
  2. according to the type and offset of data, decode the data correspondingly, respecting the minimal encoding rule for positive integers;
  3. continue to decode the rest of the input;

Among them, the rules of decoding data types and offset is as follows:

  1. the data is a string if the range of the first byte (i.e. prefix) is [0x00, 0x7f], and the string is the first byte itself exactly;
  2. the data is a string if the range of the first byte is [0x80, 0xb7], and the string whose length is equal to the first byte minus 0x80 follows the first byte;
  3. the data is a string if the range of the first byte is [0xb8, 0xbf], and the length of the string whose length in bytes is equal to the first byte minus 0xb7 follows the first byte, and the string follows the length of the string;
  4. the data is a list if the range of the first byte is [0xc0, 0xf7], and the concatenation of the RLP encodings of all items of the list which the total payload is equal to the first byte minus 0xc0 follows the first byte;
  5. the data is a list if the range of the first byte is [0xf8, 0xff], and the total payload of the list whose length is equal to the first byte minus 0xf7 follows the first byte, and the concatenation of the RLP encodings of all items of the list follows the total payload of the list;

In code, this is:

1def rlp_decode(input):

2 if len(input) == 0:

3 return

4 output = ''

5 (offset, dataLen, type) = decode_length(input)

6 if type is str:

7 output = instantiate_str(substr(input, offset, dataLen))

8 elif type is list:

9 output = instantiate_list(substr(input, offset, dataLen))

10 output += rlp_decode(substr(input, offset + dataLen))

11 return output

12

13def decode_length(input):

14 length = len(input)

15 if length == 0:

16 raise Exception("input is null")

17 prefix = ord(input[0])

18 if prefix <= 0x7f:

19 return (0, 1, str)

20 elif prefix <= 0xb7 and length > prefix - 0x80:

21 strLen = prefix - 0x80

22 return (1, strLen, str)

23 elif prefix <= 0xbf and length > prefix - 0xb7 and length > prefix - 0xb7 + to_integer(substr(input, 1, prefix - 0xb7)):

24 lenOfStrLen = prefix - 0xb7

25 strLen = to_integer(substr(input, 1, lenOfStrLen))

26 return (1 + lenOfStrLen, strLen, str)

27 elif prefix <= 0xf7 and length > prefix - 0xc0:

28 listLen = prefix - 0xc0;

29 return (1, listLen, list)

30 elif prefix <= 0xff and length > prefix - 0xf7 and length > prefix - 0xf7 + to_integer(substr(input, 1, prefix - 0xf7)):

31 lenOfListLen = prefix - 0xf7

32 listLen = to_integer(substr(input, 1, lenOfListLen))

33 return (1 + lenOfListLen, listLen, list)

34 raise Exception("input does not conform to RLP encoding form")

35

36def to_integer(b):

37 length = len(b)

38 if length == 0:

39 raise Exception("input is null")

40 elif length == 1:

41 return ord(b[0])

42 return ord(substr(b, -1)) + to_integer(substr(b, 0, -1)) * 256

Further reading

Was this article helpful?