packed_data - Documentation for Ruby 3.5 (original) (raw)
Packed Data¶ ↑
Quick Reference¶ ↑
These tables summarize the directives for packing and unpacking.
For Integers¶ ↑
Directive | Meaning |
---|---|
C | 8-bit unsigned (unsigned char) |
S | 16-bit unsigned, native endian (uint16_t) |
L | 32-bit unsigned, native endian (uint32_t) |
Q | 64-bit unsigned, native endian (uint64_t) |
J | pointer width unsigned, native endian (uintptr_t) |
c | 8-bit signed (signed char) s | 16-bit signed, native endian (int16_t) l | 32-bit signed, native endian (int32_t) q | 64-bit signed, native endian (int64_t) j | pointer width signed, native endian (intptr_t)
S_ S! | unsigned short, native endian I I_ I! | unsigned int, native endian L_ L! | unsigned long, native endian Q_ Q! | unsigned long long, native endian | (raises ArgumentError if the platform has no long long type) J! | uintptr_t, native endian (same with J)
s_ s! | signed short, native endian i i_ i! | signed int, native endian l_ l! | signed long, native endian q_ q! | signed long long, native endian | (raises ArgumentError if the platform has no long long type) j! | intptr_t, native endian (same with j)
S> s> S!> s!> | each the same as the directive without >, but big endian L> l> L!> l!> | S> is the same as n I!> i!> | L> is the same as N Q> q> Q!> q!> | J> j> J!> j!> |
S< s< S!< s!< | each the same as the directive without <, but little endian L< l< L!< l!< | S< is the same as v I!< i!< | L< is the same as V Q< q< Q!< q!< | J< j< J!< j!< |
n | 16-bit unsigned, network (big-endian) byte order N | 32-bit unsigned, network (big-endian) byte order v | 16-bit unsigned, VAX (little-endian) byte order V | 32-bit unsigned, VAX (little-endian) byte order
U | UTF-8 character w | BER-compressed integer
For Floats¶ ↑
Directive | Meaning |
---|---|
D d | double-precision, native format |
F f | single-precision, native format |
E | double-precision, little-endian byte order |
e | single-precision, little-endian byte order |
G | double-precision, network (big-endian) byte order |
g | single-precision, network (big-endian) byte order |
For Strings¶ ↑
Directive | Meaning |
---|---|
A | arbitrary binary string (remove trailing nulls and ASCII spaces) |
a | arbitrary binary string |
Z | null-terminated string |
B | bit string (MSB first) |
b | bit string (LSB first) |
H | hex string (high nibble first) |
h | hex string (low nibble first) |
u | UU-encoded string |
M | quoted-printable, MIME encoding (see RFC2045) |
m | base64 encoded string (RFC 2045) (default) |
| (base64 encoded string (RFC 4648) if followed by 0)
P | pointer to a structure (fixed-length string) p | pointer to a null-terminated string
Additional Directives for Packing¶ ↑
Directive | Meaning |
---|---|
@ | moves to absolute position |
X | back up a byte |
x | null byte |
Additional Directives for Unpacking¶ ↑
Directive | Meaning |
---|---|
@ | skip to the offset given by the length argument |
X | skip backward one byte |
x | skip forward one byte |
Packing and Unpacking¶ ↑
Certain Ruby core methods deal with packing and unpacking data:
- Method Array#pack: Formats each element in array
self
into a binary string; returns that string. - Method String#unpack: Extracts data from string
self
, forming objects that become the elements of a new array; returns that array. - Method String#unpack1: Does the same, but unpacks and returns only the first extracted object.
Each of these methods accepts a string template
, consisting of zero or more directive characters, each followed by zero or more modifier characters.
Examples (directive 'C'
specifies ‘unsigned character’):
[65].pack('C')
[65, 66].pack('CC')
[65, 66].pack('C')
[65].pack('')
[65].pack('CC')
'A'.unpack('C')
'AB'.unpack('CC')
'AB'.unpack('C')
'A'.unpack('CC')
'AB'.unpack('')
The string template
may contain any mixture of valid directives (directive 'c'
specifies ‘signed character’):
[65, -1].pack('cC')
"A\xFF".unpack('cC')
The string template
may contain whitespace (which is ignored) and comments, each of which begins with character '#'
and continues up to and including the next following newline:
[0,1].pack(" C #foo \n C ")
"\0\1".unpack(" C #foo \n C ")
Any directive may be followed by either of these modifiers:
'*'
- The directive is to be applied as many times as needed:
[65, 66].pack('C*')
'AB'.unpack('C*')- Integer
count
- The directive is to be appliedcount
times:
[65, 66].pack('C2')
[65, 66].pack('C3')
'AB'.unpack('C2')
'AB'.unpack('C3')
Note: Directives in%w[A a Z m]
usecount
differently; see String Directives.
If elements don’t fit the provided directive, only least significant bits are encoded:
[257].pack("C").unpack("C")
Packing Method¶ ↑
Method Array#pack accepts optional keyword argument buffer
that specifies the target string (instead of a new string):
[65, 66].pack('C*', buffer: 'foo')
The method can accept a block:
[65, 66].pack('C*') {|s| p s }
Unpacking Methods¶ ↑
Methods String#unpack and String#unpack1 each accept an optional keyword argument offset
that specifies an offset into the string:
'ABC'.unpack('C*', offset: 1)
'ABC'.unpack1('C*', offset: 1)
Both methods can accept a block:
ret = [] "ABCD".unpack("C*") {|c| ret << c } ret
'AB'.unpack1('C*') {|ele| p ele }
Integer Directives¶ ↑
Each integer directive specifies the packing or unpacking for one element in the input or output array.
8-Bit Integer Directives¶ ↑
'c'
- 8-bit signed integer (like Csigned char
):
[0, 1, 255].pack('c*')
s = [0, 1, -1].pack('c*')
s.unpack('c*')'C'
- 8-bit unsigned integer (like Cunsigned char
):
[0, 1, 255].pack('C*')
s = [0, 1, -1].pack('C*')
s.unpack('C*')
16-Bit Integer Directives¶ ↑
's'
- 16-bit signed integer, native-endian (like Cint16_t
):
[513, -514].pack('s*')
s = [513, 65022].pack('s*')
s.unpack('s*')'S'
- 16-bit unsigned integer, native-endian (like Cuint16_t
):
[513, -514].pack('S*')
s = [513, 65022].pack('S*')
s.unpack('S*')'n'
- 16-bit network integer, big-endian:
s = [0, 1, -1, 32767, -32768, 65535].pack('n*')
s.unpack('n*')'v'
- 16-bit VAX integer, little-endian:
s = [0, 1, -1, 32767, -32768, 65535].pack('v*')
s.unpack('v*')
32-Bit Integer Directives¶ ↑
'l'
- 32-bit signed integer, native-endian (like Cint32_t
):
s = [67305985, -50462977].pack('l*')
s.unpack('l*')'L'
- 32-bit unsigned integer, native-endian (like Cuint32_t
):
s = [67305985, 4244504319].pack('L*')
s.unpack('L*')'N'
- 32-bit network integer, big-endian:
s = [0,1,-1].pack('N*')
s.unpack('N*')'V'
- 32-bit VAX integer, little-endian:
s = [0,1,-1].pack('V*')
s.unpack('v*')
64-Bit Integer Directives¶ ↑
'q'
- 64-bit signed integer, native-endian (like Cint64_t
):
s = [578437695752307201, -506097522914230529].pack('q*')
s.unpack('q*')'Q'
- 64-bit unsigned integer, native-endian (like Cuint64_t
):
s = [578437695752307201, 17940646550795321087].pack('Q*')
s.unpack('Q*')
Platform-Dependent Integer Directives¶ ↑
'i'
- Platform-dependent width signed integer, native-endian (like Cint
):
s = [67305985, -50462977].pack('i*')
s.unpack('i*')'I'
- Platform-dependent width unsigned integer, native-endian (like Cunsigned int
):
s = [67305985, -50462977].pack('I*')
s.unpack('I*')'j'
- Pointer-width signed integer, native-endian (like Cintptr_t
):
s = [67305985, -50462977].pack('j*')
s.unpack('j*')'J'
- Pointer-width unsigned integer, native-endian (like Cuintptr_t
):
s = [67305985, 4244504319].pack('J*')
s.unpack('J*')
Other Integer Directives¶ ↑
'U'
- UTF-8 character:
s = [4194304].pack('U*')
s.unpack('U*')'w'
- BER-encoded integer (see BER encoding):
s = [1073741823].pack('w*')
s.unpack('w*')
Modifiers for Integer Directives¶ ↑
For the following directives, '!'
or '_'
modifiers may be suffixed as underlying platform’s native size.
'i'
,'I'
- Cint
, always native size.'s'
,'S'
- Cshort
.'l'
,'L'
- Clong
.'q'
,'Q'
- Clong long
, if available.'j'
,'J'
- Cintptr_t
, always native size.
Native size modifiers are silently ignored for always native size directives.
The endian modifiers also may be suffixed in the directives above:
'>'
- Big-endian.'<'
- Little-endian.
Float Directives¶ ↑
Each float directive specifies the packing or unpacking for one element in the input or output array.
Single-Precision Float Directives¶ ↑
'F'
or'f'
- Native format:
s = [3.0].pack('F')
s.unpack('F')'e'
- Little-endian:
s = [3.0].pack('e')
s.unpack('e')'g'
- Big-endian:
s = [3.0].pack('g')
s.unpack('g')
Double-Precision Float Directives¶ ↑
'D'
or'd'
- Native format:
s = [3.0].pack('D')
s.unpack('D')'E'
- Little-endian:
s = [3.0].pack('E')
s.unpack('E')'G'
- Big-endian:
s = [3.0].pack('G')
s.unpack('G')
A float directive may be infinity or not-a-number:
inf = 1.0/0.0
[inf].pack('f')
"\x00\x00\x80\x7F".unpack('f')
nan = inf/inf
[nan].pack('f')
"\x00\x00\xC0\x7F".unpack('f')
String Directives¶ ↑
Each string directive specifies the packing or unpacking for one byte in the input or output string.
Binary String Directives¶ ↑
'A'
- Arbitrary binary string (space padded; count is width);nil
is treated as the empty string:
['foo'].pack('A')
['foo'].pack('A*')
['foo'].pack('A2')
['foo'].pack('A4')
[nil].pack('A')
[nil].pack('A*')
[nil].pack('A2')
[nil].pack('A4')
"foo\0".unpack('A')
"foo\0".unpack('A4')
"foo\0bar".unpack('A10')
"foo ".unpack('A')
"foo ".unpack('A4')
"foo".unpack('A4')
russian = "\u{442 435 441 442}"
russian.size
russian.bytesize
[russian].pack('A')
[russian].pack('A*')
russian.unpack('A')
russian.unpack('A2')
russian.unpack('A4')
russian.unpack('A*')'a'
- Arbitrary binary string (null padded; count is width):
["foo"].pack('a')
["foo"].pack('a*')
["foo"].pack('a2')
["foo\0"].pack('a4')
[nil].pack('a')
[nil].pack('a*')
[nil].pack('a2')
[nil].pack('a4')
"foo\0".unpack('a')
"foo\0".unpack('a4')
"foo ".unpack('a4')
"foo".unpack('a4')
"foo\0bar".unpack('a4')'Z'
- Same as'a'
, except that null is added or ignored with'*'
:
["foo"].pack('Z*')
[nil].pack('Z*')
"foo\0".unpack('Z*')
"foo".unpack('Z*')
"foo\0bar".unpack('Z*')
Bit String Directives¶ ↑
'B'
- Bit string (high byte first):
['11111111' + '00000000'].pack('B*')
['10000000' + '01000000'].pack('B*')
['1'].pack('B0')
['1'].pack('B1')
['1'].pack('B2')
['1'].pack('B3')
['1'].pack('B4')
['1'].pack('B5')
['1'].pack('B6')
"\xff\x00".unpack("B*")
"\x01\x02".unpack("B*")
"".unpack("B0")
"\x80".unpack("B1")
"\x80".unpack("B2")
"\x80".unpack("B3")'b'
- Bit string (low byte first):
['11111111' + '00000000'].pack('b*')
['10000000' + '01000000'].pack('b*')
['1'].pack('b0')
['1'].pack('b1')
['1'].pack('b2')
['1'].pack('b3')
['1'].pack('b4')
['1'].pack('b5')
['1'].pack('b6')
"\xff\x00".unpack("b*")
"\x01\x02".unpack("b*")
"".unpack("b0")
"\x01".unpack("b1")
"\x01".unpack("b2")
"\x01".unpack("b3")
Hex String Directives¶ ↑
'H'
- Hex string (high nibble first):
['10ef'].pack('H*')
['10ef'].pack('H0')
['10ef'].pack('H3')
['10ef'].pack('H5')
['fff'].pack('H3')
['fff'].pack('H4')
['fff'].pack('H5')
['fff'].pack('H6')
['fff'].pack('H7')
['fff'].pack('H8')
"\x10\xef".unpack('H*')
"\x10\xef".unpack('H0')
"\x10\xef".unpack('H1')
"\x10\xef".unpack('H2')
"\x10\xef".unpack('H3')
"\x10\xef".unpack('H4')
"\x10\xef".unpack('H5')'h'
- Hex string (low nibble first):
['10ef'].pack('h*')
['10ef'].pack('h0')
['10ef'].pack('h3')
['10ef'].pack('h5')
['fff'].pack('h3')
['fff'].pack('h4')
['fff'].pack('h5')
['fff'].pack('h6')
['fff'].pack('h7')
['fff'].pack('h8')
"\x01\xfe".unpack('h*')
"\x01\xfe".unpack('h0')
"\x01\xfe".unpack('h1')
"\x01\xfe".unpack('h2')
"\x01\xfe".unpack('h3')
"\x01\xfe".unpack('h4')
"\x01\xfe".unpack('h5')
Pointer String Directives¶ ↑
'P'
- Pointer to a structure (fixed-length string):
s = ['abc'].pack('P')
s.unpack('P*')
".".unpack("P")
("\0" * 8).unpack("P")
[nil].pack("P")'p'
- Pointer to a null-terminated string:
s = ['abc'].pack('p')
s.unpack('p*')
".".unpack("p")
("\0" * 8).unpack("p")
[nil].pack("p")
Other String Directives¶ ↑
'M'
- Quoted printable, MIME encoding; text mode, but input must use LF and output LF; (see RFC 2045):
["a b c\td \ne"].pack('M')
["\0"].pack('M')
["a"*1023].pack('M') == ("a"*73+"=\n")*14+"a=\n"
("a"*73+"=\na=\n").unpack('M') == ["a"*74]
(("a"*73+"=\n")*14+"a=\n").unpack('M') == ["a"*1023]
"a b c\td =\n\ne=\n".unpack('M')
"=00=\n".unpack('M')
"pre=31=32=33after".unpack('M')
"pre=\nafter".unpack('M')
"pre=\r\nafter".unpack('M')
"pre=".unpack('M')
"pre=\r".unpack('M')
"pre=hoge".unpack('M')
"pre==31after".unpack('M')
"pre===31after".unpack('M')'m'
- Base64 encoded string; count specifies input bytes between each newline, rounded down to nearest multiple of 3; if count is zero, no newlines are added; (see RFC 4648):
[""].pack('m')
["\0"].pack('m')
["\0\0"].pack('m')
["\0\0\0"].pack('m')
["\377"].pack('m')
["\377\377"].pack('m')
["\377\377\377"].pack('m')
"".unpack('m')
"AA==\n".unpack('m')
"AAA=\n".unpack('m')
"AAAA\n".unpack('m')
"/w==\n".unpack('m')
"//8=\n".unpack('m')
"////\n".unpack('m')
"A\n".unpack('m')
"AA\n".unpack('m')
"AA=\n".unpack('m')
"AAA\n".unpack('m')
[""].pack('m0')
["\0"].pack('m0')
["\0\0"].pack('m0')
["\0\0\0"].pack('m0')
["\377"].pack('m0')
["\377\377"].pack('m0')
["\377\377\377"].pack('m0')
"".unpack('m0')
"AA==".unpack('m0')
"AAA=".unpack('m0')
"AAAA".unpack('m0')
"/w==".unpack('m0')
"//8=".unpack('m0')
"////".unpack('m0')'u'
- UU-encoded string:
[""].pack("u")
["a"].pack("u")
["aaa"].pack("u")
"".unpack("u")
"#86)C\n".unpack("u")
Offset Directives¶ ↑
'@'
- Begin packing at the given byte offset; for packing, null fill or shrink if necessary:
[1, 2].pack("C@0C")
[1, 2].pack("C@1C")
[1, 2].pack("C@5C")
[*1..5].pack("CCCC@2C")
For unpacking, cannot to move to outside the string:
"\x01\x00\x00\x02".unpack("C@3C")
"\x00".unpack("@1C")
"\x00".unpack("@2C")'X'
- For packing, shrink for the given byte offset:
[0, 1, 2].pack("CCXC")
[0, 1, 2].pack("CCX2C")
For unpacking; rewind unpacking position for the given byte offset:
"\x00\x02".unpack("CCXC")
Cannot to move to outside the string:
[0, 1, 2].pack("CCX3C")
"\x00\x02".unpack("CX3C")'x'
- Begin packing at after the given byte offset; for packing, null fill if necessary:
[].pack("x0")
[].pack("x")
[].pack("x8")
For unpacking, cannot to move to outside the string:
"\x00\x00\x02".unpack("CxC")
"\x00\x00\x02".unpack("x3C")
"\x00\x00\x02".unpack("x4C")