What is PEA file format, features, specs (original) (raw)
What are PEA files, format features and specifications
PEA file format specifications version 1.6
PEA file extension
Pea (.pea file extension), acronym for Pack, Encrypt, Authenticate, designs a file format focused on data security, aiming to provide archiving,compression and multi volume file split (spanning) feature in a single passage, along with flexible schemes of optional checksum / hash integrity checkand authenticated file encryption (AES in EAX or HMAC mode, alternatively Twofish and Serpent in EAX mode); PEA file format specifications are released under public domain.
PEA specifications document
PEA file format specifications and implementation notes (pdf)
PEA file format compression specs
Pea compression is optional, at current level of implementation are defined only following levels: PCOMPRESS0 (store only, no compression), and PCOMPRESS1..3 based on deflate (reference zlib's compres/uncompres algorithm code), respectively at compression level 3, 6 and 9.
PEA file format encryption and hashing specs
PEA format security model acts at 3 levels: objects (input files and folders sent to .pea archive), volumes (output archive file that can be spanned to user defined size) and streams (the actual output data stream that is formed by multiple input files and can be written to multiple output volumes); each one of those levels can be omitted as needed by the user.
- Object level integrity checking is performed to detect errors with object level granularity on raw input data and all associated data (name, size, attributes, date-time);
- Current implementation allows: Checksum (Adler32, CRC32, CRC64), Hash (MD5, SHA1, RIPEMD-160, SHA256, SHA512, Whirlpool, SHA3-256, SHA3-512, BLAKE2S, BLAKE2B)
- Volume level integrity check is communication oriented and allow to discard single corrupted volumes in order to minimize, in case of error, the retransmission overhead;
- Current implementation allows same Checksum and Hash algorithms featured by Object level check
- Stream level check offers wide choice of algorithms up to authenticated encryption, protecting privacy and authenticity of a group of objects sharing same security needs, including tags generated by object level checks;
- Current implementation allows same Checksum and Hash algorithms featured by Object and Volume levels, plus Authenticated encryption schemes: HMAC mode AES128, EAX mode AES128, AES256, Serpent128, Serpent256, Twofish128, Twofish 256, triple cascade encrypttion combining AES, Twofish, and Serpent each 256 bit in EAX mode.
PEA file format volume spanning specs
Arbitrarily sized volume spanning allows the archive to be splitted in volumes of arbitrary size, with the only constrain of volumes being at least 10 byte bigger than volume control tag to allow passing (through archive's header) minimum needed information to the extraction application.
PEA specs revisions
PEA file format standard, as defined in version 1 revision 5 specification, can store a single stream containing unlimited objects, each up to 2^64 byte in size; current Pea executable supports 1.5 file format specifications (practically, archives are memory and filesystem-limited rather than format limited) and is backward compatible with previous revisions of the format.
PEA 2.0 file format specifications extend the concepts behind PEA 1.x file format and can store an unlimited number of stream, but the format is not actually supported by current Pea archiving utility.
PEA format specifications table: max file size, compression, security...
Here, a brief table of features and limitations applying to file format and to current implementation:
Feature | PEA file format | Current utility implementation |
---|---|---|
Archive | ||
Maximum PEA archive size | PEA archive maximum size is unlimited, nohigher limit is set by the format design for maximum archive size, only filesystem size limitations applies | Maximum PEA archive size is limited to 16 YB (yottabyte), up to 999999 volumes of 2^64-1 byte each.Please note under currently understanding using 128 bit block encryption it would be safe not to encrypt more than 2^64 byte with same key, better staying one or more orders of magnitude below. |
Stream number | 1.3: single stream; 2.0 unlimited number of streams; | Single stream (1.3 file format) |
Output | ||
Security | Optional Authenticated Encryption, at stream level only. HMAC mode: AES128, EAX mode: AES 128 or 256bit, Serpent 128 / 256, Twofish 128 / 256, triple cascade encryption: AES+Twofish+Serpent, Twofish+Serpent+AES, Serpent+AES+Twofish each 256 bit in EAX mode | |
Integrity check | AE tag (see security section) or hash or checksum at stream level, plus hash or checksum for input objects, and for output volumes. Currently supported: Adler32, CRC32, CRC64 checksum algorithms; MD5, SHA1, RIPEMD-160, SHA-2 and SHA-3 families, and Whirlpool hash algorithms. | |
Error correction | No scheme featured at current level of development | |
Communication recovery | Independent volume control check allow to identify corrupted volumes (first volume may be needed to know volume check algorithm) | No specific tool developed; volume check is done during extraction and then, allowing to repeat download only of corrupted volumes |
Data recovery | Stream control tags allow to recognize correct streams, if better granularity is needed object control tags allow to recognize correct objects; input object names and POD trigger allow to identify objects and stream between the archive data; | No specific tool developed to try error resistant data extraction, however object check errors are reported to identify corrupted and non corrupted data if the extraction is successful |
Support for multi volume output | Native, requires a single pass. Raw file spanning compatible with Unix split command, and applications like HJSplit and 7-Zip. | |
Volume number | 1..unlimited | 1..999999 (6 digit counter string in output file name, after .pea file extension) |
Volume size | Volume tag size +1.. unlimited; first volume must contain at least 10 byte of data to allow parsing of the archive header, to allow unpacking application to calculate volume tag size | Volume tag size +1.. 2^64-1 (qword variable) ; first volume must contain at least 10 byte of data |
Compression | Native, requires single pass; schemes: PCOMPRESS0: no compression; PCOMPRESS1..3 based on deflate using zlib's compres/uncompres, level 3, 6 and 9 respectively | |
Solid archive | Not implemented compression modes featuring the possibility of creating solid archive | |
Input | ||
Input types | 1.3: files and dirs; 2.0: files, dirs, metadata stored as messages triggers | Files and dirs (1.3) |
Maximum number of files/ objects in a PEA archive | 1..unlimited, theoretically a PEA archive can accept an unlimied number of input files | Host system memory limited (input object list is stored in a dynamic array of strings) |
Maximum size of input file for PEA archive | 0..2^64-1 16 EB maximum size for each input file | 0..2^64-1 16 EB maximum size, likely limited by underlying filesystem technology |
Input object qualified name size (size 0 mean that archive object is a trigger, no input object mapped to the archive object) | 1..2^16-1 64 KB of characters under any encoding | 1..32K (exceeding needs, longer values are considered errors) |
Metadata | Objects attributes and last modification time, optionally comments and any kind of meta content using messages | Save object attributes and object last modification time. Restore only object attributes (on Microsoft Windows), nothing on *x |
Triple cascaded encryption: AES, Twofish, Serpent each 256 bit in EAX mode
PEA supports multiple chained encryption, cascading AES, Twofish, and Sepent, 256 bit in EAX mode
Each cipher is separately keyed through PBKDF2, scrypt (default), or both
KFD options:
- with PBKDF2 key schedule of each cipher is based on a different hash primitive which is run for a different number of iterations: Whirlpool x 25000 for AES, SHA512 x 50000 for Twofish, SHA3-512 x 75000 for Serpent (Whirlpool is significantly slower than SHA512 that is slower than SHA3-512). PEA format revision 1.4 introduced variable, user defined number of KDF rounds for the triple cascaded encryption, up to 25 million rounds for each of the 3 algorithms - also, please note rounds are based on 512 bit hash primitives, which are more resources intensive than 256 bit counterparts.
- with scrypt KDF the key schedule work load not only impacts on the CPU but also on memory, in order to increase resilinece to dictionary attacks. Requiring 64 MB up to 1 GB RAM (depending on the KDF workload option) for each instance severely increases the requisites to build an hardware setup for brute forcing the password, making it difficult to implement such a machine with ASIC or FPGA.
- Hybrid KDF (introduced in 1.6 revision) uses scrypt for AES (as specified in scrypt section) and for Twofish (with half the N parameter and doubling the r parameter, same p parameter), and uses PBKDF2 for Serpent (75,000 iterations, plus up to 25M additional iterations to increase the work load as specified in PBKDF2 section).
KDF work flow can be increased as specified in the tenth byte of the header
1 use 128 MB RAM for scrypt KDFs, and +200K iterations for the PBKDF2 KDF
2 use 256 MB RAM for scrypt KDFs, and +500K iterations for the PBKDF2 KDF
3 use 512 MB RAM for scrypt KDFs, and +1M iterations for the PBKDF2 KDF
4 use 1 GB RAM for scrypt KDFs, and +2M iterations for the PBKDF2 KDF
5 use 1 GB RAM, with p = 2 for scrypt KDFs, and +5M iterations for the PBKDF2 KDF
6 use 1 GB RAM, with p = 4 for scrypt KDFs, and +10M iterations for the PBKDF2 KDF
7 use 1 GB RAM, with p = 8 for scrypt KDFs, and +25M iterations for the PBKDF2 KDF
key schedule of each cipher is provided a separate 96 byte pseudorandom salt
password is modified when provided as input for key schedule of each cipher; modification are trivial xor with non secret values and counters, with the sole purpose to initialize the key derivation with different values and be a further factor (alongside different salt, and different hash / iteration number) to guarantee keys are a statistically independent
Password verification tag is the xor of the 3 password verification tags of each encryption function, and is written / verified after all 3 key initialization functions are completed before verification
Each block between password verification tag and stream authentication tag is encrypted with all 3 ciphers
A 1..128 bytes block of random data is added after password verification tag in order to mask exact archive size (this is the first block to be encrypted/decrypted)
Each cipher generate its own 128 bit sized stream authentication tag, tags are concatenated and hashed with SHA3-384; the SHA3-384 value is checked for verification, this requires all the 3 tags to match to expected values and does not allow ciphers to be authenticated separately
Multiple encryption, if correctly implemented, is meant under current understandings to:
- Provide a larger keyspace than each single cipher, but smaller than the sum of the lengths of keyspaces due possibility of meet-in-the-middle type of attacks. However, such large keyspace may be overkilling even in event of significant quantum computing advancements: Grover's quantum algorithm which is the best-possible known attack for NP-complete problems provides a quadratic speed-up over a classic computing. Under those assumptions, as a role of thumb, a quantum computer will be able to brute force a 256 bit keyspace not faster than a classic machine can brute force a 128 bit keyspace, which is currently considered safe by a wide margin.
- provide a security margin even in case all but one of the algorithms used as cipher (or key schedule hash) is compromised by a breakthrough in cryptanalysis, which seems unlike due the amount of theoretical work and real life testing behind mainstream primitives available today. Drawbacks of multiple encryption are:
- The inherent added complexity makes multiple encryption more prone to implementation errors
- Performing multiple algorithms requires more computing power and consequently reduces performances. Performance penalty for cascaded encryption may be decisive for some classes of applications, but in case of file archiving as for PeaZip, where many other operations (potentially far slower as read / write to disk) are involved, the performance hit is quite reasonable:
- Test machine: notebook with Intel Core i7-8565U CPU, 4 physical cores with hyper-threading (8 logical cores), 8 GB RAM, 512 GB PCIe NVMe SSD, NTFS filesystem
- Benchmark creation of PEA archive from 100MB input:
- 7 seconds archive creation, 3 seconds archive extraction with AES256 EAX, deflate compression, CRC32 and SHA3-256 integrity checks
- 8 seconds archive creation, 4 seconds archive extraction with Serpent 256 EAX, deflate compression, CRC32 and SHA3-256 integrity checks (slower than AES and Twofish)
- 10 seconds archive creation, 6 seconds archive extraction with AES+Twofish+Serpent 256 EAX, deflate compression, CRC32 and SHA3-256 integrity checks – the purposely slower key schedule, employed at startup for multiple encryption modes, also account for the extra time
For a more complete explanation and discussion of the pea format specifications please see the documentation about Pea archive formatdesign (.pdf).
Use cases for PEA archive format
| | | | | |
| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | |
| | .PEA Author: Giorgio Tani, 2006no maximum number of input filesno maximum archive size2^64 bytes max size for each input file | | SPEEDPea format features average speed, due lightweight, quick Deflate-based compression algorithm, and efficient encryption and hashing algorithms. | |
| | | | | |
| | COMPRESSION RATIO
Pea format features moderate compression, due to fast Deflate-based compression, comparable with compression ratios of GZ and classic ZIP format, making it suitable to archive or backup large quantities of data in reasonable time. | | ADVANCED OPTIONS
Pea format lacks some features of competing formats, but offers advanced security focused characteristics, as AES-based authenticated encryption (can be optionally be replaced by Serpent or Twofish EAX mode authenticated encryption), and triple cascade encryption.. | |
| | | | | |
Synopsis: Pea file format specifications. What .pea file extension stands for? What are pea file format features in terms of compression ratio, compression speed, advanced authenticated encryption options?
Topics: pea file extension specs, pea authenticated encryption
PeaZip > FAQ > What is PEA file format, features, specs