7-Zip Open Discussion: New Compression Method for 7-Zip called ZStandard (original) (raw)
A free file archiver for extremely high compression
New Compression Method for 7-Zip called ZStandard
Created: 2016-06-27
Updated: 2022-06-15
Hello Igor,
Zstd, short for Zstandard, is a fast lossless compression algorithm, targeting real-time compression scenarios at zlib-level and better compression ratios.
It is provided as a BSD-license package, hosted on Github: ZStd Homepage: https://github.com/Cyan4973/zstd
I am adding this new Codec for a while now and everything seems to work very well. Is it possible to include this new method in your mainline 7-Zip version ?
Here is the link to the 7-Zip ZStd Homepage of me: https://mcmilk.de/projects/7-Zip-ZStd/ ...
Thank you a lot for 7-Zip, it is fast and stable ... and with ZStd very very fast for making backups to USB 3.0 Disks with USB 3.0 Speed :-)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Now I don't add new external methods to 7z code.
And if you use 0x4F711xx id - probably you should request it or notify me.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The id is from Rich, he did the first versions of the plugin and got an id from you.
So this should be fine ;)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
*
As I remember I suggested 0x4F710xx for his codecs (LZHAM).
But you use 0x4F711xx - that is another range.
It's not problem, but I must know about this range and update methods.txt about these IDs.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello Igor,
sorry for this. I thought, that 0x4F71101 was already registered and agreed with you :(
It would be fine, if I can leave the define to the current value ;-)
These two external I know currently:
- 0x4F71001 - LZHAM (http://richg42.blogspot.de/2015/11/lzham-custom-codec-plugin-for-7-zip.html)
- 0x4F71101 - Zstd (http://www.zstd.net)
Is there any way, that ZStandard will get it into the mainline. The source is BSD licensed and Yann Collet will be happy with including v1.0, which will released when the beta time is over... this will be in some months I think.
With best regards, Tino
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, you can use 0x4F71101, and I'll update methods.txt list with 0x4F711xx range.
Now I don't plan to include any new external codec to 7-Zip.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Возможно, имеет смысл добавить дополнительное api в интерфейс 7z.dll, которое позволит разработчикам, использующим 7z.dll, использовать их собственные кодеки без перекомпиляции 7z.dll? Что то вроде новой экспортируемой функции RegisterCodecFactory(DWORD ACodecID, ICodecFactory AFactory)? И объект ICodecFactory будет по запросу 7z.dll создавать нужный кодек. У этого решения есть дополнительная фишка - кодеки можно будет писать практически на чем угодно, начиная с C, заканчивая Питоном :)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
*
7-Zip supports external codecs for 7z.dll. You just need to place DLL to "Codecs" folder. It works for extraction / compression.
But probably they want some additional parameter support in 7z.dll and GUI. So they recompile it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
*
Google translation:
> Perhaps it makes sense to add additional api in 7z.dll interface that enables developers using 7z.dll, use their own codecs without recompiling 7z.dll? Something like a new exported function RegisterCodecFactory (DWORD ACodecID, ICodecFactory AFactory)? And ICodecFactory object will 7z.dll request to create the required codec. This solution has an additional feature - the codecs can write almost anything, since with C, Python ending :)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello Igor,
thanks a lot for adding the ID to the methods.txt. I will try to keep the 7-Zip Zstd version up to date, cause the speed of Zstandard @ around 100 Mib/s is needed for my my GPL USB-Backup program.
with best regards, Tino
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Note that if new version of your codec decoder is changed (and is not compatible with old data), you must change ID.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The Decoder will handle older versions correctly, it is saved within the ZStd Stream.
No need to change the id for every new release.
I started using ZStd for the Backup program since version 0.5 .. and all versions since (0.5.x, 0.6.x, 0.7.x) then can be decoded with that 7za.dll, which is currently around 450KB statically compiled incl. all these methods: ppmd, deflate, bzip2, zstd and lzma.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If algorithms are different, then we can need different ID ranges for these codecs in some cases. So it can be enhanced in future with new versions.
So you must write full details for each new method.
Author
Date of creation
If new versions are possible in future, what numbers you will use in these cases.
For example, you add ZStandard, but you are not author of original codec.
So we can have 2 ways -
11 01 - is ID for ZStandard
or
11 01 - is ID for ZStandard from Reichardt
What about any new future versions of ZStandard?
And you must write how your version is related to original ZStandard code. For example, is it possible to write another implementation that still will be ZStandard?
Probably
- you support some subset (maybe full) of features of ZStandard.
- you selected some way to encode ZStandard properies to 7-Zip properties.
So actually it's not ZStandard, but it's something like ZStandard-Reichardt.
And you must describe in details all these additions to original ZStandard code.
Same things for LZ4 / LZ5.
Last edit: Igor Pavlov 2016-12-27
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello Igor,
thanks for your fast reply.
The algorithms are different, yes. Could you assign new ID ranges for LZ4 and LZ5?
The authors of the different codecs are as follows:
LZ4 and ZStandard: Yann Collet
LZ5: Przemyslaw Skibinski
The original streams of these 3 codecs are directly wrapped with the open 7-Zip container format. They use an 5 bytes header for defining version numbers and compression level information for showing them in the 7zFM GUI.
I am very sorry... I forgot to mention, that I added direct Lzip, LZ4, LZ5 and ZStandard Archive support. So using tar-files from these 4 codecs is also possible.... like this:
7z x -so test.tar.zstd | 7z l -si -ttar
REM -> show contents of zstd compressed tar archiv test.tar.zstd
7z x -so test.tar.lz | 7z l -si -ttar
REM -> show contents of lzip compressed tar archiv test.tar.lz
I currently added these Handler GUID's to my GUID.txt file:
0E Zstd
0F Lz4
10 Lz5
C6 Lzip
Could you assign them also, Or give me some other ID's I should use for that ?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
*
So try to describe information about all these codecs in txt file.
1) what exact code is used (exact version information and version history)
2) what modes of original code are supported.
3) exact description of encoding headers.
Memory requirements for different values of property ranges.
4) what expectation of possible new versions of these codecs?
That information can help to select good ID range.
If these codecs can be used as external archive format, then describe it also. Does it uses additional header?
About ID for archive format. It's not so important as codecs id.
We can change archive-ID at any time.
Now 7-Zip supports 1-byte ID in macros in RegisterArc.h.
But full ID can be longer. So you can try to change macro source code.
Or use IDs from 50-5F range.
Last edit: Igor Pavlov 2016-12-28
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
*
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
*
Last edit: Tino Reichardt 2016-12-28
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
*
Please update it about version.
Byte _ver_major;
Byte _ver_minor;
Byte _level;
Byte _reserved[2];
What is ver? Is it decoding or encoding version?
How decoder must treat that header?
Does decoder just ignore all these fields (5 bytes)?
Are all zstd streams are compatible for all versions?
And why we need all these properties?
What do you show in "Method" column to user?
- threading is supported through skippable frame id 0x184D2A50U
What does it mean?
Is it your addition to ZStandard?
Or it's original zstd feature?
- the codec is used as archiv handler also, see ZstdHandler.cpp
- when compiled with ZSTD_LEGACY_SUPPORT, then support is increased to these
addtional version numbers of zstd: v0.1 up v0.7
Write also how you support old versions as codec (in 7z), if you support them.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
*
> What is ver? Is it decoding or encoding version?
The version, which was used for compression.
> How decoder must treat that header?
This header is informational only. It's not used for decompressing the data.
> Does decoder just ignore all these fields (5 bytes)?
Yes.
> Are all zstd streams are compatible for all versions?
All zstd versions <= 0.8 are considered legacy and are only supported when ZSTD_LEGACY_SUPPORT is defined at compile time. 7-Zip zstd has it enabled and can decompress all old versions....
ZStandard reached version 1.0 in august 2016... the format was then considered stable and will not be changed in the future.
> And why we need all these properties?
They are just generic information, to be used for
> What do you show in "Method" column to user?
The zstd version and the level which was used for compression of that file.
>> * threading is supported through skippable frame id 0x184D2A50U
>
> What does it mean?
> Is it your addition to ZStandard?
> Or it's original zstd feature?
It has a special meaning for being able to be decompressed in a multithreaded way. This is currently an optional addition to the zstd stream. It's ignored by older versions and remains compatible in this way. Yann currently adds support for multithreading in the same way, so the upcoming release 1.2.0 will have it also ;-) But the feature will remain optional I think... so this is handled by skippable frames withing zstd.
See the zstdmt branch for more details about it: https://github.com/facebook/zstd/tree/zstdmt
And also the zstd compression format description: https://github.com/facebook/zstd/blob/zstdmt/doc/zstd_compression_format.md
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There is ZStandard, and there is 7z-ZStandard.
If you write ZStandard stream without any properties, as, for example, bzip2 in 7z, then there is no any question.
But you have created new substance (7z-ZStandard with 5 bytes properties).
So every thing about these 5 bytes properties is now YOUR problem.
So you must describe any aspect of these 5 bytes.
For example, you must write that the Decoder MUST ignore all fields of these properties and try to decode stream with default ZStandard code.
Is it really so?
Even if version contains 0.1?
I'm not sure about 0x184D2A50 things still. I don't know in what specification it must be placed.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:Is it really so?
Even if version contains 0.1?
Yes. ZStandard detects the version itself from the zstd header, which follows the 5 Byte 7z Container header.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
*
What do you mean?
"5 Byte 7z Container header" probably is stored in 7z header at the end of archive.
and "zstd header" probably is stored in data stream at the start of 7z archive at offset 32 from start of 7z file.
So we can't say that "zstd header, which follows the 5 Byte 7z Container header". These headers are stored in different places.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
*
Yes, you are right. The informational 5 Byte extra header is stored in the end of the archive.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Log in to post a comment.
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.