This document describes the file format that WinZip uses to compress files using methods not described in the Zip file appnote.txt specification. This includes the following:
Method | WinZip Version | Release Date |
PPMd | 10 Beta | August 2005 |
WavPack | 11 Beta | October 2006 |
JPEG | 12 | September 2008 |
XZ | 18 | September 2013 |
MP3 | 21 | September 2016 |
Zstandard | 24 | July 2019 |
Reference | 25 | September 2020 |
In WinZip 12.1, released in May of 2009, the Zipx file was introduced. The Zipx file is a Zip file that uses any of the aforementioned compression methods or the LZMA or bzip2 compression methods as documented in the Zip file appnote.txt specification.
Without compromising the basic Zip file format, WinZip Computing extended the format specification to support new compression methods. Additionally, we are providing information about no-cost third-party source for these methods, most of which are used by WinZip. We believe that using the free compression code and this specification will make it easy for all developers to add compatible compression to their Zip file utilities.
From time to time, we may update the information provided here, for example, to document any changes to the file formats or to add additional notes or implementation tips.
This document is not a tutorial on compression or Zip file structure. While we have attempted to provide the necessary details to implement the compression methods within Zip files, developers and other interested third parties will need to understand basic compression concepts and have familiarity with the overall Zip file format.
WinZip Computing makes no warranties regarding the information provided in this document. In particular, WinZip Computing does not represent or warrant that the information provided here is free from errors or is suitable for any particular use or that the file formats described here will be supported in future versions of WinZip. You should test and validate all code and techniques in accordance with good programming practice.
PPMd is an open-source data compression algorithm developed by Dmitry Shkarin. WinZip uses Version I, revision 1 of the algorithm. Information and source code for PPMd I rev. 1 can be found on the internet at http://www.compression.ru/ds/.
WinZip appends to the start of the PPMd compressed data in a Zip file a single two byte WORD of data containing the following three fields:
These fields are packed into the two byte WORD (stored in Intel low-byte/high-byte order) at the beginning of a compressed file as indicated in the following C++ code snippet:
For files compressed with PPMd WinZip sets the version needed to extract and version made by fields in the local and central headers to the same values it would use if the files had been compressed with the Deflate algorithm.
WavPack is an open-source lossless audio compression format developed by David Bryant. Information and source code for WavPack 4.32, which is backward and forward compatible with the version of WavPack used in WinZip 11.0 or later, can be found on the internet at http://www.wavpack.com.
WinZip stores each WavPack compressed .WAV file as output from the WavPack routines in the Zip file immediately after the end of the local header. It sets the compression method to 97 in the local and central headers. WinZip also sets the version needed to extract and version made by fields in the local and central headers to the same values it would use if the files had been compressed with the Deflate algorithm.
NOTE: it is necessary to compress the entire sequence of bytes used to store a sample, not just the bits used to represent the sample itself. For example, 12-bit samples require 2 bytes (16-bits) in the .WAV file format. If you tell WavPack that the sample size is only 12-bits, it will ignore the 4 unused bits during compression and set them to 0 on extract, regardless of what they were set to in the original file. Instead, telling WavPack that the sample size is the full 16-bits instead of the actual 12-bits is necessary to insure that the file extracts to one that is bit for bit identical to the original
Compressed Jpeg is a lossless compression format for JPEG files. Information for compressed Jpeg 1.0, used in WinZip 12.0 or later, can be found on the internet at https://www.winzip.com/wz-jpg-comp.pdf.
WinZip stores each compressed JPEG file as output from the compressed JPEG routine in the Zip file immediately after the end of the local header. It sets the compression method to 96 in the local and central headers. WinZip also sets the version needed to extract and version made by fields in the local and central headers to the same values it would use if the files had been compressed with the Deflate algorithm.
XZ is an open-source data compression format co-developed by Igor Pavlov and Lasse Collin. It incorporates the LZMA2 compression algorithm combined with optional pre-processing filters for greater compression. Information and source code can be found online at http://tukaani.org/xz/.
WinZip supports the complete XZ 1.0.4 specification with the following two restrictions:
WinZip creates a stream block with a check type of NONE. Data integrity is handled via the standard CRC-32 mechanism described in the Zip format specification. All XZ check types including CRC-64 and SHA-256 are supported for applications that implement them. Multiple blocks within a stream are supported and must be zero-padded as described in the XZ specification.
For files compressed with XZ, WinZip sets the version needed to extract and version made by fields in the local and central headers to the same values it would use if the files had been compressed with the Deflate algorithm.
MP3 audio file compression is an open-source algorithm developed by Matthias Stirner for his compression program, PackMP3. It is licensed under LGPL v3. Information and source code can be found on GitHub.
WinZip stores the compressed data in the Zip file immediately after the end of the local header. It sets the compression method to 94 in the local and central headers. WinZip also sets the version needed to extract and version made by fields in the local and central headers to the same values it would use if the files had been compressed with the Deflate algorithm.
Zstandard (zstd) compression is an open-source data compression algorithm developed by Yann Collet at Facebook. It is dual-licensed under BSD + GPLv2. Information and source code can be found on GitHub.
WinZip stores the compressed data in the Zip file immediately after the end of the local header. It sets the compression method to 93 in the local and central headers. WinZip also sets the version needed to extract and version made by fields in the local and central headers to the same values it would use if the files had been compressed with the Deflate algorithm.
The Reference method indicates the file is compressed as a reference. References are used to link duplicate files within the Zip file. Duplicate files are defined as those that match identically in both content and size but may have different names, dates, or other attributes or exist in different folders.
The compressed data for a reference consists of the SHA-1 hash of original file data. A SHA-1 hash is 160-bits; thus, all file references have a compressed size of 20 bytes. Accordingly, it is recommended that files below a minimum size threshold (WinZip uses 4k) not be handled as duplicates.
In addition to the compressed data, the Reference method requires an extra data field containing identifying information to be attached to the central header, and optionally, the local header. A similar extra data field is also attached to the central/local headers of the file being referenced (the referent). This establishes the link between the two.
The extra data header ID for a Reference is 0x9903. The fields are all stored in little-endian format except for the UUID, which is in big-endian format. The extra data field has a length of 24, 20 data bytes plus two bytes for the header ID and two bytes for the data size.
Offset | Size (bytes) | Content |
0 | 2 | Extra field header ID (0x9903) |
2 | 2 | Extra data size |
4 | 4 | 32-bit CRC |
8 | 16 | UUID |
The purpose of the CRC is to verify that the extra data is attached to the correct header when unzipping. It might be incorrect if the Zip file was changed by an application that doesn't understand or support duplicate file references.
The CRC is calculated from the following fields in the following order:
The compression method from the header, widened to a 32-bit unsigned integer
The timestamp from the header, widened to a 32-bit unsigned integer
The CRC-32 value from the header, which is already a 32-bit unsigned integer
Finally, the 16 data bytes comprising the UUID
The UUID is created as per RFC 4122. In particular, all fields are in big-endian or network byte order. WinZip uses a Variant 1, Version 4 (randomly generated) UUID.
WinZip sets the version needed to extract and version made by fields in the local and central headers to the same values it would use if the files had been compressed with the Deflate algorithm.
When unzipping a Reference (method 92), the following procedure should be used:
Verify the extra data attached to the header is correct (validate the 32-bit CRC)
Search the central directory for the referent. A referent is identified by an entry that is not compressed as a reference (method 92) and has an extra data field (ID 0x9903) with a matching UUID.
Verify the extra data attached to the referent header is correct (validate 32-bit CRC)
Decompress the referent data as you normally would but for the reference
Verify the SHA-1 hash (compressed data of the reference) is correct for the decompressed data of the referent
One final note on encryption. Duplicate file references should not be used in combination with encryption due to the inherent conflicts. WinZip does not support this.
All credit and our thanks go to:
Document version: 3.2
Last modified: March, 2021
Copyright© 2003- Corel Corporation
All Rights Reserved