Spaces:

xsigus24
/

text-generation-webui

Running

App Files Files Community

text-generation-webui / installer_files /conda /share /doc /xz /lzma-file-format.txt

xsigus24

Upload folder using huggingface_hub

1d777c4 almost 2 years ago

raw

history blame contribute delete

6.08 kB


	The .lzma File Format
	=====================

	0. Preface
	0.1. Notices and Acknowledgements
	0.2. Changes
	1. File Format
	1.1. Header
	1.1.1. Properties
	1.1.2. Dictionary Size
	1.1.3. Uncompressed Size
	1.2. LZMA Compressed Data
	2. References


	0. Preface

	This document describes the .lzma file format, which is
	sometimes also called LZMA_Alone format. It is a legacy file
	format, which is being or has been replaced by the .xz format.
	The MIME type of the .lzma format is `application/x-lzma'.

	The most commonly used software to handle .lzma files are
	LZMA SDK, LZMA Utils, 7-Zip, and XZ Utils. This document
	describes some of the differences between these implementations
	and gives hints what subset of the .lzma format is the most
	portable.


	0.1. Notices and Acknowledgements

	This file format was designed by Igor Pavlov for use in
	LZMA SDK. This document was written by Lasse Collin
	<[email protected]> using the documentation found
	from the LZMA SDK.

	This document has been put into the public domain.


	0.2. Changes

	Last modified: 2022-07-13 21:00+0300

	Compared to the previous version (2011-04-12 11:55+0300)
	the section 1.1.3 was modified to allow End of Payload Marker
	with a known Uncompressed Size.


	1. File Format

	+-+-+-+-+-+-+-+-+-+-+-+-+-+==========================+
	\| Header \| LZMA Compressed Data \|
	+-+-+-+-+-+-+-+-+-+-+-+-+-+==========================+

	The .lzma format file consist of 13-byte Header followed by
	the LZMA Compressed Data.

	Unlike the .gz, .bz2, and .xz formats, it is not possible to
	concatenate multiple .lzma files as is and expect the
	decompression tool to decode the resulting file as if it were
	a single .lzma file.

	For example, the command line tools from LZMA Utils and
	LZMA SDK silently ignore all the data after the first .lzma
	stream. In contrast, the command line tool from XZ Utils
	considers the .lzma file to be corrupt if there is data after
	the first .lzma stream.


	1.1. Header

	+------------+----+----+----+----+--+--+--+--+--+--+--+--+
	\| Properties \| Dictionary Size \| Uncompressed Size \|
	+------------+----+----+----+----+--+--+--+--+--+--+--+--+


	1.1.1. Properties

	The Properties field contains three properties. An abbreviation
	is given in parentheses, followed by the value range of the
	property. The field consists of

	1) the number of literal context bits (lc, [0, 8]);
	2) the number of literal position bits (lp, [0, 4]); and
	3) the number of position bits (pb, [0, 4]).

	The properties are encoded using the following formula:

	Properties = (pb * 5 + lp) * 9 + lc

	The following C code illustrates a straightforward way to
	decode the Properties field:

	uint8_t lc, lp, pb;
	uint8_t prop = get_lzma_properties();
	if (prop > (4 * 5 + 4) * 9 + 8)
	return LZMA_PROPERTIES_ERROR;

	pb = prop / (9 * 5);
	prop -= pb * 9 * 5;
	lp = prop / 9;
	lc = prop - lp * 9;

	XZ Utils has an additional requirement: lc + lp <= 4. Files
	which don't follow this requirement cannot be decompressed
	with XZ Utils. Usually this isn't a problem since the most
	common lc/lp/pb values are 3/0/2. It is the only lc/lp/pb
	combination that the files created by LZMA Utils can have,
	but LZMA Utils can decompress files with any lc/lp/pb.


	1.1.2. Dictionary Size

	Dictionary Size is stored as an unsigned 32-bit little endian
	integer. Any 32-bit value is possible, but for maximum
	portability, only sizes of 2^n and 2^n + 2^(n-1) should be
	used.

	LZMA Utils creates only files with dictionary size 2^n,
	16 <= n <= 25. LZMA Utils can decompress files with any
	dictionary size.

	XZ Utils creates and decompresses .lzma files only with
	dictionary sizes 2^n and 2^n + 2^(n-1). If some other
	dictionary size is specified when compressing, the value
	stored in the Dictionary Size field is a rounded up, but the
	specified value is still used in the actual compression code.


	1.1.3. Uncompressed Size

	Uncompressed Size is stored as unsigned 64-bit little endian
	integer. A special value of 0xFFFF_FFFF_FFFF_FFFF indicates
	that Uncompressed Size is unknown. End of Payload Marker (*)
	is used if Uncompressed Size is unknown. End of Payload Marker
	is allowed but rarely used if Uncompressed Size is known.
	XZ Utils 5.2.5 and older don't support .lzma files that have
	End of Payload Marker together with a known Uncompressed Size.

	XZ Utils rejects files whose Uncompressed Size field specifies
	a known size that is 256 GiB or more. This is to reject false
	positives when trying to guess if the input file is in the
	.lzma format. When Uncompressed Size is unknown, there is no
	limit for the uncompressed size of the file.

	(*) Some tools use the term End of Stream (EOS) marker
	instead of End of Payload Marker.


	1.2. LZMA Compressed Data

	Detailed description of the format of this field is out of
	scope of this document.


	2. References

	LZMA SDK - The original LZMA implementation
	http://7-zip.org/sdk.html

	7-Zip
	http://7-zip.org/

	LZMA Utils - LZMA adapted to POSIX-like systems
	http://tukaani.org/lzma/

	XZ Utils - The next generation of LZMA Utils
	http://tukaani.org/xz/

	The .xz file format - The successor of the .lzma format
	http://tukaani.org/xz/xz-file-format.txt