mirror of https://github.com/google/brotli
Spec clarifications for Section 8.
Based on Mark Adler's review comments.
This commit is contained in:
parent
8618383b9b
commit
dcdc68e68b
|
@ -1106,7 +1106,7 @@ The number of static dictionary words for a given length is:
|
||||||
|
|
||||||
.nf
|
.nf
|
||||||
NWORDS[length] = 0 (if length < 4)
|
NWORDS[length] = 0 (if length < 4)
|
||||||
NWORDS[length] = (1 << NDBITS[lengths]) (if length >= 4)
|
NWORDS[length] = (1 << NDBITS[length]) (if length >= 4)
|
||||||
.fi
|
.fi
|
||||||
|
|
||||||
DOFFSET and DICTSIZE are defined by the following recursion:
|
DOFFSET and DICTSIZE are defined by the following recursion:
|
||||||
|
@ -1122,7 +1122,7 @@ index is:
|
||||||
|
|
||||||
offset(length, index) = DOFFSET[length] + index * length
|
offset(length, index) = DOFFSET[length] + index * length
|
||||||
|
|
||||||
Each static dictionary word has NTRANSFORMS different forms, given by
|
Each static dictionary word has 121 different forms, given by
|
||||||
applying a word transformation to a base word in the DICT array. The
|
applying a word transformation to a base word in the DICT array. The
|
||||||
list of word transformations is given in Appendix B. The static
|
list of word transformations is given in Appendix B. The static
|
||||||
dictionary word for a <length, distance> pair can be reconstructed as
|
dictionary word for a <length, distance> pair can be reconstructed as
|
||||||
|
@ -1131,21 +1131,21 @@ follows:
|
||||||
.nf
|
.nf
|
||||||
word_id = distance - (max allowed distance + 1)
|
word_id = distance - (max allowed distance + 1)
|
||||||
index = word_id % NWORDS[length]
|
index = word_id % NWORDS[length]
|
||||||
base_word = DICT[offset(length, index)..offset(length, index+1))
|
base_word = DICT[offset(length, index)..offset(length, index+1)-1]
|
||||||
transform_id = word_id >> NDBITS[length]
|
transform_id = word_id >> NDBITS[length]
|
||||||
.fi
|
.fi
|
||||||
|
|
||||||
The string copied to the output stream is computed by applying the
|
The string copied to the output stream is computed by applying the
|
||||||
transformation to the base dictionary word. If transform_id is
|
transformation to the base dictionary word. If transform_id is
|
||||||
greater than NTRANSFORMS - 1 or length is greater than 24, the
|
greater than 120 or length is greater than 24, the
|
||||||
compressed data set is invalid.
|
compressed data set is invalid.
|
||||||
|
|
||||||
Each word transformation has the follwing form:
|
Each word transformation has the following form:
|
||||||
|
|
||||||
transform_i(word) = prefix_i + T_i(word) + suffix_i
|
transform_i(word) = prefix_i + T_i(word) + suffix_i
|
||||||
|
|
||||||
where the _i subscript denotes the transform_id above. Each T_i
|
where the _i subscript denotes the transform_id above. Each T_i
|
||||||
is one of the following 20 elementary transforms:
|
is one of the following 21 elementary transforms:
|
||||||
|
|
||||||
.nf
|
.nf
|
||||||
Identity, OmitLast1, ..., OmitLast9, UppercaseFirst, UppercaseAll,
|
Identity, OmitLast1, ..., OmitLast9, UppercaseFirst, UppercaseAll,
|
||||||
|
@ -1169,7 +1169,7 @@ The form of these elementary transforms are as follows:
|
||||||
.fi
|
.fi
|
||||||
|
|
||||||
For the purposes of UppercaseAll, word is parsed into UTF-8
|
For the purposes of UppercaseAll, word is parsed into UTF-8
|
||||||
characters an coverted to upper-case by taking 1 - 3 bytes at a time,
|
characters and converted to upper-case by taking 1 - 3 bytes at a time,
|
||||||
using the algorithm below:
|
using the algorithm below:
|
||||||
|
|
||||||
.nf
|
.nf
|
||||||
|
@ -1196,6 +1196,9 @@ executed only once.
|
||||||
|
|
||||||
Appendix B. contains the list of transformations by specifying the
|
Appendix B. contains the list of transformations by specifying the
|
||||||
prefix, elementary transform and suffix components of each of them.
|
prefix, elementary transform and suffix components of each of them.
|
||||||
|
Note that the OmitFirst8 elementary transform is not used in the list
|
||||||
|
of transformations. The strings in Appendix B. are in C string format
|
||||||
|
with respect to escape (backslash) characters.
|
||||||
|
|
||||||
.ti 0
|
.ti 0
|
||||||
9. Compressed data format
|
9. Compressed data format
|
||||||
|
|
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue