Opened 10 years ago

Closed 7 years ago

Last modified 7 years ago

#1896 closed Bugs (fixed)

boost::iostreams::gzip_decompressor silently ignores multiple members

Reported by: Bruce MacDonald <bruce@…> Owned by: Jonathan Turkanis
Milestone: Boost 1.41.0 Component: iostreams
Version: Boost 1.35.0 Severity: Problem
Keywords: iostreams gzip gzip_decompressor Cc:


The gzip file format RFC 1952 allows for the concatenation of multiple "members" in one gzip file. I have a provider who, unfortunately, sends me gzipped files which consist of about 8 500M (uncompressed) members. Why they are doing this I don't know, but this seems to be a long standing weirdness with gzip. The gzip command line utility simply decompresses all the members in a single stream and this is the behaviour I would expect from gzip_decompressor.

The gzip_decompressor only processes the first member and silently ignores the others. In fact, the implementation of read_footer attempts to slurp the rest of the compressed file into a string which it then discards.

In order to read the rest of the members all we have to do is read and process the actual trailer (8 bytes) and then recursively process the rest of the input (perhaps after invoking close() on ourself?).

I have attempted to write a fix for this myself but have been defeated by the complexity of the library.

Attachments (1)

t.gz (111 bytes) - added by Bruce MacDonald <bruce@…> 10 years ago.
Toy example gzip file containing two strings.

Download all attachments as: .zip

Change History (7)

Changed 10 years ago by Bruce MacDonald <bruce@…>

Attachment: t.gz added

Toy example gzip file containing two strings.

comment:1 Changed 9 years ago by Jonathan Turkanis

Status: newassigned

Yes, the implementation is tricky because when the end of a deflated sequence is reached, the symmetric filter will usually have some unconsumed characters in the buffer, which need to be fed back through the decompressor as part of the next member.

Now that I've figured out the problem, it shouldn't be too hard to implement.

comment:2 Changed 9 years ago by Jonathan Turkanis

Resolution: fixed
Status: assignedclosed

(In [46001]) added support for archives with multiple members; added tests for metadata and for multiple members (fixes #1896)

comment:3 Changed 8 years ago by come.raczy@…

Milestone: Boost 1.36.0Boost 1.41.0
Resolution: fixed
Status: closedreopened

I doesn't look like the change 46001 made it into boost_1_36_0 or any other release

Since the change seems to work, would it be possible to push it into 1.41.0?

comment:4 Changed 7 years ago by Steven Watanabe

Resolution: fixed
Status: reopenedclosed

iostreams was fully merged to the release branch in [56830].

comment:5 Changed 7 years ago by bruce@…

Yes, but multiple members don't work in 1.43.0. An exception is thrown during footer processing on the second member as the crc's don't match.

I have fixed this locally by adding:

crc_ = 0;

in zlib_base::reset() on line 145 of libs/iostreams/src/zlib.cpp.

comment:6 Changed 7 years ago by Steven Watanabe

I fixed the crc problem a few weeks ago.

Modify Ticket

Change Properties
Set your email in Preferences
as closed The owner will remain Jonathan Turkanis.
The resolution will be deleted.

Add Comment

E-mail address and name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.