Modify

Ticket #1896 (closed Bugs: fixed)

Opened 6 years ago

Last modified 4 years ago

boost::iostreams::gzip_decompressor silently ignores multiple members

Reported by: Bruce MacDonald <bruce@…> Owned by: turkanis
Milestone: Boost 1.41.0 Component: iostreams
Version: Boost 1.35.0 Severity: Problem
Keywords: iostreams gzip gzip_decompressor Cc:

Description

The gzip file format RFC 1952 allows for the concatenation of multiple "members" in one gzip file. I have a provider who, unfortunately, sends me gzipped files which consist of about 8 500M (uncompressed) members. Why they are doing this I don't know, but this seems to be a long standing weirdness with gzip. The gzip command line utility simply decompresses all the members in a single stream and this is the behaviour I would expect from gzip_decompressor.

The gzip_decompressor only processes the first member and silently ignores the others. In fact, the implementation of read_footer attempts to slurp the rest of the compressed file into a string which it then discards.

In order to read the rest of the members all we have to do is read and process the actual trailer (8 bytes) and then recursively process the rest of the input (perhaps after invoking close() on ourself?).

I have attempted to write a fix for this myself but have been defeated by the complexity of the library.

Attachments

t.gz Download (111 bytes) - added by Bruce MacDonald <bruce@…> 6 years ago.
Toy example gzip file containing two strings.

Change History

Changed 6 years ago by Bruce MacDonald <bruce@…>

  • attachment t.gz Download added

Toy example gzip file containing two strings.

comment:1 Changed 6 years ago by turkanis

  • Status changed from new to assigned

Yes, the implementation is tricky because when the end of a deflated sequence is reached, the symmetric filter will usually have some unconsumed characters in the buffer, which need to be fed back through the decompressor as part of the next member.

Now that I've figured out the problem, it shouldn't be too hard to implement.

comment:2 Changed 6 years ago by turkanis

  • Status changed from assigned to closed
  • Resolution set to fixed

(In [46001]) added support for archives with multiple members; added tests for metadata and for multiple members (fixes #1896)

comment:3 Changed 5 years ago by come.raczy@…

  • Status changed from closed to reopened
  • Resolution fixed deleted
  • Milestone changed from Boost 1.36.0 to Boost 1.41.0

I doesn't look like the change 46001  https://svn.boost.org/trac/boost/changeset/46001 made it into boost_1_36_0 or any other release

Since the change seems to work, would it be possible to push it into 1.41.0?

comment:4 Changed 4 years ago by steven_watanabe

  • Status changed from reopened to closed
  • Resolution set to fixed

iostreams was fully merged to the release branch in [56830].

comment:5 Changed 4 years ago by bruce@…

Yes, but multiple members don't work in 1.43.0. An exception is thrown during footer processing on the second member as the crc's don't match.

I have fixed this locally by adding:

crc_ = 0;

in zlib_base::reset() on line 145 of libs/iostreams/src/zlib.cpp.

comment:6 Changed 4 years ago by steven_watanabe

I fixed the crc problem a few weeks ago.

View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.