Modify

Ticket #5908 (closed Bugs: fixed)

Opened 3 years ago

Last modified 2 years ago

iostreams gzip fails to handle optional extra fields in gzip header

Reported by: Travis Abbott <typedef.struct@…> Owned by: turkanis
Milestone: To Be Determined Component: iostreams
Version: Boost 1.47.0 Severity: Problem
Keywords: gzip Cc:

Description

When bit 2 (FEXTRA) is set in a gzip header's flags, the iostreams code fails to read the XLEN field before starting to read the extra comment. The code is actually there to do it, but it gets skipped. This means the code goes directly a loop like: while (--xlen != 0) with xlen still set to 0. This results in the rest of the file being slurped in by this comment reading code (or at least until xlen wraps around to 0 again, which could take awhile). I ran into this because many popular file formats in bioinformatics (BAM, tabix) are gzipped and include extra optional fields in their headers.

I've attached an example gzipped file with an optional header, a test program that should demonstrate the problem (against 1.47 and latest svn), as well as a patch that fixes it.

Attachments

sample.txt.gz Download (68 bytes) - added by Travis Abbott <typedef.struct@…> 3 years ago.
gzip file with extra comment in header
example.cpp Download (733 bytes) - added by Travis Abbott <typedef.struct@…> 3 years ago.
test case that demonstrates the problem (run ./example sample.txt.gz)
iostreams-gzip.patch Download (493 bytes) - added by Travis Abbott <typedef.struct@…> 3 years ago.
patch that fixes the issue (patch -p0 < iostreams-gzip.patch from top level of the repo)
iostreams-gzip_hdr_test.patch Download (2.4 KB) - added by typedef.struct@… 2 years ago.
second patch adding unit test to prevent future regressions.

Change History

Changed 3 years ago by Travis Abbott <typedef.struct@…>

gzip file with extra comment in header

Changed 3 years ago by Travis Abbott <typedef.struct@…>

test case that demonstrates the problem (run ./example sample.txt.gz)

Changed 3 years ago by Travis Abbott <typedef.struct@…>

patch that fixes the issue (patch -p0 < iostreams-gzip.patch from top level of the repo)

comment:1 Changed 2 years ago by typedef.struct@…

Seems like this is still present in 1.48 and current SVN. To be clear, the sample file is not empty. The desired output would be for the test program to read "hello there", but it gets nothing. You can compare to gzip -dc. The supplied patch still works with current SVN.

comment:2 Changed 2 years ago by typedef.struct@…

Also worth mentioning is that this worked in 1.40, but when the ability to support multiple compressed objects in 1 gzipped stream was introduced, this bug appeared.

Changed 2 years ago by typedef.struct@…

second patch adding unit test to prevent future regressions.

comment:3 Changed 2 years ago by typedef.struct@…

The test attached in the second patch (iostreams-gzip_hdr_test.patch) will fail with the current code, demonstrating the inability to parse certain types of RFC 1952 compliant gzip headers. Application of the first patch will cause the test to pass.

comment:4 Changed 2 years ago by turkanis

  • Status changed from new to assigned

I have applied the patches to trunk.

comment:5 Changed 2 years ago by danieljames

  • Status changed from assigned to closed
  • Resolution set to fixed

(In [77368]) Iostreams: Merge from trunk.

View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.