Modify

Ticket #3853 (closed Bugs: fixed)

Opened 4 years ago

Last modified 4 years ago

bzip2_decompressor reads only one block of a pbzip2-created files

Reported by: Darko Veberic <darko.veberic@…> Owned by: turkanis
Milestone: Boost 1.42.0 Component: iostreams
Version: Boost 1.41.0 Severity: Problem
Keywords: Cc:

Description

reading bz2 files created with parallel version "pbzip2" stops exactly after correctly decompressing the first bzip block (900k). this same bz2 file is correctly decompressed by the non-parallel bunzip2 version so, imho the bzip2 filter should do the same or throw an exception if this is some kind of unrecognized format.

i've been able to reproduce this bug with boost 1.35.1, 1.36.0, 1.40.0, 1.41.0 so i guess it is present in all versions. the bug can be seen by running the attached programs:

c++ spit.cc -o spit c++ test_bzip2_decompressor.cc -o test_bzip2_decompressor ./spit >test.txt bzip2 -c test.txt >test.txt.bz2 pbzip2 -c test.txt >test.txt.parallel.bz2

running

./test_bzip2_decompressor test.txt.bz2 | wc

outputs correct 500 lines:

500 250000 995395

while

./test_bzip2_decompressor test.txt.parallel.bz2 | wc

finds only 452 lines and only 900k in total:

452 226164 900000

Attachments

spit.cc Download (227 bytes) - added by Darko Veberic <darko.veberic@…> 4 years ago.
creates test file
test_bzip2_decompressor.cc Download (519 bytes) - added by Darko Veberic <darko.veberic@…> 4 years ago.
decompressor using boost iostream filter

Change History

Changed 4 years ago by Darko Veberic <darko.veberic@…>

creates test file

Changed 4 years ago by Darko Veberic <darko.veberic@…>

decompressor using boost iostream filter

comment:1 Changed 4 years ago by Darko Veberic <darko.veberic@…>

sorry, the line in the original bug description got concatenated. it should read:

c++ spit.cc -o spit

c++ test_bzip2_decompressor.cc -o test_bzip2_decompressor

./spit >test.txt

bzip2 -c test.txt >test.txt.bz2

pbzip2 -c test.txt >test.txt.parallel.bz2

comment:2 Changed 4 years ago by jeff.gilchrist@…

It seems like boost does not support multiple bz2 streams in a file. Both bzip2 and pbzip2 support reading multiple bz2 streams, so when you see the end of a bz2 sequence, you have check if it is the EOF as well, if not, look for another bz2 sequence and concatenate the decompressed results if one is found.

So you can have files like:

|bz1|

or

|bz1|bz2|bz3|bz4| etc...

And you need to support both if you want to be compatible with bzip2 and pbzip2 since they both support this and it is a valid format.

comment:3 Changed 4 years ago by steven_watanabe

  • Status changed from new to closed
  • Resolution set to fixed

(In [63057]) Allow bzip2_decompressor to process multiple concatenated streams. Fixes #3853.

View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.