Modify

Ticket #5629 (assigned Bugs)

Opened 3 years ago

Last modified 2 months ago

base64 encode/decode for std::istreambuf_iterator/std::ostreambuf_iterator

Reported by: nen777w@… Owned by: ramey
Milestone: To Be Determined Component: serialization
Version: Boost 1.45.0 Severity: Problem
Keywords: Cc:

Description

MSVS 2008 The code:

#include "boost/archive/iterators/base64_from_binary.hpp"
#include "boost/archive/iterators/binary_from_base64.hpp"
#include "boost/archive/iterators/transform_width.hpp"

//typedefs
typedef  std::istreambuf_iterator<char>    my_istream_iterator;
typedef  std::ostreambuf_iterator<char>    my_ostream_iterator;

typedef boost::archive::iterators::base64_from_binary<
          boost::archive::iterators::transform_width< my_istream_iterator, 6, 8>
> bin_to_base64;

typedef boost::archive::iterators::transform_width<
    boost::archive::iterators::binary_from_base64< my_istream_iterator >, 8, 6
> base64_to_bin;

void test()
{
   {
        //INPUT FILE!!!
    std::ifstream ifs("test.zip", std::ios_base::in|std::ios_base::binary);
    std::ofstream ofs("test.arc", std::ios_base::out|std::ios_base::binary);

    std::copy(
        bin_to_base64( my_istream_iterator(ifs >> std::noskipws) ),
        bin_to_base64( my_istream_iterator() ),
        my_ostream_iterator(ofs)
    );
  }

  {
    std::ifstream ifs("test.arc", std::ios_base::in|std::ios_base::binary);
    std::ofstream ofs("test.rez", std::ios_base::out|std::ios_base::binary);

    std::copy(
        base64_to_bin( my_istream_iterator(ifs >> std::noskipws) ),
        base64_to_bin( my_istream_iterator() ),
        my_ostream_iterator(ofs)
    );
  }
}

Result: 1) If the INPUT FILE will be any of ZIP-file format. The result was:

a) _DEBUG_ERROR("istreambuf_iterator is not dereferencable"); it can be disabled or ignored b) The encoded file "test.rez" will have one superfluous byte than INPUT FILE

2) If the INPUT FILE will any other file (binary or text) all will be OK.

Attachments

Change History

comment:1 Changed 3 years ago by nen777w@…

If it may help. The workaround code and example how to use you can find here:  http://rsdn.ru/forum/cpp.applied/4317966.1.aspx

comment:2 Changed 18 months ago by anonymous

Not so much a bug but a missing feature - no function to add/remove "=" padding. See  http://stackoverflow.com/questions/8033942/boost-base64-url-encode-decode

comment:3 Changed 18 months ago by Ruslan Teliuk <nen777w@…>

comment:4 Changed 17 months ago by dave

  • Owner changed from dave to jeffrey.hellrung

comment:5 Changed 17 months ago by ramey

  • Owner changed from jeffrey.hellrung to ramey
  • Status changed from new to assigned

comment:6 Changed 17 months ago by dave

  • Component changed from iterator to serialization

comment:7 Changed 16 months ago by iGene

The root cause is that sequences whose size doesn't divide by four get a buffer overrun. Here is my workaround.

#include <sstream>
#include <cassert>

struct to_base64 : public std::stringstream {
	to_base64(const std::string& str);
	to_base64(const char* begin, const char* end);
};

struct from_base64 : public std::stringstream {
	from_base64(const std::string& str);
	from_base64(const char* begin, const char* end);
};

#include <boost/archive/iterators/binary_from_base64.hpp>
#include <boost/archive/iterators/base64_from_binary.hpp>
#include <boost/archive/iterators/transform_width.hpp>
#include <boost/archive/iterators/ostream_iterator.hpp>

// slightly generalized version of the example here:
// http://stackoverflow.com/questions/7053538/how-do-i-encode-a-string-to-base64-using-only-boost

template <typename TransformIterator>
static void apply(const char* begin, const char* end, std::stringstream& target) {
	std::copy(TransformIterator(begin), TransformIterator(end), std::ostreambuf_iterator<char>(target));
}
template <typename TransformIterator>
static void applyTwice(const char* begin, const char* end, std::stringstream& target) {
	long size = end - begin;
	int remainder = size % 4;
	const char* truncated = end - remainder;
	apply<TransformIterator>(begin, truncated, target);
	if (remainder) {
		assert(remainder != 1); /* it can never be =1 if this whole thing about dividing by four is correct */
		char padded[4] = { 'A', 'A', 'A', 'A' };
		const char* src = truncated; 
		char* dest = &padded[0];
		while (src != end)
			*(dest++) = *(src++);
		apply<TransformIterator>(&padded[0], &padded[sizeof(padded)], target);
		std::ios::streampos pos = target.tellp();
		pos -= (4 - remainder);
		target.seekp(pos);
	}
}

using namespace boost::archive::iterators;

typedef base64_from_binary<transform_width<const char*, 6, 8> > to;
to_base64::to_base64(const char* begin, const char* end) { apply<to>(begin, end, *this); }
to_base64::to_base64(const std::string& str) { apply<to>(str.c_str(), str.c_str() + str.length(), *this); }

typedef transform_width<binary_from_base64<const char*>, 8, 6> from;
from_base64::from_base64(const char* begin, const char* end) { applyTwice<from>(begin, end, *this); }
from_base64::from_base64(const std::string& str) { applyTwice<from>(str.c_str(), str.c_str() + str.length(), *this); }

int main()
{
	size_t length = 0;
	do {
		// generate source bytes
		char source[RAND_MAX + 1];
		for (size_t pos = 0; pos < length; ++pos)
			source[pos] = '0' + char(rand() % 32);
		source[length] = '\0';
		// convert them to base64
		to_base64 b(&source[0], &source[length]);
		std::string b64 = b.str();
		// and convert them back
		from_base64 result(b64.c_str(), b64.c_str() + b64.size());
		// compare as binary
		size_t size = (size_t)result.tellp();
		assert(size == length);
		char dest[RAND_MAX];
		result.read(&dest[0], size);
		for (size_t pos = 0; pos < length; ++pos)
			assert(source[pos] == dest[pos]);
		// compare as text
		std::string asString = result.str();
		assert(!strcmp(asString.c_str(), &source[0]));
	} while (++length < 100);

	return 0;
}

comment:8 follow-up: ↓ 9 Changed 15 months ago by anonymous

Already fixed?

comment:9 in reply to: ↑ 8 Changed 12 months ago by anonymous

Replying to anonymous:

Already fixed?

Not entirely. Although a fix was put into boost 1.53 so that '=' characters will not cause the decoder to crash, it still doesn't treat them as padding. The fix will cause the decoder to add nulls to the end of the decoded value, which is probably not what you want. Granted, you should be able to figure out the right thing to to given the number of '=' characters on the end of the encoded stream, but you shouldn't have to.

Also, the encoder still won't produce a padded encoding.

comment:10 Changed 7 months ago by prantlf@…

The problem of padding could be solved by making the transform_width stateful. If you want to encode/decode to/from BASE64 and you input comes in chunks or as a stream, you need to save your previous state to be able to continue with the next chunk correctly.

The transform_width currently goes "blindly" for the next item pointed to by the iterator, although there may not be enough items to finish the minimum quantum. It even ends up with buffer overflow, if your input sequence is not zero-padded. Also, this eager reading of the zero padding makes it useful only to convert once a complete buffer.

Having the transform_width "know" that there is a minimum quantum of units to read which has to be available to produce an output unit, it would prevent the buffer overflow and allow transforming chunked input.

I added the end-iterator and state to the transform_width (either directly or to the transformed iterator by an iterator adaptor). Reading ahead and storing the next value makes the code a little longer and note so compact. Also, tt runs significantly slower than a hand-coded BASE64 encoder; I\m nor sure why. Maybe copying of the iterators around?

Would it make sense to include a stateful transform_width in boost?

comment:11 Changed 2 months ago by ramey

I did spend a significant amount of time on this while (apparently) not getting it quite right.

How about:

a) updating the current test so that if fails

b) suggesting a patch

Robert Ramey

View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as assigned
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.