Modify

Opened 10 years ago

Last modified 7 years ago

#1836 assigned Bugs

bug in serializing wide character strings

Reported by: Jeff Faust <jeff@…> Owned by: Robert Ramey
Milestone: To Be Determined Component: serialization
Version: Boost 1.34.1 Severity: Problem
Keywords: wstring wchar_t Cc: jeff@…, Sohail Somani

Description

We've discovered an issue Boost has writing and reading wide character strings (wchar_t* and std::wstrings) to non-wide character file streams (std::ifstream and std::ofstream). The issue stems from the fact that wide characters are written and read as a sequence of characters (in text_oarchive_impl.ipp and text_iarchive_impl.ipp, respectively). For text streams, an EOF character terminates the reading of a file on Windows. Some wide characters have EOF (value = 26 decimal) as one of the bytes so reading that byte causes early termination of the read. We have worked around the issue by deriving our own input and output archives from text_i|oarchive_impl<Archive> and overriding load_override() and save_override for std::wstring and wchar_t*. Our implementation just sequences through the wide characters and writes them 1 by 1 as wchar_t to the archive. This isn't very elegant and is even less readable in the file than the current implementation but does resolve the problem.

Although the test test_simple_class does test wstrings, it only uses characters 'a'-'z' which does not expose this problem.

Attachments (0)

Change History (10)

comment:1 Changed 10 years ago by Ostap Kutsyy <ostapkl@…>

Why don't you use text_wi|oarchive? This was designed for wide characters and strings.

comment:2 in reply to:  1 Changed 10 years ago by jefffaust

Replying to Ostap Kutsyy <ostapkl@gmail.com>:

Why don't you use text_wi|oarchive? This was designed for wide characters and strings.

Frankly, I've never seen those in the documentation. Now that I look, there they are... in the "Archive Concepts" and "Implementation Notes" sections. The wide character classes are not even part of the "Text Archive Class Diagram".

I've asked the developer working on this if this will solve our problem. I'll follow up after he looks into it.

Thanks for the help!

Jeff

comment:3 Changed 10 years ago by jefffaust

Ostap,

Using text_w?archive does address our problem. Thanks for the help. However, I still consider this a bug. Attempting to serialize a wstring should fail to compile, in the same way that "cout << wstring();" fails to compile. As it is currently, it fails at runtime in frustratingly subtle ways.

Jeff

comment:4 Changed 10 years ago by Robert Ramey

Status: newassigned

comment:5 Changed 9 years ago by (none)

Milestone: Boost 1.35.1

Milestone Boost 1.35.1 deleted

comment:6 Changed 9 years ago by Sohail Somani

Cc: Sohail Somani added

Robert, do you have any thoughts on this? My thoughts are that narrow archives should definitely not support wide character streams for this reason (among others...)

If you agree, I have a patch in the works for this issue.

comment:7 in reply to:  6 Changed 9 years ago by Sohail Somani

Replying to sohail:

Robert, do you have any thoughts on this? My thoughts are that narrow archives should definitely not support wide character streams for this reason (among others...)

Should not support wide characters...

comment:8 Changed 9 years ago by Robert Ramey

I don't think that data types should be coupled to archives.

That is, any data that can be serialized to one kind of archive should be serializable to ALL kinds of archives. That is why std::wstring must be serializable into a text or xml_archive. The real fix is to make adjustments so that all characters are rendered. This is not trivial to do without a big performance hit. So its and open issue for now.

comment:9 in reply to:  8 Changed 7 years ago by Dean Michael Berris

Milestone: To Be Determined

Replying to ramey:

This is not trivial to do without a big performance hit. So its and open issue for now.

Do you have a suggestion as to how this should be addressed? For example, should serializing a std::wstring to a text archive yield an appropriately encoded (maybe Base64) string, then when read back be appropriately decoded? How will this work on binary archives and in other user-provided archives (like the ones Boost.MPI provides)?

comment:10 Changed 7 years ago by Robert Ramey

speaking from memory when I last looked at this, I concluded that what was needed was an escape mechanism. This would render some characters as \134 or something like that. This would be implemented in the "stack" of iterator adaptors which are used to save/load the string. This would entail character by character processing which I was thinking would be a performance hit - which it would be. BUT, now I realize that serialization of a wstring to a char archive is not a common operation, so performance really isn't an issue. The way to address this is to look at the documentation for "dataflow" iterators and the related code. It wouldn't be too hard to craft another "escape" layer and insert into the iterator stack which handles this. Feel free to take this on.

Robert Ramey

Modify Ticket

Change Properties
Set your email in Preferences
Action
as assigned The owner will remain Robert Ramey.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.