Modify

Ticket #4214 (closed Patches: fixed)

Opened 4 years ago

Last modified 4 years ago

Use collection_size_type/version_type consistently (fixes Boost.MPI)

Reported by: dgregor Owned by: ramey
Milestone: Boost 1.44.0 Component: serialization
Version: Boost Development Trunk Severity: Problem
Keywords: Cc:

Description

Boost.MPI has some special archivers (e.g., ignore_skeleton_oarchive) that ignore some of the structural aspects of serializing data. It does so by recognizing attempts to serialize values of type collection_size_type and version_type and then silently ignoring them which, in the narrow domain that ignore_skeleton_oarchive is used, works well. The current crop of failures in the MPI library (skeleton_content_test-* are failing sporadically) all appears due to inconsistent use of collection_size_type/version_type.

The attached patch updates the serialization library to use collection_size_type and version_type consistently. Tests pass for GCC and Clang with both the Serialization and MPI libraries, and this ends up fixing a pretty serious regression in the MPI library.

Currently, Boost.MPI is broken on both trunk and the release branch because

Attachments

boost-serialization-version-collection.patch Download (6.3 KB) - added by dgregor 4 years ago.
Patch to use collection_size_type/version_type consistently

Change History

Changed 4 years ago by dgregor

Patch to use collection_size_type/version_type consistently

comment:1 Changed 4 years ago by dgregor

  • Status changed from new to closed
  • Resolution set to fixed

(In [62003]) Merge Boost.Serialization fixes for Boost.MPI (consistent use of collection_size_type/version_type). Fixes #4214.

comment:2 Changed 4 years ago by dgregor

  • Milestone changed from To Be Determined to Boost 1.44.0

comment:3 Changed 4 years ago by ramey

  • Status changed from closed to reopened
  • Resolution fixed deleted

I'm a little concerned that this might break the ability to read older archives. Did you consider this?

I think what may have happened is:

a) I changed version_type unsigned char b) so an archive is created with a different type than before. c) then when things are read in - we read 16 bit int - error. d) so you've fixed it so that now it reads an 8 bit type. - no error on test. e) BUT - now any attempt to read an archive created in a previous system won't work - IT will have an error when it is trying to load.

In short, I think I introduced a bug in 1.43 (or ? 1.42). and that now we've fixed it in the wrong place. The best fix is likely:

a) increment the value returned by get_libray_version() b) include conditional code around your fix.

Does this make sense to you?

Robert Ramey

comment:4 Changed 4 years ago by dgregor

Sorry I missed this in my original fix. I've reverted my changes and will take another shot at it.

I am a little unhappy with the dependency of the serialization code in boost/serialization.*.hpp on boost::archive::version_type, especially since there is a completely different type boost::serialization::version_type. As part of this change, I would also like to move over to boost::serialization::version_type for versioning container serializations (and will teach Boost.MPI to ignore those versions as well).

comment:5 Changed 4 years ago by ramey

Actually I'm pretty unhappy with it as well.

I'm interested in getting it addressed in a definitive way. But as you can see, it requires a lot of thought to get it "right".

Right now I'm concerned that the change which created the issue here might have created binary_?archives. This would be a huge hassle to deal with.

It never occurred to me that both boost::serialization::version_type and boost::archive::version_type exist. Obviously this is a bad idea and likely the source of some problems that I'm not aware of. So clearly, the concepts of class version and archive library version are muddled.

A couple of random but related observations:

a) collection_size is sort of a problem. unsigned int was unsatisfactory. So we made collection_size_type. But it turns out that each collection (vector, list, etc.) has it's own size_type - e.g. vector::size_t. So collection_size_type isn't really any more "correct" than the original "unsigned int". In practice it's probably not a problem. But someday it could pop up.

b) The library has evolved much more since it's acceptance than is generally appreciated. It's MUCH less of a hack than it used to be and much more regular. This occurred over years as bugs were fixed. Also a large number of issues came up as people almost immediately started to put serialization code into DLLS. Every compiler has it's own rules for things like code stripping. These rules vary between debug and release versions for each compiler. And of course each compiler has it's own attribute syntax for adding the information that these rules use. See force_include.hpp. This raises lot's of issues in that, as far as I can tell, dynamic linking isn't in any way addressed in the C++ standard.

c) A number of things like EXPORT and now VERSION were "almost correct". Of course this turns out to be worse than "wrong" because it makes the issues harder to tease apart. I think EXPORT is almost there. VERSION is still a festering sore.

d) Applying fixes to the serialization library is much harder than other libraries because one wants to avoid invalidating existing archives.

Given all this, along with the fact that I've continually tweaked it, it's absolutely amazing to me that the library works as well as it does.

I'm happy to consider/integrate any changes you want to make in these or other areas.

Robert Ramey

comment:6 Changed 4 years ago by ramey

I've checked in changes that I think will address this without making previously created archives obsolete. Please watch the appropriate tests and let me know if this fixes things.

Robert Ramey

comment:7 Changed 4 years ago by ramey

Doug,

I'm still struggling with this. Not so much with the MPI aspect but with maintaining backward compatibility.

I had looked at your patch. I didn't (and still don't) understand the "skeleton" concept but I had concluded (maybe incorrectly) that the problem was that somewhere in MPI there was a dependency upon the size of version_type. And of course the existence of 3 different version types didn't help any. So it seemed the correct fix was just to make the version_type consistent at one byte wide.

I made changes to implement this, and still some changes had to be made in MPI and all tests pass. BUT it's still not right for loading old archives.

So my question is: suppose version_type and item_version_type are made larger than 1 char. Archives already now need specialized code to handle these particular types. But even though I might save/load only one byte for current archives, I need to retrieve a larger type older archives. So I would like to make the types larger than one byte, even though the current version only saves/loads 1 byte.

Would this break MPI again?

Any insight you can give me would be appreciated.

Robert Ramey

comment:8 Changed 4 years ago by ramey

  • Status changed from reopened to closed
  • Resolution set to fixed
View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.