Opened 5 years ago

Last modified 4 years ago

#9177 reopened Bugs

Improved serialization of floating point values

Reported by: John Maddock Owned by: Robert Ramey
Milestone: To Be Determined Component: serialization
Version: Boost Development Trunk Severity: Problem
Keywords: Cc:

Description

Currently there are several weaknesses with floating point serialization:

1) There's no handling of long double or user-defined floating point types that have been declared primitives. 2) The current code for float and double may fail to print sufficient digits to round trip the value, when the value is less than 1, but not so small as to trigger an automatic switch to scientific format (which from memory occurs around 10-5).

The attached patch addresses both of these issues, and coincidentally simplifies the code as well.

Attachments (3)

basic_text_oprimitive.hpp.patch (2.4 KB) - added by John Maddock 5 years ago.
basic_text_oprimitive.hpp.2.patch (1.5 KB) - added by John Maddock 5 years ago.
t.cpp (3.4 KB) - added by John Maddock 5 years ago.

Download all attachments as: .zip

Change History (14)

Changed 5 years ago by John Maddock

comment:1 Changed 5 years ago by Robert Ramey

Resolution: fixed
Status: newclosed

I've incorporated this patch into the code. A couple of observations though.

a) user types such as a multi-precision float would need to implement the << operator or trigger a compile time error.

b) the serialization_trait for these types must be marked "primitive" in order to get to the correct code.

c) and of course these special types need to corresponding members defined in numeric_limits.

Failure to get the above right - will like create a compile error which is pretty confusing. Oh well.

One thing I asked for some time ago was for the is_float to be defined in terms of numeric_limits - more or less as you have done here. At least that would keep things in one place. Actually it's not clear what the designers of numeric limits had in mind by not explicitly specifying is_float.

Question: what does this mean for someone who synthesizes a floating point decimal? What will this code do? Also what about a fixed point decimal. I'm generally concerned about undefined behavior and unintended consequences when we are "too clever" as I'm thinking we might be here.

Anyway, I gave credit to you in the comments for the fix.

Robert Ramey

I remember some special types - like int64 in microsoft failed to support stream << and >> operators. I don't know if that's still true.

comment:2 Changed 5 years ago by anonymous

For decimal floating point types it *may* stream out 2 digits too many - we should really be using C++11 max_digits10 and not digits10+2 - actually that could be fixed by checking for BOOST_NO_CXX11_NUMERIC_LIMITS and using max_digits10 when not defined and digits10+2 when it is.

For fixed point decimal the floating point code will not be triggered: these don't have a floating point (max_exponent is zero), and are generally regarded as "exact" in the same sense that integers are exact albeit truncating.

You're right about special types often not having std lib support - though typically that's a transitional thing when they're first introduced?

comment:3 Changed 5 years ago by anonymous

Hmmm - what about floating point decimal - this has been proposed as a library

comment:4 Changed 5 years ago by John Maddock

As mentioned above: their fine, especially if you use numeric_limits<>::max_digits10 rather than digits10+2 when BOOST_NO_CXX11_NUMERIC_LIMITS is not defined. Note that since we're talking about class types here, none of this code will be triggered by default, it requires the user to mark the type as fundamental (using your traits classes). I guess using a bit more introspection we could probably figure out if a type has a serialize member or not, and if not then forward to the operator<< code... but that may be too clever, or not?

comment:5 Changed 5 years ago by John Maddock

Ah, just looked at your updated code and there are a number of mistakes:

a) You mispelled my name. b) It uses numeric_limits<float> which will clearly only work for type float and not double, long double etc. c) The formula for number of digits you use is suitable for base2 only. d) You're still not outputting in scientific format - that results in one less significant digit being printed (strange but true), and as a result values do not always round trip. e) I see you also removed the save-and-restore formatting options - that may not matter frankly, but I thought it was a nicety to reset these at the end of the procedure so that classes that have their own serialize functions see whatever iostream flags were originally set in the stream by the user.

I'm attaching a patch for these, plus to use numeric_limits<>::max_digits10 when available.

comment:6 Changed 5 years ago by John Maddock

Resolution: fixed
Status: closedreopened

Changed 5 years ago by John Maddock

comment:7 Changed 5 years ago by John Maddock

I'm also attaching a test case - I'll let you pick a better name for the file, you have to build and run it 3 times with TEST_FLOAT, TEST_DOUBLE and TEST_LONG_DOUBLE defined.

Changed 5 years ago by John Maddock

Attachment: t.cpp added

comment:8 Changed 5 years ago by Robert Ramey

Thanks for the interesting comments.

a) sorry about misspelling your name,

b) I noticed that float vs double after checkin - fixed now

c) usage of the kahan formula. In my own tests I found it didn't work. It seemed to me that it should work in our case given that the radix trait of our floating point types should be 2. If understand this correctly ( which I doubt) the code in your patch should work the same as mine. When I got these results, changed to the current version you see now in the repository which used digits10+2. This gave results without round trip errors in gcc and max round trip error of 1 with mdvc 9.0.

d) I fixed the scientific format issue.

e) i implements save/restore state using io state saver for both the precision and format (scientific). This should leave the stream in the same state it was before. I haven't uploaded this yet.

I used digits10 + 2. I have no problem incorporating your patch. I was reluctant to do this as I didn't find BOOST_NO_CXX11_NUMERIC_LIMITS in the 1.54 documentation. I'll presume it's in the next release.

I'll incorporate your test in the future.

One big problem is the usage of BOOST_CHECK_EQUAL. Previously I had used my own ad-hoc method to test whether values were "close enough". Of course now that we're looking at this more carefully it became clear that my test wasn't really correct. I moved to your float_distance function which addressed these issue. I set a tolerance for 1 or 2 bits difference. Now I'm wondering if I should set that to zero - I'll look into that later. I really don't have the confidence that the wide variety of platforms which I would like the serialization library to target all have floating point correctly implemented. I'll think about this.

Thanks for you help with all this.

Robert Ramey

comment:9 Changed 5 years ago by Robert Ramey

Resolution: fixed
Status: reopenedclosed

comment:10 Changed 4 years ago by boost@…

This is still a problem for floating point in 1.55. The value 2.23783695e+038 does not round trip.

I can't tell from Trac if this was supposed to be fixed in 1.55 or not. If it isn't apologies in advance.

Please take a look at Bruce Dawson's page on portably serializing floats.

comment:11 Changed 4 years ago by Robert Ramey

Resolution: fixed
Status: closedreopened
Note: See TracTickets for help on using tickets.