Modify

Opened 4 years ago

Closed 4 years ago

Last modified 21 months ago

#9233 closed Bugs (wontfix)

Suboptimal shift by multiples of 8 bits

Reported by: Domagoj Šarić Owned by: John Maddock
Milestone: To Be Determined Component: multiprecision
Version: Boost 1.54.0 Severity: Optimization
Keywords: Cc:

Description

It seems to me that the operation from the subject is needlessly fat, as if it is using a generic implementation for a generic/'random' number of bits...

Attachments (0)

Change History (5)

comment:1 Changed 4 years ago by John Maddock

Resolution: wontfix
Status: newclosed

I don't believe there's any way to improve this - shifts by whole numbers of "limbs" are optimized, but individual bytes are not in a suitable order for a std::memmove for example.

comment:2 Changed 21 months ago by Domagoj Šarić

Maybe you could add/separate specialized functions for shifting: for whole limbs, for whole bytes and 'generic'. Then the shift operator can be implemented as if ( bits % limb_bits == 0 ) shift_limbs(); elseif ( bits % 8 == 0 ) shift_bytes(); else { generic() }; and rely on IPO to directly select the optimal implementation when bits is a compile time constant...

ps. why are limbs on win64 32bit integers (as opposed to 64 bit ones)?

Last edited 21 months ago by Domagoj Šarić (previous) (diff)

comment:3 Changed 21 months ago by John Maddock

Fixed in https://github.com/boostorg/multiprecision/commit/cb1a41835f4566987b96c7a7034088e22d83f77b

It turns out that memmove is (very slightly) faster than moving whole limbs so that's the preferred method when available.

Limbs are half the size of the largest available integers - so 32-bit for msvc as there's no 128-bit integer support (unlike mingw for example).

comment:4 in reply to:  3 Changed 21 months ago by Domagoj Šarić

Replying to johnmaddock:

Fixed in https://github.com/boostorg/multiprecision/commit/cb1a41835f4566987b96c7a7034088e22d83f77b

Thanks John :) I no longer use/need this functionality so I didn't test but from a cursory skim it looks good...except that maybe the (little) endian checks are not really necessary: if you make them in only one place - where you decide on the underlying/raw/limb layout - if you order the limbs in 'machine endianess' (I'm guessing) the rest of the code can be endianness agnostic (and more readily assume correct byte ordering/use memcpy/move/set more easily)...

It turns out that memmove is (very slightly) faster than moving whole limbs so that's the preferred method when available.

It's also smaller (one call instead of a loop for every instantiation;)

Limbs are half the size of the largest available integers - so 32-bit for msvc as there's no 128-bit integer support (unlike mingw for example).

Thanks for the explanation ;)

comment:5 Changed 21 months ago by anonymous

if you order the limbs in 'machine endianess' (I'm guessing) the rest of the code can be endianness agnostic (and more readily assume correct byte ordering/use memcpy/move/set more easily)

No I'd have to rewrite everything - multiplication, addition, subtraction, the works. Two versions for each, one little endian, one big. And I'd have no means of testing the big endian code either...

Not going to happen, sorry!

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain John Maddock.
The resolution will be deleted.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.