broot vs brootinv performancs
Niels Möller
nisse at lysator.liu.se
Tue Oct 30 15:26:30 CET 2012
I've tried some more benchmarking (sorry, patches to speed not yet
checked in). To my disapppointment, broot appear to be 20% slower than
brootinv (except for small sizes, where it wins because it does the
initial few iterations without bignum overhead).
I don't understand why. When describing the iterations below, let n be
the target size of the iteraton.
brootinv uses the iteration
r' <-- ((k+1) r - r^{k+1} y) / k
which is
mul_1,
powlo,
mullo_n,
sub_n,
bdiv_q_1,
all of size n. For broot, I've rewritten the iteration as
r' <-- r - (a^{k-1} (r^2)^{(k+1)/2} - r) / k
which gave a modest improvement. We still have cancellation in the
subtraction in parenthesis (and I don't yet use wraparound, neither does
brootinv). The operations are then
sqr (input size n/2, output size n)
powlo (size n, exponent size one bit smaller)
mullo_n (size n)
bdiv_q_1, (size n/2)
mpn_neg (size n/2)
The mullo_n is the same. The sqr + powlo ought to be roughly the same as
the powlo i brootinv (consider small k so windowing is not useful). I
had expected a significant win because a mpn_neg of size n/2 ought to be
faster than the combination of mul_1, sub_n of size n and bdiv_q_1 of
size n/2.
There's also a powlo (a^{k-1}) and mullo outside of the loop. I expected
them to contribute little to the total running time. But mullo seems to
take 13% for large sizes and mpn_powlo is not currently supported by
speed, so it's not so easy to measure).
Maybe I ought to power a and r at the same time as
a^{k-1} r^{k+1} = (a*r)^{k-1} * r^2
or so. Some wraparound or mulmid tricks may be applicable to
Or maybe I should rather try to optimize brootinv to take advantage of
cancellation and wraparound.
Another note: I also wish for a "bdiv_nq_1" which returns - u / d (mod
B^n) rather than u / d (mod B^n), then one could eliminate the mpn_neg.
(And this is somewhat related to my bdiv work a few months ago, which
also returns a quotient with different sign).
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list