# Re: implementation note for scapegoat tree

On Mar 28, 10:51 am, Ben Pfaff <b...@xxxxxxxxxxxxxxx> wrote:
I'm what you might call a balanced binary tree enthusiast, and so
when I rather belatedly encountered the idea of scapegoat trees a
few months ago I decided I'd take a stab at implementing them
when I got a chance. I finally got around to it this week and I
thought I'd pass along something that I'd noticed during
implementation.

In case you don't know what a scapegoat tree is, you can look at
http://en.wikipedia.org/wiki/Scapegoat_tree
which includes a link to the original Galperin and Rivest paper
on them.

One potentially tricky part of scapegoat tree design is the
choice of alpha, which is a real constant picked by the
implementor, that must be greater than 0.5 and less than 1. We
must be able to efficiently calculate h_alpha =
floor(log(n)/log(1/alpha)) for integer n > 0. One way to do so
would be to maintain a table relating h_alpha to the minimum
value of n that yields that h_alpha. Then, we can track the
value of h_alpha(n) in the number of nodes in the tree n as nodes
are inserted and deleted.

But I decided to use a different approach. I chose alpha =
sqrt(2)/2 = 1/sqrt(2) ~= .707. Then, computing h_alpha is a
matter of computing a logarithm in base sqrt(2). This is
actually easy: simply compute twice the base-2 logarithm, then
adjust upward by 1 if necessary.

Here's the code I ended up with for this. I omit the
implementation of count_leading_zeros, which can be computed with
a single machine instruction on many architectures (including
x86):

/* Returns floor(log2(x)).
Undefined if X is zero. */
static inline int
floor_log2 (size_t x)
{
return sizeof (size_t) * CHAR_BIT - 1 - count_leading_zeros (x);
}

/* Returns ceil(pow(sqrt(2), x * 2 + 1)).
Defined for X from 0 up to the number of bits in size_t minus
1. */
static inline size_t
pow_sqrt2 (int x)
{
/* These constants are sqrt(2) multiplied by 2**63 or 2**31,
respectively, and then rounded to nearest. */
#if SIZE_MAX >> 31 >> 1
return (UINT64_C(0xb504f333f9de6484) >> (63 - x)) + 1;
#else
return (0xb504f334 >> (31 - x)) + 1;
#endif
}

/* Returns floor(log(n)/log(sqrt(2))).
Undefined if N is 0. */
static inline int
calculate_h_alpha (size_t n)
{
int log2 = floor_log2 (n);

/* The correct answer is either 2 * log2 or one more. So we
see if n >= pow(sqrt(2), 2 * log2 + 1) and if so, add 1. */
return (2 * log2) + (n >= pow_sqrt2 (log2));
}

With -O3, GCC compiles calculate_h_alpha to a body of only ten or
so simple arithmetic instructions on x86.

How well does your tree perform?

.