Re: Find second minimum number



Mark P said:

Googmeister wrote:

You can find the second minimum (along with the minimum)
in n + log n comparisons. Hint: the solution requires no fancy
techniques or mathematical analysis.


Do tell, how do you do this with only n + log n comparisons?

He's right, and it's easy if you know how. :-)

Donald Knuth discusses maximum-finding. Clearly, this is the same problem,
wearing a false moustache and ridiculously large spectacles.

He first proves that the search for a maximum amongst n items involves at
least n - 1 comparisons.

He then discusses quadratic selection. It works like this (and there is a
hidden message in the data I've chosen):

q w e r t y u i o p a s d f g h j k z x c v b n m

Rearrange in a square, as best you can:

I: q w e r t
II: y u i o p
III: a s d f g
IV: h j k z x
V: c v b n m

Group I: q < w, e < w, r < w, t < w : 4 comparisons. w is largest.
Group II: y > u, y > i, y > o, y > p : 4 comparisons. y is largest.
Group III: a < s, s > d, s > f, s > g : 4 comparisons. s is largest.
Group IV: h < j, j < k, k < z, z > x : 4 comparisons. z is largest.
Group V: c < v, v > b, v > n, v > m : 4 comparisons. v is largest.

That's 20 comparisons so far. We can now pull out the largest with 4 more
comparisons:

w < y, y > s, y < z, z > v. So z is the largest value in the group. So far
so good, and we've made 24 comparisons, which is equal to n - 1.

Now, consider the possibilities for the second largest. EITHER it's one of
the winners in the other four groups, OR it's one of the losers in z's
group.

Strictly speaking, we already know that w < y, y > s, so we don't need to
compare those again, but if z had been in Group I, we wouldn't have known
this, so let's pretend we don't.

w < y, y > s, y > v, so y is the second-greatest of the group winners. Now
we have to trawl through Group IV's losers: y > h, y > j, y > k, y > x

This gives us y as the second-largest member, in seven more comparisons - a
total of 31, which is just 1 higher than n + sqrt(n).

Okay, what about cubic selection? Here, we split the data up into a cube
(I've added @ as a rather sad glyph for "omega", arbitrarily in the middle
somewhere - we will assume it has a value > z):

q w e r t y u i o p a s d f @ g h j k l z x c v b n m

Dice it!

q w e r t y u i o
p a s d f @ g h j
k l z x c v b n m

q < w, w > e : 2, w
p > a, p < s : 2, s
k < l, l < z : 2, z

r < t, t < y : 2, y
d < f, f < @ : 2, @
x > c, x > v : 2, x

u > i, u > o : 2, u
g < h, h < j : 2, j
b < n, n > m : 2, n

18 comparisons so far.

w > s, w < z : 2, z
y < @, @ > x : 2, @
u > j, u > n : 2, u

24 so far.

z < @, @ > u : 2, @

26 comparisons, for 27 items. Perfect. Now for second place. Clearly, z is
not the highest (because @ is), and yet it is the highest out of its
original 3x3 group. Likewise u. So the second highest is either z, or u, or
y, or x (the two losers to @ in the semi-final), or d, or f (the two losers
to @ in the first round).

d < f, f < y, y > x, y < z, z > u. So z is the second highest, and we
learned this after a further five comparisons: 31 in all, for 27 items.
This is just 1 higher than n + pow(n, 1.0 / 3.0).

You can extend this scheme indefinitely. Consider n = 16.

q w e r t y u i o p a s d f g h

Split into n^(1.0/4.0) groups:

q w e r t y u i

o p a s d f g h

Split each group into n^(1.0/4.0) groups:

q w e r
t y u i

o p a s
d f g h

Split each group into n^(1.0/4.0) groups:

q w
e r

t y
u i

o p
a s

d f
g h

Compare:

q < w : 1, w
e < r : 1, r

t < y : 1, y
u > i : 1, u

o < p : 1, p
a < s : 1, s

d < f : 1, f
g < h : 1, h

8 so far (n = 16, remember).

w > r: 1, w
y > u: 1, y

p < s: 1, s
f < h: 1, h

12 so far (n = 16)

w < y: 1, y
s > h: 1, s

14 so far.

y > s: 1, y

15 so far. The second place is either s, w, or u. This takes up two more
comparisons: s < w, w > u.

So we have 17 comparisons to get the top two from 16. This is as near to n +
log n as makes no odds.

As you increase the number of "dimensions" of the search, you approximate n
+ log n ever more closely.

Let us arbitrarily choose to have a 16-dimensional search, where n is 65536.

We predict that we will take 65535 + log2(65536) searches - i.e. 65551
altogether.

We divide the data into 32768 groups, and we have one comparison in each
group. That costs us 32768 comparisons so far. Clearly we keep halving the
number of comparisons, and I don't think anyone will quibble now when I say
that this comes to 65535 searches to establish the "winner".

To find the second place, we need only look at all the data items "beaten"
by the ultimate winner. There are 15 "rounds", and so 15 items in that
group. This means we need another 14 comparisons to establish second place.

And thus we actually come in at two under par!

So yes, it's certainly possible to do this in N + log N - much to my
surprise, I must admit.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
.