rotate revisited
From: Richard Harter (cri_at_tiac.net)
Date: 04/06/04
- Next message: gswork: "Re: Powerpoint 2000 - Using VBA to return username"
- Previous message: Richard Heathfield: "Re: measuring job performance"
- Next in thread: Ian Woods: "Re: rotate revisited"
- Reply: Ian Woods: "Re: rotate revisited"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Tue, 06 Apr 2004 07:39:34 GMT
My apologies for the length of this posting.
Statement of the problem:
Suppose we have an array A containing n equi-spaced elements.
The object is to rotate the elements of A by k locations for some k,
0 < k < n. We can express this by
A[i] <- A[i+k], i < n-k
<- A[i+k-n], i >= n-k
An alternative way to express it is to represent A as the
concatenation
of two arrays P and Q of lengths k and n-k respectively. Then we have
A = PQ
rotate(A) = QP
There are three major approaches to effecting rotation that I
know of; these are:
(1) Move one of the components into a scratch array, move the
other component into its final position, and then move the
component in the scratch array into its final position.
The cost of this method (excluding loop costs) is:
Data moves: n + min(k, n-k)
Aux storage: min(k, n-k)
The catch in assessing costs is that the method lends itself to
bulk data moves (via hardware instuctions or loop unrolling) so
that data move costs are not comparable with those of other
methods. Likewise the logic costs are not comparable except for
naive implementations.
In short, it shouuld be by far the fastest method of the three,
but it does require a substantial amount of auxilliary storage.
(2) Reverse each component separately and then reverse the entire
array. Let r() be the reversal function. Then we have
QP = r(r(P)r(Q))
Calculating the costs of this method is straightforward. The
reversal of an array of length n requires n/2 swaps. Each swap
requires 3 data moves. Ergo, the number of data moves is
3n/2. Each element gets swapped twice, so the total number of
data moves is 3n. The number of loop tests is n/2u where u is
the loop unrolling factor.
(3) Treat the rotation as a permutation of the array, and
execute each permutation cycle separately. This is the most
complicated of the three, potentially requires the fewest
operations, and has the most varied implementations.
Let d be the greatest common denominator of n and k. Then the
permutation that carries A into rotate(A) has d cycles. For
example, suppose that n=14 and k=6, so that d=gcd(14,6)=2. Then
the permutation cycles are:
(0, 6, 12, 4, 10, 2, 8)
(1, 7, 13, 5, 11, 3, 9)
Each cycle is of length n/d (7 in this case). A cycle consists
of runs of indices increasing by k. Thus in our example, the
runs in the first cycle are
(0, 6, 12), (4, 10), and (2,8).
When the cycle begins at the beginning of a run the length of a
run is either ldiv(n,k) or udiv(n,k) where ldiv is integer
division rounded down and udiv is integer division rounded up.
In our example the lengths are ldiv(14,6)=2 and udiv(14,6)=3.
If, however, we start the cycle in the middle or end of a run
that run will be broken, e.g.,
(8, 0, 6, 12, 4, 10, 2)
has two runs of length one, one run length of two, one run of
length three.
When k > n/2 the long runs are descending runs. For example,
suppose n=14, k=10. There are two cycles,
(13, 9, 5, 1, 11, 7, 3)
(12, 8, 4, 0, 10, 6, 2)
In this example there are descending runs oflengths
ldiv(n,n-k)=ldiv(14,4)=3, and udiv(14,4)=3. In many
implementations it is convient to divide the algorithm into two
cases, k < n/2, and k > n/2. In the latter case we replace k by
n-k and change the direction of rotation.
There are two special cases, k=0, and k=n/2. If k=0 there is
nothing to do. If k=n/2 we can swap the first and last halves.
The swap requires 3n/2 data moves and n/2 loop tests; this cannot
be improved upon. The cases are:
k = 0 Do nothing
0 < k < n/2 Increment = k, ascending runs
k = n/2 Swap first and last halves
n/2 < k < n Decrement = n-k, descending runs
I will use the term "small slide" for a data move that is less
than n/2 locations, and the term "big slide" for a data move that
is greater than n/2,
When the runs are ascending big slides occur when the array index
is in the last k elements; similarly, when runs are descending
big slides occur when the array index is in the first k elements.
In either case the total number of big slides is k.
It follows that the number of big slides in a cycle is k/d;
likewise the number of runs within a cycle is also k/d.
When effecting the rotation of a cycle the last big slide is not
executed. Instead the saved initial value is used. In other
words the sequence of events is:
An initial value is saved.
A loop consisting of k/d runs of small slides alternates
with k/d-1 big slides.
Array[final index] is filled with the saved value.
There are two things to settle in translating the above into
code. The first is how to terminate the loop. The simplest way
is to determine the final index in advance (it will be the
starting index + n - k) and exit when the final index is reached.
The second is how to handle the alternation of small slide runs
and big slides. One way is to use a simple loop with a test to
determine if the current move is a big slide or a small slide.
An example of this method is:
index = first
last = first + n - k
save = A[first]
while (index .ne. last) begin
next = index + k
if (next .ge. n) next = next - k
A[index] = A[next]
index = next
end
A[last] = save
The costs of this code for a single cycle of length n/d are:
Data moves: n/d + 1
loop tests: n/d
if tests: n/d - 1
The costs for processing the entire rotation (including a loop
over d) are:
Data moves: n + d
Tests 2n
The above code performs two tests per element, a loop test and a
test to see whether the move is a big slide or a small slide. We
can reduce the number of tests per element by putting in an inner
loop for the small slide runs. A naive (and slightly erroneous)
implementation for ascending runs (k .lt. n/2)looks like this:
nk = n - k
index = first
last = first + nk
save = A[first]
while (index .ne. last) begin
while (index .lt. nk) begin
A[index] = A[index + k]
index = index + k
end
A[index] = A[index - nk]
index = index - nk
end
A[last] = save
This code is wrong because the actual loop is a "loop and a
half". That is, the final big slide should not be made. Because
it is being made, the loop termination test is also wrong. We
can fix this is either of two ways. The first is to move a copy
of the small slide loop outside the cycle loop. The second is to
move the termination test to the middle of the cycle loop. The
code for the second alternative looks like this:
nk = n - k
index = first
last = first + nk
save = A[first]
loop begin
while (index .lt. nk) begin
A[index] = A[index + k]
index = index + k
end
if (index .eq. last) exitloop
A[index] = A[index - nk]
index = index - nk
end
A[last] = save
The inner loop will be executed (n-k)/d times; costs for the loop
are
Data moves: (n-k)/d
Loop tests n/d
The loop termination test will be executed k/d times and the big
slide will be executed k/d -1 times. The total costs for this
code are:
Data moves: n + d
Loop tests: n + k + d
The number of loop tests can be reduced by k by using an "until"
loop instead of a "while" loop. This eliminates one loop test
for each small slide loop.
A similar economy can be employed in the outer loop over the
number of cycles. If for some reason it is desirable to avoid
computing the gcd, we can count the number of small slide loops
executed; when the last cycle is completed this number will be k.
Thus our code for ascending runs can look like this:
nk = n - k
first = 0
count = 0
until (count .ge. k) begin
last = first + nk
index = first
save = A[first]
loop begin
count += 1
until (index .ge. nk) begin
A[index] = A[index + k]
index += k
end
if (index .eq. last) exitloop
A[index] = A[index - nk]
index -= nk
end
A[last] = save
first += 1
end
The analogous code for descending runs is:
nk = n - k
first = n - 1
count = 0
until (count .ge. nk) begin
last = first - k
index = first
save = A[first]
loop begin
count += 1
until (index .ge. nk) begin
A[index] = A[index - nk]
index -= nk
end
if (index .eq. last) exitloop
A[index] = A[index + k]
index += k
end
A[last] = save
first -= 1
end
This code clearly is the most economical method for rotating an
array in terms of operations used since it uses n+d data moves
and loop tests.
Richard Harter, cri@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
A university is what a college becomes when the faculty
loses interest in students. - John Ciardi
- Next message: gswork: "Re: Powerpoint 2000 - Using VBA to return username"
- Previous message: Richard Heathfield: "Re: measuring job performance"
- Next in thread: Ian Woods: "Re: rotate revisited"
- Reply: Ian Woods: "Re: rotate revisited"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|