Re: arm11/armv6 right shift signed packed values

<johann.koenig@xxxxxxxxx> wrote in message news:53ea9945-3838-40b2-836d-2c8f08c30efa@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
I'm attempting to pack some numbers for output after doing some work.
They're currently r7 = v0|v4 and r10 = v1|v5. They all need to be >>3,
before or after repacking. Output will be v1|v0 and v5|v4 (little
endian architecture). I managed to get the v1|v0 written reasonably
mov r8, r10, asr #3 ; 1>>3|xxx
pkhtb r8, r8, r7, asr #19 ; 1>>3|,0>>3
str r8, [r0], r2 ; o1|o0, post inc

But v5|v4 is a little ugly because I'm starting with the least
significant bits, so right shifting is going to drag in the bottom of
the upper word (right?). Right now I'm sign extending, then writing
individual shorts.
mov r8, r10, asr #3 ; 5 >> 3
strh r8, [r0, #2] ; o5
sxth r1, r1 ;

sxth r7, r7 ;
mov r8, r7, asr #3 ; 4 >> 3
strh r8, [r0], r2 ; o4, post inc

I found
PKHBT R3, R1, R2, LSL #15 ; R3 = [R2>>1, R1]
PKHTB R3, R3, R1, ASR #1 ; R3 = [R2>>1, R1>>1]
However, that seems to rely on the input being full words.

Is there a better way to do this?

An easy alternative would be to shift r10 and r7 left by 16 and then apply
your first sequence. This way you save and instruction and use str.

However the best option would be to avoid shifting at this stage. Unless it
is the final result, delaying the shift until the next processing step might be
cheaper. Another possibility is to use halving additions if you do any, so
that the result is already shifted.