Mombu the Programming Forum

Go Back   Mombu the Programming Forum > Programming > shuffling bits
User Name
Password
REGISTER NOW! Mark Forums Read




Reply
1 31st October 20:54
geomrock
External User
 
Posts: 1
Default shuffling bits



Is there a way to shuffle bits on x86 hardware.

I have three integers

A = a_1 a_2 a_3 ... a_32
B = b_1 b_2 b_3 ... b_32
C = c_1 c_2 c_3 ... c_32

where all a_i's b_i's and c_i's are bits
and my shuffle looks like this

X = a_1 b_1 c_1 a_2 b_2 c_2 ... a_32 b_32 c_32

Is there a way i can generate X using MMX or SIMD instructions
such that X is 128 bit integer.


Thanks in advance for your comments,
--Elijah
  Reply With Quote


 


2 31st October 20:54
bx. c
External User
 
Posts: 1
Default shuffling bits



well.. 32 bits + 32 bits + 32 bits = 96 bits... so... could you define where
you'd like the remaining 32 bits in X to be? (or how they should be
defined?)
  Reply With Quote
3 31st October 20:54
matt taylor
External User
 
Posts: 1
Default shuffling bits


3 is an awkward number. If you can, I would suggest interleaving 4 32-bit
variables. There are no convenient x86 instructions for bit swizzling,
although multiplies can sometimes accomplish that.

I can think of 2 basic strategies:
1. Expand 4 32-bit values into 4 separate 128-bit values by inserting 0s in
between bits, then or them together
2. Concatenate all 4 32-bit values together to form a 128-bit value, then
swizzle using shifts & masking.

I can't think of any decent way of interleaving individual bits. Perhaps you
could somehow abuse punpck* instructions. Data swizzling is not one of the
x86's strong points.

-Matt
  Reply With Quote
4 31st October 20:54
bx. c
External User
 
Posts: 1
Default shuffling bits


yah.. 4 would be a better number... but in either case, i don't know SSE,..
so i don't know any 128-bit media instructions to work with...

the best i'd come up with, would be the double-precision shift instructions,
SHLD and SHRD.... shift a single bit at a time.. combine that with a loop..
taking a bit from each reg, each time through the loop... and you have the
finalized product....
  Reply With Quote
5 31st October 20:54
matt taylor
External User
 
Posts: 1
Default shuffling bits


<snip>

Well at a minimum I would use the shift instructions instead of shld/shrd:
; Swizzle A -> edx inserting 0s
; Trashes eax, ecx
mov eax, A ; eax = A
mov ecx, 8 ; ecx = bits to copy
xor edx, edx ; edx = masked copy of A

...copyloop:
lea edx, [edx*8] ; edx = edx << 3
shr eax, 1 ; CF = next_bit
adc edx, edx ; edx = (edx << 1) | next_bit
dec ecx
jnz .copyloop

Here's how to do it with multiplies:
; Swizzle al -> eax inserting 0s
movzx eax, byte A
imul eax, 0x00000101 ; ah = al
and eax, 0x0000F00F ; ah = <A7, A6, A5, A4>, al = <A3, A2, A1, A0> imul eax, 0x00001111
and eax, 0x03030303 ; eax = <A7 | A6, A5 | A4, A3 | A2, A1 | A0>
lea eax, [eax+eax*8] ; Swizzle into nibbles
and eax, 0x11111111

Perhaps someone can improve upon either of those. Each produces 8 bits of
the result at a time. The multiply method takes 17 clocks on an Athlon, 8 of
which are spent in the multiply unit. Unrolling could give a throughput of
~8 clocks per 8 bits on the multiply routine.

-Matt
  Reply With Quote
6 31st October 20:55
bit
External User
 
Posts: 1
Default shuffling bits


If you need to do many swizzles the BT and ADC/RCL/RCR instructions are
the way to go in general - use the carry flag. The AMD processors do
quite well on bit instructions.

mov eax, A
mov ebx, B
mov ecx, C
mov edi, pResult ; pointer to three DWORDs

bt eax, 31
adc edx, edx
bt ebx, 31
adc edx, edx
bt ecx, 31
adc edx, edx

bt eax, 30
adc edx, edx
bt ebx, 30
adc edx, edx
bt ecx, 30
adc edx, edx

....{repeat a few times}

bt eax, 21
adc edx, edx
bt ebx, 21
adc edx, edx
mov [edi][4*2], esi
bt ecx, 21
adc edx, edx
dec edx

....{repeat a few times}

bt eax, 10
adc edx, edx
mov [edi][4*1], edx
bt ebx, 10
adc edx, edx
bt ecx, 10
adc edx, edx
dec edx

....{repeat a few times}

bt eax, 1
adc edx, edx
bt ebx, 1
adc edx, edx
bt ecx, 1
adc edx, edx

bt eax, 0
adc edx, edx
bt ebx, 0
adc edx, edx
bt ecx, 0
adc edx, edx

mov [edi][4*0], edx


bitRAKE
  Reply With Quote
7 31st October 20:55
geomrock
External User
 
Posts: 1
Default shuffling bits


The rest can all be zeroes...
  Reply With Quote
8 31st October 20:55
geomrock
External User
 
Posts: 1
Default shuffling bits


I was thinking that there might be a MMX/SSE or some other
instruction that already does this job for me...I've see
shuffling instructions in the INTEL manuals...what are they
for? What does shuffle instruction mean in SSE?
  Reply With Quote
9 31st October 20:55
matt taylor
External User
 
Posts: 1
Default shuffling bits


<snip>

MMX/SSE shuffles give you 4 multiplexers with 4 inputs. The source and
destination MMX/SSE registers are partitioned into fourths (2-bytes for MMX,
4-bytes for SSE). Each destination fourth can select any of the source
fourths. SSE will also let you select 2-byte quantities from the low or high
8-bytes of the register (pshuflw/pshufhw).

The unpck* instructions are more appropriate since they interleave different
quantities of data. However, you can't use them for interleaving bits.

-Matt
  Reply With Quote
10 31st October 20:55
ivan korotkov
External User
 
Posts: 1
Default shuffling bits


They shuffle bytes, not bits. PSHUFW mm1, mm2/mem, imm8 does the following:
mm1[i] = mm2[BITS(imm8, i, 2)], i = 0..3, where mm1/2 are represented as
arrays of 4 words and BITS(expr, i, j) means bits i to i+j of expr. Again,
SHUFPS xmm1, xmm2/mem, imm8 does the same with XMM registers represented as
4 single-precision floats. There are also SSE2 instructions that operate
with integer
words in XMM regs. MMX\SIMD aren't going to help you in your case.

Ivan
  Reply With Quote
Reply


Thread Tools
Display Modes




666