find words that contains some specific letters
John, i think you've missed a trick there - doesn't Rotate_Left do a
barrel shift, where the bits that fall off the left-hand end get shoved
back in at the right, rather than a plain shift? If so, the basic
algebraic structure is actually a bit different to java's hash.
If Rotate_Left is a plain shift, then the Ada hash is based only on the
last 10 characters of the string. Multiplying by eight is like shifting
left three places; modding by 2*32 is like anding with 0xffffffff (if
you'll pardon my verbs). When you add in a character, the rightmost bit
you can affect is bit 0, the LSB; after ten shifts of three bits, that is
now at bit thirty, and after one more, it's at bit thirty-three, and has
vanished.
If Rotate_Left is a barrel shift, then our bit of interest doesn't vanish,
because it comes back to the right-hand end of the hash after the eleventh
shift in position 1. Because 3 is coprime with 32, it will then proceed to
make a complete circuit of all positions in the hash over the following
twenty-one shifts, winding up back in position 0 after the thirty-second.
I have a vague worry that this means that strings built from the same
32-character blocks arranged in any order will have the same hash, but i
suspect that this is not the case, because of carries. My maths isn't up
to proving it, though.
I still think this is a shoddy hash. It has no diffusion: if you hash two
strings which differ at one character position, the difference between the
hashes will always be confined to the six**** bits which correspond to
where that character ended up in the rotating hash. Java's
multiply-and-mod approach quickly spreads the bits of each character out
over the whole hash.
tom
--
Science of a sufficiently advanced form is indistinguishable from magic
|