Twiddling the Bits: Better Bit Mixing - Improving on MurmurHash3's 64-bit Finalizer

Wednesday, September 28, 2011

Better Bit Mixing - Improving on MurmurHash3's 64-bit Finalizer

Austin Appleby's superb MurmurHash3 is one of the best general purpose hash functions available today. Still, as good as it is, there are a couple of minor things about it that make me uneasy. I want to talk about one of those things today.

Here is the 64-bit finalizer from MurmurHash3:

UInt64 MurmurHash3Mixer( UInt64 key )
  {
  key ^= (key >> 33);
  key *= 0xff51afd7ed558ccd;
  key ^= (key >> 33);
  key *= 0xc4ceb9fe1a85ec53;
  key ^= (key >> 33);

  return key;
  }

The goal of a finalizer (sometimes called a "bit mixer") is to take an intermediate hash value that may not be thoroughly mixed and increase its entropy to obtain both better distribution and fewer collisions among hashes. Small differences in input values, as you might get when hashing nearly identical data, should result in large differences in output values after mixing.

Ideally, flipping a single bit in the input key results in all output bits changing with a probability of 0.5. This is called the Avalanche effect. Cryptographically-secure hashes are willing to spend an impressive number of computer cycles getting this probability close to 0.5. Hash functions that don't have to be cryptographically-secure trade away some avalanche accuracy in favor of performance. MurmurHash3 gets close to 0.5 with a remarkable economy of effort: Just two multiplies, three shifts and three exclusive-ors. Austin reports MurmurHash3 avalanches all bits to within 0.25% bias.

The overall structure of the finalizer (shifts and multiplies) was designed by Austin. The constants were chosen by "a simple simulated-annealing algorithm" (his description.) The part that troubles me is the simulation was driven with random numbers. That doesn't seem right. After all, if the inputs were truly random what would be the point of mixing? It seems to me that a good mixer should be able to take a series of, say, counting numbers and produce output that is virtually indistinguishable from random.

First question: Could I build a better mixer by training the simulation on low-entropy inputs such as counting numbers and high-likelihood bit patterns? Here are the results for MurmurHash3 and the top 14 mixers my simulation found:

4,853,184 Low-entropy keys
Mixer	Maximum error	Mean error
MurmurHash3 mixer	0.005974634384355	0.000265202074121
Mix01	0.000791851287732	0.000190873655461
Mix02	0.000885192071844	0.000197317606821
Mix03	0.000828116139837	0.000191444267889
Mix04	0.000991926125199	0.000199204707592
Mix05	0.000932171539344	0.000195036565653
Mix06	0.000819874127995	0.000199337714668
Mix07	0.000805656657567	0.000194049828213
Mix08	0.000906415252337	0.000191828700595
Mix09	0.000927020281943	0.000194149835046
Mix10	0.000877774261186	0.000194585176663
Mix11	0.000955867323390	0.000194238221367
Mix12	0.000932377589640	0.000193450189656
Mix13	0.000789996835068	0.000190778327016
Mix14	0.000800917500758	0.000191264175101

The maximum error tends to be about seven times lower. Mean error is lower too but only slightly. The answer, then, is yes, it does get better results with low-entropy keys. But that leads to the second question: Does training on low-entropy keys result in better or worse performance with high-entropy keys? To find out, I tested it with 100 million cryptographic-quality random numbers:

100,000,000 Random keys
Mixer	Maximum error	Mean error
MurmurHash3 mixer	0.000212380000000	0.000040445117187
Mix01	0.000177410000000	0.000040054211426
Mix02	0.000179150000000	0.000039797316895
Mix03	0.000170070000000	0.000040068117676
Mix04	0.000185470000000	0.000039775007324
Mix05	0.000192510000000	0.000039626535645
Mix06	0.000195660000000	0.000040216433105
Mix07	0.000193810000000	0.000039834248047
Mix08	0.000196590000000	0.000039063793945
Mix09	0.000174280000000	0.000039541943359
Mix10	0.000181790000000	0.000039569926758
Mix11	0.000181140000000	0.000039501779785
Mix12	0.000183920000000	0.000039622690430
Mix13	0.000175610000000	0.000039437180176
Mix14	0.000182000000000	0.000040092158203

Yes, we can do slightly better than MurmurHash3 even on random test keys.

On average this is a small but measurable improvement that comes at no performance cost. For the worst-case, low-entropy keys that concern us the most, it provides a significant improvement.

Here's a handy tip: A great use for a mixer like this is to use it to "purify" untrustworthy input hash keys. If a calling function provides poor-quality hashes (even counting numbers!) as input to your code then purifying it with one of these mixers will insure it does no harm.

Here are the parameters for the mixers I tested:

Mixer parameters
Mixer	Mixer operations
MurmurHash3 mixer	33	0xff51afd7ed558ccd	33	0xc4ceb9fe1a85ec53	33
Mix01	31	0x7fb5d329728ea185	27	0x81dadef4bc2dd44d	33
Mix02	33	0x64dd81482cbd31d7	31	0xe36aa5c613612997	31
Mix03	31	0x99bcf6822b23ca35	30	0x14020a57acced8b7	33
Mix04	33	0x62a9d9ed799705f5	28	0xcb24d0a5c88c35b3	32
Mix05	31	0x79c135c1674b9add	29	0x54c77c86f6913e45	30
Mix06	31	0x69b0bc90bd9a8c49	27	0x3d5e661a2a77868d	30
Mix07	30	0x16a6ac37883af045	26	0xcc9c31a4274686a5	32
Mix08	30	0x294aa62849912f0b	28	0x0a9ba9c8a5b15117	31
Mix09	32	0x4cd6944c5cc20b6d	29	0xfc12c5b19d3259e9	32
Mix10	30	0xe4c7e495f4c683f5	32	0xfda871baea35a293	33
Mix11	27	0x97d461a8b11570d9	28	0x02271eb7c6c4cd6b	32
Mix12	29	0x3cd0eb9d47532dfb	26	0x63660277528772bb	33
Mix13	30	0xbf58476d1ce4e5b9	27	0x94d049bb133111eb	31
Mix14	30	0x4be98134a5976fd3	29	0x3bc0993a5ad19a13	31

19 comments:

PatrickMarch 22, 2012 at 2:03 AM
Thanks for sharing this.

I was wondering : Could the finalization mix in 32 bit murmurhash3 be improved in a similar fashion?

Cheers!
ReplyDelete
Replies
David StaffordMarch 25, 2012 at 12:01 PM
The 32-bit mixer has the same structure as the 64-bit mixer and both were generated by simulated annealing so, while I can't be certain without trying it, I strongly suspect the answer is yes.
ReplyDelete
Replies
ThomasSeptember 19, 2017 at 10:12 AM
nice information :)

best regards
thomas from kuechengeraete-22.blogspot.com
versicherung-wv.blogspot.com | versicherung-wv.de | applicate-soft.de
ReplyDelete
Replies
UnknownDecember 1, 2017 at 10:01 PM
That's very interesting, thanks for sharing it! Can you provide more details how you found those new mixer?
ReplyDelete
Replies
Pelle EvensenJuly 13, 2018 at 7:05 PM
Nice work!

Coincidentally, I have been using a quite similar construction that seems to behave more like a pseudo-random permutation. At least I can so far not find any obvious ways to do measurements that clearly will distinguish it from a PRP:
http://mostlymangling.blogspot.com/2018/07/mathjax.html
ReplyDelete
Replies
cherioSeptember 12, 2018 at 10:50 AM
This seems to have become a "go to" page referenced even in Java 8 source code. I am interested to see more in regards to mix function properties. Specifically I'd like to know if it is a 1-to-1 function and whether the function can produce collisions e.g. assuming y=mix(x) can it produce the same "y" for two different "x"?
ReplyDelete
Replies
Chris_FJune 14, 2019 at 8:25 PM
Looking at your table Mix13 seem to perform the best in both scenarios. Then I recognized that I had seen those constants before, in implementations of SplitMix64. https://github.com/lemire/testingRNG/blob/master/source/splitmix64.h
ReplyDelete
Replies
UnknownJuly 29, 2020 at 5:52 PM
Vcein lklo1 23rd346689097r5,04ww2
ReplyDelete
Replies
mickeyarthurNovember 3, 2021 at 5:09 AM
To be honest, I have been accidentally landed on your site because I was looking at Cheap Assignment Writing Service Uk to write my assignment. however, I know your post is not useful to me regarding my assignment, but I found it really helpful and informative, and I promise I will also share this information with others
ReplyDelete
Replies
periyannanJanuary 23, 2022 at 10:06 PM
Your website is very good and nice information was provided in your site, thanks for sharing.
artificial intelligence internship | best final year projects for cse | internship certificate online | internship for mba finance students | internship meaning in tamil

ReplyDelete
Replies
Media FosterApril 20, 2022 at 12:28 AM
This is really interesting, you are such a great blogger. Visit media foster for creative and professional website design and Digital Marketing Company in Mohali and Also get Digital Marketing Course in Mohali
TOP IT Company in Mohali
best SEO Company in Mohali
ReplyDelete
Replies
veerediweddingApril 20, 2022 at 4:50 AM
We provide very affordable packages on car services in Chandigarh.
luxury car rental in chandigarh
luxury wedding cars in chandigarh
ReplyDelete
Replies
영양군출장안마24시June 29, 2022 at 6:46 PM
금정구출장샵
연제구출장샵
수영구출장샵
사상구출장샵
수성구출장샵
달서구출장샵
ReplyDelete
Replies
thuyoneOctober 18, 2023 at 1:47 PM
How do you look for these numbers (multipliers and shifts)?
I noticed that variant 13, which seems to be the best, uses non-prime numbers (multipliers). Would it be better if it did? And, obviously, how to find then?
ReplyDelete
Replies
Anayat global worksNovember 29, 2023 at 9:58 PM
Step into a realm of digital brilliance with the industry's top-rated Mobile and Web Development Firm, recognized among the Best Web Development Companies. Our passion for innovation and commitment to perfection redefine your online presence, making us the trusted choice for transformative and unparalleled web development solutions.
ReplyDelete
Replies
Search place adsNovember 30, 2023 at 3:51 AM
Unlock opportunities and connect effortlessly with our platform's free classified ads feature. Post free ads with ease, whether you're buying, selling, or exploring services. Experience a seamless and empowering way to engage with a community of possibilities. Join us today and redefine your online experience.
ReplyDelete
Replies

Add comment

Wednesday, September 28, 2011

Better Bit Mixing - Improving on MurmurHash3's 64-bit Finalizer

4,853,184 Low-entropy keys

100,000,000 Random keys

Mixer parameters

19 comments: