Apologies in advance. This is a really boring post.
For reasons you either know about by now or don’t care about, I was curious as to how well String’s hashCode was distributed (I suspected the answer was “not very”). I ran a few quick experiments to verify this.
For your amusement, here is a list of all hash collisions between alphanumeric strings of size 2: http://www.drmaciver.com/collisions.txt and here is a list of all which don’t collide with any others http://www.drmaciver.com/noncolliding.txt
Some statistics: There are 3844 alphanumeric strings of size 2. Of these 3570 collide with at least one other string. That is, 274 of these strings (or about 7% of them) *don’t* collide with something else.
Oh well. It’s a good thing no one would be stupid enough to rely on hashCode to distinguish the contents of two objects.
Edit: Additional facts I originally posted in the comments.
For what it’s worth, even fewer strings have unique hash codes for 3 characters. 3948 don’t collide, or about 1.6% of them.
This of course doesn’t mean that probability of a hash collision is really high. In reality it’s acceptably low. It’s just a demonstration that it’s not hard to find colliding pairs.
The following consist of all the String hash collisions contained in the sowpods scrabble dictionary:
- isohel, epistolaries
- righto, buzzards
- hierarch, crinolines
- inwork, hypercatalexes
- wainages, presentencing
- variants, gelato
- misused, horsemints
- trichothecenes, locular
- pomatoes, eructation