java.lang.String#hashCode

Apologies in advance. This is a really boring post.

For reasons you either know about by now or don’t care about, I was curious as to how well String’s hashCode was distributed (I suspected the answer was “not very”). I ran a few quick experiments to verify this.

For your amusement, here is a list of all hash collisions between alphanumeric strings of size 2: http://www.drmaciver.com/collisions.txt and here is a list of all which don’t collide with any others http://www.drmaciver.com/noncolliding.txt

Some statistics: There are 3844 alphanumeric strings of size 2. Of these 3570 collide with at least one other string. That is, 274 of these strings (or about 7% of them) *don’t* collide with something else.

Oh well. It’s a good thing no one would be stupid enough to rely on hashCode to distinguish the contents of two objects.

Edit: Additional facts I originally posted in the comments.

For what it’s worth, even fewer strings have unique hash codes for 3 characters. 3948 don’t collide, or about 1.6% of them.

This of course doesn’t mean that probability of a hash collision is really high. In reality it’s acceptably low. It’s just a demonstration that it’s not hard to find colliding pairs.

The following consist of all the String hash collisions contained in the sowpods scrabble dictionary:

  • isohel, epistolaries
  • righto, buzzards
  • hierarch, crinolines
  • inwork, hypercatalexes
  • wainages, presentencing
  • variants, gelato
  • misused, horsemints
  • trichothecenes, locular
  • pomatoes, eructation
This entry was posted in programming and tagged , on by .

14 thoughts on “java.lang.String#hashCode

  1. chessguy

    Sounds like you found what you were looking for. Which makes me a little suspicious… Now I’m not usually one to defend Java, by any means, but this experiment seems a little…strange. If you’ve gone this far, why not use the same infrastructure to do a quick calculation and see about 3-character strings? or 4? or….whatever. What about toString()’ing some non-string objects? Testing only 2-character strings can’t be a very good measure.

  2. david Post author

    Well part of the reason I didn’t do it for much larger numbers is that it takes a longish time to run for them as the run time for calculating this is exponential in the number of characters. :-)

    I did try it out for three character strings, but the results were uninterestingly similar so I didn’t bother posting them. If you really care I’ll run the results for larger numbers when I get home (I didn’t put the code anywhere central and can’t be bothered to recreate it here) and post them.

    For what it’s worth, this isn’t intended as evidence that java Strings are terrible and awful. I just wanted to demonstrate that hash collisions are not remotely a rare thing.

  3. dave

    I once worked at a bank where the former Cobol programmers who made up the Java development staff thought you had to override hashCode() for every class you defined — and almost universally defined it as “return 1;”

  4. Cetin Sert

    I tested it with the same string size using a-zA-Z0-9. 32-bit Vista using .NET and mono.

    running on .NET 3.5:
    3844 two-char strings
    14776336 comparisons
    0 collisions with one or more items
    0 total collisions

    running on mono 1.9.1:
    3844 two-char strings
    14776336 comparisons
    **3570** collisions with one or more items
    5250 total collisions

    http://sert.homelinux.org/projects/mono-collisions.txt

    ** the number is the same ^_^

  5. david Post author

    For what it’s worth, even fewer strings have unique hash codes for 3 characters. 3948 don’t collide, or about 1.6% of them.

    Also, this of course doesn’t mean that probability of a hash collision is really high. In reality it’s acceptable low. It’s just a demonstration that it’s not hard to find colliding pairs.

  6. david Post author

    Utterly pointless fact of the day. The following consist of all the String hash collisions contained in sowpods:

    isohel, epistolaries
    righto, buzzards
    hierarch, crinolines
    inwork, hypercatalexes
    wainages, presentencing
    variants, gelato
    misused, horsemints
    trichothecenes, locular
    pomatoes, eructation

  7. david Post author

    It doesn’t actually matter which version I’m using. java.lang.String has a hash code implementation defined as part of the standard library specification, and it’s been the same since… not sure. 1.2 or 1.3 I think.

  8. jual cd rpp kurikulum 2013

    Audio books can be purchased online at a fraction of cost. Sometimes you can find them for free. Audible and ebook libraries and bookstores are both dedicated to their customers by providing the largest selection of audible books that are downloadable online or purchased in local stores. Some of the top providers are eBay, Amazon and several audible clubs. With millions of audio books available, various formats are used for the convenience of everyone ranging from children to adults. Some of those formats included are MP3 players, MP3 compatible mobile phones, iPods, iPhones, PDA’s, CD’s as well as kindles and android devices. Here is a list of places that offer audible books:

Comments are closed.