I wrote a post recently about how disappointed I was that the optimiser couldn’t outsmart some clever Java code for computing hash codes. Well, here’s a faster hash code along the same lines.

The hash code implemented in `Arrays.hashCode`

is a polynomial hash, it applies to any data type with a positional interpretation. It takes the general form where . In other words, it’s a dot product of the elements of the array and some powers of 31. Daniel Lemire’s implementation makes it explicit to the optimiser, in a way it won’t otherwise infer, that this operation is data parallel. If it’s really just a dot product it can be made even more obvious at the cost of a loss of flexibility.

Imagine you are processing fixed or limited length strings (VARCHAR(255) or an URL) or coordinates of a space of fixed dimension. Then you could pre-compute the coefficients in an array and write the hash code explicitly as a dot product. Java 9 uses AVX instructions for dot products, so it should be very fast.

public class FixedLengthHashCode { private final int[] coefficients; public FixedLengthHashCode(int maxLength) { this.coefficients = new int[maxLength + 1]; coefficients[maxLength] = 1; for (int i = maxLength - 1; i >= 0; --i) { coefficients[i] = 31 * coefficients[i + 1]; } } public int hashCode(int[] value) { int result = coefficients[0]; for (int i = 0; i < value.length && i < coefficients.length - 1; ++i) { result += coefficients[i + 1] * value[i]; } return result; } }

This is really explicit, unambiguously parallelisable, and the results are remarkable.

Benchmark | Mode | Threads | Samples | Score | Score Error (99.9%) | Unit | Param: size |
---|---|---|---|---|---|---|---|

HashCode.BuiltIn | thrpt | 1 | 10 | 10.323026 | 0.223614 | ops/us | 100 |

HashCode.BuiltIn | thrpt | 1 | 10 | 0.959246 | 0.038900 | ops/us | 1000 |

HashCode.BuiltIn | thrpt | 1 | 10 | 0.096005 | 0.001836 | ops/us | 10000 |

HashCode.FixedLength | thrpt | 1 | 10 | 20.186800 | 0.297590 | ops/us | 100 |

HashCode.FixedLength | thrpt | 1 | 10 | 2.314187 | 0.082867 | ops/us | 1000 |

HashCode.FixedLength | thrpt | 1 | 10 | 0.227090 | 0.005377 | ops/us | 10000 |

HashCode.Unrolled | thrpt | 1 | 10 | 13.250821 | 0.752609 | ops/us | 100 |

HashCode.Unrolled | thrpt | 1 | 10 | 1.503368 | 0.058200 | ops/us | 1000 |

HashCode.Unrolled | thrpt | 1 | 10 | 0.152179 | 0.003541 | ops/us | 10000 |

Modifying the algorithm slightly to support limited variable length arrays degrades performance slightly, but there are seemingly equivalent implementations which do much worse.

public class FixedLengthHashCode { private final int[] coefficients; public FixedLengthHashCode(int maxLength) { this.coefficients = new int[maxLength + 1]; coefficients[0] = 1; for (int i = 1; i <= maxLength; ++i) { coefficients[i] = 31 * coefficients[i - 1]; } } public int hashCode(int[] value) { final int max = value.length; int result = coefficients[max]; for (int i = 0; i < value.length && i < coefficients.length - 1; ++i) { result += coefficients[max - i - 1] * value[i]; } return result; } }

Benchmark | Mode | Threads | Samples | Score | Score Error (99.9%) | Unit | Param: size |
---|---|---|---|---|---|---|---|

FixedLength | thrpt | 1 | 10 | 19.172574 | 0.742637 | ops/us | 100 |

FixedLength | thrpt | 1 | 10 | 2.233006 | 0.115285 | ops/us | 1000 |

FixedLength | thrpt | 1 | 10 | 0.227451 | 0.012231 | ops/us | 10000 |

The benchmark code is at github.