Objects, equals() and hash-values

Figure 413. Hashing principle Slide presentation
Hashing principle
I want the 12p one

Figure 414. Quickly identify by simple value Slide presentation
  • Where is the blond haired guy?

  • I take the pink flower.

  • The 334.50$ cellular phone.


Figure 415. Hashing in Java and equals() Slide presentation

Method hashCode(): Instance 0 ⇨ o.hashCode(), of type int.

  • Same value on repeated invocation

  • Objects with identical value with respect to equals() must have identical hash values:

    true == a.equals(b)a.hashCode() == b.hashCode().

  • Conversely: Two instances differing with respect to equals() may have identical hash values.

Consequence: equals() and hashCode() must be redefined simultaneously!


Figure 416. Rectangle equals(...) and hashCode() Slide presentation
public class Rectangle {
    int width, height;
    @Override public boolean equals(Object o) {
        if (o instanceof  Rectangle r) {
            return width == r.width
                    && height == r.height;
        } else {
            return false;
        }
    }
    @Override public int hashCode() {
        return width + height;
    }
}

Figure 417. Rectangle hash values Slide presentation
public class Rectangle {
    int width, height;
...
    @Override public int hashCode() {
        return width + height;
    }
}
width height hash value
1 3 4
2 2 4
5 5 10
2 7 9
4 9 13

Figure 418. Better hashCode() method Slide presentation
public class Rectangle {
    int width, height;
...
    @Override public int hashCode() {
        return width + 13 * height;
    }
}
width height hash value
1 3 40
2 2 28
5 5 70
2 7 93
4 9 121

exercise No. 128

Choosing a good hashCode() method

Q:

We consider:

public class TimePeriod {
  private final int hours, minutes, seconds;
  /**
   * A time period within a day e.g. 4 hours, 23 minutes and 11 seconds.
   *
   * @param hours Hours ranging from 0 to 23
   * @param minutes Minutes ranging from 0 to 59
   * @param seconds Seconds ranging from 0 to 59
   */
  public TimePeriod(final int hours, final int minutes, final int seconds) {
    this.hours = hours;
    this.minutes = minutes;
    this.seconds = seconds;
  }

  @Override
  public boolean equals(final Object o) {
    if (o instanceof TimePeriod t) {
      return hours == t.hours &&
          minutes == t.minutes &&
          seconds == t.seconds;
    } else {
      return false;
    }
  }
}

Which of the two hashCode() implementations is better with respect to quickly telling whether two TimePeriod instances are having the same value or not?

Method 1 Method 2
public class TimePeriod {
...
  @Override
  public int hashCode() {
    return seconds + 60 * (minutes + 60 * hours);
  }
}
public class TimePeriod {
...
  @Override 
  public int hashCode() {
    return seconds + minutes + hours;
  }
}

Tip

Excerpt from Object.hashCode():

However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

A:

According to Figure 415, “Hashing in Java and equals() part of a hashCode() methods contract is:

true == a.equals(b)a.hashCode() == b.hashCode().

A so called perfect hashCode() method in addition will return different values whenever two instances a and b differ in value with respect to the underlying equals() method:

false == a.equals(b)a.hashCode() != b.hashCode().

Combining these two statements a perfect hashCode() method will have the following property:

a.equals(b) == (a.hashCode() == b.hashCode())

Method 1 serves this purpose: Its returns a given TimePeriod instance into its total amount of seconds equivalent. Example: (1 hour, 2 minutes, 5 seconds) amounts to (1 * 60 + 2) * 60 + 5 = 3725 seconds. Due to its specification an instance of TimePeriod cannot exceed 24 hours being equivalent to 24 * 60 * 60 = 86400 seconds. Thus a 4-byte int is safe with respect to overflow issues.

Conclusion: Whenever two instances of TimePeriod differ their method 1 hashCode() values will differ as well.

This does not hold with respect to method 2. Consider two instances (1 hour, 2 minutes, 3 seconds) vs. (3 hours, 2 minutes, 1 second) returning an identical hashCode() of 1 + 2 + 3 = 3 + 2 + 1 = 6. So method 2 requiring just two additions offers (slightly) better runtime performance at the expense of a higher hash value collision rate.

Note

Perfect hash functions are rare with respect to real world modeling problems. In the current example Timeperiod instances are limited by 24 hours, 59 minutes and 59 seconds. This limit is equal to 89999 seconds fitting well into the count of 2 32 different int values.

exercise No. 129

String and good hashCode() implementations.

Q:

In the previous exercise we found a perfect hashCode() implementation:

public class TimePeriod {
...
  @Override
  public int hashCode() {
    return seconds + 60 * (minutes + 60 * hours);
  }
}

Does a perfect hash function exist for String instances as well?

Tip

Consider the possible number of different strings.

A:

It is not possible to construct a perfect hashCode() method acting on strings. A Java String consists of individual char elements each requiring two bytes representing 2 16 different characters. Depending on a string's length we have:

String length (number of characters) Number of different strings
1 2 16
2 2 2 × 16
3 2 3 × 16

Thus considering just the union of zero (empty), one- and two character strings we have 1 + 2 16 + 2 32 possibilities exceeding the 2 32 count of different int values. Thus hash value clashes are inevitable.

The JDK's String.hashCode() implementation already reveals conflicts for ASCII strings of length 2:

Code Execution result
System.out.println("hashcode of Aa: " + "Aa".hashCode());
System.out.println("hashcode of BB: " + "BB".hashCode());
hashcode of Aa: 2112
hashcode of BB: 2112