Implementing equals() and hashCode()

Let's begin with a puzzle. What is the output of this program?

import java.util.HashSet;
import java.util.Set;

class Person {
    protected String name;
    public Person(String name) {
        this.name = name;
    }

    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        Person other = (Person) obj;
        if (name == null) {
            if (other.name != null)
                return false;
        } else if (!name.equals(other.name))
            return false;
        return true;
    }
    @Override
    public String toString() {
        return "Person[name=" + name + "]";
    }
}

public class Puzzle {
    public static void main(String[] args) {
        Set<Person> people = new HashSet<Person>();

        people.add(new Person("Alice"));
        people.add(new Person("Bob"));
        people.add(new Person("Alice"));

        System.out.println(people.size());
        for(Person p : people){
            System.out.println(p);
        }
    }
}

If you haven't run the program, you probably expect the output to be

2
Person[name=Alice]
Person[name=Bob]

Wrong! The real output is

3
Person[name=Alice]
Person[name=Bob]
Person[name=Alice]

Why? What happened? Well, it's one of the most common Java mistakes. We violated the fundamental requirement for the hashCode() method:

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

We've overriden the equals() method without overriding hashCode() and the result is an incorrect behavior of the program.

The HashSet implementation relies on the contract between the methods equals() and hashCode(): if the hash codes are different, then so are the objects. The HashSet computes hash codes for the first new Person("Alice") and the second new Person("Alice"). The outputs are different so the two objects go to different buckets without ever checking for equality. Thus, we have two Alice object in the set, even though they are "equal" according to the equals() method.

Rule to remember: Always redefine hashCode() whenever you redefine equals().

Solution: generate equals and hashCode

Even if you know about the rule, it's easy to make a mistake implementing hashCode() and equals(). I remember making it on at least three separate occasions in my student times, and (after the first time) I knew what the danger was. The solution is to generate equals and hashCode. Any decent Java IDE has a hashCode()/equals() generator. Use it. Even if you don't like what the code generates, it will save you from common pitfalls, such as incorrect null handling or breaking the contract between hashCode() and equals(). It's much easier to modify the code than to write one from scratch.

In the Java community, it's popular to use Apache Commons Lang EqualsBuilder and HashCodeBuilder to generate hashCode() and equals() methods. Personally, I don't like it. The implementations are slow because they use reflection. If your code makes a lot of comparisons (for example you implement a cache) then it's better to have equality checks in compiled code. Moreover, the implementations have bugs. Even if BigDecimal equals() incompatible with hashCode() bug is fixed, there is no guarantee that there are no similar bugs. Plus, I'm sure you won't bother to browse through the code to see if it works for you. It's better to have explicit code.

OneWebSQL

OneWebSQL generates compatible equals() and hashCode() methods for your Java beans. The code is explicit so you see what you get :-)

Have you ever had problems with equals() and hashCode() methods?