两个对象值相同(x.equals(y) == true)，但是可能存在hashCode不同吗?

博主：茶白
发布时间：2021 年 10 月 28 日
暂无评论
16992字数
分类： java

# 考点

这道题仍然是考察JVM层面的基本知识，面试官认为，基本功扎实，才能写出健壮性和稳定性很高的代码。

# 涉及到的技术知识

`(x.equals(y)==true)`，这段代码，看起来非常简单，但其实里面还是涉及了一些底层知识点的，首先我们基于`equals`这个方法进行探索。

`equals`这个方法，在每个对象中都存在，以String类型为例，其方法定义如下

```java hljs
public boolean equals(Object anObject) {
  if (this == anObject) { 
    return true;
  }
  if (anObject instanceof String) { //判断对象实例是否是String
    String anotherString = (String)anObject; //强转成string类型
    int n = value.length;
    if (n == anotherString.value.length) { //如果两个字符串相等，那么它们的长度自然相等。
      //遍历两个比较的字符串，转换为char类型逐个进行比较。
      char v1[] = value;  
      char v2[] = anotherString.value;
      int i = 0;
      while (n-- != 0) {
        if (v1[i] != v2[i]) //采用`==`进行判断，如果不相同，则返回false
          return false;
        i++;
      }
      return true; //否则返回true。
    }
  }
  return false;
}
```

首先来分析第一段代码，判断传递进来的这个对象和当前对象实例`this`是否相等，如果相等则返回`true`。

```java hljs
  if (this == anObject) { 
    return true;
  }
```

那`==`号的处理逻辑是怎么实现的呢？

## 了解`==`判断

在java语言中`==`操作符号，这个比较大家都知道，是基于引用对象的比较，具体其实还有一些其他的区别。

JVM会根据`==`两边相互比较的操作类型不同，在编译时生成不同的指令。

1. 对于boolean，byte、short、int、long这种整形操作数，会生成`if_icmpne`指令，该指令用于比较整形数值是否相等。关于if_icmpne指令可以参见：[Chapter 4. The class File Format](https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.10.1.9.if_icmp_cond),它在Hotspot VM中的bytecodeInterpreter源码中的具体实现如下

```c++ hljs
   #define COMPARISON_OP(name, comparison)                                  
   CASE(_if_icmp##name): {                                            
     int skip = (STACK_INT(-2) comparison STACK_INT(-1))            
       ? (int16_t)Bytes::get_Java_u2(pc + 1) : 3;         
     address branch_pc = pc;                                        
     UPDATE_PC_AND_TOS(skip, -2);                                   
     DO_BACKEDGE_CHECKS(skip, branch_pc);                           
     CONTINUE;                                                      
   }
   ```

可以看到实质是按照comparison表达式比较操作数栈中偏移量为-1和-2的两个INT值。
2. 如果操作数是对象的话，编译器则会生成**if_acmpne** 指令，与**if_icmpne** 相比将i(int)改成了a(object reference)。这个指令在JVM规范中的表述：[Chapter 4. The class File Format](https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.10.1.9.if_acmp_cond)，它在Hotspot VM中相应的实现可参考：

```c++ hljs
   COMPARISON_OP(name, comparison)                                    
     CASE(_if_acmp##name): {                                            
     int skip = (STACK_OBJECT(-2) comparison STACK_OBJECT(-1))      
       ? (int16_t)Bytes::get_Java_u2(pc + 1) : 3;        
     address branch_pc = pc;                                        
     UPDATE_PC_AND_TOS(skip, -2);                                   
     DO_BACKEDGE_CHECKS(skip, branch_pc);                           
     CONTINUE;                                                      
   }
   ```

从`STACK_OBJECT(-2) comparison STACK_OBJECT(-1)`这一句即可看出比较的其实是操作数栈上两个对象在堆中的指针。

对于JVM有一定了解得同学，必然知道`((x==y)=true)`这个判断，如果`x`和`y`的内存地址相同，那么意味着就是同一个对象，因此直接返回`true`。

> 因此从上面的分析中，得到的结论是，`==`判断，比较的是两个对象的内存地址，如果`==`返回true，说明内存地址相同。

## String.equals源码

继续分析`equals`中的源码，剩余部分源码的实现逻辑是

1. 比较两个字符串的长度是否相等，如果不相等，直接返回`false`
2. 把两个String类型转换为char[]数组，并且按照数组顺序逐步比较每一个char字符，如果不相等，同样返回`false`

```java hljs
public boolean equals(Object anObject) {
  //省略
  if (anObject instanceof String) { //判断对象实例是否是String
    String anotherString = (String)anObject; //强转成string类型
    int n = value.length;
    if (n == anotherString.value.length) { //如果两个字符串相等，那么它们的长度自然相等。
      //遍历两个比较的字符串，转换为char类型逐个进行比较。
      char v1[] = value;  
      char v2[] = anotherString.value;
      int i = 0;
      while (n-- != 0) {
        if (v1[i] != v2[i]) //采用`==`进行判断，如果不相同，则返回false
          return false;
        i++;
      }
      return true; //否则返回true。
    }
  }
  return false;
}
```

## ==和equals

通过上面的分析，我们知道，在 Java 中比较两个对象是否相等主要是通过 `==`号，比较的是他们在内存中的存放地址。Object 类是 Java 中的超类，是所有类默认继承的，如果一个类没有重写 Object 的 `equals`方法，那么通过`equals`方法也可以判断两个对象是否相同，因为它内部就是通过`==`来实现的。

```java hljs
public boolean equals(Object obj) {
  return (this == obj);
}
```

这里的相同，是说比较的两个对象是否是同一个对象，即在内存中的地址是否相等。而我们有时候需要比较两个对象的内容是否相同，即类具有自己特有的“逻辑相等”概念，而不是想了解它们是否指向同一个对象。

例如比较如下两个字符串是否相同`String a = "Hello"` 和 `String b = new String("Hello")`，这里的相同有两种情形，是要比较 a 和 b 是否是同一个对象（内存地址是否相同），还是比较它们的内容是否相等？这个具体需要怎么区分呢？

如果使用 `==` 那么就是比较它们在内存中是否是同一个对象，但是 String 对象的默认父类也是 Object，所以默认的`equals`方法比较的也是内存地址，所以我们要重写 `equals`方法，正如 String 源码中所写的那样。

1. 先比较内存地址
2. 再比较value值

这样当我们 `a == b`时是判断 a 和 b 是否是同一个对象，`a.equals(b)`则是比较 a 和 b 的内容是否相同，这应该很好理解。

JDK 中不止 String 类重写了equals 方法，还有数据类型 Integer，Long，Double，Float等基本也都重写了 `equals` 方法。所以我们在代码中用 Long 或者 Integer 做业务参数的时候，如果要比较它们是否相等，记得需要使用 `equals` 方法，而不要使用 `==`。

因为使用 `==`号会有意想不到的坑出现，像这种数据类型很多都会在内部封装一个常量池，例如 IntegerCache，LongCache 等等。当数据值在某个范围内时会直接从常量池中获取而不会去新建对象。

如果要使用`==`，可以将这些数据包装类型转换为基本类型之后，再通过`==`来比较，因为基本类型通过`==`比较的是数值，但是在转换的过程中需要注意 NPE（NullPointException）的发生。

## 了解Class中的HashCode

> 回过头再看一下面试题： 两个对象值相同(x.equals(y) == true)，但是可能存在hash code不同吗?

这个结果返回`true`，假设`x`和`y`是String类型，意味着它满足两个点。

1. `x`和`y`有可能是同一个内存地址。
2. `x`和`y`这两个字符串的值是相同的。

基于这两个推断，我们还没办法和hash code联系上，也就是说，这两个对象的引用地址相同，和hashCode是否有关系？

在Java中，任何一个对象都是派生自Object，而在Object中，有一个`native`方法`hashCode()`。

```java hljs
public native int hashCode();
```

### 为什么要有hashCode？

对于包含容器结构的程序语言来说，基本上都会涉及到hashCode，它的主要作用是为了配合基于散列的集合一起工作，比如HashSet、HashTable、ConcurrentHashMap、HashMap等。

在这类的集合中添加元素时，首先需要判断添加的元素是否存在（不允许存在重复元素），也许大多数人都会想到调用equals方法来逐个进行比较，这个方法确实可行。但是如果集合中已经存在一万条数据或者更多的数据，如果采用equals方法去逐一遍历每个元素的值进行比较，效率会非常低。

此时hashCode方法的作用就体现出来了，当集合要添加新的对象时，先调用这个对象的hashCode方法，得到对应的hashcode值，实际上在HashMap的具体实现中会用一个table保存已经存进去的对象的hashcode值：

1. 如果table中没有该hashcode值，它就可以直接存进去，不用再进行任何比较了；
2. 如果存在该hashcode值， 就调用它的equals方法与新元素进行比较，相同的话就不存了，不相同就散列其它的地址，所以这里存在一个冲突解决的问题，这样一来实际调用equals方法的次数就大大降低了.

> 说通俗一点：Java中的hashCode方法就是根据一定的规则将与对象相关的信息（比如对象的存储地址，对象的字段等）映射成一个数值，这个数值称作为散列值。下面这段代码是java.util.HashMap的中put方法的具体实现：

```java hljs
 public V put(K key, V value) {
   if (key == null)
     return putForNullKey(value);
   int hash = hash(key.hashCode());
   int i = indexFor(hash, table.length);
   for (Entry<K,V> e = table[i]; e != null; e = e.next) {
     Object k;
     if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
       V oldValue = e.value;
       e.value = value;
       e.recordAccess(this);
       return oldValue;
     }
   }
 
   modCount++;
   addEntry(hash, key, value, i);
   return null;
 }
```

所以通过`hashCode`,减少了查询比较的次数，优化了查询的效率同时也就减少了查询的时间。

### hashCode方法的实现

> 下面是hashCode这个方法的完整注释说明。

```java hljs
 /**
     * Returns a hash code value for the object. This method is
     * supported for the benefit of hash tables such as those provided by
     * {@link java.util.HashMap}.
     * <p>
     * The general contract of {@code hashCode} is:
     * <ul>
     * <li>Whenever it is invoked on the same object more than once during
     *     an execution of a Java application, the {@code hashCode} method
     *     must consistently return the same integer, provided no information
     *     used in {@code equals} comparisons on the object is modified.
     *     This integer need not remain consistent from one execution of an
     *     application to another execution of the same application.
     * <li>If two objects are equal according to the {@code equals(Object)}
     *     method, then calling the {@code hashCode} method on each of
     *     the two objects must produce the same integer result.
     * <li>It is <em>not</em> required that if two objects are unequal
     *     according to the {@link java.lang.Object#equals(java.lang.Object)}
     *     method, then calling the {@code hashCode} method on each of the
     *     two objects must produce distinct integer results.  However, the
     *     programmer should be aware that producing distinct integer results
     *     for unequal objects may improve the performance of hash tables.
     * </ul>
     * <p>
     * As much as is reasonably practical, the hashCode method defined by
     * class {@code Object} does return distinct integers for distinct
     * objects. (This is typically implemented by converting the internal
     * address of the object into an integer, but this implementation
     * technique is not required by the
     * Java™ programming language.)
     *
     * @return  a hash code value for this object.
     * @see     java.lang.Object#equals(java.lang.Object)
     * @see     java.lang.System#identityHashCode
     */
public native int hashCode();

```

从注释的描述可以知道，hashCode 方法返回该对象的哈希码值。它可以为像 HashMap 这样的哈希表有益。Object 类中定义的 hashCode 方法为不同的对象返回不同的整形值。具有迷惑异议的地方就是
`This is typically implemented by converting the internal address of the object into an integer`
这一句，意为通常情况下实现的方式是将对象的内部地址转换为整形值。

如果你不深究就会认为它返回的就是对象的内存地址，我们可以继续看看它的实现，但是因为这里是 native 方法所以我们没办法直接在这里看到内部是如何实现的。native 方法本身非 java 实现，如果想要看源码，只有下载完整的 jdk 源码，Oracle 的 JDK 是看不到的，OpenJDK 或其他开源 JRE 是可以找到对应的 C/C++ 代码。我们在 OpenJDK 中找到 [Object.c](http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/3462d04401ba/src/share/native/java/lang/Object.c) 文件，可以看到hashCode 方法指向 `JVM_IHashCode` 方法来处理。

```java hljs
static JNINativeMethod methods[] = {
    {"hashCode",    "()I",                    (void *)&JVM_IHashCode},
    {"wait",        "(J)V",                   (void *)&JVM_MonitorWait},
    {"notify",      "()V",                    (void *)&JVM_MonitorNotify},
    {"notifyAll",   "()V",                    (void *)&JVM_MonitorNotifyAll},
    {"clone",       "()Ljava/lang/Object;",   (void *)&JVM_Clone},
};
```

而`JVM_IHashCode`方法实现在 [jvm.cpp](http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/prims/jvm.cpp)中的定义为：

```c++ hljs
JVM_ENTRY(jint, JVM_IHashCode(JNIEnv* env, jobject handle))  
  JVMWrapper("JVM_IHashCode");  
  // as implemented in the classic virtual machine; return 0 if object is NULL  
  return handle == NULL ? 0 : ObjectSynchronizer::FastHashCode (THREAD, JNIHandles::resolve_non_null(handle)) ;  
JVM_END 
```

这里是一个三目表达式，真正计算获得 hashCode 值的是[ObjectSynchronizer::FastHashCode](http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/runtime/synchronizer.cpp)，它具体的实现在[synchronizer.cpp](http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/runtime/synchronizer.cpp)中，截取部分关键代码片段。

```c++ hljs
intptr_t ObjectSynchronizer::FastHashCode (Thread * Self, oop obj) {
  if (UseBiasedLocking) {
  
  //省略代码片段
  
  // Inflate the monitor to set hash code
  monitor = ObjectSynchronizer::inflate(Self, obj);
  // Load displaced header and check it has hash code
  mark = monitor->header();
  assert (mark->is_neutral(), "invariant") ;
  hash = mark->hash();
  if (hash == 0) {
    hash = get_next_hash(Self, obj);
    temp = mark->copy_set_hash(hash); // merge hash code into header
    assert (temp->is_neutral(), "invariant") ;
    test = (markOop) Atomic::cmpxchg_ptr(temp, monitor, mark);
    if (test != mark) {
      // The only update to the header in the monitor (outside GC)
      // is install the hash code. If someone add new usage of
      // displaced header, please update this code
      hash = test->hash();
      assert (test->is_neutral(), "invariant") ;
      assert (hash != 0, "Trivial unexpected object/monitor header usage.");
    }
  }
  // We finally get the hash
  return hash;
}
```

从以上代码片段中可以发现，实际计算hashCode的是 `get_next_hash`，该方法的部分代码定义如下。

```cpp hljs
static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0 ;
  if (hashCode == 0) {
     // This form uses an unguarded global Park-Miller RNG,
     // so it's possible for two threads to race and generate the same RNG.
     // On MP system we'll have lots of RW access to a global, so the
     // mechanism induces lots of coherency traffic.
     value = os::random() ;
  } else
  if (hashCode == 1) {
     // This variation has the property of being stable (idempotent)
     // between STW operations.  This can be useful in some of the 1-0
     // synchronization schemes.
     intptr_t addrBits = cast_from_oop<intptr_t>(obj) >> 3 ;
     value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
  } else
  if (hashCode == 2) {
     value = 1 ;            // for sensitivity testing
  } else
  if (hashCode == 3) {
     value = ++GVars.hcSequence ;
  } else
  if (hashCode == 4) {
     value = cast_from_oop<intptr_t>(obj) ;
  } else {
     // Marsaglia's xor-shift scheme with thread-specific state
     // This is probably the best overall implementation -- we'll
     // likely make this the default in future releases.
     unsigned t = Self->_hashStateX ;
     t ^= (t << 11) ;
     Self->_hashStateX = Self->_hashStateY ;
     Self->_hashStateY = Self->_hashStateZ ;
     Self->_hashStateZ = Self->_hashStateW ;
     unsigned v = Self->_hashStateW ;
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
     Self->_hashStateW = v ;
     value = v ;
  }

value &= markOopDesc::hash_mask;
  if (value == 0) value = 0xBAD ;
  assert (value != markOopDesc::no_hash, "invariant") ;
  TEVENT (hashCode: GENERATE) ;
  return value;
}
```

从`get_next_hash`的方法中我们可以看到，如果从0开始算的话，这里提供了6种计算`hash`值的方案，有自增序列，随机数，关联内存地址等多种方式，其中官方默认的是最后一种，即随机数生成。可以看出`hashCode`也许和内存地址有关系，但不是直接代表内存地址的，具体需要看虚拟机版本和设置。

> 上面的整体描述还是比较复杂，直接说结论：
>
> 1. 一个对象的hashCode，默认情况下如果没有重写，则由JVM中的`get_next_hash`方法来生成，这个生成的方式不一定和内存地址有关，默认是用随机数生成。两个不同的对象，他们生成的hashCode可能会相同，如果存在这个问题，就是所谓的`hash冲突`，在HashMap中，解决`hash冲突`的方法是链式寻址法。
> 2. 使用`==`这个表达式判断，如果返回`true`，意味着两个对象的hashCode一定相同。