Re: Improving String.equals() implementation



On Sun, 5 Oct 2008, zerg wrote:

Tom Anderson wrote:
On Fri, 3 Oct 2008, Mark Space wrote:

Eric Sosman wrote:
Mark Space wrote:
John W Kennedy wrote:

And don't forget the null check, class check, and length check that all have to come before the data check.

Why check for null? Throw a NPE....

Violates the equals() contract, that's why: thing.equals(null)
should yield false and throw no exception.

Ah, ok. That's what I get for posting before checking the docs....

But:

public boolean equals(Object obj) {
try {
if (this == obj) return true ; // controversial!
String that = (String)obj ;
if (this.length != that.length) return false ;
for (int i = 0 ; i < this.length ; ++i)
if (this.chars[i] != that.chars[i]) return false ;
return true ;
} catch (ClassCastException e) {
return false ;
} catch (NullPointerException e) {
return false ;
}
}

In the case of comparing to an actual String, this should be quite fast - i don't think there's any per-execution overhead to setting up a try block. When comparing to null or a non-String, though, there will be a throw-catch, which is disproportionately expensive.

Pardon me, but ...

Yuck.

Oh, absolutely! I was merely trying to show that a NullPointerException could be used to handle the null condition.

This isn't any kind of savings at all, since the compiler still inserts the equivalent bytecode to the instanceof test into the method before the cast.

I wasn't claiming it was faster than the sensible way of doing it - just that in the common (hopefully) case, it wouldn't be significantly slower.

In fact, there are separate tests for null and for not instanceof String, as if you wrote:

if (this == null) throw new NullPointerException();
if (!this instanceof String) throw new ClassCastException();
String that = (String) obj;

which may actually be worse than

if (this == null) return false;
if (!this instanceof String) return false;
String that = (String) obj;

since in the latter case, the compiler may be able to optimize away the first statement, with static analysis telling it that the second statement would return false anyway in that case and that the first statement has no side effects. I very much doubt it will figure out that the semantics are the same removing the null test in the exception-throwing case, even though the catch blocks are in the same method, unless the optimizer's coders were downright heroic in their skills and efforts.

On the other hand, it might be able to remove the null check from the instanceof implementation, if it inlines enough of it, which would boil down to much the same thing.

The real problem would be the overhead of allocating and throwing the exceptions - as you say, i don't think the compiler will optimise those away.

Besides the lack of actual efficiency improvement, there's also the aesthetic/style/clunkiness/readability/maintainability matter of the gross misuse of exceptions.

I don't think i'd call it gross misuse. It might not be to your taste, or indeed to the vast majority of people's taste, but there's a difference between that and 'gross misuse'.

I think there is a perfectly valid philosophy of coding which says you should write straight-line code on the assumption that everything will work, and then write exception handlers to deal with the case where it doesn't. This approach leads to very concise, simple code describing the common case, and well-defined blocks of code describing how to deal with the exceptions. In this case, the expectation is for a String as an argument, and the exceptions are nulls and things which aren't Strings.

You could argue that in this case, nulls and non-strings aren't exceptional, they're part of the normal input that the method handles, in which case this style would be inappropriate by virtue of asymmetry. I don't think i'd agree with that myself.

Anyway, i actually think this approach is stylistically preferable to the nested if approach:

if (obj == this) {
return true ;
} else {
if (obj != null) {
if (obj instanceof String) {
String that = (String)obj ;
if (this.length == that.length) {
for (int i = 0 ; i < this.length ; ++i)
if (this.chars[i] != that.chars[i]) return false ;
return true ;
} else {
return false ;
}
} else {
return false ;
}
} else {
return false
}
}

Although of course that's a bit of a strawman, because it could be written more simply with else-ifs and some swapping-round:

if (obj == this) {
return true ;
} else if (obj == null) {
return false ;
} else if (!obj instanceof String) {
return false ;
} else {
String that = (String)obj ;
if (this.length != that.length) {
return false ;
} else {
for (int i = 0 ; i < this.length ; ++i)
if (this.chars[i] != that.chars[i]) return false ;
return true ;
}
}

Or, better yet, guard clauses:

if (obj == this) return true ;
if (obj == null) return false ;
if (!obj instanceof String) return false ;
String that = (String)obj ;
if (this.length != that.length) return false ;
for (int i = 0 ; i < this.length ; ++i)
if (this.chars[i] != that.chars[i]) return false ;
return true ;

I believe I've already mentioned, in another thread, my company's employment contract's "early termination" clause for anyone who writes code like that. :-)

Then i'm rather glad i work for a different company!

Or rather, i hope i do ...

tom

--
Don't believe his lies.
.



Relevant Pages

  • Re: Confused about using Reflection to examine a collection
    ... the precise type that Reflection is saying it is. ... say that 'obj' is actually a custom collection. ... If it were a basic type, such as 'string' then I would do a cast like this: ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Update connect string in queries.
    ... Dim obj As DAO.QueryDef, dbs As DAO.Database ... Dim strConn As String, strDSN As String, strDescription As String ... Set dbs = CurrentDB ...
    (microsoft.public.access.modulesdaovba)
  • Re: Update connect string in queries.
    ... Public Sub RefreshQPassConnections() ... Dim obj As DAO.QueryDef, dbs As DAO.Database ... Dim strConn As String, strDSN As String, strDescription As String ... Set dbs = CurrentDB ...
    (microsoft.public.access.modulesdaovba)
  • Re: a[3} slower than a.x; a.z; a.z
    ... String^ ReadFirstLineFromFile{ ... lock l(obj); ... … do something with shared state...
    (comp.lang.cpp)
  • Re: Improving String.equals() implementation
    ... should yield false and throw no exception. ... public boolean equals(Object obj) { ... In the case of comparing to an actual String, this should be quite fast - i don't think there's any per-execution overhead to setting up a try block. ... In fact, there are separate tests for null and for not instanceof String, as if you wrote: ...
    (comp.lang.java.programmer)