A fast way to judge an anagram



Two words are anagrams if one can form by permuting the letters in the
other. To judge whether two strings are anagrams, a well-know method is
sorting the two strings and compare the sorted result. Here is a new
way to do it with linear complexity.

1. Build a look-up table with 26 primes. As we know, prime numbers are
natural numbers has only two divisors (1 and itself).
2. Each character is one-toone mapping into an entry in the look-up
table.
3. The product of prime numbers associated with all characters in a
string is the signature of the string.
4. If two strings are anagrams, their signatures must be equal
(obviously). And if the signatures are equal, the corresponding strings
are anagrams (can be proved by the commutative property of
multiplication and the perperty of primes).

Here is the C99 source code:
/*assume upper case
if true return 0.
*/
long is_anagram(const char* str1, const char* str2)
{
static const unsigned char look_up[26] =
{
2, 3, 5, 7, 11, 13, 17, 19, 23, 29,
31, 37, 41, 43, 47, 53, 59, 61, 67, 71,
73, 79, 83, 89, 97, 101
};

size_t size1 = strlen(str1);
size_t size2 = strlen(str2);
if( size1 != size2 )
return size1 - size2;

long index1=1, index2=1;
for( size_t i=0; i<size1; ++i )
{
index1 *= look_up[ str1[i]-'A' ];
index2 *= look_up[ str2[i]-'A' ];
}

return index1 - index2;
}

This method can be proved by using the property A*B=C <=>
LOG(A)+LOG(B)=LOG(C).

Here is the CPP code:

double is_anagram_better(const char* str1, const char* str2)
{
static const double look_up[26] =
{
log(2), log(3), log(5), log(7), log(11),
log(13), log(17), log(19), log(23), log(29),
log(31), log(37), log(41), log(43), log(47),
log(53), log(59), log(61), log(67), log(71),
log(73), log(79), log(83), log(89), log(97),
log(101)
};

size_t size1 = strlen(str1);
size_t size2 = strlen(str2);
if( size1 != size2 )
return size1 - size2;

double index1=1, index2=1;
for( size_t i=0; i<size1; ++i )
{
index1 += look_up[ str1[i]-'A' ];
index2 += look_up[ str2[i]-'A' ];
}

return index1 - index2;
}

.



Relevant Pages

  • Re: A fast way to judge an anagram
    ... To judge whether two strings are anagrams, ... You write both a reliable test routine and a fast routine. ...
    (comp.programming)
  • Re: A fast way to judge an anagram
    ... To judge whether two strings are anagrams, ... Build a look-up table with 26 primes. ...
    (comp.programming)
  • Re: Another puzzle
    ... > So this time I a puzzle of which I dont have any time or space constraints- ... find if they are anagrams of each other given the ... For arbitrarily large strings it becomes O. ...
    (comp.programming)
  • Re: A fast way to judge an anagram
    ... To judge whether two strings are anagrams, ... Build a look-up table with 26 primes. ... long is_anagram(const char* str1, const char* str2) ...
    (comp.programming)
  • Re: A fast way to judge an anagram
    ... To judge whether two strings are anagrams, ... Build a look-up table with 26 primes. ... It's well known that sorting can be done in linear time when the input range is bounded ...
    (comp.programming)