Re: efficient comparison



On Mon, 27 Aug 2007 21:20:31 -0700, "Jim Langston"
<tazmaster@xxxxxxxxxxxxxx> wrote:

<bob@xxxxxxxxxxxxxx> wrote in message
news:1188266948.054544.252730@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Let's say you have a set of names called set A. You also have a
different
set of names called set B. What is the most efficient way of
comparing the sets?

By comparison, I mean finding the names that need to be added to set B
and
deleted from set B to form set A.

It seems like a simple sort of each list then going through and comparing.
Rather trivial actually.

It's not that simple, nor that trivial. The "most efficient" way
depends on the structure of the names, the emphasis given on
storage cost vs computation cost and even on the relative sizes
of the sets A and B.

The trivial answer is to use hash tables, i.e., enter one set
into a hash table and then check the occurences of the other in
the table. It's the O(n) solution...

.... but there are complications. Hash tables are not cache
friendly. Moreover they are not as efficient as radix based
trees (e.g., Patricia trees, ternary tree, and UR trees) if the
names are strings. If the names are integers there are really
efficient sorts available, except that most O(n) and
O(n*log log n) integer sorts aren't cache friendly for large n.



.



Relevant Pages

  • Re: hash() yields different results for different platforms
    ... considering to add a "hash" column in the table, make it a unique key, ... I believe this will be faster than making the "url" column unique key ... and hashing a string are both O. ... result in a big savings compared to comparing regular strings; ...
    (comp.lang.python)
  • Re: [Full-disclosure] Signature or checksum? (was: MD5 considered harmful)
    ... otherwise authenticated MD5/SHA-256 hash. ... Otherwise, if I'm an attacker, ... the use of signatures provides less security than comparing ...
    (Full-Disclosure)
  • Re: Suggestions for double-hashing scheme
    ... > by computing of the hash function). ... Comparing can be relatively expensive, ... tables sizes (for example, I ran all sizes between 740 elements and ...
    (comp.programming)
  • Re: Hash
    ... effect as if they were an Oaccess map. ... Oinsertion compared to Ofor hash tables. ... thats just comparing worst case. ... For a tree-like map structure, ...
    (comp.programming)
  • Re: compare-by-hash (was Re: sharing /etc/passwd)
    ... Hash: SHA1 ... infinite number of inputs, you are guaranteed an infinite number of ... > blocks comparing as equal exists, ... hashes to generate a collision. ...
    (FreeBSD-Security)