Re: How good are checksums?
From: Roedy Green (see_at_mindprod.com.invalid)
Date: 04/29/04
- Next message: Christophe Vanfleteren: "Re: Nested Tags : Jakarta Taglibs"
- Previous message: Roedy Green: "Re: Programming is not as much fun/more fun than it used to be."
- In reply to: Roedy Green: "Re: How good are checksums?"
- Next in thread: Roedy Green: "Re: How good are checksums?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 29 Apr 2004 21:08:08 GMT
On Thu, 29 Apr 2004 18:58:24 GMT, Roedy Green
<see@mindprod.com.invalid> wrote or quoted :
>the logic I use in The Replicator and Untouch for this is to compute a
>checksum and compare file lengths. This makes the odds of a false
>duplicate very low. I use a fast Adlerian checksum.
>
>Even without the length check, the odds are only 1/2^32 of getting a
>false duplicate.
in the replicator, it does not matter if two unrelated documents hash
to the same key, only if the immediately previous version of a
document hashes to the same key and length.
Your problem is similar to what to do with hash table collisions.
See http://mindprod.com/jgloss/hashtable.html
Two different documents can hash to the same key. But two identical
documents cannot hash to different keys. If two documents hash to the
same key, you can do some finer check for duplicates, even a byte by
byte compare. This is not too painful if you do the i/o in whacking
big chunks as raw bytes.
Most of the time you won't have collisions, so you won't have to do
the compare.
If the two files have the same name and same timestamp, you should be
able to trust the OS that they are the same file without even
examining contents. :-)
-- Canadian Mind Products, Roedy Green. Coaching, problem solving, economical contract programming. See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
- Next message: Christophe Vanfleteren: "Re: Nested Tags : Jakarta Taglibs"
- Previous message: Roedy Green: "Re: Programming is not as much fun/more fun than it used to be."
- In reply to: Roedy Green: "Re: How good are checksums?"
- Next in thread: Roedy Green: "Re: How good are checksums?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|