Re: Shorter checksum than MD5

From: Mercuro (this_at_is.invalid)
Date: 09/09/04


Date: Thu, 09 Sep 2004 12:13:01 GMT

Paul Rubin wrote:

>
> How about putting a timestamp in each record, so you only have to
> compare the records that have been updated since the last period
> comparison.
>

ok, i will give some more information:

I have a proprietary system, which I can't modify.
  But, it uses Foxpro DBF files which I can read.
  I have found all the data I want to have in a
MySQL table. (this table will be used to lookop
prices and to find other information about articles)

Since I'm not able to put some timestamps on
changed records, I got the idea to put a checksum
on each record and save it in the MySQL table.
Every night I would 'SELECT' all checksums
together with the artikelnumbers and than compare
it one by one with newly calculated checksums from
the DBF file. Only the changed checksums shall be
  'UPDATED' and missing numbers would be 'INSERTED'.

This is the code I have for now:
(I will probably change md5 with crc32)

import sys, os, string, dbfreader, md5
from string import strip

# import MySQL module
import MySQLdb

# connect
db = MySQLdb.connect( .... )

# create a cursor
cursor = db.cursor()

cursor.execute("SELECT ID, md5sum, 0 FROM ARTIKEL;")
resultaat = list(cursor.fetchall())
f = dbfreader.DBFFile("ARTIKEL.DBF")

f.open()
i = 0
while 1:
         i += 1
         updated = 0
         rec=f.get_next_record()
         if rec==None:
             break
         pr_kassa = str(rec["PR_KASSA"])
         ID = rec["ID"]
         IDs = str(ID)
         assortiment =
strip(str(rec["ASSORTIMENT"]))[0:1]
         pr_tarief = str(rec["PR_TARIEF"])
         status = strip(str(rec["STATUS"]))[0:1]
         pr_aank = str(rec["PR_AANK"])
         benaming =
string.join(string.split(str(rec["BENAMING"]),
"'"), "\\'")

         md5sum = md5.new(pr_kassa + IDs +
assortiment + pr_tarief + status + pr_aank +
benaming).hexdigest()[3:8]

        if (i % 100) == 0:
             print "record %i: ID %s" % (i, IDs)
             # lijst optimaal maken om in te
zoeken make list more optimal to search trough
             tmp = resultaat[:90]
             resultaat = resultaat[90:]
             resultaat.extend(tmp)

         if resultaat != None:
           for record in resultaat:
             if record[0] == ID:
                 #record[2] = 1
                 if record[1]!=md5sum:
                     print "update record (ID:
%s)" % IDs
                     # update van bestaand record,
md5 sum does not match
                     cursor.execute("UPDATE
ARTIKEL SET " +
 
"benaming='%s', status=%s, assortiment='%s',
pr_aank=%s, pr_tarief=%s, pr_kassa=%s, md5sum='%s'
WHERE ID=%s ;" %
                                    (benaming,
status, assortiment, pr_aank, pr_tarief, pr_kassa,
  md5sum, IDs))
                 updated = 1
                 break

        if (updated == 0) & (ID < 8000000):
             # nieuw record
             print "nieuw record (ID: %s)" % IDs
             cursor.execute("INSERT INTO ARTIKEL
(ID, benaming, status, assortiment, pr_aank,
pr_tarief, pr_kassa, md5sum)" +
 
   " VALUES ( %s, '%s', %s, '%s', %s, %s, %s,
'%s', '%s' );" %
 
   (IDs, benaming, status, assortiment, pr_aank,
pr_tarief, pr_kassa, md5sum))

f.close()

#############################################

If anybody has any better ideas, I'm happy to hear
them!



Relevant Pages

  • Re: Checksum and Objects
    ... I can't use CompObj because I need to compare data offline. ... their laptop computer is synced to the file server. ... it took 20 seconds to generate all the checksums in a temp table. ...
    (microsoft.public.fox.programmer.exchange)
  • Re: SQL Server via MS Access
    ... > I am able to modify some of the records in Access, ... You need to add a TIMESTAMP column to each SQL table, ... this, by default, it codes the update to compare every field. ... Access can simply check using the TIMESTAMP column. ...
    (microsoft.public.windows.server.sbs)
  • Re: HELP! compare two SQL timestamps in C# byte arrays
    ... The inherent 'timestamp' for a record isn't based on time and as such to compare as being larger / ... The problem is I am not testing for Equality. ... >> how do convert\manipulate the arrays to something I can compare. ...
    (microsoft.public.dotnet.framework.adonet)
  • Re: Match two file with CompareMem
    ... >And computing checksums, CRCs, digital digests, or whatever ... Files of different sizes are not the same. ... If the files have arbitrary properties, compare block-by-block. ... If several is greater that three, one can compare pairwise for remaining ...
    (borland.public.delphi.language.objectpascal)
  • Re: comparing PE files with CRC/checksum
    ... >>My application requires the ability to compare two files using a CRC32 ... If two object files differ only in their timestamp they are ... digital fingerprint without ever modifying the binary file. ... But the bottom line is that I consider two object files identical ...
    (microsoft.public.win32.programmer.kernel)