Re: Shorter checksum than MD5
From: Mercuro (this_at_is.invalid)
Date: 09/09/04
- Next message: Mercuro: "Re: Shorter checksum than MD5"
- Previous message: Brian Inglis: "Re: Xah Lee's Unixism"
- In reply to: Paul Rubin: "Re: Shorter checksum than MD5"
- Next in thread: Paul Rubin: "Re: Shorter checksum than MD5"
- Reply: Paul Rubin: "Re: Shorter checksum than MD5"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 09 Sep 2004 12:13:01 GMT
Paul Rubin wrote:
>
> How about putting a timestamp in each record, so you only have to
> compare the records that have been updated since the last period
> comparison.
>
ok, i will give some more information:
I have a proprietary system, which I can't modify.
But, it uses Foxpro DBF files which I can read.
I have found all the data I want to have in a
MySQL table. (this table will be used to lookop
prices and to find other information about articles)
Since I'm not able to put some timestamps on
changed records, I got the idea to put a checksum
on each record and save it in the MySQL table.
Every night I would 'SELECT' all checksums
together with the artikelnumbers and than compare
it one by one with newly calculated checksums from
the DBF file. Only the changed checksums shall be
'UPDATED' and missing numbers would be 'INSERTED'.
This is the code I have for now:
(I will probably change md5 with crc32)
import sys, os, string, dbfreader, md5
from string import strip
# import MySQL module
import MySQLdb
# connect
db = MySQLdb.connect( .... )
# create a cursor
cursor = db.cursor()
cursor.execute("SELECT ID, md5sum, 0 FROM ARTIKEL;")
resultaat = list(cursor.fetchall())
f = dbfreader.DBFFile("ARTIKEL.DBF")
f.open()
i = 0
while 1:
i += 1
updated = 0
rec=f.get_next_record()
if rec==None:
break
pr_kassa = str(rec["PR_KASSA"])
ID = rec["ID"]
IDs = str(ID)
assortiment =
strip(str(rec["ASSORTIMENT"]))[0:1]
pr_tarief = str(rec["PR_TARIEF"])
status = strip(str(rec["STATUS"]))[0:1]
pr_aank = str(rec["PR_AANK"])
benaming =
string.join(string.split(str(rec["BENAMING"]),
"'"), "\\'")
md5sum = md5.new(pr_kassa + IDs +
assortiment + pr_tarief + status + pr_aank +
benaming).hexdigest()[3:8]
if (i % 100) == 0:
print "record %i: ID %s" % (i, IDs)
# lijst optimaal maken om in te
zoeken make list more optimal to search trough
tmp = resultaat[:90]
resultaat = resultaat[90:]
resultaat.extend(tmp)
if resultaat != None:
for record in resultaat:
if record[0] == ID:
#record[2] = 1
if record[1]!=md5sum:
print "update record (ID:
%s)" % IDs
# update van bestaand record,
md5 sum does not match
cursor.execute("UPDATE
ARTIKEL SET " +
"benaming='%s', status=%s, assortiment='%s',
pr_aank=%s, pr_tarief=%s, pr_kassa=%s, md5sum='%s'
WHERE ID=%s ;" %
(benaming,
status, assortiment, pr_aank, pr_tarief, pr_kassa,
md5sum, IDs))
updated = 1
break
if (updated == 0) & (ID < 8000000):
# nieuw record
print "nieuw record (ID: %s)" % IDs
cursor.execute("INSERT INTO ARTIKEL
(ID, benaming, status, assortiment, pr_aank,
pr_tarief, pr_kassa, md5sum)" +
" VALUES ( %s, '%s', %s, '%s', %s, %s, %s,
'%s', '%s' );" %
(IDs, benaming, status, assortiment, pr_aank,
pr_tarief, pr_kassa, md5sum))
f.close()
#############################################
If anybody has any better ideas, I'm happy to hear
them!
- Next message: Mercuro: "Re: Shorter checksum than MD5"
- Previous message: Brian Inglis: "Re: Xah Lee's Unixism"
- In reply to: Paul Rubin: "Re: Shorter checksum than MD5"
- Next in thread: Paul Rubin: "Re: Shorter checksum than MD5"
- Reply: Paul Rubin: "Re: Shorter checksum than MD5"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|