Shorter checksum than MD5

Mercuro this at is.invalid
Thu Sep 9 08:13:01 EDT 2004


Paul Rubin wrote:


> 
> How about putting a timestamp in each record, so you only have to
> compare the records that have been updated since the last period
> comparison.  
> 

ok, i will give some more information:

I have a proprietary system, which I can't modify. 
  But, it uses Foxpro DBF files which I can read. 
  I have found all the data I want to have in a 
MySQL table. (this table will be used to lookop 
prices and to find other information about articles)

Since I'm not able to put some timestamps on 
changed records, I got the idea to put a checksum 
on each record and save it in the MySQL table. 
Every night I would 'SELECT' all checksums 
together with the artikelnumbers and than compare 
it one by one with newly calculated checksums from 
the DBF file.  Only the changed checksums shall be 
  'UPDATED' and missing numbers would be 'INSERTED'.

This is the code I have for now:
(I will probably change md5 with crc32)

import sys, os, string, dbfreader, md5
from string import strip

# import MySQL module
import MySQLdb

# connect
db = MySQLdb.connect( .... )

# create a cursor
cursor = db.cursor()

cursor.execute("SELECT ID, md5sum, 0 FROM ARTIKEL;")
resultaat = list(cursor.fetchall())
f = dbfreader.DBFFile("ARTIKEL.DBF")

f.open()
i = 0
while 1:
         i += 1
         updated = 0
         rec=f.get_next_record()
         if rec==None:
             break
         pr_kassa = str(rec["PR_KASSA"])
         ID = rec["ID"]
         IDs = str(ID)
         assortiment = 
strip(str(rec["ASSORTIMENT"]))[0:1]
         pr_tarief = str(rec["PR_TARIEF"])
         status = strip(str(rec["STATUS"]))[0:1]
         pr_aank = str(rec["PR_AANK"])
         benaming = 
string.join(string.split(str(rec["BENAMING"]), 
"'"), "\\'")

         md5sum = md5.new(pr_kassa + IDs + 
assortiment + pr_tarief + status + pr_aank + 
benaming).hexdigest()[3:8]

	if (i % 100) == 0:
             print "record %i: ID %s" % (i, IDs)
             # lijst optimaal maken om in te 
zoeken make list more optimal to search trough
             tmp = resultaat[:90]
             resultaat = resultaat[90:]
             resultaat.extend(tmp)

         if resultaat != None:
           for record in resultaat:
             if record[0] == ID:
                 #record[2] = 1
                 if record[1]!=md5sum:
                     print "update record (ID: 
%s)" % IDs
                     # update van bestaand record, 
md5 sum does not match
                     cursor.execute("UPDATE 
ARTIKEL SET " +
 
"benaming='%s', status=%s, assortiment='%s', 
pr_aank=%s, pr_tarief=%s, pr_kassa=%s, md5sum='%s' 
WHERE ID=%s ;" %
                                    (benaming, 
status, assortiment, pr_aank, pr_tarief, pr_kassa, 
  md5sum, IDs))
                 updated = 1
                 break

	if (updated == 0) & (ID < 8000000):
             # nieuw record
             print "nieuw record (ID: %s)" % IDs
             cursor.execute("INSERT INTO ARTIKEL 
(ID, benaming, status, assortiment, pr_aank, 
pr_tarief, pr_kassa, md5sum)" +
 
   " VALUES ( %s, '%s', %s, '%s', %s, %s, %s, 
'%s', '%s' );" %
 
   (IDs, benaming, status, assortiment, pr_aank, 
pr_tarief, pr_kassa, md5sum))



f.close()

#############################################



If anybody has any better ideas, I'm happy to hear 
them!



More information about the Python-list mailing list