Faster md5/1-way encryption?

Mike C. Fletcher mcfletch at rogers.com
Wed Apr 24 18:28:11 EDT 2002


Something that hits me right off the bat:

 >             md5_zeile = md5.new(zeile).hexdigest()
 >             self_Inhalt += md5_zeile + linesep

Would likely be much better as:

	myResults = []
	...
	myResults.append( md5.new(zeile).hexdigest() )
	...
	myResults = string.join( myResults, os.linesep )

Which would save you creating and destroying huge numbers of (at the end 
of processing) very large strings.  If you can avoid that last step 
(join) and just use writelines to save to file, even better.

Oh, and:
	lines = len( myResults)

can replace your 'i' counter in that approach (won't be a noticable 
savings, I just hate counters :) ).

Finally, file.tell() may be fairly slow (depends on implementation), as 
it may require flushing all buffers so that the file can give an 
accurate result as to current position.  Unless you really need that 
accuracy, consider just accumulating the length of the lines you're 
processing and adding 1 or 2 each time for the length of os.linesep.

MD5 can process gigabytes of information pretty quickly, so I'd be 
surprised if a 100,000 line file is a huge problem.

HTH,
Mike

Alexander Skwar wrote:
> Hi!
...
> The files which should be processed are rather large (100,000+ lines)
> and on my machine this takes ages to run.  I'm currently doing it like
> this:
> 
> 
> import os
> import xreadlines
> import threading
> import md5
> 
> class md5Thread(threading.Thread):
...
>     def run(self):
>         dateiname = self.Dateiname
>         self.Zeilenanzahl = os.path.getsize(dateiname)
>         linesep = os.linesep
>         datei = file(dateiname, 'r', 1)
>         
>         self.Zeilennummer = 0
>         self_Inhalt = ''
>         i = 0
>         for zeile in xreadlines.xreadlines(datei):
>             md5_zeile = md5.new(zeile).hexdigest()
>             self_Inhalt += md5_zeile + linesep
>             self.Zeilennummer = datei.tell()
>             self.ZeilennummerZeile = i
>             i += 1
>             
>         self.Inhalt = self_Inhalt
>         self.done = 1
...
> Further, I use a wxTimer to poll the current byteposition of the thread
> and display it in a wxGauge and wxTextctrl.
> 
> Since this is my first Python app, there are for sure some things that
> can be optimized.
> 
> I suppose what's taking so very long is 
> a) The md5 generation itself
> b) That I instantiate a new md5 object for every single line and thus
> will have 100,000+ objects.
> 
> Could somebody please tell me, how I could speed the whole thing up?  I
> don't have to calculate the md5 sum of every line, but I need to use
> some sort of 1-way encryption.  So, if there are faster alternatives
> than md5, I'd be happy as well.
> 
> Thanks,
> 
> Alexander Skwar


-- 
_______________________________________
   Mike C. Fletcher
   http://members.rogers.com/mcfletch/







More information about the Python-list mailing list