help in debugging file.seek, file.read

Prateek surekap at gmail.com
Sun Apr 29 18:01:06 EDT 2007


I have a wierd sort of problem.
I'm writing a bunch of sets to a file (each element is a fixed length
string).

I was originally using the built-in sets type but due to a processing
issue, I had to shift to a Python implementation (see
http://groups.google.com/group/comp.lang.python/browse_thread/thread/77e06005e897653c/12270083be9a67f6).
I'm using Raymond Hettinger's very graciously provided TransactionSet
recipe from http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/511496

My old code was pretty speedy but the new code is quite slow and
according to some hotshot profiles it has nothing to do with the new
TransactionSet. *Some* of the file.seek and file.read calls
occasionally block for insane amounts (>10s) of time when reading no
more than 45 bytes of data from the file. So when I'm running load-
tests, this eventually happens like so:

Each i_id statement is the time taken for 100 successive commits.
Each RTH (Read Table Header) statement is the time taken for a single
read call for 45 bytes of data
# sz__TABLE_HEADER_FORMAT__ is 45
hdr = os.read(f.fileno(), sz__TABLE_HEADER_FORMAT__)
#hdr = f.read(sz__TABLE_HEADER_FORMAT__) # Tried this as well

Loading items...
i_id:  0 0.000111103057861
i_id:  100 1.01557397842
i_id:  200 1.14013886452
i_id:  300 1.16142892838
i_id:  400 1.16356801987
i_id:  500 1.36410307884
i_id:  600 1.34421014786
i_id:  700 1.30385017395
i_id:  800 1.48079919815
i_id:  900 1.41147589684
RTH:  0.582525968552
RTH:  2.77490496635
i_id:  1000 5.16863512993
i_id:  1100 1.73725795746
i_id:  1200 1.56621193886
i_id:  1300 1.81338000298
i_id:  1400 1.69464302063
i_id:  1500 1.74725604057
i_id:  1600 2.35222291946
i_id:  1700 1.85096788406
i_id:  1800 2.20518493652
i_id:  1900 1.94831299782
i_id:  2000 2.03350806236
i_id:  2100 2.32529306412
i_id:  2200 2.44498205185
RTH:  0.105868816376
i_id:  2300 3.65522289276
i_id:  2400 4.2119910717
i_id:  2500 4.21354198456
RTH:  0.115046024323
RTH:  0.122591972351
RTH:  2.88115119934
RTH:  10.5908679962
i_id:  2600 18.8498170376
i_id:  2700 2.42577004433
i_id:  2800 2.47392010689
i_id:  2900 2.88293218613

So I have no idea why this is happening (it is also happening with
seek operations).
Any guidance on how to debug this?

thanks,
Prateek




More information about the Python-list mailing list