Help with arrays of strings

Jon Smirl jonsmirl at gmail.com
Mon Jul 31 20:37:51 EDT 2006


I only have a passing acquaintance with Python and I need to modify some
existing code. This code is going to get called with 10GB of data so it
needs to be fairly fast. 

http://cvs2svn.tigris.org/ is code for converting a CVS repository to
Subversion. I'm working on changing it to convert from CVS to git.

The existing Python RCS parser provides me with the CVS deltas as
strings.I need to get these deltas into an array of lines so that I can
apply the diff commands that add/delete lines (like 10 d20, etc). What is
the most most efficient way to do this? The data structure needs to be
able to apply the diffs efficently too.

The strings have embedded @'s doubled as an escape sequence, is there an
efficient way to convert these back to single @'s?

After each diff is applied I need to convert the array of lines back into
a string, generate a sha-1 over it and then compress it with zlib and
finally write it to disk. 

The 10GB of data is Mozilla CVS when fully expanded.

Thanks for any tips on how to do this.

Jon Smirl
jonsmirl at gmail.com




More information about the Python-list mailing list