[Python-Dev] Cost-Free Slice into FromString constructors--Long

Raymond Hettinger rhettinger at ewtllc.com
Thu May 25 17:33:35 CEST 2006


Runar Petursson wrote:

> We've been talking this week about ideas for speeding up the parsing 
> of Longs coming out of files or network.  The use case is having a 
> large string with embeded Long's and parsing them to real longs.  One 
> approach would be to use a simple slice:
> long(mystring[x:y])
>
> an expensive operation in a tight loop.  The proposed solution is to 
> add further keyword arguments to Long (such as):
>
> long(mystring, base=10, start=x, end=y)
>
> The start/end would allow for negative indexes, as slices do, but 
> otherwise simply limit the scope of the parsing.  There are other 
> solutions, using buffer-like objects and such, but this seems like a 
> simple win for anyone parsing a lot of text.  I implemented it in a 
> branch  runar-longslice-branch, but it would need to be updated with 
> Tim's latest improvements to long.  Then you may ask, why not do it 
> for everything else parsing from string--to which I say it should.  
> Thoughts?


-1 This is a somewhat specialized parsing application and should not be 
allowed to muck-up an otherwise simple, general-purpose API.   
Micro-optimizations do not warrant API changes.   Certainly, it should 
not be propogated to everything that can parse from a string.  Also, you 
are likely to find that the cost of varargs and kwargs will offset the 
slicing gains.

I think the whole notion is off base.  The int(mystring[x:y]) situation 
is only important when you're doing it many times.  Often, you'll be 
doing other conversions as well.  So, you would be better-off creating a 
text version of the struct module that would enable you to extract and 
convert many elements from a long record stored as text.   Alternately, 
you could expose a function styled after fscanf().  IOW, better to 
provide to general purpose text parsing tool than to muck-up the whole 
language for one specialized application.


Raymond


More information about the Python-Dev mailing list