Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

Steven D'Aprano steve+comp.lang.python at pearwood.info
Tue Nov 1 03:10:50 EDT 2011


On Mon, 31 Oct 2011 22:12:26 -0400, Dave Angel wrote:

> I would claim that a well-written (in C) translate function, without
> using the delete option, should be much quicker than any python loop,
> even if it does copy the data.

I think you are selling short the speed of the Python interpreter. Even 
for short strings, it's faster to iterate over a string in Python 3 than 
to copy it with translate:

>>> from timeit import Timer
>>> t1 = Timer('for c in text: pass', 'text = "abcd"')
>>> t2 = Timer('text.translate(mapping)', 
...     'text = "abcd"; mapping = "".maketrans("", "")')
>>> min(t1.repeat())
0.450606107711792
>>> min(t2.repeat())
0.9279451370239258


> Incidentally, on the Pentium family,
> there's a machine instruction for that, to do the whole loop in one
> instruction (with rep prefix).

I'm pretty sure that there isn't a machine instruction for copying an 
entire terabyte of data in one step. Since the OP explicitly said he was 
checking text up to a TB in size, whatever solution is used has to scale 
well.



-- 
Steven



More information about the Python-list mailing list