Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

Dave Angel d at davea.name
Mon Oct 31 18:08:00 EDT 2011


On 10/31/2011 05:47 PM, Dave Angel wrote:
> On 10/31/2011 03:54 PM, python at bdurham.com wrote:
>> Wondering if there's a fast/efficient built-in way to determine
>> if a string has non-ASCII chars outside the range ASCII 32-127,
>> CR, LF, or Tab?
>>
>> I know I can look at the chars of a string individually and
>> compare them against a set of legal chars using standard Python
>> code (and this works fine), but I will be working with some very
>> large files in the 100's Gb to several Tb size range so I'd
>> thought I'd check to see if there was a built-in in C that might
>> handle this type of check more efficiently.
>>
>> Does this sound like a use case for cython or pypy?
>>
>> Thanks,
>> Malcolm
>>
> How about doing a .replace() method call, with all those characters 
> turning into '', and then see if there's anything left?
>
>
>
I was wrong once again.  But a simple combination of  translate() and 
split() methods might do it.  Here I'm suggesting that the table replace 
all valid characters with space, so the split() can use its default 
behavior.

-- 

DaveA




More information about the Python-list mailing list