Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

Duncan Booth duncan.booth at invalid.invalid
Tue Nov 1 14:54:09 EDT 2011


Steven D'Aprano <steve+comp.lang.python at pearwood.info> wrote:

> LEGAL = ''.join(chr(n) for n in range(32, 128)) + '\n\r\t\f'
> MASK = ''.join('\01' if chr(n) in LEGAL else '\0' for n in range(128))
> 
> # Untested
> def is_ascii_text(text):
>     for c in text:
>         n = ord(c)
>         if n >= len(MASK) or MASK[n] == '\0': return False
>     return True
> 
> 
> Optimizing it is left as an exercise :)
> 

#untested
LEGAL = ''.join(chr(n) for n in range(32, 128)) + '\n\r\t\f'
MASK = [True if chr(n) in LEGAL else False for n in range(128)]

# Untested
def is_ascii_text(text):
  try:
    return all(MASK[ord(c)] for c in text)
  except IndexError:
    return False


-- 
Duncan Booth http://kupuguy.blogspot.com



More information about the Python-list mailing list