regular expression, unicode

Simon Strobl Simon.Strobl at gmail.com
Wed Apr 29 07:44:12 EDT 2009


Hello,

why can't I use this pattern

good = re.compile("^[A-ZÄÖÜ].*")

in python3. According to the documentation, patterns may be unicode
strings.

I get this error message:

Traceback (most recent call last):
  File "./get.py", line 8, in <module>
    for line in sys.stdin:
  File "/usr/lib64/python3.0/io.py", line 1734, in __next__
    line = self.readline()
  File "/usr/lib64/python3.0/io.py", line 1808, in readline
    while self._read_chunk():
  File "/usr/lib64/python3.0/io.py", line 1557, in _read_chunk
    self._set_decoded_chars(self._decoder.decode(input_chunk, eof))
  File "/usr/lib64/python3.0/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1:
invalid data

Simon



More information about the Python-list mailing list