regular expression, unicode

MRAB google at mrabarnett.plus.com
Wed Apr 29 19:39:12 EDT 2009


Simon Strobl wrote:
> Hello,
> 
> why can't I use this pattern
> 
> good = re.compile("^[A-ZÄÖÜ].*")
> 
> in python3. According to the documentation, patterns may be unicode
> strings.
> 
> I get this error message:
> 
> Traceback (most recent call last):
>   File "./get.py", line 8, in <module>
>     for line in sys.stdin:
>   File "/usr/lib64/python3.0/io.py", line 1734, in __next__
>     line = self.readline()
>   File "/usr/lib64/python3.0/io.py", line 1808, in readline
>     while self._read_chunk():
>   File "/usr/lib64/python3.0/io.py", line 1557, in _read_chunk
>     self._set_decoded_chars(self._decoder.decode(input_chunk, eof))
>   File "/usr/lib64/python3.0/codecs.py", line 300, in decode
>     (result, consumed) = self._buffer_decode(data, self.errors, final)
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1:
> invalid data
> 
In Python 3 .py files are assumed to be encoded in UTF-8 unless declared
otherwise by a line such as:

# -*- coding: cp-1252 -*-

You need to check what encoding your editor is using (if possible use
UTF-8).



More information about the Python-list mailing list