python3, regular expression and bytes text

Eko palypse ekopalypse at gmail.com
Sat Oct 12 15:57:06 EDT 2019


> You cannot. First, \w in re.LOCALE works only when the text is encoded 
> with the locale encoding (cp1252 in your case). Second, re.LOCALE 
> supports only 8-bit charsets. So even if you set the utf-8 locale, it 
> would not help.
> 
> Regular expressions with re.LOCALE are slow. It may be more efficient to 
> decode text and use Unicode regular expression.

Thank you, I guess I'm convinced to always decode everything (re pattern and text) to utf8 internally and then do the re search but then I would need to figure out the correct position, hmm - some ongoing investigation needed, I guess.

Thx
Eren



More information about the Python-list mailing list