ascii-unicode replacement
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Thu Apr 5 15:57:59 EDT 2007
En Thu, 05 Apr 2007 14:28:20 -0300, Andrea Valle <andrea.valle at unito.it>
escribió:
> I scripted some text files with another language which cannot handle
> unicode.
> As I need special character in the resulting text files (IPA
> extension), my idea was to define some special ascii sequences in the
> text files, open the text files in Python, replace the special
> sequences with unicode and encode in utf8. I made some tests with
> consolle and everything seemed fine.
>
> But my script keeps on raising exceptions related to encoding.
You are mixing Unicode and strings all the way.
I prefer to use standard file objects, with explicit decoding/encoding
right where the data is being read/written, rather than the codecs.open
approach.
Your source file is ASCII, right? So read it using the builtin open()
+ read(). You get a string. Convert to unicode right there, using
read_text.decode("ascii"). You have unicode now.
Do all the processing and replacements in Unicode.
At the LAST stage, encode the text using .encode("utf-8") just before
writing the output file.
--
Gabriel Genellina
More information about the Python-list
mailing list