ascii-unicode replacement

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Thu Apr 5 15:57:59 EDT 2007


En Thu, 05 Apr 2007 14:28:20 -0300, Andrea Valle <andrea.valle at unito.it>  
escribió:

> I scripted some text files with another language which cannot handle
> unicode.
> As I need special character in the resulting text files (IPA
> extension), my idea was to define some special ascii sequences in the
> text files, open the text files in Python, replace the special
> sequences with unicode and encode in utf8. I made some tests with
> consolle and everything seemed fine.
>
> But my script keeps on raising exceptions related to encoding.

You are mixing Unicode and strings all the way.

I prefer to use standard file objects, with explicit decoding/encoding  
right where the data is being read/written, rather than the codecs.open  
approach.

Your source file is ASCII, right? So read it using the builtin open()  
+ read(). You get a string. Convert to unicode right there, using  
read_text.decode("ascii"). You have unicode now.
Do all the processing and replacements in Unicode.
At the LAST stage, encode the text using .encode("utf-8") just before  
writing the output file.

-- 
Gabriel Genellina




More information about the Python-list mailing list