8-bit cleanliness

Rafael Cordones Marcos rcm at bcnartdirecte.com
Sun Jun 11 13:08:45 EDT 2000


On Sun, Jun 11, 2000 at 06:54:53PM +0200, Thomas Wouters wrote:
> On Sun, Jun 11, 2000 at 05:55:36PM +0200, Rafael Cordones Marcos wrote:
> 
> > I just started to use Python a few days ago because I got fed up of so much convoluted Perl code. ;)
> > Anyway, I have to read some text files and process the words appearing in them. I have discovered, to
> > my surprise, that accents like (á, à, ...) get replaced by a 4 character code. Is there any class/module/option
> > available to read *text* files with non english characters in them?

Thanks for the quick response!! And sorry for the tone of my email but it's been 
several days working full time on this to finally find that Python was mangling 
accented characters... which it finally was not doing!!

Waht happened is that I put print statements to follow some data structures arround.
I had printed a hash and the keys in that hash appeared with \XXX codes. But if
I print the key itself everything is fine!

Back to code!

Rafa


> Those non-ASCII characters do not get replaced by that 4-character code,
> those non-ASCII characters *are* that 4-character code ;-) If you use repr()
> on strings, non-printable characters are expressed as an octal number, to be
> able to reliably reproduce them:
> 
> >>> s = "áááárgh"
> >>> s			# which is the same as repr(s), in interactive mode
> '\341\341\341\341rgh'
> 
> >>> print s
> áááárgh
> 
> \341 is the 'accurate' representation of the 'á' character, it'll always be
> converted in the same actual character, regardless of your font settings.
> How it is displayed depends on your font or your locale settings, depending
> on what you use to view it ;-)
> 
> Just treat your strings as data, as you should anyway, and all will end up
> fine. Just be sure not to use 'repr()' (or ``) when you really mean 'print'
> or 'str()'.
> 
> -- 
> Thomas Wouters <thomas at xs4all.net>
> 
> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
> 
> -- 
> http://www.python.org/mailman/listinfo/python-list
> 

-- 
Linux! The Choice of a GNU Generation! -> http://www.debian.org
Unix IS user-friendly, it just chooses its friends very carefully.




More information about the Python-list mailing list