Character encodings and codecs

Alex Martelli aleax at aleax.it
Sat Feb 1 15:15:45 EST 2003


Grumfish wrote:
   ...
> So I would have to read it in byte by byte and manuall check when I
> can make a break. There is now Python module that would make this
> easier. I thought thats waht the codec module does but I can't relly
> under stand it.
> 
> The specific projoct I'm working on now would require readine EUC-JP,
> storing characters internally as Unicode, and writing UTF-8.

codec does exactly this, as long as you have an EUC-JP codec
installed of course -- just use codec.open to open your files,
specifying each file encoding -- data in memory is Unicode,
and you can read by line line for example (w. method readline).

http://www.python.jp/Zope/download/JapaneseCodecs
I think has Japanese codecs available for download, but I'm
not sure because the instructions are in (I believe) Japanese,
and I cannot read that language.


Alex





More information about the Python-list mailing list