Proper use of the codecs module.

Andrew andrew at invalid.invalid
Fri Aug 16 10:02:08 EDT 2013


I have a mixed binary/text file[0], and the text portions use a radically
nonstandard character set. I want to read them easily given information
about the character encoding and an offset for the beginning of a string. 

The descriptions of the codecs module and codecs.register() in particular
seem to suggest that this is already supported in the standard library.
However, I can't find any examples of its proper use. Most people who use
the module seem to want to read utf files in python 2.x.[1] I would like to
know how to correctly set up a new codec for reading files that have
nonstandard encodings. 

I have two other related questions: 

How does seek() work on a file opened in text mode? Does it seek to a
character offset or to a byte offset? I need the latter behavior. If I
can't get it I will have to find a different approach. 

The files I'm working with use a nonstandard end-of-string character in the
same fashion as C null-terminated strings. Is there a builtin function that
will read a file "from seek position until seeing EOS character X"? The
methods I see for this online seem to amount to reading one character at a
time and checking manually, which seems nonoptimal to me. 


[0] The file is an SNES ROM dump, but I don't think that matters. 
[1] I'm using Python 3, if it's relevant. 

-- 

Andrew



More information about the Python-list mailing list