overriding character escapes during file input

David J Birnbaum djbpitt+python at pitt.edu
Sat Sep 2 23:33:18 EDT 2006


Dear Python-list,

I need to read a Unicode (utf-8) file that contains text like:
> blah \fR40\fC blah
I get my input and then process it with something like:
> inputFile = codecs.open(sys.argv[1],'r', 'utf-8')
>
> for line in inputFile:
When Python encounters the "\f" substring in an input line, it wants to 
treat it as an escape sequence representing a form-feed control 
character, which means that it gets interpreted as (or, from my 
perspective, translated to) "\x0c". Were I entering this string myself 
within my program code, I could use a raw string (r"\f") to avoid this 
translation, but I don't know how to do this when I am reading a line 
from a file. If all I cared about was getting my code to work, I could 
simply let the translation take place and then undo it within my 
program, but, as Humpty Dumpty said, "it's a question of which is to be 
master," and I would prefer to coerce Python into reading the line the 
way I want it to be read, rather than let it do as it pleases and then 
clean up afterwards.

Can anyone advise?

In case it matters, I'm using ActivePython 2.4 under Windows XP.

Thanks,

David
djbpitt+python at pitt.edu



More information about the Python-list mailing list