overriding character escapes during file input

John Machin sjmachin at lexicon.net
Sun Sep 3 04:44:11 EDT 2006


John Machin wrote:
> David J Birnbaum wrote:
> > Dear Python-list,
> >
> > I need to read a Unicode (utf-8) file that contains text like:
> > > blah \fR40\fC blah
> > I get my input and then process it with something like:
> > > inputFile = codecs.open(sys.argv[1],'r', 'utf-8')
> > >
> > > for line in inputFile:
> > When Python encounters the "\f" substring in an input line, it wants to
> > treat it as an escape sequence representing a form-feed control
> > character,
>
> Even if it were as sentient as "wanting" to muck about with the input,
> it doesn't. Those escape sequences are interpreted by the compiler, and
> in other functions (e.g. re.compile) but *not* when reading a text
> file.
>
> Example:
> |>>> guff = r"blah \fR40\fC blah"
> |>>> print repr(guff)
> 'blah \\fR40\\fC blah'
> |>>> # above is ASCII so it is automatically also UTF8
>
> Comment: It contains backslash followed by 'f' ...
>
> |... fname = "guff.utf8"
> |>>> f = open(fname, "w")
> |>>> f.write(guff)
> |>>> f.close()
> |>>> import codecs
> |>>> f = codecs.open(fname,'r', 'utf-8')
> |>>> guff2 = f.read()
> |>>> print guff2 == guff
> |True
> No interpretation of the r"\f" has been done.
>
> > which means that it gets interpreted as (or, from my
> > perspective, translated to) "\x0c". Were I entering this string myself
> > within my program code, I could use a raw string (r"\f") to avoid this
> > translation, but I don't know how to do this when I am reading a line
> > from a file.
>
> What I suggest you do is:
>    print repr(open('yourfile', 'r').read()
> [or at least one of the offending lines]
> and inspect it closely. You may find (1) that the file has formfeeds in
> it or (2) it has r"\f" in in it and you were mistaken about the
> interpretation or (3) something else.
>
> If you maintain (3) is the case, then make up a small example file,
> show a dump of it using print repr(.....) as above, plus the (short)
> code where you decode it and dump the result.
=========================================================
On 3/09/2006 3:53 PM, David J Birnbaum wrote in e-mail:
> Dear John,
>
> Thank you for the quick response. Ultimately I need to remap the "f" in
> "\f" to something else, so I worked around the problem by doing the
> remapping first, and I'm now getting the desired result.
>

Please reply on-list.

How could you read the file to remap an "f" if you were getting '\0x0C'
when you tried to read it? Are we to assume that it was case (2) i.e.
not a Python problem?

Cheers,
John




More information about the Python-list mailing list