Reading \n unescaped from a file

Rob Hills rhills at medimorphosis.com.au
Thu Sep 3 12:24:49 EDT 2015


Hi Chris,

On 03/09/15 06:10, Chris Angelico wrote:
> On Wed, Sep 2, 2015 at 12:03 PM, Rob Hills <rhills at medimorphosis.com.au> wrote:
>> My mapping file contents look like this:
>>
>> \r = \\n
>> “ = "
> Oh, lovely. Code page 1252 when you're expecting UTF-8. Sadly, you're
> likely to have to cope with a whole pile of other mojibake if that
> happens :(

Yeah, tell me about it!!!

> Technically, what's happening is that your "\r" is literally a
> backslash followed by the letter r; the transformation of backslash
> sequences into single characters is part of Python source code
> parsing. (Incidentally, why do you want to change a carriage return
> into backslash-n? Seems odd.)
>
> Probably the easiest solution would be a simple and naive replace(),
> looking for some very specific strings and ignoring everything else.
> Easy to do, but potentially confusing down the track if someone tries
> something fancy :)
>
> line = line.split('#')[:1][0].strip() # trim any trailing comments
> line = line.replace(r"\r", "\r") # repeat this for as many backslash
> escapes as you want to handle
>
> Be aware that this, while simple, is NOT capable of handling escaped
> backslashes. In Python, "\\r" comes out the same as r"\r", but with
> this parser, it would come out the same as "\\\r". But it might be
> sufficient for you.

Thanks for the explanation which has helped me understand the problem. 
I also tried your approach but wound up with output data that somehow
had every single character escaped :-(

I've since decided I was being too obsessive trying to load *everything*
from my mapping file and have simply hard-coded my two escaped character
replacements for now and moved on to more important problems (ie the
Windoze Character soup that comprises my data and which I have to clean
up!).

Thanks again,

-- 
Rob Hills
Waikiki, Western Australia




More information about the Python-list mailing list