Raw strings as input from File?

Dave Angel davea at ieee.org
Wed Dec 2 00:39:50 EST 2009


rzed wrote:
> utabintarbo <utabintarbo at gmail.com> wrote in
> news:adc6c455-5616-471a-8b39-d7fdad2179e4 at m33g2000vbi.googlegroups.c
> om: 
>
>   
>> I have a log file with full Windows paths on a line. eg:
>> K:\A\B\C\10xx\somerandomfilename.ext->/a1/b1/c1/10xx
>> \somerandomfilename.ext ; t9999xx; 11/23/2009 15:00:16 ;
>> 1259006416 
>>
>> As I try to pull in the line and process it, python changes the
>> "\10" to a "\x08". This is before I can do anything with it. Is
>> there a way to specify that incoming lines (say, when using
>> .readlines() ) should be treated as raw strings?
>>
>> TIA
>>     
>
> Despite all the ragging you're getting, it is a pretty flakey thing 
>   
When the OP specified readline(), which does *not* behave this way, he 
probably deserved what you call "ragging."  The backslash escaping is 
for string literals, which are in code, not in data files.

In any case, there's a big difference between surprising (to you), and 
flakey.
> that Python does in this context:
> (from a python shell)
>   
>>>> x = '\1'
>>>> x
>>>>         
> '\x01'
>   
>>>> x = '\10'
>>>> x
>>>>         
> '\x08'
>
> If you are pasting your string as a literal, then maybe it does the 
> same. It still seems weird to me. I can accept that '\1' means x01, 
> but \10 seems to be expanded to \010 and then translated from octal 
> to get to x08. That's just strange. I'm sure it's documented 
> somewhere, but it's not easy to search for.
>
>   
Check in the help for "escape Strings".   It's documented (in vers. 2.6, 
anyway)  in a nice chart that backslash followed by 3 digits, is 
interpreted as octal.  I don't like it much either, but it's inherited 
from C, which has worked that way for 30+ years.

Online, see   
http://www.python.org/doc/2.6.4/reference/lexical_analysis.html,   and  
look in section 2.4.1 for the chart.
> Oh, and this:
>   
>>>> '\7'
>>>>         
> '\x07'
>   
>>>> '\70'
>>>>         
> '8'
> ... is realy odd.
>
>   
Octal 70 is hex 38 (or decimal 56), which is the character '8'.

DaveA



More information about the Python-list mailing list