[Tutor] Escape characters in strings

dman dsh8290@rit.edu
Sun, 6 Jan 2002 16:06:00 -0500


On Sat, Jan 05, 2002 at 09:22:12PM -0500, Grimmtooth wrote:
| 
| > | Example:
| > |
| > | 12/13 14:04:53 2> S:<STX>L.TG23  Y1           12007030050020C100021340
| > | 12/13 14:04:53 2> S:12888888881003030000000001249000212011213140434210
| > | 12/13 14:04:53 2> S:10125414C200344012888888881=0512101543211234567800
| > | 12/13 14:04:53 2> S:1     700000000740   0242C00000000000000EN00420107
| > | 12/13 14:04:53 2> S:6O02450U300001000\300009990\000000000999\950U30000
| > | 12/13 14:04:53 2> S:1000\300002500\000000000250<ETX>.
| >
| > Is this from a log file?  How are you reading the data?  Can you show
| > us the code?
| 
| 1) Yes and no.
| 
| 2) (a) either the user pastes this data into a wxWindows Text Control then I
| GetValue() it

I can't test this out since I don't know enough of wx to whip up a
window with a text control.  I expect that the text control would give
you back each character unscathed.

| or else it is read from a logfile directly using readlines()
| into an array.

Technically it is a 'list', but anyways, it works for me.  I created a
file containing

-----
foo\300bar
----

$ python2.1
Python 2.1.1+ (#1, Dec 30 2001, 17:17:30) 
[GCC 2.95.4 20011223 (Debian prerelease)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> f = open( "data" , "r" )
>>> print f.readlines()
['foo\\300bar\n']
>>> 


The "extra" backslash in the output is to indicate that a real
backslash is in the string.

| In the current case, I've hardcoded the text in using a """
| triple-quoted string.

This is going to be problematic -- any string literal in python source
is eval()ed.  You want to be sure and use a 'raw' string in your
python source, or put the test data into a file.

| The problem occurs no matter which way it goes.

You've tested all three data input methods?  Can you make my code
above fail?

| 3) I could, but it would be a lot of code. Let me paraphrase it:
| 
|    first, get code from text control or log file.
| 
|    then, using a function Stitch(), I locate all lines beginning with S:
| after the 2> by first slicing [] them out then concatenating them together
| with '+'. This returns one continuous string.
| 
| 	for i in s:
| 		x = i[18:-1]    # first 18 are time stamp, last is CR. Trim both.
| 
| 		if len(x) > 0:
| 			if x[0] == 'S':
| 				snd.append(x)
| 
|    then, I pass the string to a Parser class.  This in turn slices out
| particular parts of the long string and assigns them to additional classes,
| each designed to parse the particular data it is passed.
| 
| class BaseElement:
| 	def	__init__(self, parent):
| 		pass
| 
| 	def Show(self):
| 		pass
| 
| class AdditionalAmounts(BaseElement):
| 
| 	data	=	None
| 
| 	def __init__(self, parent, txt):
| 		self.data, parent.remainder	=	lllvar(txt)
| 
| 
| def llvar(data):
| 	len	=	int(data[:2])
| 	return(data[2:len+2], data[len+2:])
| 
| 
| the BaseElement class is an empty, defined to give me a None if I don't
| override the __init__ and Show methods. Assuming that I've sliced off
| everything prior to the part of interest already (which I have), the above
| code fragments are the ones that get processed. First, the string is passed
| to AdditionalAmounts.__init__().  __init__() calls llvar() to seperate the
| data and from the remainder of the log entry (an llvar is a data field that
| is preceded by two digits indicating how many bytes the data contains). The
| process of actually processing the data is where I end up tripping over the
| escaped \ characters.

I don't see anything in the code above that would cause python to
evaluate the string you have.

| > If you do want to eval() the string you can replace( "\\" , "\\\\" )
| > to escape all backslashes first.
| 
| Doesn't work, that was the first thing I tried. For some reason
| Python's not finding the '\' because it thinks '\300' is a single
| character rather than four.
| 
| Even 'print x' will hide the \ in this case.

I can achieve this with 

>>> print '\300'
À
>>> 

because that string is part of the code.  However if I change the code
to

>>> print '\\300'
\300
>>> print r'\300'
\300
>>> 

I get the desired result.

HTH,
-D


PS.  It would be quite useless if the interpreter was constantly
     evaluating string data you are working with according to it's
     source code rules.

-- 

If any of you lacks wisdom, he should ask God, who gives generously to
all without finding fault, and it will be given to him.  But when he
asks he must believe and not doubt, because he who doubts is like a wave
of the sea, blown and tossed by the wind.
        James 1:5-6