[Tutor] quoting and escaping

Jon Crump jjcrump at myuw.net
Wed Jan 14 23:14:15 CET 2009


Kent, Steve, Marty, et alia,

On Tue, 13 Jan 2009, Kent Johnson wrote:
> Where does this come from? It looks like the string representation of
> a dict. Can you get the actual dict? Perhaps there is a better way to
> do whatever you are doing?


It does look like that doesn't it, but it's actually a javascript object 
in a file Itinerary.js This file is in the format expected by the Simile 
Timeline software (if you're interested, you can see the working web 
application at 
<http://home.myuw.net/jjcrump/Timelines/itinerary/JohnItinerary.html>). 
The data for this application was prepared by means of a python script 
that you all have been helping me with from time to time. The script isn't 
pretty and bears everywhere the marks of inexperience, but I've learned a 
lot doing it and it works.

Having created the js file, I found that I might wish to edit the data in 
a variety of ways, so I thought, well, it looks like a python dictionary, 
maybe I can make it act like one. simplejson.loads() was the obvious thing 
to try, but since the data format isn't really json that didn't work. The 
new Date() object declarations needed to be modified to look like 
datetime.date() instances.

Marty Walsh very helpfully reminded me that re.sub() can take a function 
as its first argument, so once I got the regex string right, it was easy 
to turn new Date(1216,9,18) into "datetime.date(12,10,18)".

In experimenting at the interpreter command line I was confused by the 
fact that the interpreter was interpreting the escaped quotation marks in 
my string fragment rather than rendering the escape character. Steve 
Willoughby's kind message explained the business of 'raw' strings which 
put me on the right track to understanding what was happening. In the 
event, however, the escapes were not a problem because file.read() returns 
a raw string.

So now I can put the pieces together and do this:

>>> itin = file('/PathToFile/Itinerary.js', 'r')
>>>
>>> jdate = re.compile('new Date\((\d{4}),(\d{1,2}),(\d{1,2})\)')
>>>
>>> def repldate(match):
...     y, m, d = map(int, match.groups())
...     return '\"date(%04d, %02d, %02d)\"' % (y, m+1, d)
...
>>> print type(eval(jdate.sub(repldate, itin.read())))
<type 'dict'>

And that's what I wanted all along.

I'm mindful of the security implications of eval() and understand that 
it's bad form, but since this processing is entirely internal I don't 
think that'll be a problem. I agree that there's probably a better way of 
doing things, especially in the generation of Itinerary.js in the first 
place. Creating in python a data structure for another language by means 
of string formatting seemed like a kluge to me even at the time, but being 
the tyro that I am, I didn't know how else to proceed.

The data file in question is at 
<http://home.myuw.net/jjcrump/Timelines/itinerary/Itinerary.js> and the 
python used for the original text processing is in the same directory at 
itin-processing.py. It's a big ugly mess full of naive notes to myself, if 
anyone cares to look at it and offer any comments or advice I could not be 
anything but grateful. I've learned so much from you guys, and thank you 
Kent for following up, but, given my long-windedness, aren't you sorry you 
asked? ;-)

Thanks again,
Jon


More information about the Tutor mailing list