[Tutor] quoting and escaping
Jon Crump
jjcrump at myuw.net
Wed Jan 14 23:14:15 CET 2009
Kent, Steve, Marty, et alia,
On Tue, 13 Jan 2009, Kent Johnson wrote:
> Where does this come from? It looks like the string representation of
> a dict. Can you get the actual dict? Perhaps there is a better way to
> do whatever you are doing?
It does look like that doesn't it, but it's actually a javascript object
in a file Itinerary.js This file is in the format expected by the Simile
Timeline software (if you're interested, you can see the working web
application at
<http://home.myuw.net/jjcrump/Timelines/itinerary/JohnItinerary.html>).
The data for this application was prepared by means of a python script
that you all have been helping me with from time to time. The script isn't
pretty and bears everywhere the marks of inexperience, but I've learned a
lot doing it and it works.
Having created the js file, I found that I might wish to edit the data in
a variety of ways, so I thought, well, it looks like a python dictionary,
maybe I can make it act like one. simplejson.loads() was the obvious thing
to try, but since the data format isn't really json that didn't work. The
new Date() object declarations needed to be modified to look like
datetime.date() instances.
Marty Walsh very helpfully reminded me that re.sub() can take a function
as its first argument, so once I got the regex string right, it was easy
to turn new Date(1216,9,18) into "datetime.date(12,10,18)".
In experimenting at the interpreter command line I was confused by the
fact that the interpreter was interpreting the escaped quotation marks in
my string fragment rather than rendering the escape character. Steve
Willoughby's kind message explained the business of 'raw' strings which
put me on the right track to understanding what was happening. In the
event, however, the escapes were not a problem because file.read() returns
a raw string.
So now I can put the pieces together and do this:
>>> itin = file('/PathToFile/Itinerary.js', 'r')
>>>
>>> jdate = re.compile('new Date\((\d{4}),(\d{1,2}),(\d{1,2})\)')
>>>
>>> def repldate(match):
... y, m, d = map(int, match.groups())
... return '\"date(%04d, %02d, %02d)\"' % (y, m+1, d)
...
>>> print type(eval(jdate.sub(repldate, itin.read())))
<type 'dict'>
And that's what I wanted all along.
I'm mindful of the security implications of eval() and understand that
it's bad form, but since this processing is entirely internal I don't
think that'll be a problem. I agree that there's probably a better way of
doing things, especially in the generation of Itinerary.js in the first
place. Creating in python a data structure for another language by means
of string formatting seemed like a kluge to me even at the time, but being
the tyro that I am, I didn't know how else to proceed.
The data file in question is at
<http://home.myuw.net/jjcrump/Timelines/itinerary/Itinerary.js> and the
python used for the original text processing is in the same directory at
itin-processing.py. It's a big ugly mess full of naive notes to myself, if
anyone cares to look at it and offer any comments or advice I could not be
anything but grateful. I've learned so much from you guys, and thank you
Kent for following up, but, given my long-windedness, aren't you sorry you
asked? ;-)
Thanks again,
Jon
More information about the Tutor
mailing list