Eval (was Re: Question about using python as a scripting language)
skip at pobox.com
skip at pobox.com
Wed Aug 9 10:56:45 EDT 2006
Brendon> Turns out that the website in question stores its data in the
Brendon> format of a Python list
Brendon> (http://quotes.nasdaq.com/quote.dll?page=nasdaq100, search the
Brendon> source for "var table_body"). So, the part of my code that
Brendon> extracts the data looks something like this:
...
Brendon> return eval(data[pos1+len(START_MARKER):END_MARKER])
Brendon> My question is: what's the safe way to do this?
At the top level the lines look like a Python list. On a line-by-line basis
they also have consistent structure. Read it line-by-line, parse the lines
(using regular expressions or whatever), then append the parsed values to a
list, something like (untested):
import re
symbolinfo = []
sympat = re.compile(
r'\[',
r'"(?P<sym>[^"]+)",'
r' *"(?P<name>[^"]+)",'
r' *(?<n1>[^,]+,'
r' *(?<n2>[^,]+,'
r' *(?<n3>[^,]+,'
r' *(?<n4>[^,]+,'
r' *(?<n5>[^,]+,'
r' *"(?P<s1>[^"]*)"
r' *"(?P<s2>[^"]*)"
r'\]')
for line in urllib.urlopen("http://..."):
mat = sympat.match(line)
if mat is not None:
symbolinfo.append(mat.groupdict())
The regular expression is fairly fragile, but that's okay. If their format
changed from a list of ten elements to a list of eight or twelve elements,
you'd probably be interested in knowing about that asap. eval() probably
wouldn't fail unless they completely butchered the table syntax.
With a small amount of input massaging, you could do this more cleanly with
the csv module. That's left as an exercise for the reader.
Skip
More information about the Python-list
mailing list