How do I parse this ? regexp ?

Jeremy Bowers jerf at jerf.org
Wed Apr 27 13:49:31 EDT 2005


On Wed, 27 Apr 2005 07:56:11 -0700, serpent17 at gmail.com wrote:

> Hello all,
> 
> I have this line of numbers:
> 
> 
> 04242005 18:20:42-0.000002, 271.1748608, [-4.119873046875,
> 3.4332275390625, 105.062255859375], [0.093780517578125, 0.041015625,
> -0.960662841796875], [0.01556396484375, 0.01220703125, 0.01068115234375]
> 
> 
> repeated several times in a text file and I would like each element to be
> part of a vector. how do I do this ? I am not very capable in using regexp
> as you can see.

I think, based on the responses you've gotten so far, that perhaps you
aren't being clear enough.

Some starter questions:

* Is that all on one line in your file?
* Are there ever variable numbers of the [] fields?
* What do you mean by "vectors"?

If the line format is stable (no variation in numbers), and especially if
that is all one line, given that you are not familiar with regexp I
wouldn't muck about with it. (For me, I'd still say it's borderline if I
would go with that.) Instead, follow along in the following and it'll
probably help, though as I don't precisely know what you're asking I can't
give a complete solution:

Python 2.3.5 (#1, Mar  3 2005, 17:32:12) 
[GCC 3.4.3  (Gentoo Linux 3.4.3, ssp-3.4.3-0, pie-8.7.6.6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> x = "04242005 18:20:42-0.000002, 271.1748608, [-4.119873046875, 3.4332275390
625, 105.062255859375], [0.093780517578125, 0.041015625, -0.960662841796875], [0
.01556396484375, 0.01220703125, 0.01068115234375]"
>>> x.split(',', 2)
['04242005 18:20:42-0.000002', ' 271.1748608', ' [-4.119873046875, 3.43322753906
25, 105.062255859375], [0.093780517578125, 0.041015625, -0.960662841796875], [0.
01556396484375, 0.01220703125, 0.01068115234375]']
>>> splitted = x.split(',', 2)
>>> splitted[2]
' [-4.119873046875, 3.4332275390625, 105.062255859375], [0.093780517578125, 0.04
1015625, -0.960662841796875], [0.01556396484375, 0.01220703125, 0.01068115234375
]'
>>> import re
>>> safetyChecker = re.compile(r"^[-\[\]0-9,. ]*$")
>>> if safetyChecker.match(splitted[2]):
...     eval(splitted[2], {}, {})
... 
([-4.119873046875, 3.4332275390625, 105.062255859375], [0.093780517578125,
0.041015625, -0.960662841796875], [0.01556396484375, 0.01220703125,
0.01068115234375])
>>> splitted[0].split()
['04242005', '18:20:42-0.000002']
>>> splitted[0].split()[1].split('-')
['18:20:42', '0.000002']
>>> 


I'd like to STRONGLY EMPHASIZE that there is danger in using "eval" as it
is very dangerous if you can't trust the source; *any* python code will
be run. That is why I am extra paranoid and double-check that the
expression only has the characters listed in that simple regex in it.
(Anyone who can construct a malicious string out of those characters will
get my sincere admiration.) You may do as you please, of course, but I
believe it is not helpful to suggest security holes on comp.lang.python
:-) The coincidence of that part of your data, which is also the most
challenging to parse, exactly matching Python syntax is too much to pass
up.

This should give you some good ideas; if you post more detailed questions
we can probably be of more help.




More information about the Python-list mailing list