Distributions, RE-verb and the like

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Thu Dec 29 15:25:44 EST 2005


Psyco is finished now, and it works on the x86, for Win, the new macs,
many linux boxes, etc, and it's quite useful, so maybe it can be added
to the standard Python distribution.

PyChecker (and the other similar ones that work differently) is very
useful too, and it's pure Python, so maybe it too (or something
similar) can be added to the standard distribution.

--------------------

Regular Expressions can be useful but:
- They look really an-pythonic
- Their syntax is difficult to remember
- It's not easy to understand and debug REs written by other people,
comments help a little.
- They can have hidden bugs (because of their low readability)
- they mix their opeators/syntax with the data (this is thier advantage
too), this can create problems, and makes them less general (because
you have to avoid mistaking syntax elements with the data).
- Python has already syntax to define structures, loops, etc, so
another syntax can be seen as a kind of duplication.

Such things go against lot of points of the Python Zen. So I'd like a
more pythonic syntax, easy to remember, easy to read and debug, where
data and operators are fully separated. The "reverb" library does
something like that already:
http://home.earthlink.net/~jasonrandharper/reverb.py

It compiles RE written like this:

xdigit = set(digits, char('a').to('f'), char('A').to('F'))
pat = RE((text('$') | text('0x') | text('0X')) + required(hexdigit, max
= 8) - followedBy(hexdigit))

I have already "improved" reverb a little, but I think a better and
simpler syntax can be invented by people more expert than me in REs.
Here are some alrernative syntax possibilities, I don't like them, most
of them are impossible, silly, stupid, etc, but I am sure a good syntax
can be invented.

Standard RE:
pat.pattern: (?:\$|0x|0X)[\da-gA-G]{1,8}(?![\da-gA-G])

hexdigit = set(digits, chrint('a','f'), chrint('A','F'))
pat = RE((text('$') | text('0x') | text('0X')) + repeated(hexdigit, 1,
8) - followedBy(hexdigit))

hexdigit = alt(digits, chrint('a','f'), chrint('A','F'))
pat = optional("-") + alt('$', '0x', '0X') + times(hexdigit, 1, 8) -
hexdigit

hexdigit = VR(digits, interval('a', 'f'), interval('A', 'F'))
pat = optional("-") + VR('$', '0x', '0X') + times(hexdigit, 1, 8)-
hexdigit

hexdigit = VR(digits, interval('a', 'f'), interval('A', 'F'))
pat = VR("-", min=0) + VR('$', '0x', '0X') + VR(hexdigit, min=1, max=8)
- hexdigit

hexdigit = VR( VR(digits) | interval('a', 'f') | interval('A', 'F') )
pat = VR("-", 0) + VR(VR('$') | VR('0x') | VR('0X')) + VR(hexdigit, 1,
8) - hexdigit

hexdigit = Alt(digits, interval('a', 'f'), interval('A', 'F'))
pat = VR("-", 0) + Alt('$', '0x', '0X') + VR(hexdigit, 1, 8) - hexdigit

hexdigit = Alternative(digits, Interval('a', 'f'), Interval('A', 'F'))
pat = Optional("-") + Alternative('$', '0x', '0X') + RE(Hexdigit, 1, 8)
- Hexdigit

hexdigit = Alternative(digits, Interval('a', 'f'), Interval('A', 'F'))
pat = RE("-", 0) + Alternative('$', '0x', '0X') + RE(Hexdigit, 1, 8) -
Hexdigit

hexdigit = RE([digits, Interval('a', 'f'), Interval('A', 'F')]) #
flatten sul primo parametro
pat = RE("-", 0) + RE(['$', '0x', '0X']) + RE(Hexdigit, 1, 8) -
Hexdigit

hexdigit = RE(digits, Interval('a', 'f'), Interval('A', 'F'))
pat = RE("-").repeat(0) + RE('$', '0x', '0X') + Hexdigit.repeat(1, 8) -
Hexdigit

hexdigit = VRE(digits, Interval('a', 'f'), Interval('A', 'F'))
pat = VRE("-").repeat(0) + VRE('$', '0x', '0X') + Hexdigit.repeat(1, 8)
- Hexdigit

hexdigit = Vre(digits, Interval('a', 'f'), Interval('A', 'F'))
hexnum = Vre("-").repeat(0) + Vre('$', '0x', '0X') + Hexdigit.repeat(1,
8) - Hexdigit

hexdigit = Vre(digits, Interval('a', 'f'), Interval('A', 'F'))
hexnum = Vre("-").optional() + Vre('$', '0x', '0X') +
Hexdigit.repeat(1, 8) - Hexdigit

hexdigit = Vre(Vre().digits, Interval('a', 'f'), Interval('A', 'F'))
hexnum = Optional("-") + Vre('$', '0x', '0X') + Repeat(Hexdigit, 1, 8)
- Hexdigit

hexdigit = Vre(Vre().digits, Interval('a', 'f'), Interval('A', 'F'))
hexnum = Vre("-").optional() + Vre('$', '0x', '0X') +
Hexdigit.repeat(1, 8) - Hexdigit

hexdigit = Vre(Vre().digits, Interval('a', 'f')).ignorecase()
hexnum = Vre("-").optional() + Vre('$', '0x').ignorecase() +
Hexdigit.repeat(1, 8) - Hexdigit

hexdigit = Alternative(Digits, Interval('a', 'f')).ignorecase()
hexnum = Optional("-") + Alternative('$', '0x').ignorecase() +
Repeat(Hexdigit, 1, 8) - Hexdigit


I think that once the best syntax is found, implementing a better
reverb-like module isn't too much work (my modified version of reverb
is only about 130 LOCs).

Bye,
bearophile




More information about the Python-list mailing list