Please help... with re
Alex Martelli
alex at magenta.com
Wed Jul 26 17:43:43 EDT 2000
"Olivier Dagenais" <olivierS.dagenaisP at canadaA.comM> wrote in message
news:bzHf5.48352$1h3.670995 at news20.bellglobal.com...
[snip]
> A - stream your input character by character
> B - when you encounter a space, add all "buffered" characters to the list
> C - if you encounter a quote, ignore rule B until you hit another quote
> D - if you hit a backslash, ignore rule C for the next character
> E - once you run out of characters, add all "buffered" characters to the
> list
Nice, clean approach. Who knows about performance, since the re engine
is coded in C while this FSM would be coded in Python, but worth giving
it a try, I think. Here's a rather straightforward coding of it -- it
can no doubt be coded more elegantly by making the FSM explicit; this
version relies far too much on making its checks in a specific order,
and on 'continue' statements to avoid too-deep nesting... still, here
comes, coded off-the-cuff:
def splitaline(line):
result=[]
curtok=[]
insidequote=0
literalnext=0
for c in line:
if literalnext:
curtok.append(c)
literalnext=0
continue
if c=='\\':
literalnext=1
continue
if insidequote:
if c=='"':
result.append(string.join(curtok,''))
curtok=[]
insidequote=0
else:
curtok.append(c)
continue
if c=='"':
insidequote=1
elif c==' ':
result.append(string.join(curtok,''))
curtok=[]
else:
curtok.append(c)
if len(curtok):
result.append(string.join(curtok,''))
return result
> I made an horrid 68 lines monster to split a string to a list of
substrings
Well, at least this halves it:-).
> based on following example:
>
> This is an "example of a \"splitted\" text " by my monster.
>
> results to this list:
>
> [ 'This' , 'is' , 'an' , 'example of a "splitted" text ' , 'by' , 'my' ,
> 'monster' ]
>
> But the stuff is too slow to parse the lines of giant log files.
Alex
More information about the Python-list
mailing list