Parsing text

Thu May 7 10:09:36 EDT 2009

iainemsley <iainemsley <at> googlemail.com> writes:

> 
> Hi,
> I'm trying to write a fairly basic text parser to split up scenes and
> acts in plays to put them into XML. I've managed to get the text split
> into the blocks of scenes and acts and returned correctly but I'm
> trying to refine this and get the relevant scene number when the split
> is made but I keep getting an NoneType error trying to read the block
> inside the for loop and nothing is being returned. I'd be grateful for
> some suggestions as to how to get this working.
> 
> for scene in text.split('Scene'):
>     num = re.compile("^\s\[0-9, i{1,4}, v]", re.I)
>     textNum = num.match(scene)
>     if textNum:
>         print textNum
>     else:
>         print "No scene number"
>     m = '<div type="scene>'
>     m += scene
>     m += '<\div>'
>     print m
> 
> Thanks, Iain
> --
> http://mail.python.org/mailman/listinfo/python-list
> 
> 

Are you trying to match Roman numerals? As others have said, it is difficult to
make any suggestions without knowing the input to your program.

You may want to look at PyParsing (http://pyparsing.wikispaces.com/) to parse
the text file without messing with regular expressions.

Regards,
Suraj