Parsing text

Wed May 6 14:55:21 EDT 2009

iainemsley wrote:
> Hi,
> I'm trying to write a fairly basic text parser to split up scenes and
> acts in plays to put them into XML. I've managed to get the text split
> into the blocks of scenes and acts and returned correctly but I'm
> trying to refine this and get the relevant scene number when the split
> is made but I keep getting an NoneType error trying to read the block
> inside the for loop and nothing is being returned. I'd be grateful for
> some suggestions as to how to get this working.
> 
> for scene in text.split('Scene'):
>     num = re.compile("^\s\[0-9, i{1,4}, v]", re.I)
>     textNum = num.match(scene)
>     if textNum:
>         print textNum
>     else:
>         print "No scene number"
>     m = '<div type="scene>'
>     m += scene
>     m += '<\div>'
>     print m
> 
The problem is with your regular expression. Unfortunately, I can't tell
what you're trying to match. Could you provide some examples of the
scene numbers?