re question

Daniel Schüle uval at rz.uni-karlsruhe.de
Fri Jun 23 09:23:57 EDT 2006


Hello re gurus,

I wrote this pattern trying to get the "name" and the "content" of VHDL 
package
I know that the file is a valid VHDL code, so actually there is no need to 
perform
validation after 'end' token is found, but since it works fine I don't want 
to touch it.

this is the pattern

pattern = 
re.compile(r'^\s*package\s+(?P<name>\w+)\s+is\s+(?P<content>.*?)\s+end(\s+package)?(\s+(?P=name))?\s*;', 
re.DOTALL | re.MULTILINE | re.IGNORECASE)

and the problem is that
    package TEST is xyz end;
works but
    package TEST123 is xyz end;
fails

\w is supposed to match [a-zA-Z0-9_] so I don't understand why numbers and 
undescore let the pattern fail?
(there is a slight suspicion that it may be a re bug)

I also tried this pattern with the same results

pattern = 
re.compile(r'^\s*package\s+(?P<name>.+?)\s+is\s+(?P<content>.*?)\s+end(\s+package)?(\s+(?P=name))?\s*;', 
re.DOTALL | re.MULTILINE | re.IGNORECASE)

something must be wrong with (?P<name>\w+) inside the main pattern

thanks in advance

--
Daniel 





More information about the Python-list mailing list