file_name_fixer.py
Fredrik Lundh
fredrik at pythonware.com
Wed Jan 25 04:33:23 EST 2006
Steven D'Aprano wrote:
> > or you can use a more well-suited function:
> >
> > # replace runs of _ and . with a single character
> > newname = re.sub("_+", "_", newname)
> > newname = re.sub("\.+", ".", newname)
>
> You know, I really must sit down and learn how to use
> reg exes one of these days. But somehow, every time I
> try, I get the feeling that the work required to learn
> to use them effectively is infinitely greater than the
> work required to re-invent the wheel every time.
here's all you need to understand the code above:
. ^ $ * + ? ( ) [] { } | \ are reserved characters
all other characters match themselves
reserved characters must be escaped to match themselves;
to match a dot, use \. (which the RE engine sees as \.)
+ means match one or more of the preceeding item
so _+ matches one or more underscores, and \.+ matches
one or more dots
re.sub(pattern, replacement, text) replaces all matches for
the given pattern in text with the given replacement string
so re.sub("_+", "_", newname) replaces runs of underscores with
a single underscore.
> > or, slightly more obscure:
> >
> > newname = re.sub("([_.])\\1+", "\\1", newname)
>
> _Slightly_?
this introduces three new concepts:
[ ] defines a set of characters
so [_.] will match either _ or .
( ) defines a group of matched characters.
\\1 (which the RE engine sees as \1) refers to the first group
this can be used both in the pattern and in the replacement
string
so re.sub("([_.])\\1+", "\\1", newname) replaces runs consisting
of either a . or an _ followed by one or more copies of itself, with
a single instance of itself.
(using r-strings lets you remove some of extra backslashes, btw)
</F>
More information about the Python-list
mailing list