Regular expressions

rusi rustompmody at gmail.com
Wed Dec 28 02:05:36 EST 2011


On Dec 27, 10:01 am, Fredrik Tolf <fred... at dolda2000.com> wrote:
> On Mon, 26 Dec 2011, mauricel... at acm.org wrote:
> > I've tried
>
> > re.sub('@\S\s[1-9]:[A-N]:[0-9]', '@\S\s', '@HWI-ST115:568:B08LLABXX:
> > 1:1105:6465:151103 1:N:0:')
>
> > but it does not seems to work.
>
> Indeed, for several reasons. First of all, your backslash sequences are
> interpreted by Python as string escapes. You'll need to write either "\\S"
> or r"\S" (the r, for raw, turns off backslash escapes).
>
> Second, when you use only "\S", that matches a single non-space character,
> not several; you'll need to quantify them. "\S*" will match zero or more,
> "\S+" will match one or more, "\S?" will match zero or one, and there are
> a couple of other possibilities as well (see the manual for details). In
> this case, you probably want to use "+" for most of those.
>
> Third, you're not marking the groups that you want to use in the
> replacement. Since you want to retain the entire string before the space,
> and the numeric element, you'll want to enclose them in parentheses to
> mark them as groups.
>
> Fourth, your replacement string is entirely wacky. You don't use sequences
> such as "\S" and "\s" to refer back to groups in the original text, but
> numbered references, to refer back to parenthesized groups in the order
> they appear in the regex. In accordance what you seemed to want, you
> should probably use "@\1/\2" in your case ("\1" refers back to the first
> parentesized group, which you be the first "\S+" part, and "\2" to the
> second group, which should be the "[1-9]+" part; the at-mark and slash
> are inserted as they are into the result string).
>
> Fifth, you'll probably want to match the last colon as well, in order not
> to retain it into the result string.
>
> All in all, you will probably want to use something like this to correct
> that regex:
>
> re.sub(r'@(\S+)\s([1-9]+):[A-N]+:[0-9]+:', r'@\1/\2',
>         '@HWI-ST115:568:B08LLABXX:1:1105:6465:151103 1:N:0:')
>
> Also, you may be interested to know that you can use "\d" instead of
> "[0-9]".
>
> --
>
> Fredrik Tolf

For practical 'get-the-hands-dirty' experience look at

python-specific:  http://kodos.sourceforge.net/
Online: http://gskinner.com/RegExr/
emacs-specific: re-builder and regex-tool http://bc.tech.coop/blog/071103.html



More information about the Python-list mailing list