[Tutor] Regex ^$ not behaving as expected

Thu Dec 8 11:58:40 EST 2016

Hi Edmund,

For each of the cases that surprise you, next time, can you also say
what you expected to see?  That can help us see where the confusion
lies; as it stands, if we have the same mental model as what's
happening in Python, then the results look correct to us.  :P

I can guess at what you were expecting.  For example, a variation of
one of your cases would be:

>>> print(re.sub(r'^AAA', "aaa", s, re.MULTILINE))
aaacBBB
AAAdBBB

where we would have expected both occurrences of 'AAA' to be replaced
by their lowercased examples, but instead, it seems like it's only
matching the first occurrence.

If that's your expectation as well, then yes, we agree, that looks
weird.  Let's look at the documentation for re.sub.

    https://docs.python.org/3/library/re.html#re.sub

... Oh!  It looks like it takes several potential optional parameters,
not only 'flags', but 'count' as well.

Perhaps we've passed re.MULTILINE by accident as a 'count'.  Let's
explicitly pass it as the 'flags' argument instead.  Does that make a
difference

>>> print(re.sub(r'^AAA', "aaa", s, flags=re.MULTILINE))
aaacBBB
aaadBBB

Yes!  Ok, so that result looks more reasonable.  So I think that's
where the problem is.

I'm still somewhat confused as to what the regexp module is doing when
passing a non-numeric count parameter.  That looks like it should
raise a TypeError to me, so perhaps someone needs to file a bug
against the standard library?  Unsure.