Problem with -3 switch

John Machin sjmachin at lexicon.net
Mon Jan 12 18:03:20 EST 2009


On Jan 13, 6:12 am, Christian Heimes <li... at cheimes.de> wrote:
> > I say again, show me a case of working 2.5 code where prepending u to
> > an ASCII string constant that is intended to be used in a text context
> > is actually worth the keystrokes.
>
> Eventually you'll learn it the hard way. *sigh*

And the hard way involves fire and brimstone, together with weeping,
wailing and gnashing of teeth, correct? Hmmm, let's see. Let's take
Carl's example of the sinner who didn't decode the input: """ Someone
found that urllib.open() returns a bytes object in Python 3.0, which
messed him up since in 2.x he was running regexp searches on the
output.  If he had been taking care to use only unicode objects in 2.x
(in this case, by explicitly decoding the output) then it wouldn't
have been an issue. """

3.0 says:
| >>> re.search("foo", b"barfooble")
| Traceback (most recent call last):
|   File "<stdin>", line 1, in <module>
|   File "C:\python30\lib\re.py", line 157, in search
|     return _compile(pattern, flags).search(string)
| TypeError: can't use a string pattern on a bytes-like object

The problem is diagnosed at the point of occurrence with a 99%-OK
exception message. Why only 99%? Because it only vaguely hints at this
possibility:
| >>> re.search(b"foo", b"barfooble")
| <_sre.SRE_Match object at 0x00FACD78>

Obvious solution (repent and decode):
| >>> re.search("foo", b"barfooble".decode('ascii'))
| <_sre.SRE_Match object at 0x00FD86B0>

This is "messed him up"? One can get equally "messed up" when AFAICT
one is doing the right thing in 2.X e.g. one is digging XML documents
out of a ZIP file (str in 2.X, bytes in 3.x). ElementTree.parse()
requires a file, so in 2.X one uses cStringIO.StringIO. 2to3 changes
that to io.StringIO [quite reasonable; no easy way of knowing if
BytesIO would be better; StringIO more probable]. 3.X barfs on the
io.StringIO(xml_bytes) with a reasonable message "TypeError: can't
write bytes to text stream" [momentarily puzzling -- write? Oh yeah,
it happens in self.write(initial_value)] so there's a need to setup a
BYTESIO that's conditional on Python version -- not a big deal at all.

I see no /Inferno/ here :-)




More information about the Python-list mailing list