[Python-3000] Should int() and float() accept bytes?

Tue Apr 15 03:58:23 CEST 2008

This is a repeat of a question that came up on the "Decimal(unicode)" thread
a
little while ago.  I think it needs an answer, so I'm reposting it in its
own thread.
I couldn't find any other previous discussion of this; apologies if I'm
rehashing
old issues.

Currently, int() and float() accept bytes instances.  For example:

>>> int(bytes([49, 50, 51]))
123
[40381 refs]
>>> int(b'123')
123
[40381 refs]

Philosophically, this seems wrong:  it's not clear why bytes([49, 50, 51])
should represent an integer, or even which integer it should represent; if
it's intended that the bytes sequence be thought of as an ascii string
then really it should be explicitly decoded as such first:

>>> int(b'123'.decode('ascii'))
123

On the other hand, there's at least some sense in which bytes already
acts as a sort of poor-man's string: witness bytes.lower and friends.
Maybe practicality beats purity here?

What do people think about changing the int() and float() constructors so
that
they don't accept bytes?

I experimented with removing int(bytes) and int(bytearray) support in
longobject.c's long_new and in PyNumber_Long in abstract.c, to see how much
breakage occurred.   The results:

11 tests failed:
    test_email test_httplib test_io test_mimetools test_pickle
    test_pickletools test_random test_smtplib test_sqlite test_tarfile
    test_uu

(random.py needed some patching to get the test-suite to
run in the first place.)

None of the breakage looks particularly serious or difficult to fix. I
haven't tried removing float(bytes) support yet.

See also

http://bugs.python.org/issue2483

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20080414/887f9bd5/attachment.htm