getattr/setattr still ASCII-only, not Unicode - blows up SGMLlib from BeautifulSoup

John Machin sjmachin at lexicon.net
Thu Mar 13 17:35:46 EDT 2008


On Mar 14, 5:38 am, John Nagle <na... at animats.com> wrote:
>    Just noticed, again, that getattr/setattr are ASCII-only, and don't support
> Unicode.
>
>    SGMLlib blows up because of this when faced with a Unicode end tag:
>
>         File "/usr/local/lib/python2.5/sgmllib.py", line 353, in finish_endtag
>         method = getattr(self, 'end_' + tag)
>         UnicodeEncodeError: 'ascii' codec can't encode character u'\xae'
>         in position 46: ordinal not in range(128)
>
> Should attributes be restricted to ASCII, or is this a bug?
>
>                                         John Nagle

Identifiers are restricted -- see section 2.3 (Identifiers and
keywords) of the Reference Manual. The restriction is in effect that
they match r'[A-Za-z_][A-Za-z0-9_]*\Z'. Hence if you can't use
obj.nonASCIIname in your code, it makes sense for the equivalent usage
in setattr and getattr not to be available.

However other than forcing unicode to str, setattr and getattr seem
not to care what you use:

>>> class O(object):
...     pass
...
>>> o = O()
>>> setattr(o, '42', 'universe')
>>> getattr(o, '42')
'universe'
>>> # doesn't even need to be ASCII
>>> setattr(o, '\xff', 'notA-Za-z etc')
>>> getattr(o, '\xff')
'notA-Za-z etc'
>>>

Cheers,
John



More information about the Python-list mailing list