[Web-SIG] simplejson 2.0.0 released, much faster.

Arnar Birgisson arnarbi at gmail.com
Sat Sep 27 23:31:50 CEST 2008


On Sat, Sep 27, 2008 at 23:24, Bob Ippolito <bob at redivi.com> wrote:
> On Sat, Sep 27, 2008 at 2:10 PM, Arnar Birgisson <arnarbi at gmail.com> wrote:
>> On Sat, Sep 27, 2008 at 22:13, Bob Ippolito <bob at redivi.com> wrote:
>>> If you give it unicode input, it will decode to unicode. Basically it
>>> scans through the str until it finds non-ASCII, escape, or end quote.
>>> If it finds the end quote first it will just allocate a new string
>>> with exactly that day, which is super fast since it's just an alloc
>>> and copy.
>>>
>>> It will of course always decode everything containing non-ASCII
>>> characters or any escape sequences to unicode. It is not currently
>>> configurable. It was done for performance, but also does produce nicer
>>> looking repr output because you don't have so many 'u' characters to
>>> look at :) Given the way str works in Python 2.x it should not be an
>>> incompatible change except for doctests... and I guess code that
>>> explicitly checks for unicode and doesn't know what to do with str,
>>> but that would be weird.
>>
>> The reason I asked was because I've had problems even with pure-ASCII
>> strs when mixed with unicode objects in some DB-API drivers, working
>> with filesystems on the OS-X and others. A "solution" was to have
>> everything in unicode.
>
> I've never seen a pure ASCII str cause problems. I've seen pure ASCII
> unicode cause problems in stupid ways though, because not all Python C
> code that handles text can handle unicode. Dumb stuff like this bites
> me all the time in Genshi templates (where all string literals are
> unicode):
>
>>>> datetime.datetime.now().strftime(u'%Y')
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> TypeError: strftime() argument 1 must be str, not unicode
>
> Any operation involving a str and unicode should up-convert to
> unicode, and regardless of the defaultencoding a pure ASCII str will
> properly get handled (at least in Python 2.5, I don't remember what
> 2.4 did)... e.g. ''.join(['', u'foo']) returns u'foo'

Right, I can't remember the exact details now - this was in my last job :)

>> Since that means the string given to simplejson to decode will be a
>> unicode string anyways, so in that case there's no problem :)
>
> If you can prove that there is an actual problem I'm sure I can come
> up with a flag that would ensure unicode, but the implementation would
> probably be just translation of the input document to unicode before
> decoding ;)

Well, don't worry about it :)

cheers,
Arnar


More information about the Web-SIG mailing list