byte count unicode string

John Machin sjmachin at lexicon.net
Wed Sep 20 06:12:54 EDT 2006


willie wrote:
> John Machin:
>
>  >You are confusing the hell out of yourself. You say that your web app
>  >deals only with UTF-8 strings. Where do you get "the unicode string"
>  >from??? If name is a utf-8 string, as your comment says, then len(name)
>  >is all you need!!!
>
>
> # I'll go ahead and concede defeat since you appear to be on the
> # verge of a heart attack :)
> # I can see that I lack clarity so I don't blame you.

All you have to do is use terminology like "Python str object, encoded
in utf-8" and "Python unicode object".

>
> # By UTF-8 string, I mean a unicode object with UTF-8 encoding:

There is no such animal as a "unicode object with UTF-8 encoding".
Don't make up terminology as you go.

>
> type(ustr)
> <type 'unicode'>
>  >>> repr(ustr)
> "u'\\u2708'"

Sigh. I suppose we have to infer that "ustr" is the same as the "name"
that you were getting as post data.  Is that correct?

>
> # The database API expects unicode objects:
> # A template query, then a variable number of values.
> # Perhaps I'm a victim of arbitrary design decisions :)

And the database will encode those unicode objects as utf-8, silently
truncating any that are too long -- just as Duncan feared? "Arbitrary"
is not the word for it. 

Good luck! 

Cheers,
John




More information about the Python-list mailing list