encoding problems (é and è)
John Machin
sjmachin at lexicon.net
Fri Mar 24 13:49:19 EST 2006
On 24/03/2006 11:44 PM, Peter Otten wrote:
> John Machin wrote:
>
>
>>0x00d0: ord('D'), # Ð
>>0x00f0: ord('o'), # ð
>>Icelandic capital eth becomes D, OK; but the small letter becomes o!!!
>
>
> I see information flow from Iceland is a bit better than from Armenia :-)
No information flow needed. Capital letter BLAH -> D and small letter
BLAH -> o should trigger one's palpable nonsense detector for *any* BLAH.
>
>
>>Some of the transformations are a little unfortunate :-(
>
>
> The OP, as you pointed out in your first post in this thread, has more
> pressing problems with his normalization approach.
>
> Lastly, even if all went well, turning a list of French addresses into an
> ascii-uppercase graveyard would be a sad thing to do...
Oh indeed. Not only sad, but incredibly stupid. I fervently hope and
trust that such a normalisation is intended only for fuzzy matching
purposes. I can't imagine that anyone would contemplate writing the
output to storage for any reason other than logging or for regression
testing. Update it back to the database? Do you know anyone who would do
that??
More information about the Python-list
mailing list