encoding problems (é and è)

John Machin sjmachin at lexicon.net
Fri Mar 24 13:49:19 EST 2006


On 24/03/2006 11:44 PM, Peter Otten wrote:
> John Machin wrote:
> 
> 
>>0x00d0: ord('D'), # Ð
>>0x00f0: ord('o'), # ð
>>Icelandic capital eth becomes D, OK; but the small letter becomes o!!!
> 
> 
> I see information flow from Iceland is a bit better than from Armenia :-)

No information flow needed. Capital letter BLAH -> D and small letter 
BLAH -> o should trigger one's palpable nonsense detector for *any* BLAH.

> 
> 
>>Some of the transformations are a little unfortunate :-(
> 
> 
> The OP, as you pointed out in your first post in this thread, has more
> pressing problems with his normalization approach. 
> 
> Lastly, even if all went well, turning a list of French addresses into an
> ascii-uppercase graveyard would be a sad thing to do...

Oh indeed. Not only sad, but incredibly stupid. I fervently hope and 
trust that such a normalisation is intended only for fuzzy matching 
purposes. I can't imagine that anyone would contemplate writing the 
output to storage for any reason other than logging or for regression 
testing. Update it back to the database? Do you know anyone who would do 
that??




More information about the Python-list mailing list