Newbie question about text encoding

Laura Creighton lac at openend.se
Tue Feb 24 14:45:54 EST 2015


In a message of Tue, 24 Feb 2015 12:13:24 -0500, Dave Angel writes:
>With a sample of one string, how did you read "all his strings".  And 
>with one non-ASCII code in that single string, how did you know that 
>'latin1' was the only encoding that included a reasonable character at 
>that encoding?

Ah, 2 strings.  And I did not promise that latin1 was
the only encoding that  included a reasonable char at
his encoding.  I only proinmised that it was one that did.
And, given the nature of the data, I was pretty sure that
this was the one he wanted.  If it did not work, he
would come back and complain.

>See http://support.esri.com/cn/knowledgebase/techarticles/detail/21106
>
>according to that page, starting at ArcGIS 10.2.1, the default sets the 
>code page to UTF-8 (UNICODE) in the shapefile (.DBF)

Who cares.   In Europe, among Europeans, we are used to seeing
Latin1 or Latin2.

>My guess is that this is only appropriate for users who use only locally 
>created data.  Since the OP's data is apparently old (if it were current 
>versions, it'd have been utf-8), who knows how consistent the encoding is.

I do.  Very much so.  The idea that the whole world loves utf-8 is
nonsense.  Most of europe has been using latin1, latin2 etc. before
unicode was invented and will, as far as I know, continue to use it.
Oldness is an indication that latin1 is more likely to be the encoding
than uft-8.

Your guess is that latin1 is only used in local encodings.

My data is that, we in Western Europe, have this format pretty much all
of the time, for everywhere, unless you are only doing local
encodings (in which case you would use utf-8)

Laura



More information about the Python-list mailing list