unicode and dbf files

John Machin sjmachin at lexicon.net
Mon Oct 26 15:21:45 EDT 2009


On Oct 27, 3:22 am, Ethan Furman <et... at stoneleaf.us> wrote:
> John Machin wrote:
> > On Oct 24, 4:14 am, Ethan Furman <et... at stoneleaf.us> wrote:
>
> >>John Machin wrote:
>
> >>>On Oct 23, 3:03 pm, Ethan Furman <et... at stoneleaf.us> wrote:
>
> >>>>John Machin wrote:
>
> >>>>>On Oct 23, 7:28 am, Ethan Furman <et... at stoneleaf.us> wrote:
>
> > Try this:
> >http://webhelp.esri.com/arcpad/8.0/referenceguide/
>
> Wow.  Question, though:  all those codepages mapping to 437 and 850 --
> are they really all the same?

437 and 850 *are* codepages. You mean "all those language driver IDs
mapping to codepages 437 and 850". A codepage merely gives an
encoding. An LDID is like a locale; it includes other things besides
the encoding. That's why many Western European languages map to the
same codepage, first 437 then later 850 then 1252 when Windows came
along.

> >>     '\x68' : ('cp895', 'Kamenicky (Czech) MS-DOS'),     # iffy
>
> > Indeed iffy. Python doesn't have a cp895 encoding, and it's probably
> > not alone. I suggest that you omit Kamenicky until someone actually
> > wants it.
>
> Yeah, I noticed that.  Tentative plan was to implement it myself (more
> for practice than anything else), and also to be able to raise a more
> specific error ("Kamenicky not currently supported" or some such).

The error idea is fine, but I don't get the "implement it yourself for
practice" bit ... practice what? You plan a long and fruitful career
inplementing codecs for YAGNI codepages?
>
> >>     '\x7b' : ('iso2022_jp', 'Japanese Windows'),        # wag
>
> > Try cp936.
>
> You mean 932?

Yes.

> Very helpful indeed.  Many thanks for reviewing and correcting.

You're welcome.

> Learning to deal with unicode is proving more difficult for me than
> learning Python was to begin with!  ;D

?? As far as I can tell, the topic has been about mapping from
something like a locale to the name of an encoding, i.e. all about the
pre-Unicode mishmash and nothing to do with dealing with unicode ...

BTW, what are you planning to do with an LDID of 0x00?

Cheers,

John



More information about the Python-list mailing list