unicode and dbf files

Ethan Furman ethan at stoneleaf.us
Fri Oct 23 13:14:59 EDT 2009


John Machin wrote:
> On Oct 23, 3:03 pm, Ethan Furman <et... at stoneleaf.us> wrote:
> 
>>John Machin wrote:
>>
>>>On Oct 23, 7:28 am, Ethan Furman <et... at stoneleaf.us> wrote:
>>
>>>>Greetings, all!
>>
>>>>I would like to add unicode support to my dbf project.  The dbf header
>>>>has a one-byte field to hold the encoding of the file.  For example,
>>>>\x03 is code-page 437 MS-DOS.
>>
>>>>My google-fu is apparently not up to the task of locating a complete
>>>>resource that has a list of the 256 possible values and their
>>>>corresponding code pages.
>>
>>>What makes you imagine that all 256 possible values are mapped to code
>>>pages?
>>
>>I'm just wanting to make sure I have whatever is available, and
>>preferably standard.  :D
>>
>>
>>>>So far I have found this, plus variations:http://support.microsoft.com/kb/129631
>>
>>>>Does anyone know of anything more complete?
>>
>>>That is for VFP3. Try the VFP9 equivalent.
>>
>>>dBase 5,5,6,7 use others which are not defined in publicly available
>>>dBase docs AFAICT. Look for "language driver ID" and "LDID". Secondary
>>>source: ESRI support site.
>>
>>Well, a couple hours later and still not more than I started with.
>>Thanks for trying, though!
> 
> 
> Huh? You got tips to (1) the VFP9 docs (2) the ESRI site (3) search
> keywords and you couldn't come up with anything??

Perhaps "nothing new" would have been a better description.  I'd already 
seen the clicketyclick site (good info there), and all I found at ESRI 
were folks trying to figure it out, plus one link to a list that was no 
different from the vfp3 list (or was it that the list did not give the 
hex values?  Either way, of no use to me.)

I looked at dbase.com, but came up empty-handed there (not surprising, 
since they are a commercial company).

I searched some more on Microsoft's site in the VFP9 section, and was 
able to find the code page section this time.  Sadly, it only added 
about seven codes.

At any rate, here is what I have come up with so far.  Any corrections 
and/or additions greatly appreciated.

code_pages = {
     '\x01' : ('ascii', 'U.S. MS-DOS'),
     '\x02' : ('cp850', 'International MS-DOS'),
     '\x03' : ('cp1252', 'Windows ANSI'),
     '\x04' : ('mac_roman', 'Standard Macintosh'),
     '\x64' : ('cp852', 'Eastern European MS-DOS'),
     '\x65' : ('cp866', 'Russian MS-DOS'),
     '\x66' : ('cp865', 'Nordic MS-DOS'),
     '\x67' : ('cp861', 'Icelandic MS-DOS'),
     '\x68' : ('cp895', 'Kamenicky (Czech) MS-DOS'),     # iffy
     '\x69' : ('cp852', 'Mazovia (Polish) MS-DOS'),      # iffy
     '\x6a' : ('cp737', 'Greek MS-DOS (437G)'),
     '\x6b' : ('cp857', 'Turkish MS-DOS'),

     '\x78' : ('big5', 'Traditional Chinese (Hong Kong SAR, Taiwan)\
                Windows'),       # wag
     '\x79' : ('iso2022_kr', 'Korean Windows'),          # wag
     '\x7a' : ('iso2022_jp_2', 'Chinese Simplified (PRC, Singapore)\
                Windows'),       # wag
     '\x7b' : ('iso2022_jp', 'Japanese Windows'),        # wag
     '\x7c' : ('cp874', 'Thai Windows'),                 # wag
     '\x7d' : ('cp1255', 'Hebrew Windows'),
     '\x7e' : ('cp1256', 'Arabic Windows'),
     '\xc8' : ('cp1250', 'Eastern European Windows'),
     '\xc9' : ('cp1251', 'Russian Windows'),
     '\xca' : ('cp1254', 'Turkish Windows'),
     '\xcb' : ('cp1253', 'Greek Windows'),
     '\x96' : ('mac_cyrillic', 'Russian Macintosh'),
     '\x97' : ('mac_latin2', 'Macintosh EE'),
     '\x98' : ('mac_greek', 'Greek Macintosh') }

~Ethan~



More information about the Python-list mailing list