Replace high-bit characters in file.

Mikke M. plopp at dummyaddress.doh
Sun Mar 31 19:32:23 EST 2002


> > # This is not tested, of course
> > file_contents = open('dbasefile', 'rb').read()
> > file_contents = unicode(file_contents, 'latin-1').encode('cp850')
> > # file_contents is now a string containing the contents of the
> > # file in cp850 format. You can write that string to a file, if you
> > # want.
> > open('dbasefile.out', 'wb').write(file_contents)
> >
> > Is that what you wanted?
>
> Probably not. If a (dBase) file is a binary thing, it might well be
> that you modify the non-text parts of it. You really have to
> understand the structure of the file, and apply the transformation
> only to the text fragments.

The routine above ran in to trouble, exiting with a "UnicodeError: charmap
encoding error: character maps to <undefined>".

I could solve that with a:
    file_contents = unicode(file_contents,
'latin-1').encode('cp850','replace')
But that would, as you say, be dangerous as it may replace some of the
non-text parts.

However, it set me on the right track, so I edited the dbf-module I'm using
(http://www.fiby.at/dbfpy/index.html) so that when it writes a text field to
the dbase file, it is written in cp850.  This way only the relevant part of
the file is changed.


Thanks guys!!

/Mikke






More information about the Python-list mailing list