right adjusted strings containing umlauts
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Fri Aug 9 21:29:52 EDT 2013
On Thu, 08 Aug 2013 17:24:49 +0200, Kurt Mueller wrote:
> What do I do, when input_strings/output_list has other codings like
> iso-8859-1?
When reading from a text file, honour some sort of encoding cookie at the
top (or bottom) of the file, like Emacs and Vim use, or a BOM. If there
is no encoding cookie, assume UTF-8.
When reading from stdin, assume UTF-8.
Otherwise, make it the caller's responsibility to specify the encoding if
they wish to use something else.
Pseudo-code:
encoding = None
if command line arguments include '--encoding':
encoding = --encoding argument
if encoding is None:
if input file is stdin:
encoding = 'utf-8'
else:
open file as binary
if first 2-4 bytes look like a BOM:
encoding = one of UTF-8 or UTF-16 or UTF-32
else:
read first two lines
if either looks like an encoding cookie:
encoding = cookie
# optionally check the end of the file as well
close file
if encoding is None:
encoding = 'utf-8'
read from file using encoding
--
Steven
More information about the Python-list
mailing list