Help opening and reading text files in ISO-8859-1

Henry Lebowzki hzi at uol.com.br
Sat Jan 12 00:24:41 EST 2002


Hi-

    Basically, I have this text file with words in German, that looks like
this:

das Appartement, Appartements  5
das Auge, Augen    6
das Bad, Bäder    5
das Bein, Beine    6
das Beispiel, Beispiele   6
das Buch, Bücher    4
das Büro, Büros    7
das Café, Cafés    4
das Camping    9
das Dach, Dächer    5

    (the numbers are chapter numbers)
    When I open it with a little Python script (which I pasted below), I get
this weird output:

['das Appartement', ' Appartements\011\0115\012']
['das Auge', ' Augen\011\011\011\0116\012']
['das Bad', ' B\344der\011\011\011\0115\012']
['das Bein', ' Beine\011\011\011\0116\012']
['das Beispiel', ' Beispiele\011\011\0116\012']
['das Buch', ' B\374cher\011\011\011\0114\012']
['das B\374ro', ' B\374ros\011\011\011\0117\012']
['das Caf\351', ' Caf\351s\011\011\011\0114\012']

    In particular, to things bother me the most:
    1) Where are my umlauts ("Büro", not "B\374ro'"; same with "Café", etc.)
    2) Does the output >have< to be with those horrible brackets?
    3) There's no buffering, despite the fact that I set buffering = 10 (see
code below);
the output just scrolllls too fast to read.

    So you see, it's not the output I wished for.

    The code I used was:

import re
filename = raw_input ('Enter file name: ')
file = open (filename, 'r', 10)
allLines = file.readlines()
file.close()
for eachLine in allLines:
    string = re.split ('[,]' , eachLine)
    print string

wortmatch = '(\w$)(\w$)'

    I've tried
        print string.encode('iso-8859-1')
    but it won't work.


    Can you help me? I feel this is a fairly common problem for __all__
those
people whose alphabet is not covered in the ASCII charset. And the
documentation
is not good regarding this issue.

    TIA,
    Regs
    HL







More information about the Python-list mailing list