Help opening and reading text files in ISO-8859-1
Henry Lebowzki
hzi at uol.com.br
Sat Jan 12 00:24:41 EST 2002
Hi-
Basically, I have this text file with words in German, that looks like
this:
das Appartement, Appartements 5
das Auge, Augen 6
das Bad, Bäder 5
das Bein, Beine 6
das Beispiel, Beispiele 6
das Buch, Bücher 4
das Büro, Büros 7
das Café, Cafés 4
das Camping 9
das Dach, Dächer 5
(the numbers are chapter numbers)
When I open it with a little Python script (which I pasted below), I get
this weird output:
['das Appartement', ' Appartements\011\0115\012']
['das Auge', ' Augen\011\011\011\0116\012']
['das Bad', ' B\344der\011\011\011\0115\012']
['das Bein', ' Beine\011\011\011\0116\012']
['das Beispiel', ' Beispiele\011\011\0116\012']
['das Buch', ' B\374cher\011\011\011\0114\012']
['das B\374ro', ' B\374ros\011\011\011\0117\012']
['das Caf\351', ' Caf\351s\011\011\011\0114\012']
In particular, to things bother me the most:
1) Where are my umlauts ("Büro", not "B\374ro'"; same with "Café", etc.)
2) Does the output >have< to be with those horrible brackets?
3) There's no buffering, despite the fact that I set buffering = 10 (see
code below);
the output just scrolllls too fast to read.
So you see, it's not the output I wished for.
The code I used was:
import re
filename = raw_input ('Enter file name: ')
file = open (filename, 'r', 10)
allLines = file.readlines()
file.close()
for eachLine in allLines:
string = re.split ('[,]' , eachLine)
print string
wortmatch = '(\w$)(\w$)'
I've tried
print string.encode('iso-8859-1')
but it won't work.
Can you help me? I feel this is a fairly common problem for __all__
those
people whose alphabet is not covered in the ASCII charset. And the
documentation
is not good regarding this issue.
TIA,
Regs
HL
More information about the Python-list
mailing list