Opening and reading randomly text file with umlauts and accents truncates output - unicode how?

synthespian synthespian at uol.com.br
Thu Jan 31 23:11:16 EST 2002


Hi -

I have this text file with some German words. They show nouns. The point 
is displaying nouns in the singular form (first word after das, below), 
and asking the user to input what the correct form of the definite 
article is (der, die, or das - the list bellow is just a sample).

Sample:

das Appartement, Appartements
das Auge, Augen
das Bad, Bäder
das Bein, Beine
das Beispiel, Beispiele
das Buch, Bücher
das Büro, Büros
das Café, Cafés

So then I wrote this little script:
There may be a more elegant solution, but it does what I want it to do, 
__except__ for the fact that when it comes to words with accents or 
umlauts (Café, Büro)m the output from
print m.group(2)
is truncated! Like "Caf" or "B".
I'm using python 1.5.2 (Debian), but before you shout "Godammit you 
acid-head, don't you know better than using 1.5.2?!" I'd like to know 
precisely __how__ I am to use Unicode support in this code (yes, I 
acknowledge I have to upgrade, I'll do it __tonight__, but please answer 
the Unicode part, if you can).

TIA to all the fine people out there,
Spread the Love
synthespian at uol.com.br

#!/usr/bin/env python


import re
from random import randint


#filename = raw_input ('Enter file name: ')
#file = open(filename, 'r')

file = open('/home/xxxxxx/yyyy/shortwort.txt', 'r')

allLines = file.readlines()

file.close()

listSize =  len(allLines)

listLine = randint (1, listSize-1)

p = re.compile('^(der|die|das(\s\w+))')

print listLine, "\n"

print allLines[listLine], "\n"

m = p.search(allLines[listLine])

print m.group(0),"m.group(0)\n"
print m.group(1),"m.group(1)\n"
print m.group(2),"m.group(2)\n" # This is word in singular, w/out 
definite article
print m.group(1)[0:3], "\n" # Not elegant, but does the trick: print 
only the article

answerString = m.group(1)[0:3]
print answerString, "This is the answer string\n"


print 'What is the definite article related to: ', m.group(2), '?\n'
antwort = raw_input('Answer: ')

if antwort == answerString:
print "Richtig!"

else:
print "You're soooo wrong, dude!"










More information about the Python-list mailing list