Unicode string handling problem (revised)

Richard Schulman raschulmanxx at verizon.net
Tue Sep 5 21:35:46 EDT 2006


The appended program fragment works correctly with an ascii input
file. But the file I actually want to process is Unicode (utf-16
encoding). This file must be Unicode rather than ASCII or Latin-1
because it contains mixed Chinese and English characters.

When I run the program I get an attribute_count of zero. This
is incorrect for the input file, which should give a value of fifteen
or sixteen. In other words, the count function isn't recognizing the

",

characters to be counted in the line read.

Here's the program:

in_file = open("c:\\pythonapps\\in-graf1.my","rU")
try:
    # Skip the first line; make the second available for processing
    in_file.readline()
    in_line = in_file.readline()
    attribute_count = in_line.count('",')
    print attribute_count
finally:
    in_file.close()

Any suggestions?

Richard Schulman
(delete 'xx' characters for email reply)



More information about the Python-list mailing list