Unicode -> String problem

John Machin machin_john_888 at hotmail.com
Tue Jul 10 00:59:32 EDT 2001


Jay Parlar <jparlar at home.com> wrote in message news:<mailman.994625054.7131.python-list at python.org>...
> I'm having a problem converting unicode text to string type with str(). 
[snip] 
>  
> I haven't found anything that will explicitly do what I want, namely,
> completely remove any uncovertable unicode characters. I 
> have to be able to parse this text afterwards, 
> using a lot of Python's string functions,
[snip]

Are you completely sure that the offending characters are totally
meaningless in your application? u'\u00A0' is a no-break space; seems
stripping that out but leaving normal spaces (u'\u0020') might not be
a good idea. What other non-ASCII characters do you have?

Are you sure it's Unicode? Is \xA0 exactly what you are seeing, or are
you seeing \u00A0 and telling us it's \xA0 ??

Which "lot of Python's string functions" do you plan to use? Note that
8-bit strings and Unicode strings support a large number of same-name
as-close-to-same-functionality-as-possible methods -- see section
2.1.5.1 of the Python Library Reference manual. Also the re module
supports the same functions and methods on both types.



More information about the Python-list mailing list