right curly quote and unicode

Leo Kislov leo.kislov at gmail.com
Wed Oct 18 17:44:49 EDT 2006


On 10/17/06, TiNo <tinodb at gmail.com> wrote:
> Hi all,
>
> I am trying to compare my Itunes Library xml to the actual files on my
> computer.
> As the xml file is in UTF-8 encoding, I decided to do the comparison of the
> filenames in that encoding.
> It all works, except with one file. It is named 'The Chemical
> Brothers-Elektrobank-04 - Don't Stop the Rock (Electronic Battle Weapon
> Version).mp3'. It goes wrong with the apostrophe in Don't. That is actually
> not an apostrophe, but ASCII char 180: ´

It's actually Unicode char #180, not ASCII. ASCII characters are in
0..127 range.

> In the Itunes library it is encoded as: Don%E2%80%99t

Looks like a utf-8 encoded string, then encoded like an url.

> I do some some conversions with both the library path names and the folder
> path names. Here is the code:
> (in the comment I dispay how the Don't part looks. I got this using print
> repr(filename))
> -------------------------------------------------------------
> #Once I have the filenames from the library I clean them using the following
> code (as filenames are in the format '
> file://localhost/m:/music/track%20name.mp3')
>
> filename = urlparse.urlparse(filename)[2][1:]  # u'Don%E2%80%99t' ; side
> question, anybody who nows a way to do this in a more fashionable way?
> filename = urllib.unquote (filename) # u'Don\xe2\x80\x99t'

This doesn't work for me in python 2.4, unquote expects str type, not
unicode. So it should be:

filename = urllib.unquote(filename.encode('ascii')).decode('utf-8')


> filename = os.path.normpath(filename) # u'Don\xe2\x80\x99t'
>
> I get the files in my music folder with the os.walk method and then
> I do:
>
> filename = os.path.normpath(os.path.join (root,name))  # 'Don\x92t'
> filename = unicode(filename,'latin1') # u'Don\x92t'
> filename = filename.encode('utf-8') # 'Don\xc2\x92t'
> filename = unicode(filename,'latin1') # u'Don\xc2\x92t'

This looks like calling random methods with random parameters :)
Python is able to return you unicode file names right away, you just
need to pass input parameters as unicode strings:

>>> os.listdir(u"/")
[u'alarm', u'ARCSOFT' ...]

So in your case you need to make sure the start directory parameter
for walk function is unicode.



More information about the Python-list mailing list