Unicode File Names

John Machin sjmachin at lexicon.net
Thu Oct 16 22:18:48 EDT 2008


On Oct 17, 12:52 pm, Jordan <jordan.tayl... at gmail.com> wrote:
> On Oct 16, 9:20 pm, John Machin <sjmac... at lexicon.net> wrote:
>
>
>
> > On Oct 17, 11:43 am, Jordan <jordan.tayl... at gmail.com> wrote:
>
> > > I've got a bunch of files with Japanese characters in their names and
> > > os.listdir() replaces those characters with ?'s. I'm trying to open
> > > the files several steps later, and obviously Python isn't going to
> > > find '01-????.jpg' (formally '01-ひらがな.jpg') because it doesn't exist.
> > > I'm not sure where in the process I'm able to stop that from
> > > happening. Thanks.
>
> > The Fine Manual says:
> > """
> > listdir( path)
>
> > Return a list containing the names of the entries in the directory.
> > The list is in arbitrary order. It does not include the special
> > entries '.' and '..' even if they are present in the directory.
> > Availability: Macintosh, Unix, Windows.
> > Changed in version 2.3: On Windows NT/2k/XP and Unix, if path is a
> > Unicode object, the result will be a list of Unicode objects.
> > """
>
> > Are you unsure whether your version of Python is 2.3 or later?
>
> *** Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32
> bit (Intel)] on win32. *** says my interpreter
>
> when it says "if path is a Unicode object...", does that mean the path
> name must have a Unicode char?

If path is a Unicode [should read unicode] object of length > 0, then
*all* characters in path are by definition unicode characters.

Where are you getting your path from? If you are doing os.listdir(r'c:
\test') then do os.listdir(ur'c:\test'). If you are getting it from
the command line or somehow else as a variable, instead of
os.listdir(path), try os.listdir(unicode(path)). If that fails with a
message like "UnicodeDecodeError: 'ascii' codec can't decode .....",
then you'll need something like os.listdir(unicode(path,
encoding='cp1252')) # cp1252 being the most likely suspect :)

I strongly suggest that you read this:
   http://www.amk.ca/python/howto/unicode
which contains lots of useful information, including an answer to your
original question.



More information about the Python-list mailing list