[Tutor] os.rename anomaly in Python 2.3 on Windows XP

Tim Golden mail at timgolden.me.uk
Tue Oct 9 11:33:38 CEST 2007


Tony Cappellini wrote:
> Using Windows XP, SP2 and Python 2.3
> 
> I've written a script which walks through a bunch of directories and
> replaces characters which are typically illegals as filenames, with an
> '_' character.

[...]

> When my script encounters a directory with the unwanted characters,
> it's easy to detect them and filter them out. The next step is to
> rename the file to get rid of the problem characters.

[...]

> However, recently when I called os.rename(oldname, newname) an OS
> exception was thrown with "Illegal filename". I was able to narrow it
> down to oldname being the cause of the problem.
> Some of the characters showed up as ? in the Python strings.
> 
> Oddly enough, os.rename() cannot perform the renaming of the
> directories, but I can do this manually in File Explorer or even in a
> CMD console using "rename"
> 
> So what is os.renaming() actually calling on a Windows system, that
> won't allow me to rename dirs with illegal characters?


Well, the simple answer to that is (cut-and-pasted and snipped a bit)
from the posixmodule.c source:

	if (unicode_file_names()) {
...
	result = MoveFileW(PyUnicode_AsUnicode(o1),
		PyUnicode_AsUnicode(o2));
...
	result = MoveFileA(p1, p2);


so it's using the MoveFileW with two unicode filenames, or
the MoveFileA with two non-unicode filenames. So... are you
calling os.rename with unicode or non-unicode filenames?

If you're using, say, os.walk or os.listdir to walk your tree,
pass it a unicode path to start with, and the filenames coming
back will also be unicode. Try this, for example:

<code>
import os, sys

#
# filename with random non-ascii char
#
filename = u"abc\u0123.txt"
open (filename, "w").close ()

for i in os.listdir (u"."):
   print i.encode (sys.stdout.encoding, "replace")

new_filename = unicode (filename.encode ("ascii", "replace").replace ("?", "_"))
os.rename (filename, new_filename)

for i in os.listdir (u"."):
   print i

</code>

The filename with the random unicode char is
shown (with the fill-in question-mark) in the
initial list. It's then renamed with the non-ascii
char replaced by "_" and appears without an encoding
in the final list.

I think this is what you're after.

TJG


More information about the Tutor mailing list