Changing filenames from Greeklish => Greek (subprocess complain)

Νικόλαος Κούρας nikos.gr33k at gmail.com
Fri Jun 7 04:10:47 EDT 2013


On 7/6/2013 10:42 πμ, Michael Weylandt wrote:
>
> os.rename( filepath_bytes filepath.encode('utf-8')
> Missing comma, which is, after all, just a matter of syntax so it can't matter, right?
>
I doubted that os.rename arguments must be comma seperated.
But ater reading the docs.

s.rename(/src/,/dst/)<http://docs.python.org/2/library/os.html#os.rename>

    Rename the file or directory/src/to/dst/. If/dst/is a
    directory,OSError
    <http://docs.python.org/2/library/exceptions.html#exceptions.OSError>will
    be raised. On Unix, if/dst/exists and is a file, it will be replaced
    silently if the user has permission. The operation may fail on some
    Unix flavors if/src/and/dst/are on different filesystems. If
    successful, the renaming will be an atomic operation (this is a
    POSIX requirement). On Windows, if/dst/already exists,OSError
    <http://docs.python.org/2/library/exceptions.html#exceptions.OSError>will
    be raised even if it is a file; there may be no way to implement an
    atomic rename when/dst/names an existing file.

    Availability: Unix, Windows.

Indeed it has to be:

os.rename( filepath_bytes, filepath.encode('utf-8')

'mv source target' didn't require commas so i though it was safe to assume that os.rename did not either.


I'am happy to announce that after correcting many idiotic error like commas, missing colons and declaring of variables, this surrogate erro si the last i get.
I still dont understand what surrogate means. In english means replacement.
Here is the code:


#========================================================
# Collect filenames of the path dir as bytes
filename_bytes = os.listdir( b'/home/nikos/public_html/data/apps/' )

# Iterate over all filenames in the path dir
for filename in filename_bytes:
	# Compute 'path/to/filename' in bytes
	filepath_bytes = b'/home/nikos/public_html/data/apps/' + b'filename'
	try:
		filepath = filepath_bytes.decode('utf-8')
	except UnicodeDecodeError:
		try:
			filepath = filepath_bytes.decode('iso-8859-7')
			
			# Rename current filename from greek bytes => utf-8 bytes
			os.rename( filepath_bytes, filepath.encode('utf-8') )
		except UnicodeDecodeError:
			print( '''I give up! This filename is unreadable! ''')


#========================================================
# Get filenames of the apps directory as unicode
filenames = os.listdir( '/home/nikos/public_html/data/apps/' )

# Load'em
for filename in filenames:
	try:
		# Check the presence of a file against the database and insert if it doesn't exist
		cur.execute('''SELECT url FROM files WHERE url = %s''', (filename,) )
		data = cur.fetchone()        #filename is unique, so should only be one
		
		if not data:
			# First time for file; primary key is automatic, hit is defaulted
			cur.execute('''INSERT INTO files (url, host, lastvisit) VALUES (%s, %s, %s)''', (filename, host, lastvisit) )
	except pymysql.ProgrammingError as e:
		print( repr(e) )


#========================================================
filenames = os.listdir( '/home/nikos/public_html/data/apps/' )
filenames = ()

# Build a set of 'path/to/filename' based on the objects of path dir
for filename in filenames:
	filenames.add( filename )

# Delete spurious
cur.execute('''SELECT url FROM files''')
data = cur.fetchall()

# Check database's filenames against path's filenames
for filename in data:
	if filename not in filenames:
		cur.execute('''DELETE FROM files WHERE url = %s''', (filename,) )



=================================

[Fri Jun 07 11:08:17 2013] [error] [client 79.103.41.173]   File "/home/nikos/public_html/cgi-bin/files.py", line 88, in <module>
[Fri Jun 07 11:08:17 2013] [error] [client 79.103.41.173]     cur.execute('''SELECT url FROM files WHERE url = %s''', filename )
[Fri Jun 07 11:08:17 2013] [error] [client 79.103.41.173]   File "/usr/local/lib/python3.3/site-packages/PyMySQL3-0.5-py3.3.egg/pymysql/cursors.py", line 108, in execute
[Fri Jun 07 11:08:17 2013] [error] [client 79.103.41.173]     query = query.encode(charset)
[Fri Jun 07 11:08:17 2013] [error] [client 79.103.41.173] UnicodeEncodeError: 'utf-8' codec can't encode character '\\udcce' in position 35: surrogates not allowed



-- 
Webhost <http://superhost.gr>&& Weblog <http://psariastonafro.wordpress.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20130607/696de353/attachment.html>


More information about the Python-list mailing list