Changing filenames from Greeklish => Greek (subprocess complain)

Νικόλαος Κούρας nikos.gr33k at gmail.com
Fri Jun 7 14:52:24 EDT 2013


Τη Παρασκευή, 7 Ιουνίου 2013 5:29:25 μ.μ. UTC+3, ο χρήστης MRAB έγραψε:

> This is a worse way of doing it because the ISO-8859-7 encoding has 1
> byte per codepoint, meaning that it's more 'tolerant' (if that's the 
> word) of errors. A sequence of bytes that is actually UTF-8 can be
> decoded as ISO-8859-7, giving gibberish.

> UTF-8 is less tolerant, and it's the encoding that ideally you should 
> be using everywhere, so it's better to assume UTF-8 and, if it fails,  
> try ISO-8859-7 and then rename so that any names that were ISO-8859-7
> will be converted to UTF-8.

Indeed iw asnt aware of that, at that time, i was under the impression that if a string was encoded to bytes using soem charset can only be switched back with the use of that and only that charset. Since this is the case here is my fixning:


#========================================================
# Collect filenames of the path dir as bytes
filename_bytes = os.listdir( b'/home/nikos/public_html/data/apps/' )

for filename in filename_bytes:
	# Compute 'path/to/filename' into bytes
	filepath_bytes = b'/home/nikos/public_html/data/apps/' + b'filename'
	flag = False
	
	try:
		# Assume current file is utf8 encoded
		filepath = filepath_bytes.decode('utf-8')
		flag = 'utf8' 
	except UnicodeDecodeError:
		try:
			# Since current filename is not utf8 encoded then it has to be greek-iso encoded
			filepath = filepath_bytes.decode('iso-8859-7')
			flag = 'greek'
		except UnicodeDecodeError:
			print( '''I give up! File name is unreadable!''' )
	
	if( flag = 'greek' )
		# Rename filename from greek bytes --> utf-8 bytes
		os.rename( filepath_bytes, filepath.encode('utf-8') )


#========================================================
filenames = os.listdir( '/home/nikos/public_html/data/apps/' )

# Load'em
for filename in filenames:
	try:
		# Check the presence of a file against the database and insert if it doesn't exist
		cur.execute('''SELECT url FROM files WHERE url = %s''', filename )
		data = cur.fetchone()
		
		if not data:
			# First time for file; primary key is automatic, hit is defaulted 
			cur.execute('''INSERT INTO files (url, host, lastvisit) VALUES (%s, %s, %s)''', (filename, host, lastvisit) )
	except pymysql.ProgrammingError as e:
		print( repr(e) )


#========================================================
filenames = os.listdir( '/home/nikos/public_html/data/apps/' )
filepaths = ()

# Build a set of 'path/to/filename' based on the objects of path dir
for filename in filenames:
	filepaths.add( filename )

# Delete spurious 
cur.execute('''SELECT url FROM files''')
data = cur.fetchall()

# Check database's filenames against path's filenames
for rec in data:
	if rec not in filepaths:
		cur.execute('''DELETE FROM files WHERE url = %s''', rec )

=============================
nikos at superhost.gr [~/www/cgi-bin]# [Fri Jun 07 21:49:33 2013] [error] [client 79.103.41.173]   File "/home/nikos/public_html/cgi-bin/files.py", line 81
[Fri Jun 07 21:49:33 2013] [error] [client 79.103.41.173]     if( flag == 'greek' )
[Fri Jun 07 21:49:33 2013] [error] [client 79.103.41.173]                         ^
[Fri Jun 07 21:49:33 2013] [error] [client 79.103.41.173] SyntaxError: invalid syntax
[Fri Jun 07 21:49:33 2013] [error] [client 79.103.41.173] Premature end of script headers: files.py
-------------------------------
i dont know why that if statement errors.



More information about the Python-list mailing list