Changing filenames from Greeklish => Greek (subprocess complain)

nagia.retsina at gmail.com nagia.retsina at gmail.com
Sun Jun 9 03:00:53 EDT 2013


Thanks Stevn, i ll read them in a bit. When i read them can you perhaps tell me whats wrong and ima still getting decode issues?

[CODE]
# =================================================================================================================
# If user downloaded a file, thank the user !!!
# =================================================================================================================
if filename:
	#update file counter if cookie does not exist
	if not nikos:
		cur.execute('''UPDATE files SET hits = hits + 1, host = %s, lastvisit = %s WHERE url = %s''', (host, lastvisit, filename) )
	
	print('''<h2><font color=blue>Το αρχείο <font color=red> %s <font color=blue>κατεβαίνει!''' % filename )
	print('''<br><img src="/data/images/thanks.gif">''')
	print('''<br><br><br><h3><font color=blue>Και τώρα Tetris μέχρι να ολοκληρωθεί :-)''' )
	print('''<br><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://fpdownload.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,0,0" width="450" height="300""><param name="menu" value="false" /><param name="movie" value="http://www.fugly.com/f/1e6d8cd7b905f4e1bf72" /><param name="quality" value="high" /><embed src="http://www.fugly.com/f/1e6d8cd7b905f4e1bf72" AllowScriptAccess="always" menu="false" quality="high" width="450" height="300" name="FuglyGame" align="middle" type="application/x-shockwave-flash" pluginspage="http://www.macromedia.com/go/getflashplayer" /></object>''')
	
	print( '''<meta http-equiv="REFRESH" content="2;/data/apps/%s">''' % filename )
	sys.exit(0)


# =================================================================================================================
# Display download button for each file and download it on click
# =================================================================================================================
print('''<body background='/data/images/star.jpg'>
		 <center><img src='/data/images/download.gif'><br><br>
		 <table border=5 cellpadding=5 bgcolor=green>
''')


#========================================================
# Collect directory and its filenames as bytes
path = b'/home/nikos/public_html/data/apps/'
files = os.listdir( path )

for filename in files:
	# Compute 'path/to/filename'
	filepath_bytes = path + filename
	for encoding in ('utf-8', 'iso-8859-7', 'latin-1'):
		try: 
			filepath = filepath_bytes.decode( encoding )
		except UnicodeDecodeError:
			continue
        
		# Rename to something valid in UTF-8 
		if encoding != 'utf-8': 
			os.rename( filepath_bytes, filepath.encode('utf-8') )

		assert os.path.exists( filepath )
		break 
	else: 
		# This only runs if we never reached the break
		raise ValueError( 'unable to clean filename %r' % filepath_bytes ) 


#========================================================
# Collect filenames of the path dir as strings
filenames = os.listdir( '/home/nikos/public_html/data/apps/' )

# Load'em
for filename in filenames:
	try:
		# Check the presence of a file against the database and insert if it doesn't exist
		cur.execute('''SELECT url FROM files WHERE url = %s''', (filename,) )
		data = cur.fetchone()
		
		if not data:
			# First time for file; primary key is automatic, hit is defaulted
			print( "iam here", filename + '\n' )
			cur.execute('''INSERT INTO files (url, host, lastvisit) VALUES (%s, %s, %s)''', (filename, host, lastvisit) )
	except pymysql.ProgrammingError as e:
		print( repr(e) )


#========================================================
# Collect filenames of the path dir as strings
filenames = os.listdir( '/home/nikos/public_html/data/apps/' )
filepaths = set()

# Build a set of 'path/to/filename' based on the objects of path dir
for filename in filenames:
	filepaths.add( filename )

# Delete spurious 
cur.execute('''SELECT url FROM files''')
data = cur.fetchall()

# Check database's filenames against path's filenames
for rec in data:
	if rec not in filepaths:
		cur.execute('''DELETE FROM files WHERE url = %s''', rec )
[/CODE] 

When trying to run it is still erroting out:

[CODE]
[Sun Jun 09 09:37:51 2013] [error] [client 79.103.41.173] Original exception was:, referer: http://superhost.gr/
[Sun Jun 09 09:37:51 2013] [error] [client 79.103.41.173] Traceback (most recent call last):, referer: http://superhost.gr/
[Sun Jun 09 09:37:51 2013] [error] [client 79.103.41.173]   File "/home/nikos/public_html/cgi-bin/files.py", line 83, in <module>, referer: http://superhost.gr/
[Sun Jun 09 09:37:51 2013] [error] [client 79.103.41.173]     assert os.path.exists( filepath ), referer: http://superhost.gr/
[Sun Jun 09 09:37:51 2013] [error] [client 79.103.41.173]   File "/usr/local/lib/python3.3/genericpath.py", line 18, in exists, referer: http://superhost.gr/
[Sun Jun 09 09:37:51 2013] [error] [client 79.103.41.173]     os.stat(path), referer: http://superhost.gr/
[Sun Jun 09 09:37:51 2013] [error] [client 79.103.41.173] UnicodeEncodeError: 'ascii' codec can't encode characters in position 34-37: ordinal not in range(128), refere
[/CODE] 

Why am i still receing unicode decore errors?
With the help of you guys we have writen a prodecure just to avoid this kind of decoding issues and rename all greek_byted_filenames to utf-8_byted.

Is it the assert that fail? Do we have some logic error someplace i dont see?



More information about the Python-list mailing list