Changing filenames from Greeklish => Greek (subprocess complain)

MRAB python at mrabarnett.plus.com
Sat Jun 8 12:32:45 EDT 2013


On 08/06/2013 07:49, Νικόλαος Κούρας wrote:
> Τη Σάββατο, 8 Ιουνίου 2013 5:52:22 π.μ. UTC+3, ο χρήστης Cameron Simpson έγραψε:
>> On 07Jun2013 11:52, =?utf-8?B?zp3Or866zr/PgiDOk866z4EzM866?= <nikos.gr33k at gmail.com> wrote:
>>
>> | nikos at superhost.gr [~/www/cgi-bin]# [Fri Jun 07 21:49:33 2013] [error] [client 79.103.41.173]   File "/home/nikos/public_html/cgi-bin/files.py", line 81
>>
>> | [Fri Jun 07 21:49:33 2013] [error] [client 79.103.41.173]     if( flag == 'greek' )
>>
>> | [Fri Jun 07 21:49:33 2013] [error] [client 79.103.41.173]                         ^
>>
>> | [Fri Jun 07 21:49:33 2013] [error] [client 79.103.41.173] SyntaxError: invalid syntax
>>
>> | [Fri Jun 07 21:49:33 2013] [error] [client 79.103.41.173] Premature end of script headers: files.py
>>
>> | -------------------------------
>>
>> | i dont know why that if statement errors.
>>
>>
>>
>> Python statements that continue (if, while, try etc) end in a colon, so:
>
> Oh iam very sorry.
> Oh my God i cant beleive i missed a colon *again*:
>
> I have corrected this:
>
> #========================================================
> # Collect filenames of the path dir as bytes
> filename_bytes = os.listdir( b'/home/nikos/public_html/data/apps/' )
>
> for filename in filename_bytes:
> 	# Compute 'path/to/filename' into bytes
> 	filepath_bytes = b'/home/nikos/public_html/data/apps/' + b'filename'
> 	flag = False
> 	
> 	try:
> 		# Assume current file is utf8 encoded
> 		filepath = filepath_bytes.decode('utf-8')
> 		flag = 'utf8'
> 	except UnicodeDecodeError:
> 		try:
> 			# Since current filename is not utf8 encoded then it has to be greek-iso encoded
> 			filepath = filepath_bytes.decode('iso-8859-7')
> 			flag = 'greek'
> 		except UnicodeDecodeError:
> 			print( '''I give up! File name is unreadable!''' )
> 	
> 	if flag == 'greek':
> 		# Rename filename from greek bytes --> utf-8 bytes
> 		os.rename( filepath_bytes, filepath.encode('utf-8') )
> ==================================
>
> Now everythitng were supposed to work but instead iam getting this surrogate error once more.
> What is this surrogate thing?
>
> Since i make use of error cathcing and handling like 'except UnicodeDecodeError:'
>
> then it utf8's decode fails for some reason, it should leave that file alone and try the next file?
> 	try:
> 		# Assume current file is utf8 encoded
> 		filepath = filepath_bytes.decode('utf-8')
> 		flag = 'utf8'
> 	except UnicodeDecodeError:
>
> This is what it supposed to do, correct?
>
> ==================================
> [Sat Jun 08 09:39:34 2013] [error] [client 79.103.41.173]   File "/home/nikos/public_html/cgi-bin/files.py", line 94, in <module>
> [Sat Jun 08 09:39:34 2013] [error] [client 79.103.41.173]     cur.execute('''SELECT url FROM files WHERE url = %s''', (filename,) )
> [Sat Jun 08 09:39:34 2013] [error] [client 79.103.41.173]   File "/usr/local/lib/python3.3/site-packages/PyMySQL3-0.5-py3.3.egg/pymysql/cursors.py", line 108, in execute
> [Sat Jun 08 09:39:34 2013] [error] [client 79.103.41.173]     query = query.encode(charset)
> [Sat Jun 08 09:39:34 2013] [error] [client 79.103.41.173] UnicodeEncodeError: 'utf-8' codec can't encode character '\\udcce' in position 35: surrogates not allowed
>
Look at the traceback.

It says that the exception was raised by:

     query = query.encode(charset)

which was called by:

     cur.execute('''SELECT url FROM files WHERE url = %s''', (filename,) )

But what is 'filename'? And what has it to do with the first code
snippet? Does the traceback have _anything_ to do with the first code
snippet?




More information about the Python-list mailing list