Changing filenames from Greeklish => Greek (subprocess complain)

MRAB python at mrabarnett.plus.com
Fri Jun 7 10:29:25 EDT 2013


On 07/06/2013 12:53, Νικόλαος Κούρας wrote:
[snip]
>
> #========================================================
> # Collect filenames of the path dir as bytes
> greek_filenames = os.listdir( b'/home/nikos/public_html/data/apps/' )
>
> for filename in greek_filenames:
> 	# Compute 'path/to/filename' in bytes
> 	greek_path = b'/home/nikos/public_html/data/apps/' + b'filename'
> 	try:

This is a worse way of doing it because the ISO-8859-7 encoding has 1
byte per codepoint, meaning that it's more 'tolerant' (if that's the
word) of errors. A sequence of bytes that is actually UTF-8 can be
decoded as ISO-8859-7, giving gibberish.

UTF-8 is less tolerant, and it's the encoding that ideally you should
be using everywhere, so it's better to assume UTF-8 and, if it fails, 
try ISO-8859-7 and then rename so that any names that were ISO-8859-7
will be converted to UTF-8.

That's the reason I did it that way in the code I posted, but, yet
again, you've changed it without understanding why!

> 		filepath = greek_path.decode('iso-8859-7')
> 		
> 		# Rename current filename from greek bytes --> utf-8 bytes
> 		os.rename( greek_path, filepath.encode('utf-8') )
> 	except UnicodeDecodeError:
> 		# Since its not a greek bytestring then its a proper utf8 bytestring
> 		filepath = greek_path.decode('utf-8')
>
[snip]




More information about the Python-list mailing list