Changing filenames from Greeklish => Greek (subprocess complain)

Νικόλαος Κούρας nikos.gr33k at gmail.com
Thu Jun 6 16:39:56 EDT 2013


Τη Πέμπτη, 6 Ιουνίου 2013 11:25:15 μ.μ. UTC+3, ο χρήστης Lele Gaifax έγραψε:
> Νικόλαος Κούρας <nikos.gr33k at gmail.com> writes:
> 
> 
> 
> > Now the error afetr fixithg that transformed to:
> 
> >
> 
> > [Thu Jun 06 22:13:49 2013] [error] [client 79.103.41.173]     filename = fullpath.replace( '/home/nikos/public_html/data/apps/', '' )
> 
> > [Thu Jun 06 22:13:49 2013] [error] [client 79.103.41.173] TypeError: expected bytes, bytearray or buffer compatible object
> 
> >
> 
> > MRAB has told me that i need to open those paths and filenames as bytestreams and not as unicode strings.
> 
> 
> 
> Yes, that way the function will return a list of bytes
> 
> instances. Knowing that, consider the following example, that should
> 
> ring a bell:
> 
> 
> 
>     $ python3
> 
>     Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 09:59:04) 
> 
>     [GCC 4.7.2] on linux
> 
>     Type "help", "copyright", "credits" or "license" for more information.
> 
>     >>> path = b"some/path"
> 
>     >>> path.replace('some', '')
> 
>     Traceback (most recent call last):
> 
>       File "<stdin>", line 1, in <module>
> 
>     TypeError: expected bytes, bytearray or buffer compatible object
> 
>     >>> path.replace(b'some', b'')
> 
>     b'/path'

Ah yes, very logical, i should have though of that.
Tahnks here is what i have up until now with many corrections.


#========================================================
# Get filenames of the apps directory as bytestrings
path = os.listdir( b'/home/nikos/public_html/data/apps/' )

# iterate over all filenames in the apps directory
for filename in path:
	# Grabbing just the filename from path
	try: 
		# Is this name encoded in utf-8? 
		filename.decode('utf-8') 
	except UnicodeDecodeError: 
		# Decoding from UTF-8 failed, which means that the name is not valid utf-8
			
		# It appears that this filename is encoded in greek-iso, so decode from that and re-encode to utf-8
		new_filename = filename.decode('iso-8859-7').encode('utf-8') 
			
		# rename filename form greek bytestreams --> utf-8 bytestreams
		old_path = b'/home/nikos/public_html/data/apps/' + b'filename')
		new_path = b'/home/nikos/public_html/data/apps/' + b'new_filename')
		os.rename( old_path, new_path )


#========================================================
# Get filenames of the apps directory as unicode
path = os.listdir( '/home/nikos/public_html/data/apps/' )

# Load'em
for filename in path:
	try:
		# Check the presence of a file against the database and insert if it doesn't exist
		cur.execute('''SELECT url FROM files WHERE url = %s''', (filename,) )
		data = cur.fetchone()        #URL is unique, so should only be one
		
		if not data:
			# First time for file; primary key is automatic, hit is defaulted 
			cur.execute('''INSERT INTO files (url, host, lastvisit) VALUES (%s, %s, %s)''', (filename, host, lastvisit) )
	except pymysql.ProgrammingError as e:
		print( repr(e) )


#========================================================
# Empty set that will be filled in with 'path/to/filename' of path dir
urls = ()

# Build a set of 'path/to/filename' based on the objects of path dir
for filename in path
	url = '/home/nikos/public_html/data/apps/' + filename
	urls.add( url )

# Delete spurious 
cur.execute('''SELECT url FROM files''')
data = cur.fetchall()

# Check database's urls against path's urls
for url in data:
	if url not in urls
		cur.execute('''DELETE FROM files WHERE url = %s''', (url,) )
==================================

I think its ready! But i want to hear from you, before i try it! :)



More information about the Python-list mailing list