files.py (encoding error)

Νικόλαος Κούρας nikos.gr33k at gmail.com
Mon Jun 10 07:22:51 EDT 2013


All happened when using FileZilla to upload greek filenames to my remote 
linux server and putty as an ssh cleint, using greek-iso as a locale 
encoding setting, because win8 used that by default.

Everything work when filenames in the directorry are ngleish file names.
IF i rename an eglish filename to greek filename i get the error that 
shows upo at the end my post.

I know you guys know linu and there is a good chance you know python 
too, so you can help me out.

thank you.


[CODE]
#====================
# Collect directory and its filenames as bytes
path = b'/home/nikos/public_html/data/apps/'
files = os.listdir( path )

for filename in files:
         # Compute 'path/to/filename'
         filepath_bytes = path + filename
         for encoding in ('utf-8', 'iso-8859-7', 'latin-1'):
                 try:
filepath = filepath_bytes.decode( encoding )
                 except UnicodeDecodeError:
continue

                 # Rename to something valid in UTF-8
                 if encoding != 'utf-8':
                         os.rename( filepath_bytes, 
filepath.encode('utf-8') )

                 assert os.path.exists( filepath )
                 break
         else:
                 # This only runs if we never reached the break
                 raise ValueError( 'unable to clean filename %r' % 
filepath_bytes )


#========================================================
# Collect filenames of the path dir as strings
filenames = os.listdir( '/home/nikos/public_html/data/apps/' )

# Load'em
for filename in filenames:
         try:
                 # Check the presence of a file against the database and 
insert if it doesn't exist
                 cur.execute('''SELECT url FROM files WHERE url = %s''', 
(filename,) )
                 data = cur.fetchone()

                 if not data:
                         # First time for file; primary key is 
automatic, hit is defaulted
                         print( "iam here", filename + '\n' )
                         cur.execute('''INSERT INTO files (url, host, 
lastvisit) VALUES (%s, %s, %s)''', (filename, host, lastvisit) )
         except pymysql.ProgrammingError as e:
                 print( repr(e) )


#========================================================
# Collect filenames of the path dir as strings
filenames = os.listdir( '/home/nikos/public_html/data/apps/' )
filepaths = set()

# Build a set of 'path/to/filename' based on the objects of path dir
for filename in filenames:
         filepaths.add( filename )

# Delete spurious
cur.execute('''SELECT url FROM files''')
data = cur.fetchall()

# Check database's filenames against path's filenames
for rec in data:
         if rec not in filepaths:
                 cur.execute('''DELETE FROM files WHERE url = %s''', rec )
[/CODE]

When trying to runt he above i get:

[CODE]
[Sun Jun 09 09:37:51 2013] [error] [client 79.103.41.173] Original 
exception was:, referer:http://superhost.gr/
[Sun Jun 09 09:37:51 2013] [error] [client 79.103.41.173] Traceback 
(most recent call last):, referer:http://superhost.gr/
[Sun Jun 09 09:37:51 2013] [error] [client 79.103.41.173]   File 
"/home/nikos/public_html/cgi-bin/files.py", line 83, in <module>, 
referer:http://superhost.gr/
[Sun Jun 09 09:37:51 2013] [error] [client 79.103.41.173]     assert 
os.path.exists( filepath ), referer:http://superhost.gr/
[Sun Jun 09 09:37:51 2013] [error] [client 79.103.41.173]   File 
"/usr/local/lib/python3.3/genericpath.py", line 18, in exists, 
referer:http://superhost.gr/
[Sun Jun 09 09:37:51 2013] [error] [client 79.103.41.173]     
os.stat(path), referer:http://superhost.gr/
[Sun Jun 09 09:37:51 2013] [error] [client 79.103.41.173] 
UnicodeEncodeError: 'ascii' codec can't encode characters in position 
34-37: ordinal not in range(128), refere
[/CODE]

Why am i still receing unicode decore errors?
i have write a prodecure just to avoid  decoding issues and rename all 
greek_bytes filenames to utf-8_bytes.

Can you help please?
-- 
Webhost <http://superhost.gr>&& Weblog <http://psariastonafro.wordpress.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20130610/763cfa1e/attachment.html>


More information about the Python-list mailing list