Changing filenames from Greeklish => Greek (subprocess complain)

Νικόλαος Κούρας nikos.gr33k at gmail.com
Tue Jun 4 12:57:54 EDT 2013


Τη Τρίτη, 4 Ιουνίου 2013 6:07:19 μ.μ. UTC+3, ο χρήστης Michael Torrie έγραψε:
> On 06/04/2013 08:18 AM, Νικόλαος Κούρας wrote:
> 
> > No, brackets are all there. Just tried:
> 
> > 
> 
> > # Compute a set of current fullpaths
> 
> > fullpaths = set()
> 
> > path = "/home/nikos/www/data/apps/"
> 
> > 
> 
> > for root, dirs, files in os.walk(path):
> 
> > 	for fullpath in files:
> 
> > 		fullpaths.add( os.path.join(root, fullpath) )
> 
> > 		print (fullpath )
> 
> > 		print (fullpath.encode('iso-8859-7').decode('latin-1') )
> 
> 
> 
>                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> This is wrong.  You are converting unicode to iso-8859-7 bytes, then
> 
> trying to convert those bytes back to unicode by pretending they are
> 
> latin-1 bytes.  Even if this worked it will generate garbage.
> 
> 
> 
> > What are these 'surrogate' things?
> 
> 
> 
> It means that when you tried to decode greek bytes using latin-1, there
> 
> were some invalid unicode letters created (which is expected, since the
> 
> bytes are not latin-1, they are iso-8859-7!).
> 
> 
> 
> If you want the browser to use a particular encoding scheme (utf-8),
> 
> then you have to print out an HTTP header before you start printing your
> 
> other HTML data:
> 
> 
> 
> print("Content-Type: text/html;charset=UTF-8\r\n")
> 
> print("\r\n)
> 
> 
> 
> print("html data goes here)

Thanks for the clear explanation about encode and decode, i never understood it more clear.

and yes of course i know that a header must be printed before any other actual print statement. Here is how i have it:

-------------------------------------------------
print( '''Content-type: text/html; charset=utf-8\n''' )

# Compute a set of current fullpaths
fullpaths = set()
path = "/home/nikos/www/data/apps/"

for root, dirs, files in os.walk(path):
	for fullpath in files:
		fullpaths.add( os.path.join(root, fullpath) )


Your unicode explanation is clear but we do have to deal with file's contents but rather filenames themselves.

root at nikos [~]# ls -l /home/nikos/www/data/apps/
total 368548
drwxr-xr-x 2 nikos nikos     4096 Jun  4 14:49 ./
drwxr-xr-x 6 nikos nikos     4096 May 26 21:13 ../
-rwxr-xr-x 1 nikos nikos 13157283 Mar 17 12:57 100\ Mythoi\ tou\ Aiswpou.pdf*
-rwxr-xr-x 1 nikos nikos 29524686 Mar 11 18:17 Anekdotologio.exe*
-rw-r--r-- 1 nikos nikos 42413964 Jun  2 20:29 Battleship.exe
-rw-r--r-- 1 nikos nikos   236032 Jun  4 14:10 \323\352\335\370\357\365\ \335\355\341\355\ \341\361\351\350\354\374.exe
-rwxr-xr-x 1 nikos nikos 66896732 Mar 17 13:13 Kosmas\ o\ Aitwlos\ -\ Profiteies.pdf*
-rw-r--r-- 1 nikos nikos 51819750 Jun  2 20:04 Luxor\ Evolved.exe
-rw-r--r-- 1 nikos nikos 60571648 Jun  2 14:59 Monopoly.exe
-rw-r--r-- 1 nikos nikos  3511233 Jun  4 14:11 \305\365\367\336\ \364\357\365\ \311\347\363\357\375.mp3
-rwxr-xr-x 1 nikos nikos  1788164 Mar 14 11:31 Online\ Movie\ Player.zip*
-rw-r--r-- 1 nikos nikos  5277287 Jun  1 18:35 O\ Nomos\ tou\ Merfy\ v1-2-3.zip
-rwxr-xr-x 1 nikos nikos 16383001 Jun 22  2010 Orthodoxo\ Imerologio.exe*
-rw-r--r-- 1 nikos nikos  6084806 Jun  1 18:22 Pac-Man.exe
-rw-r--r-- 1 nikos nikos 25476584 Jun  2 19:50 Scrabble.exe
-rwxr-xr-x 1 nikos nikos 49141166 Mar 17 12:48 To\ 1o\ mou\ vivlio\ gia\ to\ skaki.pdf*
-rwxr-xr-x 1 nikos nikos  3298310 Mar 17 12:45 Vivlos\ gia\ Atheofovous.pdf*
-rw-r--r-- 1 nikos nikos  1764864 May 29 21:50 V-Radio\ v2.4.msi
root at nikos [~]#
-------------------------------------------------

As you see the subdirectory 'apps' contain both ebglish and greek lettered filenames.

Are those both unicode? Are the filenames of the actuals files also encoded as byte streams,much like the contents inside them?

if they are unicode then i really see no trouble when trying to:

cur.execute('''SELECT url FROM files WHERE url = %s''', ( fullpath, )

but his is what i'm still getting:


-------------------------------------------------

root at nikos [~]# [Tue Jun 04 19:50:16 2013] [error] [client 46.12.95.59]   File "files.py", line 72
[Tue Jun 04 19:50:16 2013] [error] [client 46.12.95.59]     data = cur.fetchone()        #URL is unique, so should only be one
[Tue Jun 04 19:50:16 2013] [error] [client 46.12.95.59]        ^
[Tue Jun 04 19:50:16 2013] [error] [client 46.12.95.59] SyntaxError: invalid syntax
[Tue Jun 04 19:50:16 2013] [error] [client 46.12.95.59] Premature end of script headers: files.py
[Tue Jun 04 19:50:16 2013] [error] [client 46.12.95.59] File does not exist: /home/nikos/public_html/500.shtml
-------------------------------------------------

What is the problem in your opinion Michael since verythign is encoded in utf-8?

why the cur.execute fail?
		cur.execute('''SELECT url FROM files WHERE url = %s''', ( fullpath, )
		data = cur.fetchone()        #URL is unique, so should only be one



More information about the Python-list mailing list