Unicode in cgi-script with apache2

Sat Aug 16 18:49:47 EDT 2014

Hi Peter,

Your code seems interesting.

I've tried using sys.stdout (in a slightly different form) but it gave 
the same error.

I also read about people who fixed the error by changing the servers 
locale to en_US.UTF-8. The people who posted these fixes also said that 
you can only use en_US.UTF-8 (and not ex. nl_BE.UTF8)... Anyway, It 
didn't work for me. And I find this a dirty fix because, I don't want to 
use US locale...

Please excuse me not to try out your specific solutions. I've already 
started to implement WSGI over CGI. See my previous message...

grz

Op 16-08-14 om 13:17 schreef Peter Otten:
> Dominique Ramaekers wrote:
>
>> I've got a little script:
>>
>> #!/usr/bin/env python3
>> print("Content-Type: text/html")
>> print("Cache-Control: no-cache, must-revalidate")    # HTTP/1.1
>> print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
>> print("")
>> f = open("/var/www/cgi-data/index.html", "r")
>> for line in f:
>>       print(line,end='')
>>
>> If I run the script in the terminal, it nicely prints the webpage
>> 'index.html'.
>>
>> If access the script through a webbrowser, apache gives an error:
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
>> 1791: ordinal not in range(128)
>>
>> I've done a hole afternoon of reading on fora and blogs, I don't have a
>> solution.
>>
>> Can anyone help me?
> If the input and output encoding are the same you can avoid the byte-to-text
> (and subsequent text-to-byte conversion) and serve the binary contents of
> the index.html file directly:
>
> #!/usr/bin/env python3
> import sys
>
> print("Content-Type: text/html")
> print("Cache-Control: no-cache, must-revalidate")    # HTTP/1.1
> print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
> print("")
> sys.stdout.flush()
> with open("/var/www/cgi-data/index.html", "rb") as f:
>      for line in f:
>          sys.stdout.buffer.write(line)
>
> The flush() is necessary to write pending data before accessing the lowlevel
> stdout.buffer. Instead of the loop you can use any of these:
>
> sys.stdout.buffer.write(f.read()) # not for huge files, but should be OK for
>                                    # typical html file sizes
> sys.stdout.buffer.writelines(f)
> shutil.copyfileobj(f, sys.stdout.buffer) # show off your knowledge
>                                           # of the stdlib ;)
>
>
> Alternatively you could choose an encoding via the locale:
>
> #!/usr/bin/env python3
> import locale
> locale.setlocale(locale.LC_ALL, "en_US.UTF-8")
>
> print("Content-Type: text/html")
> print("Cache-Control: no-cache, must-revalidate")    # HTTP/1.1
> print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
> print("")
> with open("/var/www/cgi-data/index.html") as f:
>      for line in f:
>          print(line, end='')
>
> Python should then use UTF-8 as the default for i/o and the resulting
> scripts looks more familiar.
>