Unicode in cgi-script with apache2

wxjmfauth at gmail.com wxjmfauth at gmail.com
Sun Aug 17 04:08:50 EDT 2014


Le vendredi 15 août 2014 20:10:25 UTC+2, Dominique Ramaekers a écrit :
> Hi,
> 
> 
> 
> I've got a little script:
> 
> 
> 
> #!/usr/bin/env python3
> 
> print("Content-Type: text/html")
> 
> print("Cache-Control: no-cache, must-revalidate")    # HTTP/1.1
> 
> print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
> 
> print("")
> 
> f = open("/var/www/cgi-data/index.html", "r")
> 
> for line in f:
> 
>      print(line,end='')
> 
> 
> 
> If I run the script in the terminal, it nicely prints the webpage 
> 
> 'index.html'.
> 
> 
> 
> If access the script through a webbrowser, apache gives an error:
> 
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 
> 
> 1791: ordinal not in range(128)
> 
> 
> 
> I've done a hole afternoon of reading on fora and blogs, I don't have a 
> 
> solution.
> 
> 
> 
> Can anyone help me?
> 
%%%%%%%%%

Your (typical) problem is not unicode, the OS or Python.

The origin of the problem is a different nature and should
be understood "globally" in a different way.

Your job is a succession of single processes/steps,

input -> process -> output,

your are manipulating with an engine: Python. The process
may be: reading a file, sending the output to a terminal, CGI,
sending to "Apache", ...
What happens, is that the coding of the characters of the
input/output of every process is or may be different and
may lead to conflicts.
The rule of the game is to use the engine (Python, in
that case) to ensure the coding of the output matches
the coding of the input of the next process.

Schematically:

input1 -> process1 -> output1 -> [eventually "transcode"] ->
input2 -> process2 -> output2 -> ....

Attempting to find a "coding common denominator", (eg. by
tweaking the platform coding or the engine coding) may fail,
because by nature a single process may require a specific
coding, eg. utf-8 for the "Apache input", which can be different
from the coding required or used in another process.

When you say, "I read a file, display it and it just work",
it means in reality:
The coding output of the "read a file process" does correspond
to the "input coding of the display process". It works, because,
by chance, the output/input coding matches in a transparent
way.

Keep in mind, the coding of characters is a job "per se". It
is independent from any progamming language, any platform ...

jmf



More information about the Python-list mailing list