[Web-SIG] The write callable (vs. file-like object)

Alan Kennedy py-web-sig at xhaus.com
Tue Aug 31 19:35:43 CEST 2004


[Phillip J. Eby]
 > If Python currently had a "byte array" type, we'd be using that instead
 > of strings.  Direct writing of Unicode isn't intended to ever be
 > directly supported by the standard, although in principle you could
 > create some kind of "encoding middleware" that sits directly atop the
 > application.  (An application or framework written to it would
 > technically not be WSGI-compliant.)
 >
 > I guess I need to add something about byte arrays to the spec,
 > especially since Java/Jython may have this issue today (i.e. strings are
 > Unicode, but for HTTP a byte array is needed).

Hmmm: looking under the jython covers, I think there is no problem with 
binary strings.

org.python.core.PyFile implements the write method for *binary* data by 
transcoding the Unicode string using the 
java.lang.String.getBytes(int,int,byte[],int) method (which is 
deprecated because it doesn't transcode unicode characters properly).

http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#getBytes(int,%20int,%20byte[],%20int)

The javadoc says: "Copies characters from this string into the 
destination byte array. Each byte receives the 8 low-order bits of the 
corresponding character. The eight high-order bits of each character are 
not copied and do not participate in the transfer in any way."

Which, AFAICT, is not a problem, because (I'm presuming) jython stores 
binary data as one byte per character of a string, i.e. the low byte. So 
the above transcoding would be fine, when you're dealing with bytes, not 
actual characters.

When the output is *character* data (i.e. the "if (binary)" clause is 
false, see below), the java.lang.String.getBytes() method is used, which 
transcodes properly to bytes, according to the "platform's default 
charset", which is set at JVM startup time.

http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#getBytes()

If anyone is interested, here is the code for the 
PyFile.getBytes(String) method, called by PyFile.write().

protected byte[] getBytes(String s)
   {
   // Yes, I known the method is depricated, but it is the fastest
   // way of converting between between byte[] and String
   if (binary)
     {
     byte[] buf = new byte[s.length()];
     s.getBytes(0, s.length(), buf, 0);
     return buf;
     }
   else
     return s.getBytes();
   }

So, I think all is well here: jython knows how to properly manage byte 
strings vs. python strings.

Regards,

Alan.

P.S. The spelling mistakes in the code comments above are verbatim from 
the jython 2.1 codebase. All other speeling misteaks are my own ;-)



More information about the Web-SIG mailing list