[Python-Dev] Unicode input issues

M.-A. Lemburg mal@lemburg.com
Mon, 10 Apr 2000 23:00:53 +0200


Guido van Rossum wrote:
> 
> > > Since you're calling methods on the underlying file object anyway,
> > > can't you avoid buffering by calling the *corresponding* underlying
> > > method and doing the conversion on that?
> >
> > The problem here is that Unicode has far more line
> > break characters than plain ASCII. The underlying API would
> > break on ASCII lines (or even worse on those CRLF sequences
> > defined by the C lib), not the ones I need for Unicode.
> 
> Hm, can't we just use \n for now?
> 
> > BTW, I think that we may need a new Codec class layer
> > here: .readline() et al. are all text based methods,
> > while the Codec base classes clearly work on all kinds of
> > binary and text data.
> 
> Not sure what you mean here.  Can you explain through an example?

Well, the line concept is really only applicable to text
data. Binary data doesn't have lines and e.g. a ZIP codec
(probably) couldn't implement this kind of method.

As it turns out, only the .writelines() method needs to know
what kinds of input/output data objects are used (and then
only to be able to specify a joining seperator).

I'll just leave things as they are for now: quite shallow
w/r to the class hierarchy.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/