Prothon should not borrow Python strings!

Wed May 26 06:43:31 EDT 2004

[trying to migrate to prothon-user]

> Mark Hahn wrote:
> 
>>Wow, thanks, this is some really great stuff.  I'm going to have
>>to go off and study up on it.
>>
>>This may be a stupid question, but couldn't I have many "types"
>>of strings and some be 8-bits, some 16-bits, and some 32-bits?
>>Couldn't normal method overloading handle the type conversion?
>>Why is there all this confusion?  Isn't this what object-centric
>>computing is designed for?

First, you could have different strings, but it wouldn't help with the 
API question. The goal is that every time you work with "human language 
strings" you work with Cantonese as easily as English which means (at 
the very least) character ordinals up to 2**16 in any API dealing with text.

Second, having multiple types of strings complicates things, just as 
having multiple types of anything does.

Third, the only difference between 16-bit and 32 bit strings is memory 
usage: shouldn't that be an implementation detail rather than a choice 
each programmer needs to make?

Michael Geary wrote:
> In fact, there are several different encodings for Unicode strings: UTF-8,
> UTF-16, and UTF-32. UTF-16 and UTF-32 each come in big-endian and
> little-endian variations, or a Byte Order Mark (BOM) at the beginning of the
> string can be used to tell you which it is.
> 
> UTF-8 is pretty nice for a lot of purposes. It includes the 7-bit ASCII
> character set unchanged and avoids the endian problems. You could specify
> that Prothon source code uses UTF-8, although you'd still want to support
> the other UTFs for data.

Right, but the issue of what encoding the data used in a file is 
orthogonal to the issue of how many runtime types there should be in 
Prothon.

  Paul Prescod