Prothon should not borrow Python strings!

Mark Hahn mark at hahnca.com
Mon May 24 14:24:12 EDT 2004


> From: Paul Prescod 
> 
> I skimmed the tutorial and something alarmed me.
> 
> "Strings are a powerful data type in Prothon. Unlike many languages, 
> they can be of unlimited size (constrained only by memory 
> size) and can 
> hold any arbitrary data, even binary data such as photos and 
> movies.They 
> are of course also good for their traditional role of storing and 
> manipulating text."
> 
> This view of strings is about a decade out of date with modern 
> programmimg practice. From the programmer's point of view, a string 
> should be a list of characters. Characters are logical 
> objects that have 
> properties defined by Unicode. This is the model used by Java, 
> Javascript, XML and C#.
> 
> Characters are an extremely important logical concept for 
> human beings 
> (computers are supposed to serve human beings!) and they need 
> first-class representation. It is an accident of history that the 
> language you grew up with has so few characters that they can have a 
> one-to-one correspondance with bytes.
> 
> I can understand why you might be afraid to tackle all of Unicode for 
> version 1.0. Don't bother. All you need to do today to avoid the dead 
> end is DO NOT ALLOW BINARY DATA IN STRINGS. Have a binary data type. 
> Have a character string type. Give them a common "prototype" if you 
> wish. Let them share methods. But keep them separate in your 
> code. The 
> result of reading a file is a binary data string. The result 
> of parsing 
> an XML file is a character string. These are as different as the bits 
> that represent an integer in a particular file format and a 
> logical integer.
> 
> Even if your character data type is today limited to 
> characters between 
> 0 and 255, you can easily extend that later. But once you 
> have megabytes 
> of code that makes no distinction between characters and 
> bytes it will 
> be too late. It would be like trying to tease apart integers 
> and floats 
> after having treated them as indistinguishable. (which brings 
> me to my 
> next post)

This is very timely.  I would like to resolve issues like this by July
and that deadline is coming up very fast.

We have had discussions on the Prothon mailing list about how to handle
Unicode properly but no one pointed this out.  It makes perfect sense to
me.  

Is there any dynamic language that already does this right for us to
steal from or is this new territory?  I know for sure that I don't want
to steal Java's streams.  I remember hating them with a passion.





More information about the Python-list mailing list