Windows vs. Linux

Gerhard Fiedler gelists at gmail.com
Wed Aug 2 18:46:23 EDT 2006


On 2006-08-02 17:36:06, Sybren Stuvel wrote:

> IMO it's too bad that "they" chose \r\n as the standard. Having two
> bytes as the end of line marker makes sense on typewriters and
> similarly operating printing equipment. 

I may well be mistaken, but I think at the time they set that standard, 
such equipment was still in use. So it may have been a consideration.

> Nowadays, I think having a single byte as the EOL maker is quite a bit
> clearer. 

Rather than thinking in bytes and the like when inserting an EOL marker, 
inserting really an EOL marker (that then gets translated by low level code 
to the appropriate byte sequence as needed) is probably the less archaic 
way to do that :)

> On the other hand, with the use of UTF-8 encodings and the like, the
> byte-to-character mapping is gone anyway, so perhaps I should just get
> used to it ;-)

Yes :)  "Bytes" is getting definitely too low level. Especially with higher 
level languages like Python... there are not many byte manipulation 
facilities anyway. The language is at a much higher level, and in that 
sense the classic strings are a bit out of line, it seems.

>> Just as for MS there are good reasons not to "fix" the backslash now
> 
> Which are those reasons, except for backward compatability?

I don't know how many reasons you need besides backward compatibility, but 
all the DOS (still around!) and Windows apps that would break... ?!?  I 
think breaking that compatibility would be more expensive than the whole 
Y2k bug story. And don't be fooled... you may run a Linux system, but you'd 
pay your share of that bill anyway.

> Less FAQs in this group about people putting tabs, newlines and other
> characters in their filenames because they forget to escape their
> backslashes?

Or forget to use raw strings. (If you don't want it to be escaped, please 
say so :) 

But similar as I wrote above with the EOL thing, I think that the whole 
backslash escape character story is not quite well-chosen. In a way, this a 
mere C compatibility pain in the neck... (Of course there are 
implementation and efficiency reasons, mainly because Python is based on C 
APIs, but all that is as arbitrary as the selection of the backslash as 
path separator.) 

There could be other solutions (in Python, I mean). Only accept raw strings 
in APIs that deal with paths? Force coders to create paths as objects, in a 
portable way, maybe by removing the possibility to create paths from 
strings that are more than one level in the path? Or introduce a Unicode 
character that means "portable path separator"? Or whatever... :)

> Strings and filenames are usually tightly coupled in any program
> handing files, though.

Yes, and that's IMO something from way below in the implementation depths. 
While file names and paths are strings, not every string is a valid and 
useful file name or path. This shows that using strings for file names and 
paths has tradition (coming from low level languages like C), but IMO is 
not quite appropriate for a higher abstraction level. 

> Almost every programming language I know of uses it as the escape
> character, except for perhaps VB Script and the likes. Not sure about
> the different assembly languages, though.

There are so many languages... and I know so few of them... 
http://en.wikipedia.org/wiki/Category:Programming_languages

Now it may be predominant (I still think it's mostly present in languages 
that are in some way influenced by C), but in the 70ies?

IIRC, Pascal uses '^' for a similar purpose (not quite the same, but 
similar). This form is still in ample use in documentation to mean 
"Ctrl-<char>"; probably much more common than the backslash notation.

> Sure. I've talked more about this specific subject in this thread than
> in the rest of my life ;-)

There's a first for everything :)

> I think cooperation and uniformity can be a very good thing. On the other
> hand, Microsoft want the software written for their platform to stay on
> their platform. That's probably one of the major reasons to remain
> incompatible with other systems.

Probably. But even if I'd had a say there (and I hate switching between 
separator characters just as much as the next guy, and possibly do so more 
than you given that I work on a Windows system, with slashes in repository 
paths and URIs), I'm not sure I'd make the jump away from the backslash as 
path separator. That's just breaking too much code. You don't want to have 
all these curses directed at you...

Gerhard




More information about the Python-list mailing list