Windows vs. Linux
Gerhard Fiedler
gelists at gmail.com
Wed Aug 2 18:46:23 EDT 2006
On 2006-08-02 17:36:06, Sybren Stuvel wrote:
> IMO it's too bad that "they" chose \r\n as the standard. Having two
> bytes as the end of line marker makes sense on typewriters and
> similarly operating printing equipment.
I may well be mistaken, but I think at the time they set that standard,
such equipment was still in use. So it may have been a consideration.
> Nowadays, I think having a single byte as the EOL maker is quite a bit
> clearer.
Rather than thinking in bytes and the like when inserting an EOL marker,
inserting really an EOL marker (that then gets translated by low level code
to the appropriate byte sequence as needed) is probably the less archaic
way to do that :)
> On the other hand, with the use of UTF-8 encodings and the like, the
> byte-to-character mapping is gone anyway, so perhaps I should just get
> used to it ;-)
Yes :) "Bytes" is getting definitely too low level. Especially with higher
level languages like Python... there are not many byte manipulation
facilities anyway. The language is at a much higher level, and in that
sense the classic strings are a bit out of line, it seems.
>> Just as for MS there are good reasons not to "fix" the backslash now
>
> Which are those reasons, except for backward compatability?
I don't know how many reasons you need besides backward compatibility, but
all the DOS (still around!) and Windows apps that would break... ?!? I
think breaking that compatibility would be more expensive than the whole
Y2k bug story. And don't be fooled... you may run a Linux system, but you'd
pay your share of that bill anyway.
> Less FAQs in this group about people putting tabs, newlines and other
> characters in their filenames because they forget to escape their
> backslashes?
Or forget to use raw strings. (If you don't want it to be escaped, please
say so :)
But similar as I wrote above with the EOL thing, I think that the whole
backslash escape character story is not quite well-chosen. In a way, this a
mere C compatibility pain in the neck... (Of course there are
implementation and efficiency reasons, mainly because Python is based on C
APIs, but all that is as arbitrary as the selection of the backslash as
path separator.)
There could be other solutions (in Python, I mean). Only accept raw strings
in APIs that deal with paths? Force coders to create paths as objects, in a
portable way, maybe by removing the possibility to create paths from
strings that are more than one level in the path? Or introduce a Unicode
character that means "portable path separator"? Or whatever... :)
> Strings and filenames are usually tightly coupled in any program
> handing files, though.
Yes, and that's IMO something from way below in the implementation depths.
While file names and paths are strings, not every string is a valid and
useful file name or path. This shows that using strings for file names and
paths has tradition (coming from low level languages like C), but IMO is
not quite appropriate for a higher abstraction level.
> Almost every programming language I know of uses it as the escape
> character, except for perhaps VB Script and the likes. Not sure about
> the different assembly languages, though.
There are so many languages... and I know so few of them...
http://en.wikipedia.org/wiki/Category:Programming_languages
Now it may be predominant (I still think it's mostly present in languages
that are in some way influenced by C), but in the 70ies?
IIRC, Pascal uses '^' for a similar purpose (not quite the same, but
similar). This form is still in ample use in documentation to mean
"Ctrl-<char>"; probably much more common than the backslash notation.
> Sure. I've talked more about this specific subject in this thread than
> in the rest of my life ;-)
There's a first for everything :)
> I think cooperation and uniformity can be a very good thing. On the other
> hand, Microsoft want the software written for their platform to stay on
> their platform. That's probably one of the major reasons to remain
> incompatible with other systems.
Probably. But even if I'd had a say there (and I hate switching between
separator characters just as much as the next guy, and possibly do so more
than you given that I work on a Windows system, with slashes in repository
paths and URIs), I'm not sure I'd make the jump away from the backslash as
path separator. That's just breaking too much code. You don't want to have
all these curses directed at you...
Gerhard
More information about the Python-list
mailing list