Mandis Quotes (aka retiring """ and ''')

Andrew Dalke adalke at mindspring.com
Tue Oct 5 00:18:34 EDT 2004


Russell Nelson wrote:
> You are cruel .... and vicious.

Ummm, okay.  I would say it's due to too many years
working with unforgiving computers and reading standards
meant for computers.

>>I was going to say it precludes reading in a huge block
>>of bytes (>1GB in size) and quoting it because you'll need
>>to buffer everything in memory.  Then I remembered string
>>concatenation.  Process 1MB at a time.
> 
> 
> Sure, they're not hard to parse.  LALR(1).

It isn't the lookahead I was worried about, it was
the requirement to keep a lot of data in memory
before being able to work on it.

> No question but that an editor should be helpful.

Right.  Though as Jeff Epler pointed out, that helpful
editor could even work with the current Python syntax.
There's nothing to say that what the user sees on
the screen much match the representation on disk.  Leo,
and of course THE show that.

> This is more of a text editor problem than anything else.  In THE,
> when you select something, the invisible characters get rendered
> visibly.  Other editors can do similar things.  When we get to a 100%
> Unicode world, they'll have to do something.  Same thing for Unicode
> glyphs that get presented identically.

Which is why I gave an example of 7 or so different
characters which can be considered whitespace.  I could
have added the combining character to the list, or the
flags to switch direction (as with a mix of English and
Hebrew).  Will all those be shown as different characters?
Or some other way?

To bring it around to THE/HUMANE.  Suppose I have the
unicode character \N{SECTION SIGN}.  That's the paragraph
symbol.  I believe THE uses to indicate the end of paragraph
during a LEAP search.  How then do I search for that
character embedded in THE?

The problem is that any character you use to represent
one of the otherwise hidden characters may itself be the
target of a search.  Given too the difficulty of actually
typing the SECTION SIGN character it's likely easier to
search based on the unicode name rather than the actual
character as typed via the keyboard.  Perhaps the better
solution is that a LEAP search show the underlying unicode
name rather than the glyph.  But that would depend on the
keyboard mode, because on a US keyboard I would like to
be able to search for "Göteborg" by typing "Goteborg"
(Noah Spurrier's "Unicode Hammer" approach) while a Swede
would prefer to type the ö directly and not have o and
ö match the same letter.

Hmm... And as I recall THE already needs to know the keyboard
layout because of its LEAP key emulation via shift-space
keypresses.  Because the shift key stays down it needs to
know that * and 8 are on the same key, while a Swedish
keyboard has ( and 8 on the same key.  So maybe there's
already work done along this route?  And it would need
to know about the Alt-Gr key for some keyboards.  Grrr!

Tangenting here, the THE docs talk about doing a LEAP
search forwards.  When fails the computer beeps.  The
docs are pretty emphatic about the beep saying that it
needs to be used by blind people.  But what about deaf
people?  Wouldn't a screen flash be more appropriate for
that case?  I also couldn't figure out why a search
failure causes the search to abort.  In EMACS when the
failure occurs I can backspace in case I made a typo
at the failure point.

				Andrew
				dalke at dalkescientific.com



More information about the Python-list mailing list