for / while else doesn't make sense

Marko Rauhamaa marko at pacujo.net
Thu May 26 05:05:08 EDT 2016


Rustom Mody <rustompmody at gmail.com>:

> On Wednesday, May 25, 2016 at 4:18:02 PM UTC+5:30, Marko Rauhamaa wrote:
>> Christopher Reimer:
>> 
>> > Back in the early 1980's, I grew up on 8-bit processors and latin-1 was
>> > all we had for ASCII.
>> 
>> You really were very advanced. According to <URL:
>> https://en.wikipedia.org/wiki/ISO/IEC_8859-1#History>, ISO 8859-1 was
>> standardized in 1985. "Eight-bit-cleanness" became a thing in the early
>> 1990's.
>
> [...]
>
> Thanks to this (sub)thread Ive added a new section: "Lemma: 7=8"
> here http://blog.languager.org/2014/04/unicode-and-unix-assumption.html

A related anecdote from maybe 1990: I worked in a project team. We had
designed a data encoding format that made use of 8-bit character strings
(SunOS 4, Sparc, C). One morning a coworker stated that the standard
library's strcmp() seems to be buggy. He quickly solved the problem by
writing his own strcmp().

I found it surprising that a function so elementary as strcmp() could go
wrong so I took a look at its disassembly. It turns out Sun engineers
had heavily optimized the function. In particular, if both strings were
32-bit-aligned, the loop was carried out using clever 32-bit integer
operations.

Only they had made a mistake. Their algorithm checked these bits of an
integer result:

   31            24              16               8               0
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    ^             ^               ^               ^

While they *should* have checked these positions:


   31            24              16               8               0
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  ^               ^               ^               ^

As a result of their bug, every fourth position of the string had its
high-order bit ignored for strcmp. In particular, '\200' was treated as
an end-of-string marker.

The fix was obvious: check bit 32. However, 32-bit integers don't have a
bit 32, which explains the oversight. Luckily, the 33th bit was readily
available in the CPU's carry flag so the optimization could be salvaged
easily.

I sent a complimentary report to Sun Microsystems' customer service. I
got an email back stating we were out of support and they wouldn't be
talking to us. I thought, ok, their loss, and we went happily forward
with our naïve, two-line strcmp() replacement.

Some three months later, the same customer service rep sent another
email confirming the finding and thanking us for reporting it.


Marko



More information about the Python-list mailing list