how to avoid leading white spaces

Sat Jun 4 15:44:56 EDT 2011

On Sat, 04 Jun 2011 13:41:33 +1200, Gregory Ewing wrote:

>> Python might be penalized by its use of Unicode here, since a
>> Boyer-Moore table for a full 16-bit Unicode string would need
>> 65536 entries
> 
> But is there any need for the Boyer-Moore algorithm to
> operate on characters?
> 
> Seems to me you could just as well chop the UTF-16 up
> into bytes and apply Boyer-Moore to them, and it would
> work about as well.

No, because that won't care about alignment. E.g. on a big-endian
architecture, if you search for '\u2345' in the string '\u0123\u4567', it
will find a match (at an offset of 1 byte).