[python-ldap] Performance improvement for schema.tokenizer.split_tokens

Sat Feb 18 10:26:19 EST 2017

Christian Heimes wrote:
> I see no problem with the IPR notice. I added a note to the doc string
> of fast_split_tokens.py.

Please repeat it here for the mailing list archive because I don't know what will happen
to your github repo in the future (python-ldap is much older than github). Even if the
mailing list archive on python.org is abandoned I have a local copy.

> In the mean time a user complained that my tokenizer silently ignores
> invalid schema lines.

Please provide additional test-cases here to be added to Tests/t_ldap_schema_tokenizer.py
before.

> The last version does some rudimentary checks and
> refuses additional residue. It also handles one of the two special
> Oracle cases from you test file.

I see you're coming close to my own (non-published) attempts of working around some
schema bugs. But is it still that much faster then? ;-)

For the record the ABNF is here:

https://tools.ietf.org/html/rfc4512#section-4.1

I'm inclined to trade speed over these ancient and seriously broken Oracle test-cases
(see also https://tools.ietf.org/html/draft-thomson-postel-was-wrong).

And while you're at it:

TESTCASES_ESCAPED_QUOTES = (
    ("BLUBBER DI 'BLU\'BB ER' DA 'BLAH' ", ["BLUBBER", "DI", "BLU'BB ER", "DA", "BLAH"]),
)

Ciao, Michael.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3829 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mail.python.org/pipermail/python-ldap/attachments/20170218/ca1cb977/attachment.bin>