[Cryptography-dev] hash.SHA256 cpu expensive per byte versus byte-string?

Thu Jul 14 12:01:24 EDT 2016

> The perf by chunk is a consequence of how SHA256 works.

I politely disagree...

Having chunks smaller or larger than SHA256's 64 byte block size
doesn't seem to affect the timing results in any noticeable way.
If you do not fill-up SHA256's block-size buffer with update(), it
simply returns, and there is only the overhead of the function call.

Unless I misunderstand the inner workings...

Regards, Frank.

On Thu, Jul 14, 2016 at 6:57 AM, lvh <_ at lvh.io> wrote:
> Hi Frank,
>
>> On Jul 14, 2016, at 12:23 AM, Frank Siebenlist <frank.siebenlist at gmail.com> wrote:
>>
>> Python's native hashing module (hashlib), shows similar results:
>> - about the same time when passed the 8MB blob in one go
>> (probably expected as both use openssl)
>> - substantial overhead when looping over small chunks (up to 100 times)
>> - except that it's about 6 times faster per single byte..
>
> The perf by chunk is a consequence of how SHA256 works. The higher perf for many calls is a consequence of extension modules vs cffi.
>
> lvh
>
>> n: 8388608
>> b:  1.958238124847412
>> b2:  1.0818939208984375
>> b8:  0.2987058162689209
>> b32:  0.10640311241149902
>> b64:  0.06242084503173828
>> b128:  0.04123806953430176
>> b256:  0.03258681297302246
>> 8388608  bs:  0.02389383316040039
>>
>> Guess hashlib used some better optimization on the C-calls (?).
>>
>> This is my last update on this observation.
>> Conclusion is "so be it", and using bigger chunks for hashing gives
>> (much) better performance.
>>
>> -Frank.
>>
>> On Tue, Jul 12, 2016 at 10:49 AM, Frank Siebenlist
>> <frank.siebenlist at gmail.com> wrote:
>>> After I sent my message yesterday evening, I was also wondering about
>>> that 512bit (64byte) block-size of sha256, and if that would add to
>>> the observed slowness.
>>> The following output shows time as a function of byte-chunk size
>>> (1,2,8,32,64,128,256 bytes)
>>>
>>> b:  12.111763954162598
>>> b2:  5.806451082229614
>>> b8:  1.4664850234985352
>>> b32:  0.37551307678222656
>>> b64:  0.20229697227478027
>>> b128:  0.11141395568847656
>>> b256:  0.06758689880371094
>>> 8388608  bs:  0.020879030227661133
>>>
>>> Time seems to go down linearly with increase of chunk size, and there
>>> is no perceived "speed boost"  when we go through the 64byte
>>> thresh-hold.
>>> Time seems to be only linearly related to the number of python-to-C calls.
>>>
>>> And again, I can understand that the overhead is proportional to the
>>> number of python-to-C calls, but it's just the factor of 500 (2-3
>>> order of magnitude) that (unpleasantly) surprised me. It requires one
>>> to optimize on byte-string size to pass in the update(), when you have
>>> many bytes to hash. For example, if you read from a file or socket,
>>> don't update() 1 byte at the time while you read from the stream, but
>>> fill-up a (big) buffer first and pass that buffer.
>>>
>>> -Frank.
>>>
>>> PS. I haven't looked at the sha256 C-code, but I can imagine that when
>>> you pass the update() one byte at the time, it will fill-up some
>>> 64byte-buffer, and if that buffer is filled, it will churn/hash that
>>> block. The adding a byte to the buffer is all low-level fast code in
>>> C, while the churning would use significantly more CPU cycles... hard
>>> to phantom that you would see much slower performance when you pass a
>>> single byte at the time in C...
>>>
>>>
>>> On Tue, Jul 12, 2016 at 8:07 AM, lvh <_ at lvh.io> wrote:
>>>> Hi,
>>>>
>>>>> On Jul 11, 2016, at 10:42 PM, Frank Siebenlist <frank.siebenlist at gmail.com> wrote:
>>>>
>>>> <snipsnip>
>>>>
>>>>> I understand that there may be a few more object-creations and casts involved in the looping, but 500 times slower… that was un unexpected surprise.
>>>>
>>>> As expected. You both get massively increased C call overhead and the worst case because you don’t get to hit a block until every 512/8 == 64 updates. Alas, openssl speed doesn’t distinguish between the same message sizes but in different chunk sizes, but you can at least clearly see the performance multiplier for larger messages.
>>>>
>>>> lvh
>>>> _______________________________________________
>>>> Cryptography-dev mailing list
>>>> Cryptography-dev at python.org
>>>> https://mail.python.org/mailman/listinfo/cryptography-dev
>> _______________________________________________
>> Cryptography-dev mailing list
>> Cryptography-dev at python.org
>> https://mail.python.org/mailman/listinfo/cryptography-dev
>
>
> _______________________________________________
> Cryptography-dev mailing list
> Cryptography-dev at python.org
> https://mail.python.org/mailman/listinfo/cryptography-dev
>