[Ironpython-users] Hashing a directory is magnitudes slower than in cPython

Curt Hagenlocher curt at hagenlocher.org
Tue Feb 25 17:58:52 CET 2014


"Basically, there's a mismatch between what .NET provides and what Python
needs for perfect compatibility."

Yes. I think I remember implementing this and that's exactly the problem I
ran into. I think we looked into incorporating a modified version of the
BCL code directly into IronPython, but at least in those days, that was a
pretty hard thing to get done. We ran into a similar issue when
implementing the compression API.

You could get around the problem in the client code with an "if
sys.platform == 'cli'" and then use the .NET classes directly.



On Tue, Feb 25, 2014 at 8:53 AM, Jeff Hardy <jdhardy at gmail.com> wrote:

> On Tue, Feb 25, 2014 at 12:38 PM, Markus Schaber <m.schaber at codesys.com>
> wrote:
> > Hi,
> >
> > A coworker just consulted me on a performance problem of IronPython vs.
> cPython.
> >
> > ... snip ...
> >
> > On a closer look, there's the additional (and IMHO much worse) problem
> that the update() method seems not to work incrementally:
> >
> > private void update(IList<byte> newBytes) {
> >     byte[] updatedBytes = new byte[_bytes.Length + newBytes.Count];
> >     Array.Copy(_bytes, updatedBytes, _bytes.Length);
> >     newBytes.CopyTo(updatedBytes, _bytes.Length);
> >     _bytes = updatedBytes;
> >     _hash = GetHasher().ComputeHash(_bytes);
> > }
> >
> > In our use-case, this means that every file which is read leads to a
> reallocation and copying and recalculation of the MD5 sum of all the data
> which was read until now. This is suboptimal from memory and performance
> perspective.
> >
> > I'm not an expert on the .NET crypto APIs, but I guess there should be
> some incremental API available there which could be exploited.
>
> http://ironpython.codeplex.com/workitem/34022
>
> I've also CC'd Emmanuel Chomarat, who was investigating a fix for
> this. Unfortunately I don't think there's an easy solution based on
> how the .NET APIs are constructed. Quoting from Emmanuel's email to me
> a while back:
>
> "I am now using TransformBlock / TransformBlockFinal to compute the
> current hash with a linear complexity ( whereas we had before n**2)
> but I am still facing an issue.
> First we need to have a copy operator, this is not possible because we
> can not share the hash instance between two objects in .net, the only
> way to make it consistent with what python is doing is by keeping a
> copy of the full data in MEMORY in order to create a new instance with
> these data when copy is called.
> The second thing is that digest can be called several times in python
> with some new data added/updated to the hash , in C# as soon as
> TransformBlockFinal has been called once we can not anymore add more
> data to the stream. Once again I have been able to use the same
> previous technic but at a memory cost + computation cost if we call
> serveral times digest/hexdigest.
>
> I don't see any to escape this pb with MS api that does not expose
> internal states as the underlying md5 lib in python does."
>
> Basically, there's a mismatch between what .NET provides and what
> Python needs for perfect compatibility. Keeping all data in memory is
> not desirable, but neither is failing some operations. And I would
> *really* prefer not to have to reimplement all of the cryptographic
> hash functions Python has.
>
> One option is to default to not buffering and failing on certain
> operations, and offer a constructor flag that enables buffering to
> allow the otherwise-impossible operations. Not my favourite idea, but
> workable.
>
> - Jeff
> _______________________________________________
> Ironpython-users mailing list
> Ironpython-users at python.org
> https://mail.python.org/mailman/listinfo/ironpython-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ironpython-users/attachments/20140225/fe5dafe2/attachment.html>


More information about the Ironpython-users mailing list