[issue1610654] cgi.py multipart/form-data

Wed Oct 15 02:45:52 CEST 2014

Rishi added the comment:

I have recreated the patch(issue1610654_1.patch) and it performs more or less like the earlier patch

Serhiy,
I agree we cannot use handmade buffering here, without seeking ahead.
I believe, we can make optimizations for streams which are buffered and non-seekable.
Cgi modules default value for file object is the BufferedReader of sys.stdin, so the solution is fairly generic.

I have removed handmade buffering. Neither do I create a Buffered* object.
We rely on user to create the buffered object. The sys.stdin that cgi module has a decent buffer underneath that
works well on apache.

The patch attached does not seek, nor does it read ahead. It only looks ahead.
As Antoine suggests, it peeks the buffer and determines through a fast lookup if the buffer has a bounary or not.
It moves forward only if it is convinced that the current buffer is completely within the next boundary.

The issue is that the current implementation deals with lines and not chunks.
Even when a savy user wraps sys.stdin around a large BufferredReader there is little to no peformance improvement in 
the current implementation for large files in my observation. It does not solve the bug mentioned either.
The difference in extreme cases like Chui's is 53s against 0.7s and even otherwise for larger files the patch
is 3 times faster than the current implementation.
I have tested this on Apache2 server where the sys.stdin is buffered.

----------
Added file: http://bugs.python.org/file36927/issue1610654_1.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue1610654>
_______________________________________