[issue30034] csv reader chokes on bad quoting in large files

Keith Erskine report at bugs.python.org
Mon Apr 10 21:11:59 EDT 2017


Keith Erskine added the comment:

As you say, David, however much we would like the world to stick to a given CSV standard, the reality is that people don't, which is all the more reason for making the csv reader flexible and forgiving.

The csv module can and should be used for more than just "comma-separated-values" files.  I use it for all sorts of different delimited files, and it works very well.  Pandas uses it, as I'm sure do many other packages.  It's such a good module, it would be a pity to restrict its scope to just Excel-related scenarios.  Parsing delimited files is undoubtedly complex, and painfully slow if done with pure Python, so the more that can be done in C the better.

I'm no C programmer, but my guesstimate is that the coding changes I'm proposing are relatively modest.  In the IN_QUOTED_FIELD section (https://github.com/python/cpython/blob/master/Modules/_csv.c#L690), it would mean checking for newline characters if the new "multiline" attribute is False (and probably "strict" is False too).  Of course there is more to this change than just that, but I'm guessing not that much more.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue30034>
_______________________________________


More information about the Python-bugs-list mailing list