[Csv] skipfinalspace

Tue Oct 21 09:21:50 CEST 2008

On Mon, Oct 20, 2008 at 00:48, John Machin <sjmachin at lexicon.net> wrote:

> Tom Brown wrote:
>
>> (Continuing thread started at
>> http://mail.python.org/pipermail/csv/2008-October/000688.html)
>>
>> On Sun, Oct 19, 2008 at 16:46, Andrew McNamara <
>> andrewm at object-craft.com.au <mailto:andrewm at object-craft.com.au>> wrote:
>>
>>     >I downloaded the 2.6 source tar ball, but is it too late for new
>>    features to
>>     >get into versions <3?
>>
>>    Yep.
>>
>>     >How would you feel about adding the following tests to
>>    Lib/test/test_csv.py
>>     >and getting them to pass?
>>     >
>>     >Also http://www.python.org/doc/2.5.2/lib/csv-fmt-params.html says
>>     >"*skipinitialspace *When True, whitespace immediately following the
>>     >delimiter is ignored."
>>     >but my tests show whitespace at the start of any field is ignored,
>>    including
>>     >the first field.
>>
>>    I suspect (but I haven't checked) that it means "after the delimiter
>> and
>>    before any quoted field (or some variation on that).
>>
>> I agree that whitespace after the delimiter and before any quoted field is
>> skipped. Also whitespace after the start of the line and before any quoted
>> field is skipped.
>>
>
>     All of the "dialect" parameters are there to allow parsing of a
>> specific
>>    common form of CSV file. Because there is no formal definition of the
>>    format, the module simply aims to parse (and produce the same result)
>>    as common applications such as Excel and Access. Changing the behaviour
>>    in any non-backwards compatible way is sure to get screams of anguish
>>    from many users. Even when the behaviour appears to be a bug, you can
>>    be sure people are counting on it working like that.
>>
>>
>> skipinitialspace defaults to false and by the same logic skipfinalspace
>> should default to false to preserve compatibility with the csv module in
>> 2.6. On the other hand, the switch to version 3 is as good a time as any to
>> break backwards compatibility to adopt something that works better for new
>> users.
>>
>
> Read Andrew's lips: They don't want "better", they want "the same as MS".

okay.

>
>
>  Based on my experience parsing several hundred csv generated by many
>> different people I think it would be nice to at least have a dialect that is
>> excel + skipinitialspace=True + skipfinalspace=True.
>>
>
> Based on my experience extracting data from innumerable csv files (and
> infinite varieties thereof),

Wow, that is a _lot_ of files :-P

spreadsheet files, and database tables, in 99.99% of cases one should
> automatically apply the following transformations to each text field:
>   * strip leading whitespace
>   * strip trailing whitespace
>   * replace embedded runs of whitespace by a single space
> and one needs to ensure that the definition of whitespace includes the
> no-break space (NBSP) character.
>
> As this "space normalisation" is needed for all input sources, the csv
> module is IMHO the wrong place to put it. A string method would be a better
> idea.

I agree that strip() and something like re.sub(r"\s+", " " are handy. If
99.99% percent of csv readers should be applying these fixes to every field
perhaps there should be easy-to-enable option to apply it. Why force almost
everyone to discover they need the transformations and put a line of code
around csv reader?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/csv/attachments/20081021/1293709e/attachment.htm>