'r' vs 'rb' in csv (was Re: Python SHA-1 as a method for unique file identification ? [help!])

Andrew McNamara andrewm at object-craft.com.au
Mon Jun 26 21:00:36 EDT 2006


>> On a semi-related note, I have a database on Linux that imports from a
>> Macintosh CSV file.  The 'csv' module says to always open files in
>> binary mode, but this didn't work in my case: I had to open it as 'rU'
>> (text with universal newlines) or 'csv' misparsed it.  I'd like the
>> program to be portable to Windows and Mac.  Is there a way around this?
>>  Will I really burn in hell for using 'rU'?
>
>Yes, you will burn in hell for using any old kludge that gets results 
>(by accident) instead of reading the manual to find a principled solution:
>
>"""
>lineterminator
>The string used to terminate lines in the CSV file. It defaults to '\r\n'.
>"""
>
>In the case of a Mac CSV file, '\r' is probably required.

Unfortunately, the documentation is misleading in this case:
"lineterminator" is only used for output.

The documentation specifies that the file should be opened in binary mode,
because the CSV parser has it's own idea of "universal newlines". The
complicating factor is that newlines can appear quoted inside a field:
using universal newlines, these "quoted newlines" would be damaged
(because it's unaware of the quoting conventions).

If your data file contains no quoted newlines (they're rare, but if you
need them, you need them), then opening the file in "universal newline"
mode should be harmless (and in this case, is the right thing to do).

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/



More information about the Python-list mailing list