School Management System in Python

Tim Chase python.list at tim.thechases.com
Wed Jul 5 20:24:30 EDT 2017


On 2017-07-06 11:47, Gregory Ewing wrote:
> The only reason I can think of to want to use tsv instead
> of csv is that you can sometimes get away without having
> to quote things that would need quoting in csv. But that's
> not an issue in Python, since the csv module takes care of
> all of that for you.

I work with thousands of CSV/TSV data files from dozens-to-hundreds
of sources (clients and service providers) and have never encountered
a 0x09-as-data needing to be escaped.  So my big reason for
preference is that people say "TSV" and I can work with it without a
second thought.

On the other hand, with "CSV", sometimes it's comma-delimited as it
says on the tin.  But sometimes it's pipe or semi-colon delimited
while still carrying the ".csv" extension.  And sometimes a
subset of values are quoted. Sometimes all the values are quoted.
Sometimes numeric values are quoted to distinguish between
numeric-looking-string and numeric-value.  Sometimes escaping is done
with backslashes before the quote-as-value character. Sometimes
escaping is done with doubling-up the quoting-character.  Sometimes
CR(0x0D) and/or NL(0x0A) characters are allowed within quoted values;
sometimes they're invalid.  Usually fields are quoted with
double-quotes; but sometimes they're single-quoted values.  Or
sometimes they're either, depending on the data (much like Python's
REPL prints string representations).

And while, yes, Python's csv module handles most of these with no
issues thanks to the "dialects" concept, I still have to determine
the dialect—sometimes by sniffing, sometimes by customer/vendor
specification—but it's not nearly as trivial as

  with open("file.txt", "rb") as fp:
    for row in csv.DictReader(fp, delimiter='\t'):
      process(row)

because there's the intermediate muddling of dialect determination or
specification.

And that said, I have a particular longing for a world in which
people actually used the US/RS/GS/FS (Unit/Record/Group/File
separators; AKA 0x1f-0x1c) as defined in ASCII for exactly this
purpose.  Sigh.  :-)

-tkc













More information about the Python-list mailing list