[Csv] What's our status?

Skip Montanaro skip at pobox.com
Thu Feb 27 18:15:57 CET 2003


    Cliff> data,2003/02/27,08:51:00
    Cliff> data,2003/02/27,08:52:00
    Cliff> data,2003/02/27,08:53:00
    Cliff> data,2003/02/27,08:54:00

    Cliff> In this case it is difficult to know whether ,/ or : is the
    Cliff> delimiter.  It's not entirely unreasonable to use a "preferred"
    Cliff> list of delimiters but it's not entirely safe either ;) In fact,
    Cliff> the current implementation will resort to a preferred list in
    Cliff> this example and return , as the delimiter.  However, given the
    Cliff> following:

    Cliff> 2003/02/27,08:51:00
    Cliff> data,2003/02/27,08:52:00
    Cliff> 08:53:00
    Cliff> data,2003/02/27,08:54:00

    Cliff> It would most likely (without testing) return ":" as the
    Cliff> delimiter as it occurs equally consistently with "/", but is
    Cliff> higher in the preferred list.  This is wrong as the delimiter is
    Cliff> clearly ",".  That being said, I would simply consider this file
    Cliff> as being unsniffable as it has no real pattern.

How about this.  A candidate delimiter is preferred if two occurrences of it
enclose other candidate delimiters.  Conversely, a candidate delimiter in which
two occurrences only surround alphanumeric characters is deemed "less
worthy".


    Cliff> BTW, I'm +1 on Skip's suggestion to make the utils a package
    Cliff> (cvs.utils) and will check it into CVS as such.  Anyone object?

Nope, sorry I didn't get around to checking in the version you posted
yesterday.

Skip



More information about the Csv mailing list