[Csv] Re: [PEP305] Python 2.3: a small change request in CSV module

Dave Cole djc at object-craft.com.au
Sat May 17 09:31:47 EDT 2003


>>>>> "Skip" == Skip Montanaro <skip at pobox.com> writes:

>>> I'll leave Dave and Andrew to comment on the possibility of
>>> admitting a multiple-character delimiter string, as that will
>>> affect their C code.

Bernard> Are they monitoring this ng as well, or should I repost
Bernard> elsewhere?  Notice I am not asking for a multichar delimiter
Bernard> but for multiple alternate single-char separators.

Skip> As I mentioned in my original note, the best place for this
Skip> discussion is csv at mail.mojam.com.  I'm sure Dave and Andrew are
Skip> there.  I don't know how regularly they monitor c.l.py.

I usually read c.l.py every day (at least skip over the subjects).
Haven't done it for over a week since our ISP's news server died.
Dunno why they are taking so long to fix it...

Bernard> Also, if this was supported directly in reader(), the
Bernard> file-like argument would not necessarily have to be seekable,
Bernard> it could conceivably just use the first read data chunk for
Bernard> the guess-work as well as for further parsing of the first
Bernard> rows.

One of the suggestions I made early on in the csv development was to
allow the sniffer and reader to operate on iterable data sources.
Turns out that you don't really need the sniffer to use an iterable
for input.

With the following (completely untested) you could sniff and read an
input source while only reading it once.

    class SniffedInput:
        def __init__(self, fp):
            self.fp = fp
            self.sample = []
            self.end_of_input = 0
            for i in range(20):
                line = fp.readline()
                if not line:
                    self.end_of_input = 0
                    break
                sample.append(line)
            self.dialect = csv.Sniffer().sniff(''.join(sample))

        def __iter__(self):
            return self

        def next(self):
            if self.sample:
                line = self.sample[0]
                del self.sample[0]
                return line
            if self.end_of_input:
                raise StopIteration
            line = self.fp.readline()
            if not line:
                raise StopIteration
            return line

    inp = SniffedInput(sys.stdin)
    for rec in csv.reader(inp, dialect=inp.dialect):
        process(rec)

Skip> Not necessarily.  It depends on how the file is accessed.  I
Skip> believe it's treated as an iterator, it which case you wind up
Skip> having to read several records, pass them off to the sniffer,
Skip> set your dialect, reprocess the lines you've already read, then
Skip> process the remaining unread lines in the file.  This would be
Skip> more tedious from C than from Python.

Bernard> I hope this could be deemed a common enough usage to grant
Bernard> inclusion in the standard module.

Does the above satisfy your needs?

Should something like that be placed into the csv module?

- Dave

-- 
http://www.object-craft.com.au






More information about the Python-list mailing list