throwing exceptions from csv.DictReader or even csv.reader

Mon Jul 5 13:32:50 EDT 2010

Tim wrote:

> Csv is a very common format for publishing data as a form of primitive
> integration. It's an annoyingly brittle approach, so I'd like to
> ensure that I capture errors as soon as possible, so that I can get
> the upstream processes fixed, or at worst put in some correction
> mechanisms and avoid getting polluted data into my analyses.
> 
> A symptom of several types of errors is that the number of fields
> being interpreted varies over a file (eg from wrongly embedded quote
> strings or mishandled embedded newlines). My preferred approach would
> be to get DictReader to throw an exception when encountering such
> oddities, but at the moment it seems to try to patch over the error
> and fill in the blanks for short lines, or ignore long lines. I know
> that I can use the restval parameter and then check for what's been
> parsed when I get my results back, but this seems brittle as whatever
> I use for restval could legitimately be in the data.
> 
> Is there any way to get csv.DictReader to throw and exception on such
> simple line errors, or am I going to have to use csv.reader and
> explicitly check for the number of fields read in on each line?

I think you have to use csv.reader. Untested:

def DictReader(f, fieldnames=None, *args, **kw):
    reader = csv.reader(f, *args, **kw)
    if fieldnames is None:
        fieldnames = next(reader)
    for row in reader:
        if row:
            if len(fieldnames) != len(row):
                raise ValueError
            yield dict(zip(fieldnames, row))

Peter