[issue44677] CSV sniffing falsely detects space as a delimiter

Piotr Tokarski report at bugs.python.org
Tue Jul 20 03:36:23 EDT 2021


Piotr Tokarski <pt12lol at gmail.com> added the comment:

Test sample:

```
import csv
from io import StringIO


def csv_text():
    return StringIO("a|b\nc| 'd\ne|' f")


with csv_text() as input_file:
    print('The following text is going to be parsed:')
    print(input_file.read())
    print()


with csv_text() as input_file:
    dialect_params = [
        'delimiter',
        'quotechar',
        'escapechar',
        'lineterminator',
        'quoting',
        'doublequote',
        'skipinitialspace'
    ]
    dialect = csv.Sniffer().sniff(input_file.read())
    print('The following dialect has been detected:')
    for dialect_param in dialect_params:
        print(f'- {dialect_param}: {repr(getattr(dialect, dialect_param))}')
    print()


with csv_text() as input_file:
    print('Parsed csv text:')
    for entry in csv.reader(input_file, dialect=dialect):
        print(f'- {entry}')
    print()
```

Actual output:

```
The following text is going to be parsed:
a|b
c| 'd
e|' f

The following dialect has been detected:
- delimiter: ' '
- quotechar: "'"
- escapechar: None
- lineterminator: '\r\n'
- quoting: 0
- doublequote: False
- skipinitialspace: False

Parsed csv text:
- ['a|b']
- ['c|', 'd\ne|', 'f']

```

Expected output:

```
The following text is going to be parsed:
a|b
c| 'd
e|' f

The following dialect has been detected:
- delimiter: '|'
- quotechar: '"'
- escapechar: None
- lineterminator: '\r\n'
- quoting: 0
- doublequote: False
- skipinitialspace: False

Parsed csv text:
- ['a', 'b']
- ['c', " 'd"]
- ['e', "' f"]

```

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue44677>
_______________________________________


More information about the Python-bugs-list mailing list