[New-bugs-announce] [issue44677] CSV sniffing falsely detects space as a delimiter

Piotr Tokarski report at bugs.python.org
Mon Jul 19 13:40:33 EDT 2021


New submission from Piotr Tokarski <pt12lol at gmail.com>:

Let's consider the following CSV content: "a|b\nc| 'd\ne|' f". The real delimiter in this case is '|' character while ' ' is sniffed. Find verbose example attached.

Problem lays in csv.py file in the following code:

```
        matches = []
        for restr in (r'(?P<delim>[^\w\n"\'])(?P<space> ?)(?P<quote>["\']).*?(?P=quote)(?P=delim)', # ,".*?",
                      r'(?:^|\n)(?P<quote>["\']).*?(?P=quote)(?P<delim>[^\w\n"\'])(?P<space> ?)',   #  ".*?",
                      r'(?P<delim>[^\w\n"\'])(?P<space> ?)(?P<quote>["\']).*?(?P=quote)(?:$|\n)',   # ,".*?"
                      r'(?:^|\n)(?P<quote>["\']).*?(?P=quote)(?:$|\n)'):                            #  ".*?" (no delim, no space)
            regexp = re.compile(restr, re.DOTALL | re.MULTILINE)
            matches = regexp.findall(data)
            if matches:
                break
```

What makes matches non-empty and farther processing happens with delimiter falsely set to ' '.

----------
components: Library (Lib)
messages: 397821
nosy: pt12lol
priority: normal
severity: normal
status: open
title: CSV sniffing falsely detects space as a delimiter
type: behavior
versions: Python 3.8

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue44677>
_______________________________________


More information about the New-bugs-announce mailing list