[issue40762] Writing bytes using CSV module results in b prefixed strings

Mon May 25 07:57:30 EDT 2020

Sidhant Bansal <sidhbansal at gmail.com> added the comment:

Hi Remi,

Currently a code like this:
```
with open("abc.csv", "w", encoding='utf-8') as f:
    data = [b'\x41']
    w = csv.writer(f)
    w.writerow(data)
with open("abc.csv", "r") as f:
    rows = csv.reader(f)
    for row in rows:
        print(row[0]) # prints b'A'
```
Is able to write the string "b'A'" in a CSV file. You are correct that the ideal way should indeed be to decode the byte first.

However if a user does not decode the byte then the CSV module calls the str() method on the byte object as you said, but in real-life that b-prefixed string is just not readable by another program in an easy way (they will need to first chop off the b-prefix and single quotes around the string) and has turned out to be a pain point in one of the pandas issue I referred to in my first message.

Also I am not sure if you have taken a look at my PR, but my approach to fix this problem does NOT involve guessing the encoding scheme used, instead we simply use the encoding scheme that the user provided when they open the file object. So if you open it with `open("abc.csv", "w", encoding="latin1")` then it will try to decode the byte using "latin1". Incase it fails to decode using that, then it will throw a UnicodeDecodeError (So there is no unknowing file corruption, a UnicodeDecode error is thrown when this happens). You can refer to the tests + NEWS.d in the PR to confirm the same.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue40762>
_______________________________________