[Numpy-discussion] bug in genfromtxt for python 3.2

Wed Mar 30 13:37:45 EDT 2011

Hi,

On Wed, Mar 30, 2011 at 10:02 AM, Ralf Gommers
<ralf.gommers at googlemail.com> wrote:
> On Wed, Mar 30, 2011 at 3:39 AM, Matthew Brett <matthew.brett at gmail.com> wrote:
>> Hi,
>>
>> On Mon, Mar 28, 2011 at 11:29 PM,  <josef.pktd at gmail.com> wrote:
>>> numpy/lib/test_io.py    only uses StringIO in the test, no actual csv file
>>>
>>> If I give the filename than I get a  TypeError: Can't convert 'bytes'
>>> object to str implicitly
>>>
>>>
>>> from the statsmodels mailing list example
>>>
>>>>>>> data = recfromtxt(open('./star98.csv', "U"), delimiter=",", skip_header=1, dtype=float)
>>>> Traceback (most recent call last):
>>>>  File "<pyshell#30>", line 1, in <module>
>>>>    data = recfromtxt(open('./star98.csv', "U"), delimiter=",",
>>>> skip_header=1, dtype=float)
>>>>  File "C:\Programs\Python32\lib\site-packages\numpy\lib\npyio.py",
>>>> line 1633, in recfromtxt
>>>>    output = genfromtxt(fname, **kwargs)
>>>>  File "C:\Programs\Python32\lib\site-packages\numpy\lib\npyio.py",
>>>> line 1181, in genfromtxt
>>>>    first_values = split_line(first_line)
>>>>  File "C:\Programs\Python32\lib\site-packages\numpy\lib\_iotools.py",
>>>> line 206, in _delimited_splitter
>>>>    line = line.split(self.comments)[0].strip(asbytes(" \r\n"))
>>>> TypeError: Can't convert 'bytes' object to str implicitly
>>
>> Is the right fix for this to open a 'filename' passed to genfromtxt,
>> as 'binary' (bytes)?
>>
>> If so I will submit a pull request with a fix and a test,
>
> Seems to work and is what was intended I think, see Pauli's
> changes/notes in commit 0f2e7db0.
>
> This is ticket #1607 by the way.

Thanks for making a ticket.  I've submitted a pull request for the fix
and linked to it from the ticket.

The reason I asked whether this was the correct fix was:

imagine I'm working with a non-latin default encoding, and I've opened a file:

fobj = open('my_nonlatin.txt', 'rt')

in python 3.2.  That might contain numbers and non-latin text.   I
can't pass that into 'genfromtxt' because it will give me this error
above.  I can pass it is as binary but then I'll get garbled text.

Should those functions also allow unicode-providing files (perhaps
with binary as default for speed)?

See you,

Matthew