[Numpy-discussion] bug in genfromtxt for python 3.2
Matthew Brett
matthew.brett at gmail.com
Wed Mar 30 13:37:45 EDT 2011
Hi,
On Wed, Mar 30, 2011 at 10:02 AM, Ralf Gommers
<ralf.gommers at googlemail.com> wrote:
> On Wed, Mar 30, 2011 at 3:39 AM, Matthew Brett <matthew.brett at gmail.com> wrote:
>> Hi,
>>
>> On Mon, Mar 28, 2011 at 11:29 PM, <josef.pktd at gmail.com> wrote:
>>> numpy/lib/test_io.py only uses StringIO in the test, no actual csv file
>>>
>>> If I give the filename than I get a TypeError: Can't convert 'bytes'
>>> object to str implicitly
>>>
>>>
>>> from the statsmodels mailing list example
>>>
>>>>>>> data = recfromtxt(open('./star98.csv', "U"), delimiter=",", skip_header=1, dtype=float)
>>>> Traceback (most recent call last):
>>>> File "<pyshell#30>", line 1, in <module>
>>>> data = recfromtxt(open('./star98.csv', "U"), delimiter=",",
>>>> skip_header=1, dtype=float)
>>>> File "C:\Programs\Python32\lib\site-packages\numpy\lib\npyio.py",
>>>> line 1633, in recfromtxt
>>>> output = genfromtxt(fname, **kwargs)
>>>> File "C:\Programs\Python32\lib\site-packages\numpy\lib\npyio.py",
>>>> line 1181, in genfromtxt
>>>> first_values = split_line(first_line)
>>>> File "C:\Programs\Python32\lib\site-packages\numpy\lib\_iotools.py",
>>>> line 206, in _delimited_splitter
>>>> line = line.split(self.comments)[0].strip(asbytes(" \r\n"))
>>>> TypeError: Can't convert 'bytes' object to str implicitly
>>
>> Is the right fix for this to open a 'filename' passed to genfromtxt,
>> as 'binary' (bytes)?
>>
>> If so I will submit a pull request with a fix and a test,
>
> Seems to work and is what was intended I think, see Pauli's
> changes/notes in commit 0f2e7db0.
>
> This is ticket #1607 by the way.
Thanks for making a ticket. I've submitted a pull request for the fix
and linked to it from the ticket.
The reason I asked whether this was the correct fix was:
imagine I'm working with a non-latin default encoding, and I've opened a file:
fobj = open('my_nonlatin.txt', 'rt')
in python 3.2. That might contain numbers and non-latin text. I
can't pass that into 'genfromtxt' because it will give me this error
above. I can pass it is as binary but then I'll get garbled text.
Should those functions also allow unicode-providing files (perhaps
with binary as default for speed)?
See you,
Matthew
More information about the NumPy-Discussion
mailing list