[Numpy-discussion] problem converting to matrix from Unicode input string
Basilisk96
basilisk96 at gmail.com
Fri Oct 5 00:13:34 EDT 2007
Hello all,
I have the following function, with print statements inserted for
debugging:
import numpy
def file2mat(inFile, sep=None, T=True):
try:
input = inFile.readlines()
print "input=%s" % input
except:
raise
finally:
inFile.close()
data = [line.split(sep) for line in input]
print "data=%s" % data
if T==True:
return numpy.mat(data).astype(numpy.float64).T
else:
return numpy.mat(data).astype(numpy.float64)
which is then tested as follows:
>>> s = "-0.500 -0.500\n0.500 -0.500\n-0.500 0.500\n0.500 0.500"
>>> u = unicode(s)
>>> from cStringIO import StringIO
>>> file2mat(StringIO(s))
input=['-0.500 -0.500\n', '0.500 -0.500\n', '-0.500 0.500\n', '0.500
0.500']
data=[['-0.500', '-0.500'], ['0.500', '-0.500'], ['-0.500', '0.500'],
['0.500', '0.500']]
matrix([[-0.5, 0.5, -0.5, 0.5],
[-0.5, -0.5, 0.5, 0.5]])
This is the expected result matrix... But now:
>>> file2mat(StringIO(u))
input=['-\x000\x00.\x005\x000\x000\x00 \x00-
\x000\x00.\x005\x000\x000\x00\n', '\x000\x00.\x005\x000\x000\x00 \x00-
\x000\x00.\x005\x000\x000\x00\n', '\x00-\x000\x00.\x005\x000\x000\x00
\x000\x00.\x005\x000\x000\x00\n', '\x000\x00.\x005\x000\x000\x00
\x000\x00.\x005\x000\x000\x00']
data=[['-\x000\x00.\x005\x000\x000\x00', '\x00-
\x000\x00.\x005\x000\x000\x00'], ['\x000\x00.\x005\x000\x000\x00',
'\x00-\x000\x00.\x005\x000\x000\x00'], ['\x00-
\x000\x00.\x005\x000\x000\x00', '\x000\x00.\x005\x000\x000\x00'],
['\x000\x00.\x005\x000\x000\x00', '\x000\x00.\x005\x000\x000\x00']]
Traceback (most recent call last):
...
ValueError: invalid literal for float(): -
When I explicitly cast the input to 'string', I get the expected
result:
>>> file2mat(StringIO(str(u)))
input=['-0.500 -0.500\n', '0.500 -0.500\n', '-0.500 0.500\n', '0.500
0.500']
data=[['-0.500', '-0.500'], ['0.500', '-0.500'], ['-0.500', '0.500'],
['0.500', '0.500']]
matrix([[-0.5, 0.5, -0.5, 0.5],
[-0.5, -0.5, 0.5, 0.5]])
Any suggestions on how to improve my code?
Is this a Unicode issue, numpy issue, or both?
The input string can come from an ASCII file or a GUI text control. In
the case of a GUI, the control returns a Unicode string, so for now I
am casting it to str(), but it seems like a hack..
BTW, the reason that I am using the astype() method of numpy.matrix is
that I get a "ValueError: setting an array element with a sequence"
when trying to use
return numpy.mat(data, numpy.float64)
in the above function.
Thank you,
-Basilisk96
More information about the NumPy-Discussion
mailing list