[Numpy-discussion] Unite a Rectangular Unicode Array into one newline-separated string

Cristi Constantin darkgl0w at yahoo.com
Fri Jun 12 03:06:18 EDT 2009


Good day.

I am trying to unite a Rectangular Unicode Array into one newline-separated string. Basically each line is separated by next line with '\n',   and all characters from one line are merged.

For example:

import numpy as np
a = np.arange(12).reshape(3,4)
a = np.asarray(a,'U')

# Method 1 that works:
'\n'.join([ ''.join([j.encode('utf8') for j in i]) for i in a ])
# Prints '0123\n4567\n8911\n'
# This is VERY slow.

# Method 2 that works:
''.join ( np.hstack( np.hstack( (i,np.array([u'\n'],'U')) ) for i in a)).encode('utf8')
# Prints '0123\n4567\n8911\n'
# This is faster, but still quite slow.

It's very important to encode the result in UTF8, because the values will not work with ASCII codec.

I played with:
a.astype(str) # But in some cases, this raises UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position 0: ordinal not in range(128)
And i also played with:
a.tostring() # But this returns '0\x00\x00\x001\x00\x00\x002\x00\x00\x003\x00\x00\x004\x00\x00\x005\x00\x00\x006\x00\x00\x007\x00\x00\x008\x00\x00\x009\x00\x00\x001\x00\x00\x001\x00\x00\x00'
# ... and i have not idea what to do with this value.

Can anyone suggest faster methods to transform into string that unicode array?
Thank you very much.




      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20090612/1d4ecb72/attachment.html>


More information about the NumPy-Discussion mailing list