no struct.pack for unicode strings?

Radovan Garabik spam at melkor.dnp.fmph.uniba.sk
Fri Oct 18 04:13:13 EDT 2002


Martin v. Loewis <martin at v.loewis.de> wrote:
 : Radovan Garabik <spam at melkor.dnp.fmph.uniba.sk> writes:

 :> Why there is no format to pack/unpack unicode strings?
 :> Or am I missing something?

 : It's not entirely clear what struct.pack should do with a Unicode
 : object: UTF-8, UTF-16 (big or little endian, with or without BOM),
 : UTF-32 (big or little endian, with or without BOM), system encoding,
 : ...

it should pack them as raw Py_UNICODE data. At least that is what
I'd need
 

 : Hence, no packing is provided.

 :> My application needs to struct.pack unicode strings, to save them into a
 :> file which can be then read by a C extension module where I need to
 :> access characters of the string (as Py_UNICODE).

 : To save Unicode in a file, I recommend to encode them as UTF-8, and
 : use PyUnicode_DecodeUTF8 in your extension module to restore the
 : Unicode object.

This is exactly what I am trying to avoid, since I need to quickly loop
over the strings (it is a dictionary index) written in the file - hence
the C extension module.
I am afraid that using PyUnicode_DecodeUTF8 (or anything that creates a 
PyObject) would impose a big speed penalty.


-- 
 -----------------------------------------------------------
| Radovan Garabík  http://melkor.dnp.fmph.uniba.sk/~garabik |
| __..--^^^--..__         garabik @ fmph . uniba . sk       |
 -----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!



More information about the Python-list mailing list