[IronPython] Binary files and byte strings

Jonathan Jacobs korpse-ironpython at kaydash.za.net
Tue Dec 20 16:06:26 CET 2005


Hi,

I have a CPython script to parse specific data files and allow me to 
manipulate them, mostly relying on the struct module. IronPython doesn't 
seem to have an implementation of this (yet?) so I used PyPy's 
implementation and discovered that IronPython's sys module doesn't 
define a "byteorder" attribute, which was easily worked around.
I then ran into the problem that IronPython's file objects don't 
implement a "tell" function, so I added one that simply returned 
stream.Position.

After this effort I fired up my script and was bombarded by all manner 
of assertions in my script, telling me that the file I was parsing was 
*not* in a valid format, I double checked the file executing the script 
via CPython without a hitch. After some debugging it looked like stream 
was being repositioned by the reader (perhaps due to buffering?), which 
left stream.Position unusable.
Grokking the PythonFile class showed that binary mode files were 
implemented using a StreamReader (as opposed to a NewLineReader for 
text-mode files) which meant that the data would be being decoded as 
text, which is not particularly useful in the case of binary files and 
really only serves to mangle data into some unusable mess.
In the end I opted for just using stream.Read to get the original 
information out in the form of a byte[] and using 
StringOps.FromByteArray (after turning it into a public function as I 
couldn't find any other way to turn my byte[] into a byte-string) to get 
this data back to the user in something they could use.

Now, I'm not sure if I missed something here but reading (not sure about 
writing, I'm too scared) binary files seem to be rather broken. Another 
thing that struck me was how IronPython's "str" type was married to 
.NET's string type. I don't know if there is some magic deeper down to 
deal with this but Python's "str" type is a byte-string whereas 
"unicode" is actual text while .NET's "string" type is designed to 
represent text as a series of Unicode characters.

Hopefully I've said something right.
--
Jonathan

When you meet a master swordsman,
show him your sword.
When you meet a man who is not a poet,
do not show him your poem.
                 -- Rinzai, ninth century Zen master




More information about the Ironpython-users mailing list