[Python-Dev] adding Construct to the standard library?

tomer filiba tomerfiliba at gmail.com
Tue Apr 18 17:25:38 CEST 2006


>
> Indeed, I wish I had known about this a year ago; it would have saved me a
> lot of work.  Of course it probably didn't exist a year ago...  :(


well, yeah. many people need "parsing-abilities", but they resort to ad-hoc
parsers using struct/some ad-hoc implementation of their own. there clearly
is a need for a generic, strong, and extensible parsing/building mechanism.

Well, declarative is less flexible.  OTOH declarative is nice in the way it
> is more readable and allows more optimisations.
>

i don't think "less flexible" is the term. it's certainly different, but if
you need something specific, you can always subclass a construct on your
own. other than that, being declarative means easy to
read/write/maintain/debug/upgrage (to a newer version of the library).

  IMHO, at least in theory Construct could have small but fast C extension
> to take care of the encoding and decoding, which is the critical path.
> Everything else, like the declaration part, can be python, as it is usually
> done once on application startup.
>
well, i expected the encodings package to have a str.encode("bin") and
str.decode("bin")... for some reason there's no such codec. it's a pity.

This is a very nice library indeed.  But the number one feature that I need
> in something like this would be to use C.  That's because of my application
> specific requirements, where i have observed that reapeatedly using
> struct.pack/unpack and reading bytes from a stream represents a
> considerable CPU overhead, whereas the same thing in C would be ultra fast.
>
well, you must have the notion of a "stream", i.e., go back and forth, be
able to read/write bits/bytes at arbitrary locations, etc. i thought of
moving the library to pyrex, and compiling it, but the number of critical
parts is very small -- basically only the Repeater class could be improved
by writing it in C.
i mean, most of the time is consumed at creating objects in the objects
tree, etc. for example, the Struct class simply iterates over the nested
construsts and parses each of the in that sequence. doing a pythonic
iteration of a C-level iteration over a pythonic object is practically the
same.

If you agree to go down this path I might even be able to volunteer some of
> my time to help, but it's not my decision.
>
well, mainly i'm looking for ideas. just moving it to c wouldnt be too
helpful.
some ideas i have:
* making the stream work with bytes instead of bits, so that memory
consumption would decrease 8-fold... but then parsing unaligned fields
(either by size of position) is gonna be a headache
* unifying the context tree with the parsing/building tree, to create less
objects on the fly (but it has some issues)
* using lambda functions for meta expressions, instead of eval(string) --
perhaps it's faster, but lambda is getting deprecated by python3k :(

apart from that, i'm rely on inheritance in many places. if some classes are
written in C and some in python, i'm not sure how it could work (can a C
class inherit a pythonic one? would it be easy to extend?). and, that means
users would have to compile the C sources, while now all they have to do is
extract a zip file. and then i'd have to write makefiles, and maintain those
also... it's getting dirty. i like the painless "unzip-and-use"
installation.

so if you have ideas, i'd be happy to hear those.
thanks,

-tomer

On 4/18/06, Gustavo Carneiro <gjcarneiro at gmail.com> wrote:
>
> why include Construct?
> > * the struct module is very nice, but very limited and non-pythonic as
> > well
> > * pure python (no platform/security issues)
> >
>
>   IMHO this is a drawback.  More on this below.
>
> * lots of people need to parse and build binary data structures, it's not
> > an esoteric library
> > * license: public domain
> > * quite a large user base for such a short time (proves the need of the
> > community)
> >
>
>   Indeed, I wish I had known about this a year ago; it would have saved me
> a lot of work.  Of course it probably didn't exist a year ago...  :(
>
>
> > * easy to use and extend (follows the componentization pattern)
> > * declarative: you don't need to write executable code for most cases
> >
>
>   Well, declarative is less flexible.  OTOH declarative is nice in the way
> it is more readable and allows more optimisations.
>
> why not:
> > * the code is (very) young. stable and all, but less than a month on the
> > loose.
> > * new features may still be added / existing ones may be changed in a
> > non-backwards-compatible manner
> >
> > so why am i saying this now, instead of waiting a few months for it to
> > maturet?
> > well, i wanted to get feedback. those of you who have seen/used the
> > library, please tell me what you think:
> > * is it suitable for a standard library?
> >
> * what more features would you want?
> > * any changes you think are necessary?
> >
>
>   This is a very nice library indeed.  But the number one feature that I
> need in something like this would be to use C.  That's because of my
> application specific requirements, where i have observed that reapeatedly
> using struct.pack/unpack and reading bytes from a stream represents a
> considerable CPU overhead, whereas the same thing in C would be ultra fast.
>
>   IMHO, at least in theory Construct could have small but fast C extension
> to take care of the encoding and decoding, which is the critical path.
> Everything else, like the declaration part, can be python, as it is usually
> done once on application startup.
>
>   If you agree to go down this path I might even be able to volunteer some
> of my time to help, but it's not my decision.
>
>   Best regards.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20060418/de6843af/attachment-0001.html 


More information about the Python-Dev mailing list