[PythonCAD] seek to entity

Art Haas ahaas at airmail.net
Mon Oct 6 18:30:35 EDT 2003


On Mon, Oct 06, 2003 at 03:16:59PM -0500, Eric Wilhelm wrote:
> 
> [ ... snip ... ]
> >The one thing my current design does to is cache the objects after it
> >reads them, so there is a price in memory to be paid. My thinking was
> >that it would be good to save them so that multiple calls of
> >getObjects() would not require scanning the file again. Time will tell
> >if this choice is a good path to take.
> 
> IMO, it isn't worth the memory to cache the entire object.  
> 
> I'm still just guessing that you have a plan to load the data into another 
> structure before it shows up onscreen (is this correct?)  If this is the 
> case, there will never be another call to getObjects() and if you are doing 
> some kind of selective read, it is reading all of the data before you have a 
> chance to skip the data which you don't want to load.
> 

I've not really thought about how I want to get the data into PythonCAD
just yet. One idea is to write out a temporary file in PythonCAD's
format and read it in. Another idea is to write out the XML data
directly into PythonCAD.

I didn't see re-using the object data for the purpose I had in mind, but
I thought someone else may want to do that, so I coded things as you see
them at present. It may end up not being worthwhile to do this, and if
someone wants to cache the info they'll have to do it from their code
and not from the Dwg interface. See my comments below ...

> >From my reading of your code, it seems that the getObjects() in dwgbase.py 
> makes a call to r15_read_objects() (let's say we're just talking about r15 
> files for now).  
> 
> r15_read_objects() then calls a few other functions and passes 
> (_handle,_omap,_cmap) to read_objects(), which then runs through the list of 
> objkeys.  Amoung other things, this function seeks to the "_last_loc" in the 
> filehandle and gets an entity handle.  It then goes on to process the entity 
> and load it into the return list with _objlist.append(_ent)
> 

That sounds about right. The '_last_loc' is the offset in the file where
the entity data resides - this value comes from the object map section.

> In the interest of reducing the amount of memory used and also reducing the 
> amount of duplicated code in the library, I suggest that the calling program 
> be allowed to fly a little closer to the raw data (but with a unified 
> interface like the one provided by dwgbase.py.)
> 
> In my example, I showed a function call which initialized the entity list.  
> This would perform many of the functions of your r15_read_objects() function 
> (everything that has to do with calculating offsets and the mapping of the 
> file (cmap, omap, sections, etc.)  The other task would be to set the 
> _last_loc variable to the beginning of the object list.
> 
> So, the program would call dwg.getObjectInit() and then have a loop of:
> while enthandle, type = dwg.getNextObj()
> 
> I guess that getNextObj() would basically do what happens inside of your 
> "for _obj in _objkeys" loop (inside of read_objects()).  The speed benefit of 
> this is that you save a loop, but you might also save yourself some code, and 
> you will save memory as well.

That loop would end up being moved to a function to read each the
entities one at a time. Looking at the code with the idea of doing this
then a small cleanup of making the extended entity data reading become
its own function as well.

> With my benchmark 9.8MB pile of circles (250000 of them), it takes about 3 
> minutes and 292MB of ram to call objs = dwg.getObjects() and I would still 
> have to make my way through that list and load all of this info into another 
> data structure before the dwg object goes out of scope and the memory gets 
> freed.

That is a lot of memory - more than my machine has in both real and
swap. Ugh ...

I'll look at modifying the R15 reader to do entity-at-a-time reading,
and we'll see what comes out. The R13/R14 file I'll leave alone until we
get something that looks reasonable, then I'll change that one as well.
It will probably take a day or two for the new code to appear.
 
> I'll try to look at the code some more and see if I can come up with a good 
> way to do it, but my feeling is that you will be better served by providing a 
> consistent interface to what is happening in the loop of read_objects().  
> This will make the whole thing scalable and the efficiency can be gained in 
> the code which lies on the next layer above the library (at the price of 
> repeated function calls rather than possibly huge amounts of duplicated 
> memory.)

Let me know what you come up with. Thanks for the feedback and
suggestions, and when I've got a new version of the code to send out
I'll post a mail to the mailing list.

Art

-- 
Man once surrendering his reason, has no remaining guard against absurdities
the most monstrous, and like a ship without rudder, is the sport of every wind.

-Thomas Jefferson to James Smith, 1822



More information about the PythonCAD mailing list