[PythonCAD] seek to entity

Eric Wilhelm ewilhelm at sbcglobal.net
Mon Oct 6 16:16:59 EDT 2003


> The following was supposedly scribed by
> Art Haas
> on Monday 06 October 2003 10:21 am:

>The one thing my current design does to is cache the objects after it
>reads them, so there is a price in memory to be paid. My thinking was
>that it would be good to save them so that multiple calls of
>getObjects() would not require scanning the file again. Time will tell
>if this choice is a good path to take.

IMO, it isn't worth the memory to cache the entire object.  

I'm still just guessing that you have a plan to load the data into another 
structure before it shows up onscreen (is this correct?)  If this is the 
case, there will never be another call to getObjects() and if you are doing 
some kind of selective read, it is reading all of the data before you have a 
chance to skip the data which you don't want to load.

>From my reading of your code, it seems that the getObjects() in dwgbase.py 
makes a call to r15_read_objects() (let's say we're just talking about r15 
files for now).  

r15_read_objects() then calls a few other functions and passes 
(_handle,_omap,_cmap) to read_objects(), which then runs through the list of 
objkeys.  Amoung other things, this function seeks to the "_last_loc" in the 
filehandle and gets an entity handle.  It then goes on to process the entity 
and load it into the return list with _objlist.append(_ent)

In the interest of reducing the amount of memory used and also reducing the 
amount of duplicated code in the library, I suggest that the calling program 
be allowed to fly a little closer to the raw data (but with a unified 
interface like the one provided by dwgbase.py.)

In my example, I showed a function call which initialized the entity list.  
This would perform many of the functions of your r15_read_objects() function 
(everything that has to do with calculating offsets and the mapping of the 
file (cmap, omap, sections, etc.)  The other task would be to set the 
_last_loc variable to the beginning of the object list.

So, the program would call dwg.getObjectInit() and then have a loop of:
while enthandle, type = dwg.getNextObj()

I guess that getNextObj() would basically do what happens inside of your 
"for _obj in _objkeys" loop (inside of read_objects()).  The speed benefit of 
this is that you save a loop, but you might also save yourself some code, and 
you will save memory as well.

With my benchmark 9.8MB pile of circles (250000 of them), it takes about 3 
minutes and 292MB of ram to call objs = dwg.getObjects() and I would still 
have to make my way through that list and load all of this info into another 
data structure before the dwg object goes out of scope and the memory gets 
freed.

I'll try to look at the code some more and see if I can come up with a good 
way to do it, but my feeling is that you will be better served by providing a 
consistent interface to what is happening in the loop of read_objects().  
This will make the whole thing scalable and the efficiency can be gained in 
the code which lies on the next layer above the library (at the price of 
repeated function calls rather than possibly huge amounts of duplicated 
memory.)

--Eric
-- 
"...our schools have been scientifically designed to 
prevent overeducation from happening."
                                        --William Troy Harris




More information about the PythonCAD mailing list