[Tutor] little something in the way of file parsing

Erik Price erikprice@mac.com
Fri, 19 Jul 2002 16:25:19 -0400


On Friday, July 19, 2002, at 02:49  PM, Sean 'Shaleh' Perry wrote:

> One stanza after another.  As preparation for a tool that would allow 
> me to do
> better things than grep on it I wrote the following bit of python.
>
> I am posting this because it shows some of the powers python has for 
> rapid
> coding and parsing.  Thought some of the lurkers might enjoy it.  Hope 
> someone
> learns something from it.


See, now that's exactly what helps me learn, right there.  I really like 
to read source code written by others, because you can sort of imagine 
what's going through their head as you follow the path of execution 
through the program.  I had never even thought of making an empty class 
definition (and I still don't feel right about it), but I learned about 
the setdefault() method of dictionaries, and a few other things.

The problem with reading source code is usually that it's just way to 
big to be able to just sit down and digest.  Really, I think it's the 
best way to learn (short of actually writing code), but usually I 
download some cool looking program like Zope or Xerces or something like 
that and I just get overwhelmed because the code extends over so many 
files and you don't know where half the references are coming from, et 
cetera.  This is the kind of thing that people can learn from.

I also found it interesting that you didn't use regular expressions 
throughout the whole thing.  Normally when I think "parse", my mind goes 
"regex" and I immediately think "Perl" (yeah I like that language too, 
even though that might make me unpopular on this list ;).  But in this 
case, you didn't need them -- the file's structure was well-organized 
and you were able to use splices of line-strings and split() to grab the 
important parts of each line and place it into a meaningful attribute of 
the "package" object instance.  In fact, in my head I was wondering if 
this isn't a perfect application for an XML file, although there seems 
to be a bit more work involved in defining the structure of an XML 
file....

> # note I use the readline() idiom because there is currently 10 
> thousand plus
> # entries in the file which equates to some 90,000 lines.
>
> while 1:
>     line = fd.readline()

I assume that this comment means that you use readline() as opposed to 
readlines() ?  In other words, what are you actually saying in this 
comment (or rather why did you feel the need to clarify why you chose 
readline() ).

>
>     setattr(package, tag, value)
>

It's strange to me to see an empty class definition and then this 
function used to modify the class.  For you, it was convenient to use 
the Package class to wrap up your data into "packages" (that's a pun 
actually), but aren't class definitions usually used specifically for 
their behaviors?

In other words, I can see perfectly well that this script works 
perfectly well for your needs, so nothing else needs to be said.  But 
for the sake of my understanding, if a professor of OO programming were 
to come along, wouldn't he suggest that you define some methods of the 
Package class to do some of the work for you, rather than the 
setattr() ?  I'm curious because this is a big difference between Python 
and a language like Java.  You get more flexibility with Python, but it 
seems almost like it's too much of a shortcut.

> At this point I have a list of package classes and several dictionaries 
> holding
> lists of these packages.  There is only one instance of the actual 
> package in
> memory though, the rest are references handled by python's garbage 
> collector.
> Most handy.

Is that a specific behavior of list.append() or is that the way that 
references are passed in all Python data structures (by reference, not 
by value)?

Thanks for posting this Sean.


Erik