Class data being zapped by method

Wed Aug 9 00:57:07 EDT 2006

Kevin M wrote:
> Inline
>
> > 1.) Why are you removing the .pyc file?
>
> After I had run the script once and subsequently changed the class
> file, I would run the script again, and it would use the pyc file from
> the older revision of the script. I got frustrated with having to
> manually delete the pyc file before rerunning the script after every
> edit, so I built it in.

Huh? Something funny is going on here. (1) You should never need to
delete a pyc file (2) A script run from the command line doesn't create
a pyc file, only imported modules get that. So: What platform are you
running it on? What version of Python? Are you running the script from
the command line, or in an IDE? Which IDE?

>
> > 2.) Reading lines from a file is better done like so:
> >
> > arrLines = open('datafiles/'+filename+'.tabdata').readlines()
> >
> > and the 'r' flag is the default, you can omit it.
>
> I know. In fact, this was the original code. However, I have read in
> many places that if the file is *massive*, which is true in my case, it
> is far more efficient to use the line-by-line implicit method I used.
> On a decent machine it doesn't really make a noticeable difference, but
> some of the ".tabdata" files I'm parsing are > 20MB plain text, so I
> figured that warranted the alternative approach.

Firstly, 20MB is not massive.
Secondly, the problem with large files is keeping a list like arrLines
hanging about when you only need one line at a time.
If you insist on keeping the list, get it in one hit using readlines(),
instead of assembling it yourself manually.

>
>
> John M, I realized what you meant about splitting the code between the
> class and the processing file. At first it seemed intuitive, but
> stepping back, it doesn't really make sense that a test would be able
> to analyze and take an inventory of *itself*. I think I'm going to
> reorganize the code such that the Test class does nothing but provide a
> barebones data structure with which to work.
>
> And regarding the file separation, it's totally personal preference. It
> scales well, at least.

One class per file does *not* scale well, even when you're not being
driven silly by having to flick backwards & forwards furiously between
files :-)

>
> Another Aside --
>
> Does anybody see any major bottlenecks in the code? I'd like to be able
> to speed up the program considerably. Psyco was really no help.
>

1. Get it refactored and working correctly first. "except IndexError:
pass" certainly won't help you win the Grand Prix; how many of those
were you doing per 20MB of data?
2. How long does it take to run ? How long would you prefer it to take?
How many lines x how many tests == 20 MB?

Cheers,
John