Need help with a program

D'Arcy J.M. Cain darcy at druid.net
Thu Jan 28 10:44:37 EST 2010


On Thu, 28 Jan 2010 07:07:04 -0800 (PST)
evilweasel <karthikramaswamy88 at gmail.com> wrote:
> I am a newbie to python, and I would be grateful if someone could

Welcome.

> point out the mistake in my program. Basically, I have a huge text
> file similar to the format below:

You don't say how it isn't working.  As a first step you should read
http://catb.org/~esr/faqs/smart-questions.html.

> The text is nothing but DNA sequences, and there is a number next to
> it. What I will have to do is, ignore those lines that have 0 in it,

Your code doesn't completely ignore them.  See below.

> and print all other lines (excluding the number) in a new text file
> (in a particular format called as FASTA format). This is the program I
> wrote for that:
> 
> seq1 = []
> list1 = []
> lister = []
> listers = []
> listers1 = []
> a = []
> d = []
> i = 0
> j = 0
> num = 0

This seems like an awful lot of variables for such a simple task.

> 
> file1 = open(sys.argv[1], 'r')
> for line in file1:

This is good.  You aren't trying to load the whole file into memory at
once.  If the file is huge as you say then that would have been bad.  I
would have made one small optimization that saves one assignment and
one extra variable.

  for line in open(sys.argv[1], 'r'):

>     if not line.startswith('\n'):
>         seq1 = line.split()
>         if len(seq1) == 0:
>             continue

This is redundant and perhaps not even correct at the end of the file.
It assumes that the last line ends with a newline.  Look at what
'\n'.split() gives you and see if you can't improve the above code.

Another small optimization - "if seq1" is better than "if len(seq1)".

> 
>         a = seq1[0]
>     	list1.append(a)

Aha!  I may have found your bug.  Are you mixing tabs and spaces?
Don't do that.  Either always use spaces or always use tabs.  My
suggestion is to use spaces and choose a short indent such as three or
even two but that's a religious issue.

> 
>     	d = seq1[1]
>     	lister.append(d)

You can also do "a, d = seq1".  Of course you must be sure that you
have two fields.  Perhaps that's guaranteed for your input but a quick
sanity test wouldn't hurt here.

However, I don't understand all of the above.  It may also be a source
of problems.  You say the files are huge.  Are you filling up memory
here?  You did the smart thing reading the file but you lose it here.
In any case, see below.

> b = len(lister)
> for j in range(0, b):

Go lookup zip()

>     if lister[j] == 0:

I think that you will find that lister[j] is "0", not 0.

>         listers.append(j)
>     else:
>         listers1.append(j)

Why are you collecting the input?  Just toss the '0' ones and write the
others lines directly to the output.

Hope this helps with this script and in further understanding the power
and simplicity of Python.  Good luck.

-- 
D'Arcy J.M. Cain <darcy at druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.



More information about the Python-list mailing list