Trying to run a program within a python script on multiple output files creating by the same script

Unaiza Batool ubatool at ufl.edu
Mon Apr 24 15:16:34 EDT 2017


On Monday, April 24, 2017 at 2:44:38 PM UTC-4, Peter Otten wrote:
> ubatool at ufl.edu wrote:
> 
> > I'm writing a script that takes two command line options, a file
> > containing barcodes and a file containing sequences. I've managed to
> > create output files for each barcode with the matching and corresponding
> > sequences in it.
> > 
> > For the next part of my script, I'm trying to create more output files by
> > using the output files creating earlier as input files. So my first output
> > files contains sequences in a simple format. Now each needs to be
> > converted to fasta, then used mafft and quicktree commands on them.
> > 
> > However, this part only converts one of the original output files into
> > fasta, running mafft and quicktree on that one.
> 
> You need to make the conversions while you're in the 'for barcode ...' loop.
> To make that as simple as possible put the code for one conversion into a 
> function, e. g.
> 
> > cmd = "mafft %s > %s" % (fastafname,mafftfname)
> > sys.stderr.write("command: %s\n" % cmd)
> > os.system(cmd)
> > sys.stderr.write("command done\n")
> 
> may become (all code untested)
> 
> def fasta_to_mafft(infile, outfile):
>     cmd = "mafft %s > %s" % (infile, outfile)
>     print >> sys.stderr, "running", cmd
>     os.system(cmd)
> 
> > #!/usr/bin/python
> > 
> > import sys
> > import os
> > 
> > 
> > fname           = sys.argv[2]
> > barcodefname  = sys.argv[1]
> > 
> > barcodefile = open(barcodefname, "r")
> > for barcode in barcodefile:
> >         barcode = barcode.strip()
> >         print "barcode: %s" %  barcode
> >         infname = "%s.%s" % (fname,barcode)
> >         inf = open(infname, "w")
> >         handle1 = open(fname, "r")
> >         for lines in handle1:
> >                 seqid = lines[0:3]
> >                 i = 4
> >                 potential_barcode = lines[i:(i+len(barcode))]
> >                 if potential_barcode == barcode:
> >                         outseq = lines[i+len(barcode):]
> >                         sys.stdout.write(outseq)
> >                         inf.write("%s %s%s" % (seqid,barcode,outseq))
> >         handle1.close()
> >         inf.close()
> 
> At this point infname has the correct value, the file of that name exists, 
> so you can perform your conversions
> 
>           simple_to_fasta(infname, infname + ".fasta")
>           fasta_to_mafft(infname + ".fasta", infname + ".mafft")
>           ...

i'm confused here as the script gives an error saying simple_to_fasta and fasta_to_mafft are not defined. How do I combine the part of infile, outfile with the conversion. You said it should go in the for barcode loop? Or just it just go after infname has it's correct value or just before each command when the new output file is needed?
> 
> Using functions has the big advantage that you can test them individually 
> before putting them all together, and you can replace one implementation 
> with a better one without touching the rest of your code.
> 
> > barcodefile.close()
> > 
> > infname    = "%s.%s" %(fname,barcode)
> > fastafname = infname + ".fasta"
> > mafftfname = fastafname + ".mafft"
> > stfname    = infname + ".stock"
> > 
> > 
> > # convert simple to fasta #
> > for file in infname
> >         handle = open(infname, "r")
> >         outf   = open(fastafname, "w")
> >         for line in handle:
> >                 linearr = line.split()
> >                 seqids = linearr[0]
> >                 seq   = linearr[1]
> >                 outf.write(">%s\n%s\n" % (seqids,seq))
> >         handle.close()
> >         outf.close()
> > 
> > 
> > # align using mafft #
> > 
> > 
> > cmd = "mafft %s > %s" % (fastafname,mafftfname)
> > sys.stderr.write("command: %s\n" % cmd)
> > os.system(cmd)
> > sys.stderr.write("command done\n")
> > 
> > 
> > #convert fasta alignment to stockholm
> > # fasta_to_stockholm seq.data.txt.fasta.mafft > TEST.stockholm
> > cmd = "fasta_to_stockholm %s > %s" % (mafftfname, stfname)
> > sys.stderr.write("command: %s\n" % cmd)
> > os.system(cmd)
> > sys.stderr.write("command done\n")
> > 
> > 
> > #run quicktree to get distance matrix
> > # quicktree -out m TEST.stockholm
> > cmd = "quicktree -out m %s" % stfname
> > sys.stderr.write("Command: %s\n" % cmd)
> > os.system(cmd)
> > sys.stderr.write("command done\n")




More information about the Python-list mailing list