[Tutor] python pipeline

jarod_v6 at libero.it jarod_v6 at libero.it
Mon Sep 1 16:58:48 CEST 2014


Dear all,
I'll try to write a pipeline starting from a csv file where I write the name 
and the path of my files.
example.csv
Name,FASTQ1,FASTQ2,DIRECTORY
sampleA,A_R1_.fastq.gz,A_R2_.fastq.gz,108,~/FASTQ/
sampleB,B_R1_.fastq.gz,B_R2_.fastq.gz,112,~/FASTQ/



On that list I need to send each time 3 different script whic are depend one 
to the other. So I need to run1 and only whe it finisched start the second and 
then the 3.
One of the problems teach script write the output only in the same directory 
where I launch the program so I need to create. I set the output directory and 
the I want to obtain  this folder view
.
├── sampleA
│   ├── ref.txt
│   └── second
└── sampleB
    ├── ref.txt
    └── second
I have problems on how to move in different folder  and how can use subprocess 
for execute all.
Any idea in how can I do this?

def Staralign(file,pos):
		import subprocess

		global Path             
		global Read1
		global Read2    
		global Nome
		global label
		Read1 = []
		Read2 = []
		Nome = []
		Path = []
		label = []
		with open(file) as p:
			for i in p:
				lines = i.rstrip("\n").split(",")
				if lines[0] != "Name":
					Path.append(lines[10])
					Nome.append(lines[0])
					Read1.append(lines[7])
					Read2.append(lines[8])
		out = open("toRun.sh","w")
		out.write("#!/bin/bash\n")
		global pipe
		pipe =[]
		dizionario = {}
		for i in range(len(Nome)):
			dx =str("".join(Path[i])+ "/"+ "".join(Read1[i]))
			sn =str("".join(Path[i])+"/"+"".join(Read2[i]))
			if not os.path.exists(pos+"/"+i):
				os.makedirs(pos+"/"+i)
			print >>out, "cd " + pos +"\n"
			
			print  >>out,"~/software/STAR_2.3.0e.Linux_x86_64_static/STAR --genomeDir 
/home/sbsuser/databases/Starhg19/GenomeDir/ --runMode alignReads --readFilesIn 
"+ dx  +" "+ ""+ sn +"   --runThreadN 12  --readFilesCommand zcat " +"\n"
			step_1_out =["~/software/STAR_2.3.0e.Linux_x86_64_static/STAR --genomeDir 
/home/sbsuser/databases/Starhg19/GenomeDir/ --runMode alignReads --readFilesIn %
s   %s  --runThreadN 12  --readFilesCommand zcat "%(dx,dn)]
			print  >>out,"cd " +"  $PWD"+"/"+ "hg19_second/" +"\n"

			print  >>out,"~/software/STAR_2.3.0e.Linux_x86_64_static/STAR --runMode 
genomeGenerate --genomeDir"+"  $PWD"+"/"+ "hg19_second/  --genomeFastaFiles 
~/databases/bowtie2Database/hg19.fa --sjdbFileChrStartEnd " +"$PWD"+"/"+ "SJ.
out.tab" +" --sjdbOverhang 49   --runThreadN 12" +"\n"
			pipe.append("~/software/STAR_2.3.0e.Linux_x86_64_static/STAR --genomeDir 
/home/sbsuser/databases/Starhg19/GenomeDir/ --runMode alignReads --readFilesIn 
"+ dx  +" "+  ""+ sn +"   --runThreadN 12  --readFilesCommand zcat ")
			print  >>out,"cd  .." + "\n"
			print  >>out,"~/software/STAR_2.3.0e.Linux_x86_64_static/STAR --genomeDir"+ 
"  $PWD"+"/"+ "hg19_second/GenomeDir/  --runMode alignReads --readFilesIn "+ 
dx  +" "+ ""+ sn +"   --runThreadN 12  --readFilesCommand zcat " +"\n"
			dizionario.setdefault()
#	return Nome,Path,Read1,Read1

This isthe function I wrote but with this way I'm only able to write a bash 
script..









More information about the Tutor mailing list