match, concatenate based on filename

Patrick Maupin pmaupin at gmail.com
Thu Nov 3 23:08:42 EDT 2011


On Nov 3, 9:55 pm, Matt <macma... at gmail.com> wrote:
> Hi All,
>
> I am trying to concatenate several hundred files based on their filename..  Filenames are like this:
>
> Q1.HOMOblast.fasta
> Q1.mus.blast.fasta
> Q1.query.fasta
> Q2.HOMOblast.fasta
> Q2.mus.blast.fasta
> Q2.query.fasta
> ...
> Q1223.HOMOblast.fasta
> Q1223.mus.blast.fasta
> Q1223.query.fasta
>
> All the Q1's should be concatenated together in a single file = Q1.concat.fasta.. Q2's go together, Q3's and so on...
>
> I envision something like
>
> for file in os.listdir("/home/matthew/Desktop/pero.ngs/fasta/final/"):
>         if file.startswith("Q%i"):
>            concatenate...
>
> But I can't figure out how to iterate this process over Q-numbers 1-1223
>
> Any help appreciate.

I haven't tested this, so may have a typo or something, but it's often
much cleaner to gather your information, massage it, and then use,
than it is to gather it and use it in one go.


from collections import defaultdict

filedict = defaultdict(list)

for fname in sorted(os.listdir(mydir)):
    if fname.startswith('Q') and '.' in fname:
        filedict[fname[:fname.find('.')]].append(fname)

for prefix, fnames in filedict.iteritems():
    #print prefix, fnames
    concatenate...

HTH,
Pat



More information about the Python-list mailing list