Issue values dictionary

alex23 wuwei23 at
Tue Jun 4 23:17:03 EDT 2013

On Jun 5, 12:41 pm, claire morandin <claire.moran... at> wrote:
> But I have a problem storing all size length to the value size as it is always comes back with the last entry.
> Could anyone explain to me what I am doing wrong and how I should set the values for each dictionary?

Your code has two for loops, one that reads ERCC.txt into a dict, and
one that reads blast.txt into a dict. The first assigns to
`transcript`, the second to `blasttranscript`. When the loops are
finished, you're using the _last_ value set for both `transcript` and
`blasttranscript`. So, really, you want _three_ loops: two to load the
files into dicts, then another to compare the two of them. If the
transcripts in blast.txt are guaranteed to be a subset of ERCC.txt,
then you could get away with two loops:

# convenience function for splitting lines into values
    def get_transcript_and_size(line):
        columns = line.strip().split()
        return columns[0].strip(), int(columns[1].strip())

    # read in blast_file
    blast_transcripts = {}
    with open('transcript_blast.txt') as blast_file:
        # this is a context manager, it'll close the file when it's
        for line in blast_file:
            blasttranscript, blastsize = get_transcript_and_size(line)
            blast_transcripts[blasttranscript] = blastsize

    # read in ERCC and compare to blast
    with open('transcript_ERCC.txt') as ercc_file, \
         open('Not_sequenced_ERCC_transcript.txt', 'w') as
unknown_transcript, \
         open('transcript_out.txt', 'w') as out_file:
        # this is called a _nested_ context manager, and requires 2.7+
or 3.1+
        for line in ercc_file:
            ercctranscript, erccsize = get_transcript_and_size(line)
            if ercctranscript not in blast_transcripts:
                print >> unknown_transcript, ercctranscript
                is_ninety_percent = blast_transcripts[ercctranscript]
>= 0.9*erccsize
                print >> out_file, ercctranscript, is_ninety_percent

I've cleaned up your code a bit, using more similar naming schemes and
the same open/write procedures for all file access. Generally, any
time you're repeating code, you should stick it into a function and
use that instead, like the `get_transcript_and_size` func. If the
columns in your two files are separated by tabs, or always by the same
number of spaces, you can simplify this even further by using the csv

Hope this helps.

More information about the Python-list mailing list