[Tutor] learning curve

Daniel Klose perlmunky at googlemail.com
Mon Jan 29 17:32:59 CET 2007


Thanks Kent,

I now have:

target = sys.argv[1]

seqDictionary =  {} # python 2.5 import defaultdict from collections.
structureArray = [0, 0, 0]

#THIS TAKES THE PLACE OF THE STANDARD PERL $DIR,$file
#shorter to do the os.path.join once to a variable.
if (os.path.isfile(os.path.join(structDir, target)) and
os.path.isfile(os.path.join(seqDir, target))):
    structureHandle = open(os.path.join(structDir, target))
    structureString = structureHandle.readline()
   
    sequenceHandle  = open(os.path.join(seqDir, target))
    sequenceString = sequenceHandle.readline()

    if len(structureString) == len(sequenceString):
       
        for strChar, seqChar in zip(structureString, sequenceString):
            #SET DEFAULT VALUE AS ZERO ELSE INCREMENT
            seqDictionary[seqChar] = seqDictionary.get(seqChar, 0) + 1
            if (strChar.count('-')):
                structureArray[0] += 1
            elif (strChar.count('H')):
                structureArray[1] += 1
            elif (strChar.count('E')):
                structureArray[2] += 1
            else:
                print strChar, " is not valid"
                break;
else:
    print "Some data is missing!\n"

The reason I want to create a dictionary of lists is because for each of
the keys in the dictionary I wanted to keep tabs on the associated
structure.  For example:

dictionary[A] = [0,0,0]

list x element = A
list y element = '-'

then dictionary[A][0] = 1

print dictionary[A]
: [1, 0, 0]

I thought that a dictionary would be the best way (it is the same way as
I have done it in perl and java).  I am using google but having limited
success.

*Do you folks bottom post or top post?  The users of the perl list are
sensitive about this stuff!

I am only running python 2.4 and the system admin doesn't like me so I
won't ask him to upgrade it.

Kent Johnson wrote:
> Daniel Klose wrote:
>> Hi all,
>>
>> All I would like to do is take a file and count the number of times a
>> letter occurs in it.  It so happens that there letters are amino acids.
>> There are also some other checks in the script but these are not a
>> concern just yet.
>>
>> What I would like to do is create a dictionary of arrays.
>> In perl (my current scripting language of choice) I would simply put:
>>  ${$dictionary{$key}}[$element] += 1
>> I have no idea how to create this kind of structure in python.
>
> I don't speak perl much but it looks like you have a dict whose values
> are lists. Not quite the same as what you have below, which is a dict
> whose values are integers.
>>
>> Also I have a while loop.  If this were perl, rather than using the i =
>> 0 while(i < len(x)):
>> I would do : for (my $i = 0; $i < @array; $i++) {}.  I have found the
>> range function but I am not sure how to use it properly.
>
> You could use
>   for i in range(len(strArray)):
> but this is not good usage; better to iterate over strArray directly.
>
>> What I would like to do is create an index that allows me to access the
>> same element in two arrays (lists) of identical size.
>
> You can use the zip() function to process two lists in parallel:
> for x, y in zip(xlist, ylist):
>   # x is an element from xlist
>   # y is the corresponding element from ylist
>>
>> I have pasted in my current code below, I would be very grateful if you
>> could help me trim up this code.
>> #!/usr/bin/python
>>
>> import sys, os
>>
>> structDir = '/home/danny/dataset/structure/'
>> seqDir   = '/home/danny/dataset/sequence/'
>>
>> target = sys.argv[1]
>>
>> seqFile = seqDir      + target
>> strFile = structDir   + target
>
> os.path.join() would be more idiomatic here though what you have works.
>>
>> seqDictionary = {}
>>
>> if (os.path.isfile(seqFile) and os.path.isfile(strFile)):
>>        structureHandle = open(strFile)
>>     structureString = structureHandle.readline()
>>        sequenceHandle  = open(seqFile)
>>     sequenceString = sequenceHandle.readline()
>>        strArray = list(structureString)
>>     seqArray = list(sequenceString)
>
> You don't have to convert to lists; strings are already sequences.
>>
>>     if len(strArray) == len(seqArray):
>>         print "Length match\n"
>>                i=0
>>         while(i < len(strArray)):
>>             if seqDictionary.has_key(seqArray[i]):
>>                 seqDictionary[seqArray[i]] += 1
>>             else:
>>                 seqDictionary[seqArray[i]] = 1
>>                            i += 1
>
> The idiomatic way to iterate over sequenceString is just
>   for c in sequenceString:
>
> You don't seem to be using strArray except to get the length. Maybe
> this is where you need zip()? For example you could say
>   for structChr, seqChr in zip(structureString, sequenceString):
>
> An alternative to your conditional with has_key() is to use dict.get()
> with a default value:
>   seqDictionary[c] = seqDictionary.get(c, 0) + 1
>
> so the whole loop becomes just
>   for c in sequenceString:
>     seqDictionary[c] = seqDictionary.get(c, 0) + 1
>
> In Python 2.5 you can use defaultdict to create a dict with a default
> value of 0:
> from collections import defaultdict
> seqDictionary = defaultdict(int)
>
> then in the loop you can say
>     seqDictionary[c] += 1
>
> Kent
>
>




More information about the Tutor mailing list