No subject

Shashank Singh shashank.sunny.singh at gmail.com
Sat May 1 02:31:56 EDT 2010


Here is my quick take on it using re

import re
strings = ["1 ALA Helix Sheet Helix Coil",
           "2 ALA Coil Coil Coil Sheet",
           "3 ALA Helix Sheet Coil Turn",
           "4 ALA Helix Sheet Helix Sheet"]

regex = re.compile(r" (.+?\b)(?=.*\1)")

for s in strings:
  moreThanOnce = list(set(regex.findall(s)))
  count = len(moreThanOnce)
  if count == 1: print moreThanOnce[0]
  elif count == 2: print "doubtful"
  else: print "error"

Although this is short, its probably not the most efficient.
A more verbose and efficient version would be

for s in strings:
  l = s.split()[2:]
  counts = {}
  for ss in l:
    if counts.has_key(ss): counts[ss] += 1
    else: counts[ss] = 1
  filtered = [ss for ss in counts if counts[ss] >= 2]
  filteredCount = len(filtered)
  if filteredCount == 1:
    print filtered[0]
  elif filteredCount > 1:
    print "doubtful"
  else:
    print "error"

HTH

On Sat, May 1, 2010 at 9:03 AM, mannu jha <mannu_0523 at rediffmail.com> wrote:

> Dear all,
>
> I am trying my problem in this way:
>
> import re
> expr = re.compile("Helix Helix| Sheet Sheet| Turn Turn| Coil Coil")
> f = open("CalcSecondary4.txt")
> for line in f:
> if expr.search(line):
> print line
>
> but with this it is printing only those line in which helix, sheet, turn
> and coil are coming twice. Kindly suggest how should I modify it so that
> whatever secondary structure is coming more than or equal to two times it
> should write that as final secondary structure and if two seconday structure
> are coming two-two times in one line itself like:
>
> 4 ALA Helix Sheet Helix Sheet
>
> then it should write that as doubtful and rest it should write as error.
>
> Thanks,
>
>
> Dear all,
>
> I have a file like:
>
> 1 ALA Helix Sheet Helix Coil
> 2 ALA Coil Coil Coil Sheet
> 3 ALA Helix Sheet Coil Turn
>
> now what I want is that write a python program in which I will put the
> condition that in each line whatever secondary structure is coming more than
> or equal to two times it should write that as final secondary structure and
> if two seconday structure are coming two-two times in one line itself like:
>
> 4 ALA Helix Sheet Helix Sheet
>
> then it should write that as doubtful and rest it should write as error.
>
> Thanks,
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>


-- 
Regards
Shashank Singh
Senior Undergraduate, Department of Computer Science and Engineering
Indian Institute of Technology Bombay
shashank.sunny.singh at gmail.com
http://www.cse.iitb.ac.in/~shashanksingh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20100501/7fa3c2c1/attachment.html>


More information about the Python-list mailing list