Need a specific sort of string modification. Can someone help?

Mitya Sirenef msirenef at lightbird.net
Sun Jan 6 01:32:46 EST 2013


On 01/05/2013 03:35 AM, Sia wrote:
> I have strings such as:
 >
 > tA.-2AG.-2AG,-2ag
 > or
 > .+3ACG.+5CAACG.+3ACG.+3ACG
 >
 > The plus and minus signs are always followed by a number (say, i). I 
want python to find each single plus or minus, remove the sign, the 
number after it and remove i characters after that. So the two strings 
above become:
 >
 > tA..,
 > and
 > ...
 >
 > How can I do that?
 > Thanks.


I think it's a bit cleaner and nicer to do something similar to
itertools.takewhile but takewhile 'eats' a single next value.
I was actually doing some stuff that also needed this. I wonder if
there's a more elegant, robust way to do this?

Here's what I got for now:


class BIterator(object):
     """Iterator with 'buffered' takewhile."""

     def __init__(self, seq):
         self.seq        = iter(seq)
         self.buffer     = []
         self.end_marker = object()
         self.last       = None

     def consume(self, n):
         for _ in range(n): self.next()

     def next(self):
         val = self.buffer.pop() if self.buffer else next(self.seq, 
self.end_marker)
         self.last = val
         return val

     def takewhile(self, test):
         lst = []
         while True:
             val = self.next()
             if val is self.end_marker:
                 return lst
             elif test(val):
                 lst.append(val)
             else:
                 self.buffer.append(val)
                 return lst

     def joined_takewhile(self, test):
         return ''.join(self.takewhile(test))

     def done(self):
         return bool(self.last is self.end_marker)


s = ".+3ACG.+5CAACG.+3ACG.+3ACG"
not_plusminus = lambda x: x not in "+-"
isdigit       = lambda x: x.isdigit()

def process(s):
     lst = []
     s   = BIterator(s)

     while True:
         lst.extend(s.takewhile(not_plusminus))
         if s.done(): break
         s.next()
         n = int(s.joined_takewhile(isdigit))
         s.consume(n)

     return ''.join(lst)


print(process(s))


Obviously it assumes the input is well-formed, but the logic would be
very easy to change to, for example, check for s.done() after each step.

  - mitya



-- 
Lark's Tongue Guide to Python: http://lightbird.net/larks/




More information about the Python-list mailing list