Strings in Python

Shawn Milo Shawn at Milochik.com
Thu Feb 8 11:58:49 EST 2007


On 2/8/07, Gary Herron <gherron at islandtraining.com> wrote:
> Johny wrote:
> > Playing a little more with strings, I found out that string.find
> > function provides the position of
> > the first occurance of the substring in the string.
> > Is there a way how to find out all substring's position ?
> > To explain more,
> > let's suppose
> >
> > mystring='12341'
> > import string
> >
> >
> >>>> string.find(mystring ,'1')
> >>>>
> > 0
> >
> > But I need to find the  possition the other '1' in mystring too.
> > Is it possible?
> > Or must I use regex?
> > Thanks for help
> > L
> >
> >
> You could use a regular expression.  The re module has s function
> "findall" that does what you want.
>
> Also, if you read the documentation for strings find method, you'll find:
>
> 1 S.find(sub [,start [,end]]) -> int
> 2
> 3 Return the lowest index in S where substring sub is found,
> 4 such that sub is contained within s[start,end].  Optional
> 5 arguments start and end are interpreted as in slice notation.
> 6
> 7 Return -1 on failure.
>
> So put your find in a loop, starting the search one past the previously
> found occurrence.
>
>   i = string.find(mystring, i+1)
>
> Gary Herron
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>

Speaking of regex examples, that's basically what I did in the script
below which James Kim and I were collaborating on yesterday and this
morning, as a result of his thread.

This matches not only a string, but a regex, then loops through each
match to do something to it. I hope this helps. I submitted this to
the list for recommendations on how to make it more Pythonic, but at
least it works.

Here are the most important, stripped down pieces:

#! /usr/bin/python

import re

#match a date in this format: 05/MAR/2006
regex = re.compile(r",\d{2}/[A-Z]{3}/\d{4},")

for line in infile:

        matches = regex.findall(line)
        for someDate in matches:

                newDate = #do something here
                line = line.replace(someDate, newDate)


Here is the full script:

#! /usr/bin/python

import sys
import re

month ={'JAN':1,'FEB':2,'MAR':3,'APR':4,'MAY':5,'JUN':6,'JUL':7,'AUG':8,'SEP':9,'OCT':10,'NOV':11,'DEC':12}
infile=file('TVA-0316','r')
outfile=file('tmp.out','w')

def formatDatePart(x):
        "take a number and transform it into a two-character string,
zero padded"
        x = str(x)
        while len(x) < 2:
                x = "0" + x
        return x

regex = re.compile(r",\d{2}/[A-Z]{3}/\d{4},")

for line in infile:
        matches = regex.findall(line)
        for someDate in matches:

                dayNum = formatDatePart(someDate[1:3])
                monthNum = formatDatePart(month[someDate[4:7]])
                yearNum = formatDatePart(someDate[8:12])

                newDate = ",%s-%s-%s," % (yearNum,monthNum,dayNum)
                line = line.replace(someDate, newDate)

        outfile.writelines(line)

infile.close
outfile.close



More information about the Python-list mailing list