[Tutor] regex problem

Liam Clarke cyresse at gmail.com
Wed Jan 5 05:02:06 CET 2005


Hi Michael, 

Is a non regex way any help? I can think of a way that uses string methods - 


space=" "

stringStuff="Stuff with multiple spaces"
indexN = 0 
ranges=[]
while 1:
   try:
      indexN=stringStuff.index(space, indexN)
      if indexN+1 == space:
         indexT = indexN
         while 1:
            indexT += 1
            if not indexT == " ":
               ranges.append((indexN, indexT))
               break
         indexN=indexT +1
        else:
          indexN += 1
    except ValueError:
        ranges.reverse()
         for (low, high) in ranges:
              stringStuff.replace[stringStuff[low:high], space]

HTH
Liam Clarke
             


On Tue, 4 Jan 2005 15:39:18 -0800, Michael Powe <michael at trollope.org> wrote:
> Hello,
> 
> I'm having erratic results with a regex.  I'm hoping someone can
> pinpoint the problem.
> 
> This function removes HTML formatting codes from a text email that is
> poorly exported -- it is supposed to be a text version of an HTML
> mailing, but it's basically just a text version of the HTML page.  I'm
> not after anything elaborate, but it has gotten to be a bit of an
> itch.  ;-)
> 
> def parseFile(inFile) :
>     import re
>     bSpace = re.compile("^ ")
>     multiSpace = re.compile(r"\s\s+")
>     nbsp = re.compile(r"&nbsp;")
>     HTMLRegEx =
>     re.compile(r"(&lt;|<)/?((!--.*--)|(STYLE.*STYLE)|(P|BR|b|STRONG))/?(&gt;|>)
> ",re.I)
> 
>     f = open(inFile,"r")
>     lines = f.readlines()
>     newLines = []
>     for line in lines :
>         line = HTMLRegEx.sub(' ',line)
>         line = bSpace.sub('',line)
>         line = nbsp.sub(' ',line)
>         line = multiSpace.sub(' ',line)
>         newLines.append(line)
>     f.close()
>     return newLines
> 
> Now, the main issue I'm looking at is with the multiSpace regex.  When
> applied, this removes some blank lines but not others.  I don't want
> it to remove any blank lines, just contiguous multiple spaces in a
> line.
> 
> BTB, this also illustrates a difference between python and perl -- in
> perl, i can change "line" and it automatically changes the entry in
> the array; this doesn't work in python.  A bit annoying, actually.
> ;-)
> 
> Thanks for any help.  If there's a better way to do this, I'm open to
> suggestions on that regard, too.
> 
> mp
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 


-- 
'There is only one basic human right, and that is to do as you damn well please.
And with it comes the only basic human duty, to take the consequences.


More information about the Tutor mailing list