Changing every other instance of <B> in a file

Carel Fellinger cfelling at iae.nl
Tue Mar 27 10:20:09 EST 2001


Lars Klæboe <larskl at klassekampen.no> wrote:
...
> 	Blablabla <B> talktalk <B> blabla blabla balbalblabla

> The resulting file.html (html)

> 	Blablabla <B> talktalk </B> blabla blabla balbalblabla

> As you can tell, every other instance of <B> is to be changed into </B>.

Let's hope those B-tags aren't nested, then the following might work:

###define an input
file = """
<BB> talktalk<BB> talktalk
<BB> talktalk
  <CC> blabla blabla balbalblabla
    <B> talktalk
    <B> blabla blabla balbalblabla
  <CC> talktalk
<BB> blabla blabla balbalblabla
"""
###define the appropriate html equivalent
result = """
<BB> talktalk<\BB> talktalk
<BB> talktalk
  <CC> blabla blabla balbalblabla
    <B> talktalk
    <\B> blabla blabla balbalblabla
  <\CC> talktalk
<\BB> blabla blabla balbalblabla
"""

import re

### re.sub can deal with functions instead of simple strings to substitute
### but we need a function with state, so let's use a callable class instead
class Change:
    def __init__(self):
        self.dict = {}

    def __call__(self, matchobj):
        key = matchobj.group()
        val = self.dict.get(key, key)
        if key == val:
            self.dict[key] = val[:1] + '\\' + val[1:]
        else:
            self.dict[key] = key
        return val


assert result == re.sub(r'(<[^>]+>)', Change(), file)
-- 
groetjes, carel



More information about the Python-list mailing list