Clean "Durty" strings

rzed rzantow at gmail.com
Mon Apr 2 19:19:02 EDT 2007


"Diez B. Roggisch" <deets at nospam.web.de> wrote in
news:57cddlF2bsqdbU2 at mid.uni-berlin.de: 

>> 
>> If the OP is constrained to standard libraries, then it may be
>> a question of defining what should be done more clearly. The
>> extraneous spaces can be removed by tokenizing the string and
>> rejoining the tokens. Replacing portions of a string with
>> equivalents is standard stuff. It might be preferable to create
>> a function that will accept lists of from and to strings and
>> translate the entire string by successively applying the
>> replacements. From what I've seen so far, that would be all the
>> OP needs for this task. It might take a half- dozen lines of
>> code, plus the from/to table definition. 
> 
> The OP had <br>-tags in his text. Which is _more_ than a half
> dozen lines of code to clean up. Because your simple
> replacement-approach won't help here: 
> 
> <br>foo <br> bar </br>
> 
> Which is perfectly legal HTML, but nasty to parse.

Well, as I said, given the input the OP supplied, it's not even 
necessary to parse it. It isn't clear what the true desired 
operation is, but this seems to meet the criteria given:

<code -- the string 's' is wrapped nastily, but ...>
s ="""\
bonne mentalité mec!:) \n                        <br>bon 
pour
info moi je suis un serial posteur arceleur dictateur ^^*
\n                        <br>mais pour avoir des resultats 
probant il
faut pas faire les mariolles, comme le "fondateur" de 
bvs
krew \n
mais pour avoir des resultats probant il faut pas faire les 
mariolles,
comme le "fondateur" de bvs krew \n"""

fromlist = ['<br>', 'é', '"']
tolist   = ['',     'é', '"' ]


def withReplacements( s, flist,tlist ):
    for ix, f in enumerate(flist):
        t = tlist[ix]
        s = s.replace( f,t )
    return s    

print withReplacements(' '.join(s.split()),fromlist,tolist)

</code>

If the question is about efficiency or robustness or generality, 
then that's another set of issues, but that's for the 1.1 version 
to handle. 

-- 
rzed




More information about the Python-list mailing list