Splitting on a word

qwweeeit at yahoo.it qwweeeit at yahoo.it
Wed Jul 13 09:19:54 EDT 2005


Hi all,
I am writing a script to visualize (and print)
the web references hidden in the html files as:
'<a href="web reference"> underlined reference</a>'
Optimizing my code, I found that an essential step is:
splitting on a word (in this case 'href').

I am asking if there is some alternative (more pythonic...):

# SplitMultichar.py

import re

# string s simulating an html file
s='ffy: ytrty <a href="www.python.org">python</a> fyt <A
HREF="wwwx">wx</A>  dtrtf'
p=re.compile(r'\bhref\b',re.I)

lHref=p.findall(s)  # lHref=['href','HREF']
# for normal html files the lHref list has more elements
#     (more web references)

c='~' # char to be used as delimiter
# c=chr(127) # char to be used as delimiter
for i in lHref:
    s=s.replace(i,c)

# s ='ffy: ytrty <a ~="www.python.org">python</a> fyt <A
~="wwwx">wx</A>  dtrtf'

list=s.split(c)
# list=['ffy: ytrty <a ', '="www.python.org">python</a> fyt <A ',
'="wwwx">wx</A>  dtrtf']
#=-----------------------------------------------------

If you save the original s string to xxx.html, any browser
can visualize it.
To be sure as delimiter I choose chr(127)
which surely is not present in the html file.
Bye.




More information about the Python-list mailing list