Splitting on a word
Steven D'Aprano
steve at REMOVETHIScyber.com.au
Wed Jul 13 10:12:15 EDT 2005
On Wed, 13 Jul 2005 06:19:54 -0700, qwweeeit wrote:
> Hi all,
> I am writing a script to visualize (and print)
> the web references hidden in the html files as:
> '<a href="web reference"> underlined reference</a>'
> Optimizing my code,
[red rag to bull]
Because it was too slow? Or just to prove what a macho programmer you are?
Is your code even working yet? If it isn't working, you shouldn't be
trying to optimizing buggy code.
> I found that an essential step is:
> splitting on a word (in this case 'href').
Then just do it:
py> '<a href="web reference"> underlined reference</a>'.split('href')
['<a ', '="web reference"> underlined reference</a>']
If you are concerned about case issues, you can either convert the
entire HTML file to lowercase, or you might write a case-insensitive
regular expression to replace any "href" regardless of case with the
lowercase version.
[snip]
> To be sure as delimiter I choose chr(127)
> which surely is not present in the html file.
I wouldn't bet my life on that. I've found some weird characters in HTML
files.
--
Steven.
More information about the Python-list
mailing list