Splitting on a word

Steven D'Aprano steve at REMOVETHIScyber.com.au
Wed Jul 13 10:12:15 EDT 2005


On Wed, 13 Jul 2005 06:19:54 -0700, qwweeeit wrote:

> Hi all,
> I am writing a script to visualize (and print)
> the web references hidden in the html files as:
> '<a href="web reference"> underlined reference</a>'
> Optimizing my code, 

[red rag to bull]
Because it was too slow? Or just to prove what a macho programmer you are?

Is your code even working yet? If it isn't working, you shouldn't be
trying to optimizing buggy code.


> I found that an essential step is:
> splitting on a word (in this case 'href').

Then just do it:

py> '<a href="web reference"> underlined reference</a>'.split('href')
['<a ', '="web reference"> underlined reference</a>']

If you are concerned about case issues, you can either convert the
entire HTML file to lowercase, or you might write a case-insensitive
regular expression to replace any "href" regardless of case with the
lowercase version.

[snip]

> To be sure as delimiter I choose chr(127)
> which surely is not present in the html file.

I wouldn't bet my life on that. I've found some weird characters in HTML
files.


-- 
Steven.




More information about the Python-list mailing list