[Tutor] String matching?

Tue Dec 7 19:12:36 CET 2004

Hi all, 

It's a messy situation - basically, the HTML is from a tool used at
work, which was, ahem, 'maintained' by someone who loved the flashy
bits. Like animated gifs of the bad CG baby from Ally McBeal playing
soccer with itself. So, yeah, instead of using href links, he used
Java applets. 272 of them.

Which takes appx. 2- 5min to open in Mozilla on all the older work
boxes. So someone who knows HTML has taken over, and I'm helping out.
So, I have to email changes backwards and forth, because heaven forbid
they install Python. I mean, they've already got Perl, Java, and
Visual Basic sitting there. Luckily the 'hackers' are waiting for
Python.

So, all I want to do is rip the urls from the applet, and replace it. 

>test2 = test.replace('=\n', '')
>test2 = test2.replace('=3D"', '="')

Thanks, Kent, for some reason I totally missed that.

And thanks for the re, hopefully I won't have to use it, but it gives
me a starting point to poke the re module from.

Regards,

Liam Clarke
On Tue, 07 Dec 2004 09:03:45 -0500, orbitz <orbitz at ezabel.com> wrote:
> Instead of copying and pasting and then just doing a simple match, why
> not use urllib2 to download the html and then run through it with HTMLParse?
> 
> 
> 
> Liam Clarke wrote:
> 
> >Hi all,
> >
> >I have a large amount of HTML that a previous person has liberally
> >sprinkled a huge amount of applets through, instead of html links,
> >which kills my browser to open.
> >
> >So, want to go through and replace all applets with nice simple links,
> >and want to use Python to find the applet, extract a name and an URL,
> >and create the link.
> >
> >My problem is, somewhere in my copying and pasting into the text file
> >that the HTMl currently resides in, it got all messed up it would
> >seem, and there's a bunch of strange '=' all through it. (Someone said
> >that the code had been generated in Frontpage. Is that a good thing or
> >bad thing?)
> >
> >So, I want to search for <applet code=, but it may be in the file as
> >
> ><app=
> >let
> > code
> >
> >or <applet
> >        code
> >
> >or <ap=
> >plet
> >
> >etc. etc. (Full example of yuck here
> >http://www.rafb.net/paste/results/WcKPCy64.html)
> >
> >So, I want to be write a search that will match <applet code and
> ><app=\nlet code (etc. etc.) without having to strip the file of '='
> >and '\n'.
> >
> >I was thinking the re module is for this sort of stuff? Truth is, I
> >wouldn't know where to begin with it, it seems somewhat powerful.
> >
> >Or, there's a much easier way, which I'm missing totally. If there is,
> >I'd be very grateful for pointers.
> >
> >Thanks for any help you can offer.
> >
> >Liam Clarke
> >
> >
> >
> 
> 

-- 
'There is only one basic human right, and that is to do as you damn well please.
And with it comes the only basic human duty, to take the consequences.