[Tutor] simple regex question

Alan Gauld alan.gauld at yahoo.co.uk
Sun May 1 19:08:14 EDT 2016


On 01/05/16 20:04, bruce wrote:
> Hey all..
>
> Yeah, the sample I'm dealing with is html.. I'm doing some "complex"
> extraction, and i'm modifying the text to make it easier/more robust..
>
> So, in this case, the ability to generate the line is what's needed
> for the test..
>

But as Peter explained HTML has no concept of a "line". Trying to extract a
line from HTML depends totally on how the HTML is formatted by the author
in the original file, but if you read it from a web server it may totally
rearrange the content(while maintaining the HTML), thus breaking your code.
Similarly if it gets sent via an email or some other mechanism.

What you really want will be defined by the tags within which it lives.
And that's what a parser does - finds tags and extracts the content.
A regex can only do that for a very limited set of inputs. and it certainly
can't guarantee a "line" of output. Even if it seems to work today it
could fail completely next week even if the original HTML doesn't change.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos



More information about the Tutor mailing list