Extract email address from Java script in html source using python

VanguardLH V at nguard.LH
Sun May 24 02:17:18 EDT 2015


Steve Hayes wrote:

> On Sat, 23 May 2015 19:01:55 +1000, Chris Angelico <rosuav at gmail.com>
> wrote:
> 
>>On Sat, May 23, 2015 at 4:46 PM, savitha devi <savithad8 at gmail.com> wrote:
>>> I am developing a web scraper code using HTMLParser. I need to extract
>>> text/email address from java script with in the HTMLCode.I am beginner level
>>> in python coding and totally lost here. Need some help on this. The java
>>> script code is as below:
>>>
>>> <script type='text/javascript'>
>>>  //<!--
>>>  document.getElementById('cloak48218').innerHTML = '';
>>>  var prefix = 'ma' + 'il' + 'to';
>>>  var path = 'hr' + 'ef' + '=';
>>>  var addy48218 = 'info' + '@';
>>>  addy48218 = addy48218 + 'tsv-neuried' + '.' +
>>> 'de';
>>>  document.getElementById('cloak48218').innerHTML += '<a ' + path + '\'' +
>>> prefix + ':' + addy48218 + '\'>' + addy48218+'<\/a>';
>>>  //-->
>>
>>This is deliberately being done to prevent scripted usage. What
>>exactly are you needing to do this for?
> 
> To sell addresses to spammers, of course.

The boob that uses this javascripted obfuscation (by slicing up the URL
across variables and using concatenation within a variable) hasn't a
clue that the javascript or user clicking on a URL will still have to
eventually go to the destination so it will still get blocked.  Duh!
Nothing is actually cloaked by the javascript (it's just another means
of building up the <A> tag) and the URL string (even if it used a
decimal value instead of IP-dotted) still has to connect to somewhere
and that gets detected and blocked.  Slicing up a URL across variables
and concantenation within a variable is a child's ploy to obfuscate.
Apparently savitha can't even distinguish between an e-mail address and
a URL string.



More information about the Python-list mailing list