passing external data to web forms

Tim Chase python.list at tim.thechases.com
Mon May 31 17:21:29 EDT 2010


On 05/31/2010 12:16 PM, M L wrote:
> Specifically, I'm needing to login to a particular website,
> and which username I use varies based on it being specified
> in the email.

Are you in control of this email generation (that is, can you 
generate an email with an HTML form within, or is this email 
coming from a 3rd-party)?

Creating an HTML-email with a form to submit the 
username/password isn't a flawless solution because many folks 
(self included) configure their mail-reader to only display 
plain-text and ignore HTML components.

> My real problem is how to get the username and password
> passed to the browser.

If you can't create an HTML form in the source email, then it 
depends on what the web-server is expecting -- a GET (bad for 
users, good for you) or a POST (good web practice, but a pain for 
you).  If the web login form is maldesigned and uses a GET 
submission, you can just parse the email body for the fields and 
generate a link of the form

   http://example.com/login?user=jsmith&pass=SeKrEt

However, if the website creates expects a POST to login (good 
design so credentials don't get saved in history, or get 
bookmarked accidentally), then you have to either

1) do some dark browser-specific hackery, perhaps with a bit of 
urllib magic to kludge the session into an active browser.  Not a 
particularly inviting solution to implement.

2) generate a temporary HTML file with the prepopulated form in 
it, point the browser at that page and either (2a) have the user 
click on the [submit] button, and/or (2b) have a little 
JavaScript that clicks the [submit] button (or calls 
form.submit() more likely) after the temp-page is loaded.  I'd do 
both in the event the user has JS turned off in their browser 
(again, that'd be me, thanks to NoScript).  This temporary HTML 
file could be scraped (via urllib) from the login url itself, or 
hard-coded if you expect it to be of the same format for each 
website.

My crack at it looks something like
##########################
from sys import exit, stderr
from tempfile import NamedTemporaryFile
import email
import imaplib
import os
import re
import time
import urllib
import webbrowser

url_re = re.compile(r'\burl:\s*(http://.*)')
user_re = re.compile(r'\buser(?:name)?:\s*(.*)')
pass_re = re.compile(r'\bpass(?:word)?:\s*(.*)')

class MissingField(Exception): pass

IMAP = imaplib.IMAP4_SSL

def get_email(host, username, password):
   # ...
   for message in messages_I_care_about:
       yield message

def fields(msg):
   url_m = url_re.search(msg)
   if not url_m: raise MissingField("No URL")
   user_m = user_re.search(msg)
   if not user_m: raise MissingField("No username")
   pass_m = pass_re.search(msg)
   if not pass_m: raise MissingField("No password")
   return [m.group(1).strip() for m in (url_m, user_m, pass_m)]

def create_temp_html(url, username, password):
   f = NamedTemporaryFile(mode='w', suffix='.html')
   # HTML hard-coded here, but could be
   # scraped from the site, parsed with BeautifulSoup
   # searched for the form/uname/pwd values
   # and more programatically generated
   # but this is the lazy version you get for free ;-)
   f.write("""<html>
   <head>
   <title>Some Title</title>
   </head>
   <body onload="getElementById('f').submit();">
    Hang on...time to log in...
    <form id="f" method="POST" url="%s">
     <input type="text" value="%s" name="username"><br>
     <input type="password" value="%s" name="password"><br>
     <input type="submit" value="Log In">
    </form>
   </body>
   </html>
""" % (
     url,
     urllib.quote_plus(username),
     urllib.quote_plus(password),
     )
   )
   f.flush()
   return f

if __name__ == "__main__":
   HOST = 'mail.example.com'
   USER = 'email_username at example.com'
   PASS = 'SecretEmailPassword'
   EXPECTED_SENDER = 'somebody at example.net'

   for message in get_email(HOST, USER, PASS):
     msg = email.message_from_string(message)
     # if you don't want to limit the sender
     # delete/comment the next 3 lines
     if EXPECTED_SENDER not in msg['from'].lower():
       print "Unexpected sender...ignoring %r" % msg['subject']
       continue

     for part in msg.walk():
       # you may not want to skip HTML portions or other
       # MIME-types like attachments, but whatever
       if part.get_content_type() != 'text/plain': continue

       try:
         url, username, password = fields(msg.get_payload())
         print url, username, password
       except MissingField, e:
         print e
         continue
       f = create_temp_html(url, username, password)
       stderr.write(
         "Opening %r in %s\n" %
         (f.name, webbrowser.get().basename.title()))
       webbrowser.open(f.name)
       time.sleep(30) # wait for the browser to load the file
       # otherwise this .close() will delete it
       # before the web-browser could open it
       f.close()
##########################

Adjust regexps to find your URL/uname/pwd as desired, create the 
get_email() iterator that finds all the messages in your inbox 
that match your criteria (such as "not already seen, has XYZ in 
the subject, etc")

I'm not 100% sure of my JavaScript in the form.onload but you can 
also tweak that if your JS is enabled and you want to tinker with 
it for auto-login.

-tkc







More information about the Python-list mailing list