urllib ??

Alan Kennedy alanmk at hotmail.com
Mon Nov 4 09:49:08 EST 2002


Rainer wrote:
> 
> hi alan
> this is the problem, maybe i put it the wrong way.
> i DO NOT get the results from the server. the thing that is stored in
> 'results', is the 'query-webpage' with the values filled in, not the
> results page.

OK, I think I see your peoblem now.

You're submitting data to the server almost correctly, but the server is
just returning the input form to you, as if you had not filled out all
of the fields correctly.

And I think that this is one of the things you are doing wrong: not
filling in the fields correctly. The problem lines are these

    d['.cgifields']='ContAllHits'
    d['.cgifields']='SortByIde'
    d['.cgifields']='StructAllHits'
    d['.cgifields']='Map2Mismatch'
    d['.cgifields']='Dbase3d'

HTTP allows for multiple values for the same field in a POST request.
However, python dictionaries do NOT: python dicts can have only one
value for every key. Try the following at an interactive prompt

>>>d = {}
>>>d['.cgifields']='ContAllHits'
>>>d
{'.cgifields': 'ContAllHits'}
>>>d['.cgifields']='SortByIde'
>>>d
{'.cgifields': 'SortByIde'}

You're expecting it to maintain multiple values for the key
".cgifields", but it doesn't: the value gets overwritten each time,
meaning that you are submitting POST data that only includes the value
'.cgifields'='Dbase3d' (the last assignment in the list), and failing to
supply values for the (presumably necessary) 'ContAllHits', 'SortByIde',
'StructAllHits', 'Map2Mismatch'.

The solution is to pass a list of tuples to "urlencode" instead of a
dictionary, like so

#=-----------------------------------------------------

def query2_polyphen():

    d=[
    ('InputQuerySequence', '>GGG \012 sequence'),
    ('Var1', 'C'),
    ('InputQueryPosition', '282'),
    ('Var2', 'Y'),
    ('Dbase3d', 'pqs'),
    ('SortByIde', '1'),
    ('Map2Mismatch', '0'),
    ('StructAllHits', '0'),
    ('ContAllHits', '0'),
    ('MinHitLen', '100'),
    ('MinHitIde', '0.5'),
    ('MaxHitGaps', '20'),
    ('ContThresh', '6'),
    ('.cgifields', 'ContAllHits'),
    ('.cgifields', 'SortByIde'),
    ('.cgifields', 'StructAllHits'),
    ('.cgifields', 'Map2Mismatch'),
    ('.cgifields', 'Dbase3d'),
    ]
    query=urllib.urlencode(d)
    url = 'http://tux.embl-heidelberg.de/ramensky/polyphen.cgi'
    results = urllib.urlopen(url,query).read()
    open ('rainer.html','w').write(results)
#=-----------------------------------------------------

However, I also notice, by manually submitting your form data using a
browser, that the data you are submitting give rise to an error. I don't
know enough about molecular biology to know what the error is, but I
suggest that you get your submitted data right manually before trying to
submit it automatically.

Also, you might send an email to the provider of the polyphen service
asking them to somehow make it possible for automated submission scripts
to recognise when a query has failed, rather than simply sending the
form again. A return HTTP status value in the 4xx series might be
suitable, e.g. of the "404 - file not found" variety.

HTH,

alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan:              http://xhaus.com/mailto/alan



More information about the Python-list mailing list