[Tutor] Performance Issue

Alan Gauld alan.gauld at yahoo.co.uk
Wed Oct 17 20:12:28 EDT 2018


On 17/10/18 22:25, Stephen Smith wrote:
> I have written a screen scraping program that watches a clock (on the app's
> server) and at 7:00:00 AM dashes to make a reservation on line. It works
> fine. However, i have spent time trying to improve its performance. I am
> using selenium, with chrome driver. 

When doing performance tuning the first thing to answer
is what does improved performance mean. For example in a Word Processor
improving the speed that an input character appears on screen by 10% is
unlikely to be a worthwhile exercise. But improving the time taken to do
a global search/replace by 10% might well be worthwhile.

So what do you want to improve about an app that spends most
of its time waiting for a change on a remote server (presumably
by polling?) Is it the speed/frequency of polling? The speed of reading
the response? The speed of processing the response?

And knowing what you want to improve have you measured it to
see where the time is being spent? Is it in the client request? The
transmission to the server? the server processing? the transmission
from the server? the reading of that response? or the processing
of that response? You need to time each of those phases accurately
to find out which bits are worth improving.

> Here is what i have learned. I have tried various methods to find (by
> link_text, by_xpath, etc.) and click on the element in question (shown
> below). When i find the element with no click, the find process takes about
> .02 seconds. When i find it with a click (i need to select the element and
> move to the next iframe) it takes over a second. I get these same results no
> matter which find_element_by variation i use and i get the same times in
> headless or normal mode.
> 
> Here is my theory - finding the element is relatively simple in the html
> already loaded into my machine - hence .02 seconds. However, when i click on
> the element, processing goes out to the server which does some stuff and i
> get a new iframe displayed, all of which takes time. 

Absolutely. network access is likely to be measured in 10ths of a
second rather than hundredths. And processing the request may
well entail a server database call (which may itself be on a separate
machine from the web server with a corresponding LAN message delay),
then there's the creation and transmission of the HTML (unless your
server provides an API with JSON responses - but then you don't
need clicks etc!) And iFrames make that worse since every iframe
effectively gets treated as a separate html document.

Then when your client receives the data it has to reparse
the html into a document structure before performing the search.


> concluded that perhaps I can't take a big chunk of that time out

You probably can, but only if you have access to the server
code and the network infrastructure and deep enough pockets
for a server upgrade or a new proxy server. Assuming that's
not the case then no, you need to look at other options.

But your first step has to be to measure the various stages
of the request. If the problem lies in the transmission
time across the network there is probably not much you
can do. If its in the database access (trickier to measure
if you don't have the server code - you need to create
some simultaneous equations using multiple test scenarios)
then you might be able to construct better queries (eg look
at a different page or only query the target iframe).

> considered something other than selenium, but since i think the problem lies
> on the server side, not sure it is worth the time.

It depends on the nature of the page. The best solution,
by far, is not to do web scraping. Its always the worst
case solution and to be avoided if at all possible. Try
to find an API with JSON or XML responses.

Also, are you sure you need to use the clock on the page?
Isn't the server clock adequate? In which case the
response time should be in every message header so there's
no need for web scraping at all...

Finally, I think there is an active Selenium discussion
forum so you could try there for more ideas.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos




More information about the Tutor mailing list