[Tutor] Performance Issue

Alan Gauld alan.gauld at yahoo.co.uk
Thu Oct 18 06:53:13 EDT 2018


   Cc'ing list. Please use reply all on responses to tutor.
   If you have no control over the server, eh access to logs etc, then the
   best you can do it record the time just before sending the request and
   immediately you get the reply. That part is outside your control. If the
   remaining time is worth optimising then look to your code.
   As to the server time, it should be in the http headers so you don't need
   to parse the html, just read the headers. Much faster.
   HRH,
   Alan g.
   On 18 Oct 2018 10:40 am, User2002 <user2002 at comcast.net> wrote:

     Thank you for your thoughtful reply. There are some good ideas in there
     for
     me.

     I have asked and there either is no API or they do not want outsiders to
     have access to it, so I think that is a dead end.

     For my purposes, improved performance focuses on significantly reducing
     the
     time required to successfully execute the
     br.find_element_by_link_text(str(day_to_book)).click() command. If my
     overall time to successfully book a reservation is 1.8 seconds and
     roughly 2
     seconds is spent on this single instruction, then a .5 second
     improvement
     represents a 25% reduction. I have a second command (same type, just on
     the
     next iframe) that is similarly slow. So fixing both could represent a
     50%
     reduction. Given the demand for the reservations (there are literally
     hundreds of people out there pounding their keyboard/clicking on fields)
     every second counts. (With an automated ability to book a reservation, I
     am
     probably faster than anyone's ability to click on a field, wait for a
     reply,
     reposition the cursor, click again, etc., but I am the point where this
     has
     become something I am fully invested in and would like to take as far as
     I
     can.)

     Most of your ideas center around the notion of knowing more about where
     the
     time delay occurs in the processing steps that occur outside on my world
     -
     communication back and forth to the server, etc. I must confess, I have
     no
     idea of how to do this. How can I measure what goes on outside my
     machine
     and measure the component parts? If you have an idea in this area or
     could
     refer me to where I could go to read and learn, I'd be very grateful.

     Finally, regarding your notion of web scraping, server clock, etc.
     Literally
     the only thing I 'scrape' is the server time to ensure I click on the
     date
     field at exactly 7:00:00. Once I get to that point, I click on a date
     field,
     then I click on a time field and I am done - no scraping occurs once it
     reaches 7:00:00. So I am not sure there are improvements to be made in
     that
     area.

     -----Original Message-----
     From: Tutor <tutor-bounces+user2002=comcast.net at python.org> On Behalf Of
     Alan Gauld via Tutor
     Sent: Wednesday, October 17, 2018 8:12 PM
     To: tutor at python.org
     Subject: Re: [Tutor] Performance Issue

     On 17/10/18 22:25, Stephen Smith wrote:
     > I have written a screen scraping program that watches a clock (on the
     > app's
     > server) and at 7:00:00 AM dashes to make a reservation on line. It
     > works fine. However, i have spent time trying to improve its
     > performance. I am using selenium, with chrome driver.

     When doing performance tuning the first thing to answer is what does
     improved performance mean. For example in a Word Processor improving the
     speed that an input character appears on screen by 10% is unlikely to be
     a
     worthwhile exercise. But improving the time taken to do a global
     search/replace by 10% might well be worthwhile.

     So what do you want to improve about an app that spends most of its time
     waiting for a change on a remote server (presumably by polling?) Is it
     the
     speed/frequency of polling? The speed of reading the response? The speed
     of
     processing the response?

     And knowing what you want to improve have you measured it to see where
     the
     time is being spent? Is it in the client request? The transmission to
     the
     server? the server processing? the transmission from the server? the
     reading
     of that response? or the processing of that response? You need to time
     each
     of those phases accurately to find out which bits are worth improving.

     > Here is what i have learned. I have tried various methods to find (by
     > link_text, by_xpath, etc.) and click on the element in question (shown
     > below). When i find the element with no click, the find process takes
     > about
     > .02 seconds. When i find it with a click (i need to select the element
     > and move to the next iframe) it takes over a second. I get these same
     > results no matter which find_element_by variation i use and i get the
     > same times in headless or normal mode.
     >
     > Here is my theory - finding the element is relatively simple in the
     > html already loaded into my machine - hence .02 seconds. However, when
     > i click on the element, processing goes out to the server which does
     > some stuff and i get a new iframe displayed, all of which takes time.

     Absolutely. network access is likely to be measured in 10ths of a second
     rather than hundredths. And processing the request may well entail a
     server
     database call (which may itself be on a separate machine from the web
     server
     with a corresponding LAN message delay), then there's the creation and
     transmission of the HTML (unless your server provides an API with JSON
     responses - but then you don't need clicks etc!) And iFrames make that
     worse
     since every iframe effectively gets treated as a separate html document.

     Then when your client receives the data it has to reparse the html into
     a
     document structure before performing the search.

     > concluded that perhaps I can't take a big chunk of that time out

     You probably can, but only if you have access to the server code and the
     network infrastructure and deep enough pockets for a server upgrade or a
     new
     proxy server. Assuming that's not the case then no, you need to look at
     other options.

     But your first step has to be to measure the various stages of the
     request.
     If the problem lies in the transmission time across the network there is
     probably not much you can do. If its in the database access (trickier to
     measure if you don't have the server code - you need to create some
     simultaneous equations using multiple test scenarios) then you might be
     able
     to construct better queries (eg look at a different page or only query
     the
     target iframe).

     > considered something other than selenium, but since i think the
     > problem lies on the server side, not sure it is worth the time.

     It depends on the nature of the page. The best solution, by far, is not
     to
     do web scraping. Its always the worst case solution and to be avoided if
     at
     all possible. Try to find an API with JSON or XML responses.

     Also, are you sure you need to use the clock on the page?
     Isn't the server clock adequate? In which case the response time should
     be
     in every message header so there's no need for web scraping at all...

     Finally, I think there is an active Selenium discussion forum so you
     could
     try there for more ideas.

     --
     Alan G
     Author of the Learn to Program web site
     http://www.alan-g.me.uk/
     http://www.amazon.com/author/alan_gauld
     Follow my photo-blog on Flickr at:
     http://www.flickr.com/photos/alangauldphotos

     _______________________________________________
     Tutor maillist  -  Tutor at python.org
     To unsubscribe or change subscription options:
     https://mail.python.org/mailman/listinfo/tutor


More information about the Tutor mailing list