Help with stale exception Python

iverson.zhou at gmail.com iverson.zhou at gmail.com
Mon Dec 7 03:03:34 EST 2015


I'm new to Python and programming. Been learning it for 3 weeks now but have had lot of obstacles along the way. I found some of your insights very useful as a starter but I have come across many more complicated challenges that aren't very intuitive.

For example,I'm trying to scrap this web(via university library (fully access) so it is a proxy) using selenium (because it is very heavily java script driven). There is a button which allows user to navigate to the next page of company and my script go and find the elements of interest from each page write to a csv and then click to the next page and do it recursively. I have a couple of problems need some help with. Firstly the element that I'm really interested is only company website(which isn't always there) but when it is there the location of the element can change all the time(see http://pasteboard.co/2GOHkbAD.png and http://pasteboard.co/2GOK2NBT.png) depending on the number of elements in the parent level. I'm using driver.find_elements_by_xpath("//*[@id='wrapper']/div[2]/div[2]/div/div[2]/div[1]/div[3]/div[1]/div[2]/div/div[2]/div/div[2]/div/div[position() = 1 or position() = 2 or position() = 3]")

hoping to capture all information(e.g. phone,email,website) and then do some cleansing later on However, it appears not all the web elements are captured using this method and write to csv from each page. Some pages were written to the file but some were missing. I couldn't figure it out why.

A second problem which is a more complicate and have been driving me nuts was the the DOM changes as a result of web content changes and elements are destroyed and/maybe being recreated after driver.find_element_by_id('detail-pagination-next-btn').click()

I have tried uncountable number of methods (e.g. explicit, implicit wait) but the stale error still persists as it seems to stays stale as long as it is staled.

Have anyone come up with a solution to this and what is the best way to deal with DOM tree changes.

Much appreciated for your help. My code is attached:


with open('C:/Python34/email.csv','w') as f:
z=csv.writer(f, delimiter='\t',lineterminator = '\n',)
while True:
        row = []
        for link in driver.find_elements_by_xpath("//*[@id='wrapper']/div[2]/div[2]/div/div[2]/div[1]/div[3]/div[1]/div[2]/div/div[2]/div/div[2]/div/div[position() = 1 or position() = 2 or position() = 3]"):
            try:
                row.append(str(link.text))
                z.writerow(link.text)
                WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.XPATH,'//*[@id="detail-pagination-next-btn"]/span')))
                WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.XPATH,'//*[@id="detail-pagination-next-btn"]')))
                WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.ID,'detail-pagination-next-btn')))
                WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.ID,'detail-pagination-next-btn')))
                WebDriverWait(driver, 50).until(EC.presence_of_all_elements_located((By.XPATH,"//*[@id='wrapper']/div[2]/div[2]/div/div[2]/div[1]/div[3]/div[1]/div[2]/div/div[2]/div/div[2]/div/div[position() = 1 or position() = 2 or position() = 3]")))
                time.sleep(10)
                c=driver.find_element_by_id('detail-pagination-next-btn')
                WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.XPATH,'//*[@id="detail-pagination-next-btn"]/span')))
                WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.XPATH,'//*[@id="detail-pagination-next-btn"]')))
                WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.ID,'detail-pagination-next-btn')))
                WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.ID,'detail-pagination-next-btn')))
                WebDriverWait(driver, 50).until(EC.presence_of_all_elements_located((By.XPATH,"//*[@id='wrapper']/div[2]/div[2]/div/div[2]/div[1]/div[3]/div[1]/div[2]/div/div[2]/div/div[2]/div/div[position() = 1 or position() = 2 or position() = 3]")))
                c.click()
                time.sleep(10)
                continue
            except StaleElementReferenceException as e:
                c=driver.find_element_by_id('detail-pagination-next-btn')
                for link in driver.find_elements_by_xpath("//*[@id='wrapper']/div[2]/div[2]/div/div[2]/div[1]/div[3]/div[1]/div[2]/div/div[2]/div/div[2]/div/div[position() = 1 or position() = 2 or position() = 3]"):
                    row.append(str(link.text))
                    z.writerow(link.text)
                    WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.XPATH,'//*[@id="detail-pagination-next-btn"]/span')))
                    WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.XPATH,'//*[@id="detail-pagination-next-btn"]')))
                    WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.ID,'detail-pagination-next-btn')))
                    WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.ID,'detail-pagination-next-btn')))
                    WebDriverWait(driver, 50).until(EC.presence_of_all_elements_located((By.XPATH,"//*[@id='wrapper']/div[2]/div[2]/div/div[2]/div[1]/div[3]/div[1]/div[2]/div/div[2]/div/div[2]/div/div[position() = 1 or position() = 2 or position() = 3]")))
                    time.sleep(10)
                    c=driver.find_element_by_id('detail-pagination-next-btn')
                    WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.XPATH,'//*[@id="detail-pagination-next-btn"]/span')))
                    WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.XPATH,'//*[@id="detail-pagination-next-btn"]')))
                    WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.ID,'detail-pagination-next-btn')))
                    WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.ID,'detail-pagination-next-btn')))
                    WebDriverWait(driver, 50).until(EC.presence_of_all_elements_located((By.XPATH,"//*[@id='wrapper']/div[2]/div[2]/div/div[2]/div[1]/div[3]/div[1]/div[2]/div/div[2]/div/div[2]/div/div[position() = 1 or position() = 2 or position() = 3]")))
                    c.click()
                    time.sleep(10)


much appreciated
Iverson



More information about the Python-list mailing list