Unable to Debug

Peter Otten __peter__ at web.de
Mon Jan 2 04:37:45 EST 2017


Aritra Bhattacharjee wrote:

> I am new to python programming. I wrote a code to search for the product
> names on a page of snapdeal.com .
> 
> Code:
> import urllib.request
> from bs4 import BeautifulSoup as BS
> 
> url =
> 'https://www.snapdeal.com/products/electronics-headphones?sort=plrty'
> 
> response = urllib.request.urlopen(url).read()
> soup = BS(response, 'html.parser')
> 
> #for dataprice in soup.find_all('span', class_="lfloat product-price"):
> #    print(dataprice)
> product_name={}
> i=0
> for title in soup.find_all('p', class_="product-title"):
>         product_name[i]=title.string
>         i += 1
> for i in range(1,21):
>     print(product_name[i])
> 
> 
> Output:
> Traceback (most recent call last):
> Motorola Pulse 2 Over Ear Wired Headphones With Mic (Black)
> Bose SoundLink On-Ear Bluetooth Headphones - Black & Blue
> Sony MDR-ZX110A Headphones Without Mic (White)
> Philips SBCHL140/98 Over Ear Headphone Without Mic
>   File "C:/Users/Aritra Bhattacharjee/PycharmProjects/PythonWebModules/Web
>   Scraper.py", line 17, in <module>
> Intex Desire BT Over Ear Wired With Mic Headphone Black
>     print(product_name[i])
> JBL T450 On Ear Wired Headphones With Mic Black
> KeyError: 20

For the script above the output should end with the line above

[snip]

> The Output shows some error that I could not understand why...Thanks in
> Advance

You define a dict product_name and then successively add entries with

> i=0
> for title in soup.find_all('p', class_="product-title"):
>         product_name[i]=title.string
>         i += 1

This means that the first key will be 0, the second will be 1, and so on up 
to 19. In the for loop you are looking for the keys 1 to 20,

> for i in range(1,21):
>     print(product_name[i])

so the first match will not be printed, and when you ask for 
product_name[20] you will get the KeyError and the traceback.

How to fix your code? You could rewrite the for loop as

for i in range(len(product_name)):
    print(product_name[i])

which because of len() will work for pages that feature more or less than 20 
products. If you plan to remove products from the dict you have to cope with 
missing indices. On way to do this is iterate to over the values alone:

for name in product_name.values():
    print(name)

If you want to preserve the order on the html page you can iterate over the 
sorted key/value pairs:

for i, name in sorted(product_name.items()):
    print(i, name)




More information about the Python-list mailing list