web scraping help / better way to do it ?

Matt matt at centralkaos.com
Mon Jan 18 17:00:33 EST 2016


Beginner python user (3.5) and trying to scrape this page and get the ladder
-   www.afl.com.au/ladder .  Its dynamic content so I used lynx -dump to get
a  txt file and parsing that.

Here is the code 

# import lynx -dump txt file
f = open('c:/temp/afl2.txt','r').read()

# Put import txt file into list 
afl_list = f.split(' ')

#here are the things we want to search for
search_list = ['FRE', 'WCE', 'HAW', 'SYD', 'RICH', 'WB', 'ADEL', 'NMFC',
'PORT', 'GEEL', 'GWS', 'COLL', 'MELB', 'STK', 'ESS', 'GCFC', 'BL', 'CARL']

def build_ladder():
    for l in search_list:
        output_num = afl_list.index(l)
        list_pos = output_num -1
        ladder_pos = afl_list[list_pos]
        print(ladder_pos + ' ' + '-' + ' ' + l)

build_ladder()


Which outputs this.

1 - FRE
2 - WCE
3 - HAW
4 - SYD
5 - RICH
6 - WB
7 - ADEL
8 - NMFC
9 - PORT
10 - GEEL
* - GWS
12 - COLL
13 - MELB
14 - STK
15 - ESS
16 - GCFC
17 - BL
18 - CARL

Notice that number 11 is missing because my script picks up "GWS" which is
located earlier in the page.  What is the best way to skip that (and get the
"GWS" lower down in the txt file) or am I better off approaching the code in
a different way?


TIA

Matt







More information about the Python-list mailing list