Best way to clean up list items?

Jussi Piitulainen jussi.piitulainen at helsinki.fi
Mon May 2 14:27:17 EDT 2016


DFS writes:

> On 5/2/2016 12:57 PM, Jussi Piitulainen wrote:
>> DFS writes:
>>
>>> Have: list1 = ['\r\n   Item 1  ','  Item 2  ','\r\n  ']
>>> Want: list1 = ['Item 1','Item 2']

. .

>> Funny-looking data you have.
>
> I know - sadly, it's actual data:
>
> --------------------------------------------------------------------
> from lxml import html
> import requests
>
> webpage =
> "http://www.usdirectory.com/ypr.aspx?fromform=qsearch&qs=TN&wqhqn=2&qc=Nashville&rg=30&qhqn=restaurant&sb=zipdisc&ap=2"
>
> page  = requests.get(webpage)
> tree  = html.fromstring(page.content)
> addr1 = tree.xpath('//span[@class="text3"]/text()')
> print 'Addresses: ', addr1
> --------------------------------------------------------------------
>
> I couldn't figure out a better way to extract it from the HTML (maybe
> XML and DOM?)

I should have guessed :) But now I'm a bit worried about those spaces
inside your items. Can it happen that item text is split into strings in
the middle? Then the above sanitation does the wrong thing.

If someone has the right solution, I'm watching, too.



More information about the Python-list mailing list