re beginner

John Machin sjmachin at lexicon.net
Sun Jun 4 19:01:46 EDT 2006


On 5/06/2006 10:38 AM, Bruno Desthuilliers wrote:
> SuperHik a écrit :
>> hi all,
>>
>> I'm trying to understand regex for the first time, and it would be 
>> very helpful to get an example. I have an old(er) script with the 
>> following task - takes a string I copy-pasted and wich always has the 
>> same format:
>>
>>  >>> print stuff
>> Yellow hat    2    Blue shirt    1
>> White socks    4    Green pants    1
>> Blue bag    4    Nice perfume    3
>> Wrist watch    7    Mobile phone    4
>> Wireless cord!    2    Building tools    3
>> One for the money    7    Two for the show    4
>>
>>  >>> stuff
>> 'Yellow hat\t2\tBlue shirt\t1\nWhite socks\t4\tGreen pants\t1\nBlue 
>> bag\t4\tNice perfume\t3\nWrist watch\t7\tMobile phone\t4\nWireless 
>> cord!\t2\tBuilding tools\t3\nOne for the money\t7\tTwo for the show\t4'
>>
>> I want to put items from stuff into a dict like this:
>>  >>> print mydict
>> {'Wireless cord!': 2, 'Green pants': 1, 'Blue shirt': 1, 'White 
>> socks': 4, 'Mobile phone': 4, 'Two for the show': 4, 'One for the 
>> money': 7, 'Blue bag': 4, 'Wrist watch': 7, 'Nice perfume': 3, 'Yellow 
>> hat': 2, 'Building tools': 3}
>>
>> Here's how I did it:
>>  >>> def putindict(items):
>> ...     items = items.replace('\n', '\t')
>> ...     items = items.split('\t')
>> ...     d = {}
>> ...     for x in xrange( len(items) ):
>> ...         if not items[x].isdigit(): d[items[x]] = int(items[x+1])
>> ...     return d
>>  >>>
>>  >>> mydict = putindict(stuff)
>>
>>
>> I was wondering is there a better way to do it using re module?
>> perheps even avoiding this for loop?
> 
> There are better ways. One of them avoids the for loop, and even the re 
> module:
> 
> def to_dict(items):
>     items = items.replace('\t', '\n').split('\n')

In case there are leading/trailing spaces on the keys:

items = [x.strip() for x in items.replace('\t', '\n').split('\n')]

>     return dict(zip(items[::2], map(int, items[1::2])))
> 
> HTH

Fantastic -- at least for the OP's carefully copied-and-pasted input.
Meanwhile back in the real world, there might be problems with multiple 
tabs used for 'prettiness' instead of 1 tab, non-integer values, etc etc.
In that case a loop approach that validated as it went and was able to 
report the position and contents of any invalid input might be better.



More information about the Python-list mailing list