Newbie with sort text file question

Behrang Dadsetan ben at dadsetan.com
Mon Jul 14 17:51:42 EDT 2003


Hi Stuart,

You can try some regexp with the re module. I wrote you an example in 
the newgroup.

Regards, Ben.

stuartc wrote:
> Hi Bengt:
> 
> Thank you. Your code worked perfectly based on the text file I
> provided.
> 
> Unfortunately for me, my real text file has one slight variation that
> I did not account for.  That is, the fruit name does not always have
> an "_" after its name.  For example, apple below does not an an "_"
> attached to it.
> 
> banana_c \\yellow
> apple   \\green
> orange_b \\yellow
> 
> 
> This variation in my text file caused a problem with the program.
> Here's the error.
> 
> Traceback (most recent call last):
>   File "G:/Python22/Sort_Fruit.py", line 47, in ?
>     for fruit, dummyvar in fruitlist: fruitfreq[fruit] =
> fruitfreq.get(fruit, 0)+1
> ValueError: unpack list of wrong size
> 
> I tried to debug and fix this variation, but I wasn't able to.  I did
> notice that your split, splits each line in the file into two fields,
> as long as there's an "_" with a fruit name.  If the fruit name does
> not have an "_", then the split does not occur. I think this is
> related to the problem, but I couldn't figure out how to fix it.
> 
> Any help will be greatly appreciated. Thanks.
> 
> - Stuart
> 
> 
> 
> bokr at oz.net (Bengt Richter) wrote in message news:<beq357$thj$0 at 216.39.172.122>...
> 
>>On 12 Jul 2003 12:46:51 -0700, stuart_clemons at us.ibm.com (stuartc) wrote:
>>
>>
>>>Hi:
>>>
>>>I'm not a total newbie, but I'm pretty green.  I need to sort a text
>>>file and then get a total for the number of occurances for a part of
>>>the string. Hopefully, this will explain it better:
>>>
>>>Here's the text file: 
>>>
>>>banana_c \\yellow
>>>apple_a \\green
>>>orange_b \\yellow
>>>banana_d \\green
>>>orange_a \\orange
>>>apple_w \\yellow
>>>banana_e \\green
>>>orange_x \\yellow
>>>orange_y \\orange
>>>
>>>I would like two output files:
>>>
>>>1) Sorted like this, by the fruit name (the name before the dash)
>>>
>>>apple_a \\green
>>>apple_w \\yellow
>>>banana_c \\yellow
>>>banana_d \\green
>>>banana_e \\green
>>>orange_a \\orange
>>>orange_b \\yellow
>>>orange_x \\yellow
>>>orange_y \\orange
>>>
>>>2) Then summarized like this, ordered with the highest occurances
>>>first:
>>>
>>>orange occurs 4
>>>banana occurs 3
>>>apple occurs 2
>>>
>>>Total occurances is 9
>>>
>>>Thanks for any help !
>>
>>===< stuartc.py >========================================================
>>import StringIO
>>textf = StringIO.StringIO(r"""
>>banana_c \\yellow
>>apple_a \\green
>>orange_b \\yellow
>>banana_d \\green
>>orange_a \\orange
>>apple_w \\yellow
>>banana_e \\green
>>orange_x \\yellow
>>orange_y \\orange
>>""")
>>
>># I would like two output files:
>># (actually two files ?? Ok)
>>
>># 1) Sorted like this, by the fruit name (the name before the dash)
>>
>>fruitlist = [line.split('_',1) for line in textf if line.strip()]
>>fruitlist.sort()
>>
>># apple_a \\green
>># apple_w \\yellow
>># banana_c \\yellow
>># banana_d \\green
>># banana_e \\green
>># orange_a \\orange
>># orange_b \\yellow
>># orange_x \\yellow
>># orange_y \\orange
>>
>>outfile_1 = StringIO.StringIO()
>>outfile_1.write(''.join(['_'.join(pair) for pair in fruitlist]))
>>
>># 2) Then summarized like this, ordered with the highest occurances
>># first:
>>
>># orange occurs 4
>># banana occurs 3
>># apple occurs 2
>>
>>outfile_2 = StringIO.StringIO()
>>fruitfreq = {}
>>for fruit, dummyvar in fruitlist: fruitfreq[fruit] = fruitfreq.get(fruit, 0)+1
>>fruitfreqlist = [(occ,name) for name,occ in fruitfreq.items()]
>>fruitfreqlist.sort()
>>fruitfreqlist.reverse()
>>outfile_2.write('\n'.join(['%s occurs %s'%(name,occ) for occ,name in fruitfreqlist]+['']))
>>
>># Total occurances is 9
>>print >> outfile_2,"Total occurances [sic] is [sic] %s" % reduce(int.__add__, fruitfreq.values())
>>
>>## show results
>>print '\nFile 1:\n------------\n%s------------' % outfile_1.getvalue()
>>print '\nFile 2:\n------------\n%s------------' % outfile_2.getvalue()
>>=========================================================================
>>executed:
>>
>>[15:52] C:\pywk\clp>stuartc.py
>>
>>File 1:
>>------------
>>apple_a \\green
>>apple_w \\yellow
>>banana_c \\yellow
>>banana_d \\green
>>banana_e \\green
>>orange_a \\orange
>>orange_b \\yellow
>>orange_x \\yellow
>>orange_y \\orange
>>------------
>>
>>File 2:
>>------------
>>orange occurs 4
>>banana occurs 3
>>apple occurs 2
>>Total occurances [sic] is [sic] 9
>>------------
>>
>>Is that what you wanted?
>>
>>Regards,
>>Bengt Richter
> 





More information about the Python-list mailing list