[Tutor] String Attribute

ltc.hotspot at gmail.com ltc.hotspot at gmail.com
Fri Jul 31 16:39:46 CEST 2015





Hi Alan,





Here is the revised code below:




fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
for line in fh:
    if not line.startswith('From'): continue
    line2 = line.strip()
    line3 = line2.split()
    line4 = line3[1]
    addresses = set()
    addresses.add(line4)
    count = count + 1 
    print addresses
print "There were", count, "lines in the file with From as the first word"










The code produces the following out put:




In [15]: %run _8_5_v_13.py
Enter file name: mbox-short.txt
set(['stephen.marquard at uct.ac.za'])
set(['stephen.marquard at uct.ac.za'])
set(['louis at media.berkeley.edu'])
set(['louis at media.berkeley.edu'])
set(['zqian at umich.edu'])
set(['zqian at umich.edu'])
set(['rjlowe at iupui.edu'])
set(['rjlowe at iupui.edu'])
set(['zqian at umich.edu'])
set(['zqian at umich.edu'])
set(['rjlowe at iupui.edu'])
set(['rjlowe at iupui.edu'])
set(['cwen at iupui.edu'])
set(['cwen at iupui.edu'])
set(['cwen at iupui.edu'])
set(['cwen at iupui.edu'])
set(['gsilver at umich.edu'])
set(['gsilver at umich.edu'])
set(['gsilver at umich.edu'])
set(['gsilver at umich.edu'])
set(['zqian at umich.edu'])
set(['zqian at umich.edu'])
set(['gsilver at umich.edu'])
set(['gsilver at umich.edu'])
set(['wagnermr at iupui.edu'])
set(['wagnermr at iupui.edu'])
set(['zqian at umich.edu'])
set(['zqian at umich.edu'])
set(['antranig at caret.cam.ac.uk'])
set(['antranig at caret.cam.ac.uk'])
set(['gopal.ramasammycook at gmail.com'])
set(['gopal.ramasammycook at gmail.com'])
set(['david.horwitz at uct.ac.za'])
set(['david.horwitz at uct.ac.za'])
set(['david.horwitz at uct.ac.za'])
set(['david.horwitz at uct.ac.za'])
set(['david.horwitz at uct.ac.za'])
set(['david.horwitz at uct.ac.za'])
set(['david.horwitz at uct.ac.za'])
set(['david.horwitz at uct.ac.za'])
set(['stephen.marquard at uct.ac.za'])
set(['stephen.marquard at uct.ac.za'])
set(['louis at media.berkeley.edu'])
set(['louis at media.berkeley.edu'])
set(['louis at media.berkeley.edu'])
set(['louis at media.berkeley.edu'])
set(['ray at media.berkeley.edu'])
set(['ray at media.berkeley.edu'])
set(['cwen at iupui.edu'])
set(['cwen at iupui.edu'])
set(['cwen at iupui.edu'])
set(['cwen at iupui.edu'])
set(['cwen at iupui.edu'])
set(['cwen at iupui.edu'])
There were 54 lines in the file with From as the first word







Question no. 1: is there a build in function for set that parses the data for duplicates.




In [18]: dir (set)
Out[18]:
['__and__',
 '__class__',
 '__cmp__',
 '__contains__',
 '__delattr__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__iand__',
 '__init__',
 '__ior__',
 '__isub__',
 '__iter__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__or__',
 '__rand__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__ror__',
 '__rsub__',
 '__rxor__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__xor__',
 'add',
 'clear',
 'copy',
 'difference',
 'difference_update',
 'discard',
 'intersection',
 'intersection_update',
 'isdisjoint',
 'issubset',
 'issuperset',
 'pop',
 'remove',
 'symmetric_difference',
 'symmetric_difference_update',
 'union',
 'update']







 Question no. 2: Why is there not a building function for append?







Question no. 3: If all else fails, i.e., append & set,  my only option is the slice the data set?




Regards,

Hal






Sent from Surface





From: Alan Gauld
Sent: ‎Friday‎, ‎July‎ ‎31‎, ‎2015 ‎2‎:‎00‎ ‎AM
To: Tutor at python.org





On 31/07/15 01:25, ltc.hotspot at gmail.com wrote:

> fname = raw_input("Enter file name: ")
> if len(fname) < 1 : fname = "mbox-short.txt"
> fh = open(fname)
> count = 0
> for line in fh:
>      if not line.startswith('From'): continue
>      line2 = line.strip()
>      line3 = line2.split()
>      line4 = line3[1]
>      print line4
>      count = count + 1
> print "There were", count, "lines in the file with From as the first word"
>
> Question: How do I remove the duplicates:

OK, You now have the original code working, well done.
To remove the duplicates you need to collect the addresses
rather than printing them. Since you want the addresses
to be unique you can use a set.

You do that by first creating an empty set above
the loop, let's call it addresses:

addresses = set()

Then replace your print statement with the set add()
method:

addresses.add(line4)

This means that at the end of your loop you will have
a set containing all of the unique addresses you found.
You now print the set. You can do that directly or for
more control over layout you can write another for
loop that prints each address individually.

print addresses

or

for address in addresses:
    print address   # plus any formatting you want

You can also sort the addresses by calling the
sorted() function before printing:

print sorted(addresses)


HTH
-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist  -  Tutor at python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


More information about the Tutor mailing list