[Tutor] String Attribute

Ltc Hotspot ltc.hotspot at gmail.com
Sun Aug 2 03:20:50 CEST 2015


Hi Emile,

I made a mistake and incorrectly assumed that differences between 54 lines
of output and 27 lines of output is the result of removing duplicate email
addresses, i.e., gsilver at umich.edu
gsilver at umich.edu, cwen at iupui.edu, cwen at iupui.edu


Apparently, this is not the case and I was wrong :(
The solution to the problem is in the  desired line output:

stephen.marquard at uct.ac.za
louis at media.berkeley.edu
zqian at umich.edu
rjlowe at iupui.edu
zqian at umich.edu
rjlowe at iupui.edu
cwen at iupui.edu
cwen at iupui.edu
gsilver at umich.edu
gsilver at umich.edu
zqian at umich.edu
gsilver at umich.edu
wagnermr at iupui.edu
zqian at umich.edu
antranig at caret.cam.ac.uk
gopal.ramasammycook at gmail.com
david.horwitz at uct.ac.za
david.horwitz at uct.ac.za
david.horwitz at uct.ac.za
david.horwitz at uct.ac.za
stephen.marquard at uct.ac.za
louis at media.berkeley.edu
louis at media.berkeley.edu
ray at media.berkeley.edu
cwen at iupui.edu
cwen at iupui.edu
cwen at iupui.edu
There were 27 lines in the file with From as the first word
Not in the output of a subset.

Latest output:
set(['stephen.marquard at uct.ac.za', 'louis at media.berkeley.edu', '
zqian at umich.edu', 'rjlowe at iupui.edu', 'cwen at iupui.edu', 'gsilver at umich.edu',
'wagnermr at iupui.edu', 'antranig at caret.cam.ac.uk', '
gopal.ramasammycook at gmail.com', 'david.horwitz at uct.ac.za', '
ray at media.berkeley.edu']) ← Mismatch
There were 54 lines in the file with From as the first word

Latest revised code:
fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
addresses = set()
for line in fh:
    if line.startswith('From'):
        line2 = line.strip()
        line3 = line2.split()
        line4 = line3[1]
        addresses.add(line4)
        count = count + 1
print addresses
print "There were", count, "lines in the file with From as the first word"

Regards,
Hal

On Sat, Aug 1, 2015 at 5:45 PM, Emile van Sebille <emile at fenx.com> wrote:

> On 8/1/2015 4:07 PM, Ltc Hotspot wrote:
>
>> Hi Alan,
>>
>> Question1: The output result is an address or line?
>>
>
> It's a set actually.  Ready to be further processed I imagine.  Or to
> print out line by line if desired.
>
> Question2: Why are there 54 lines as compared to 27 line in the desired
>> output?
>>
>
> Because there are 54 lines that start with 'From'.
>
> As I noted in looking at your source data, for each email there's a 'From
> ' and a 'From:' -- you'd get the right answer checking only for
> startswith('From ')
>
> Emile
>
>
>
>
>> Here is the latest revised code:
>> fname = raw_input("Enter file name: ")
>> if len(fname) < 1 : fname = "mbox-short.txt"
>> fh = open(fname)
>> count = 0
>> addresses = set()
>> for line in fh:
>>      if line.startswith('From'):
>>          line2 = line.strip()
>>          line3 = line2.split()
>>          line4 = line3[1]
>>          addresses.add(line4)
>>          count = count + 1
>> print addresses
>> print "There were", count, "lines in the file with From as the first word"
>>
>> The output result:
>> set(['stephen.marquard at uct.ac.za', 'louis at media.berkeley.edu', '
>> zqian at umich.edu', 'rjlowe at iupui.edu', 'cwen at iupui.edu', '
>> gsilver at umich.edu',
>> 'wagnermr at iupui.edu', 'antranig at caret.cam.ac.uk','
>> gopal.ramasammycook at gmail.com', 'david.horwitz at uct.ac.za', '
>> ray at media.berkeley.edu']) ← Mismatch
>> There were 54 lines in the file with From as the first word
>>
>>
>> The desired output result:
>> stephen.marquard at uct.ac.za
>> louis at media.berkeley.edu
>> zqian at umich.edu
>> rjlowe at iupui.edu
>> zqian at umich.edu
>> rjlowe at iupui.edu
>> cwen at iupui.edu
>> cwen at iupui.edu
>> gsilver at umich.edu
>> gsilver at umich.edu
>> zqian at umich.edu
>> gsilver at umich.edu
>> wagnermr at iupui.edu
>> zqian at umich.edu
>> antranig at caret.cam.ac.uk
>> gopal.ramasammycook at gmail.com
>> david.horwitz at uct.ac.za
>> david.horwitz at uct.ac.za
>> david.horwitz at uct.ac.za
>> david.horwitz at uct.ac.za
>> stephen.marquard at uct.ac.za
>> louis at media.berkeley.edu
>> louis at media.berkeley.edu
>> ray at media.berkeley.edu
>> cwen at iupui.edu
>> cwen at iupui.edu
>> cwen at iupui.edu
>> There were 27 lines in the file with From as the first word
>>
>> Regards,
>> Hal
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sat, Aug 1, 2015 at 1:40 PM, Alan Gauld <alan.gauld at btinternet.com>
>> wrote:
>>
>> On 01/08/15 19:48, Ltc Hotspot wrote:
>>>
>>> There is an indent message in the revised code.
>>>> Question: Where should I indent the code line for the loop?
>>>>
>>>>
>>> Do you understand the role of indentation in Python?
>>> Everything in the indented block is part of the structure,
>>> so you need to indent everything that should be executed
>>> as part of the logical block.
>>>
>>> fname = raw_input("Enter file name: ")
>>>
>>>> if len(fname) < 1 : fname = "mbox-short.txt"
>>>> fh = open(fname)
>>>> count = 0
>>>> addresses = set()
>>>> for line in fh:
>>>>       if line.startswith('From'):
>>>>       line2 = line.strip()
>>>>       line3 = line2.split()
>>>>       line4 = line3[1]
>>>>       addresses.add(line)
>>>>       count = count + 1
>>>>
>>>>
>>> Everything after the if line should be indented an extra level
>>> because you only want to do those things if the line
>>> startswith From.
>>>
>>> And note that, as I suspected, you are adding the whole line
>>> to the set when you should only be adding the address.
>>> (ie line4). This would be more obvious if you had
>>> used meaningful variable names such as:
>>>
>>>      strippedLine = line.strip()
>>>      tokens = strippedLine.split()
>>>      addr = tokens[1]
>>>      addresses.add(addr)
>>>
>>> PS.
>>> Could you please delete the extra lines from your messages.
>>> Some people pay by the byte and don't want to receive kilobytes
>>> of stuff they have already seen multiple times.
>>>
>>>
>>> --
>>> Alan G
>>> Author of the Learn to Program web site
>>> http://www.alan-g.me.uk/
>>> http://www.amazon.com/author/alan_gauld
>>> Follow my photo-blog on Flickr at:
>>> http://www.flickr.com/photos/alangauldphotos
>>>
>>>
>>> _______________________________________________
>> Tutor maillist  -  Tutor at python.org
>> To unsubscribe or change subscription options:
>> https://mail.python.org/mailman/listinfo/tutor
>>
>>
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>


More information about the Tutor mailing list