Mail extraction problem (something's wrong with split methods)

Luka Milkovic luka.milkovic at public.srce.hr
Sat Sep 11 12:21:59 EDT 2004


Hello,

I have a little problem and although it's little it's extremely difficult
for me to describe it, but I'll try.
I have written a program which extracts certain portions of my received
e-mail. The content of the e-mail is actually predictable, it has one very
long list of numbers, something looking like this:

[34234,35435,657789,6756735,12312378,09678567,23424]

Of course I cannot manipulate my mail while connected to the POP3 server,
so I decided to transfer mail locally and write it to a file and then
manipulate it. Another problem is that in e-mails there is lot of output,
garbage characters and all sorts of nasty things, but somehow, I managed
to solve it (to download e-mail and extract interesting parts), and here
is how (i'll only show the "interesting parts" part):

temp = [mail.read()]
enc_txt = "\n".join(temp)
begin = enc_txt.find(", '[")+len(", '[")         
ending = enc_txt.find("]', ")                        

enc_txt2 = (enc_txt[begin:ending])                   
mail.close()
lines = enc_txt2.splitlines()                     
enc_txt3 = ' '.join([line.strip() for line in lines])
split = re.split(",", enc_txt3)                   
enc = [int(elem) for elem in split]                
enc = map(int, split)  

And this code works! But, there is a problem! When the list of numbers is
longer than 350 bytes, on the 350'th place I don't get a number, but I get
some quotes and commas and strange things. When the list is longer than
700 bytes, this problem occurs twice (actually it does not occur because
interpretor complains, but there are two mistakes of this type). Is there
a thing I'm missing, can split methods handle more than 350 bytes of
splitting text? What's actually happening.

To make it more clear (because I think you will not understand it
completely) i could upload errors, but it's large, so I'll minimize the
log.

[6964, 7086, 3211, 7522, 9472, 3265, 3610, 104, 9729, 6706, 8035, 5439,
7142, 360, 677, 1667, 1382, 9417, 4493, 8289, 9613, 3470, 889, 1021, 3381,
3480, 2483, 6579, 8928, 3240, 4437, 5908, 2290, 9587, 866, 202, 859, 2184,
8328, ..........] - the list of numbers 705 bytes long.

When I run the program (with command print split inside my code, to see
what's going on):

['6964', ' 7086', ' 3211', ' 7522', ' 9472', ' 3265', ' 3610', ' 104', '
9729', ' 6706', ' 8035', ' 5439', ' 7142', ' 360', ' 677', ' 1667', '
1382', ' 9417', ' 4493', ' 8289', ' 9613', ' 3470', ' 889', ' 1021', '
3381', ' 3480', ' 2483', ' 6579', ' 8928', ' 3240', ' 4437', ' 5908', '
2290', ' 9587', ' 866', ' 202', ' 859', ' 2184', ' 8328', .....  " 6730'",
" '", ' 6793'...... , " '", " '6573", ' 869'...]

  File "OTPAenc_dec.py", line 258, in decr
    enc = [int(elem) for elem in split]               
ValueError: invalid literal for int(): 6730'

Please help me, any help will be appreciated.

Thanks in advance.

Sorry for my bad English and my bad expression style, I really don't know
how to explain it more throughly. 

 

 



More information about the Python-list mailing list