[Tutor] Better way to remove lines from a list?

boB Stepp robertvstepp at gmail.com
Tue May 12 14:47:46 EDT 2020


  I have a test file with the following contents:

ADR;TYPE=HOME:;;11601 Southridge Dr;Little Rock;AR;72212-1733;US;11601 Sout
  hridge Dr\nLittle Rock\, AR 72212-1733\nUS
ADR;TYPE=WORK:;;1912 Green Mountain Dr;Little Rock;AR;72212;US;1912 Green M
  ountain Dr\nLittle Rock\, AR 72212\nUS
  more meaningless stuff
  even more meaningless stuff
ADR:100;;4700 E McCain Blvd;North Little Rock;AR;72117;US;4700 E McCain Blv
  d\n100\nNorth Little Rock\, AR 72117\nUS

I wish to remove the part of lines starting with "ADR" from the last
semi-colon to the EOL *and* any following lines that continue this
duplicated address.  As far as I can tell every such instance in my actual
vCard file has these subsequent lines starting with a single space before a
new legitimate vCard property line occurs which always has a character in
the first column of the line.

I have a solution that works relying on these file-specific facts.  After
reading the file into a list using readlines() I have this function to do
this processing:

def clean_address(vCard):
     cleaned_vCard = []
     for index, line in enumerate(vCard):
         clean_line = line
         if line.startswith("ADR"):
             clean_line = line.rpartition(";")[0]
             while True:
                 if vCard[index + 1].startswith(" "):
                     vCard.pop(index + 1)
                 else:
                     break
         cleaned_vCard.append(clean_line)
     return cleaned_vCard

In the inner while loop I wanted to do the equivalent of saying "advance
the outer for loop while staying inside the while loop".  If I were
able to do this I would not need to modify the vCard list in place.  I
tried to find a way to do this with ideas of next() or .__next__(), but I
could not discover online how to access the for loop's iterator.  I feel
sure there is a better way to do what I want to accomplish, possibly
completely altering the logic of my function or doing something along my
above speculations.

The other thing that bothers me is the fragility of my approach.  I am
relying on two things that I am sure are not true for a general export of a
Google vCard:  (1) What if I have an exceptionally long legitimate address
that cannot be encompassed on a single line starting with "ADR"?  In this
case my function as written would not yield a correct address.  (2) I am
relying on illegitimate address duplicates starting on following lines
beginning with a single space.  For my particular vCard file I don't think
these will affect me, but I would like to make this more robust just
because it is the right thing to do.  But at the moment I don't see how.

And for a rhetorical question:  Why can't I just make myself write the
quick, obvious, but flawed program that would have had me done with this Sunday?


-- 
Wishing you only the best,

boB Stepp


More information about the Tutor mailing list