[Tutor] Better way to remove lines from a list?
boB Stepp
robertvstepp at gmail.com
Tue May 12 14:47:46 EDT 2020
I have a test file with the following contents:
ADR;TYPE=HOME:;;11601 Southridge Dr;Little Rock;AR;72212-1733;US;11601 Sout
hridge Dr\nLittle Rock\, AR 72212-1733\nUS
ADR;TYPE=WORK:;;1912 Green Mountain Dr;Little Rock;AR;72212;US;1912 Green M
ountain Dr\nLittle Rock\, AR 72212\nUS
more meaningless stuff
even more meaningless stuff
ADR:100;;4700 E McCain Blvd;North Little Rock;AR;72117;US;4700 E McCain Blv
d\n100\nNorth Little Rock\, AR 72117\nUS
I wish to remove the part of lines starting with "ADR" from the last
semi-colon to the EOL *and* any following lines that continue this
duplicated address. As far as I can tell every such instance in my actual
vCard file has these subsequent lines starting with a single space before a
new legitimate vCard property line occurs which always has a character in
the first column of the line.
I have a solution that works relying on these file-specific facts. After
reading the file into a list using readlines() I have this function to do
this processing:
def clean_address(vCard):
cleaned_vCard = []
for index, line in enumerate(vCard):
clean_line = line
if line.startswith("ADR"):
clean_line = line.rpartition(";")[0]
while True:
if vCard[index + 1].startswith(" "):
vCard.pop(index + 1)
else:
break
cleaned_vCard.append(clean_line)
return cleaned_vCard
In the inner while loop I wanted to do the equivalent of saying "advance
the outer for loop while staying inside the while loop". If I were
able to do this I would not need to modify the vCard list in place. I
tried to find a way to do this with ideas of next() or .__next__(), but I
could not discover online how to access the for loop's iterator. I feel
sure there is a better way to do what I want to accomplish, possibly
completely altering the logic of my function or doing something along my
above speculations.
The other thing that bothers me is the fragility of my approach. I am
relying on two things that I am sure are not true for a general export of a
Google vCard: (1) What if I have an exceptionally long legitimate address
that cannot be encompassed on a single line starting with "ADR"? In this
case my function as written would not yield a correct address. (2) I am
relying on illegitimate address duplicates starting on following lines
beginning with a single space. For my particular vCard file I don't think
these will affect me, but I would like to make this more robust just
because it is the right thing to do. But at the moment I don't see how.
And for a rhetorical question: Why can't I just make myself write the
quick, obvious, but flawed program that would have had me done with this Sunday?
--
Wishing you only the best,
boB Stepp
More information about the Tutor
mailing list