Extract lines from file, add to new files

avi.e.gross at gmail.com avi.e.gross at gmail.com
Fri Jan 12 09:42:18 EST 2024


If the data in the input file is exactly as described and consists of
alternating lines containing a name and email address, or perhaps an
optional blank line, then many solutions are possible using many tools
including python programs.

But is the solution a good one for some purpose? The two output files may
end up being out of sync for all kinds of reasons. One of many "errors" can
happen if multiple lines in a row do not have an "@" or a person's name
does, for example. What if someone supplied more than one email address with
a comma separator? This may not be expected but could cause problems.

Some of the other tools mentioned would not care and produce garbage. Grep
as an example could be run twice asking for lines with an "@" and then lines
without. In this case, that would be trivial. Blank lines, or ones with just
whitespace, might need another pass to be omitted.

But a real challenge would be to parse the file in a language like Python
and find all VALID stretches in the data and construct a data structure
containing either a valid name or something specific like "ANONYMOUS"
alongside an email address. These may be written out as soon as it is
considered valid, or collected in something like a list. You can do further
processing if you want the results in some order or remove duplicates or bad
email addresses and so on. In that scenario, the two files would be written
out at the end.

Python can do the above while some of the other tools mentioned are not
really designed for it. Further, many of the tools are not generally
available everywhere.

Another question is why it makes sense to produce two output files to
contain the data that may not be linked and would not be easy to edit and
keep synchronized such as to remove or add entries. There are many ways to
save the data that might be more robust for many purposes. It looks like the
application intended is a sort of form letter merge where individual emails
will be sent that contain a personalized greeting. Unless that application
has already been written, there are many other ways that make sense. One
obvious one is to save the data in a databases as columns in a table. Other
ones are to write one file with entries easily parsed out such as:

NAME: name | EMAIL: email

Whatever the exact design, receiving software could parse that out as needed
by the simpler act of reading one line at a time.

And, of course, there are endless storage formats such as a CSV file or
serializing your list of objects to a file so that the next program can load
them in and operate from memory on all the ones it wants. The two file
solution may seem simpler but harks back to how some computing was done in
early days when list of objects might be handled by having multiple arrays
with each containing one aspect of the object and updating required
rememebreing to touch each array the same way.. That can still be a useful
technique when some operations being done in a vectoried manner might be
faster than an array of objects, but is more often a sign of poor code.






-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com at python.org> On
Behalf Of Grizzy Adams via Python-list
Sent: Friday, January 12, 2024 1:59 AM
To: Rich Shepard via Python-list <python-list at python.org>; Rich Shepard
<rshepard at appl-ecosys.com>
Subject: Re: Extract lines from file, add to new files

Thursday, January 11, 2024  at 10:44, Rich Shepard via Python-list wrote:
Re: Extract lines from file, add to (at least in part)

>On Thu, 11 Jan 2024, MRAB via Python-list wrote:

>> From the look of it:
>> 1. If the line is empty, ignore it.
>> 2. If the line contains "@", it's an email address.
>> 3. Otherwise, it's a name.

If that is it all? a simple Grep would do (and save on the blank line)
-- 
https://mail.python.org/mailman/listinfo/python-list



More information about the Python-list mailing list