Extract lines from file, add to new files

avi.e.gross at gmail.com avi.e.gross at gmail.com
Sat Feb 3 11:33:23 EST 2024


Thomas,

I have been thinking about the concept of being stingy with information as
this is a fairly common occurrence when people ask for help. They often ask
for what they think they want while people like us keep asking why they want
that and perhaps offer guidance on how to get closer to what they NEED or a
better way.

In retrospect, Rich did give all the info he thought he needed. It boiled
down to saying that he wants to distribute data into two files in such a way
that finding an item in file A then lets him find the corresponding item in
file B. He was not worried about how to make the files or what to do with
the info afterward. He had those covered and was missing what he considered
a central piece. And, it seems he programs in multiple languages and
environments as needed and is not exactly a newbie. He just wanted a way to
implement his overall design.

We threw many solutions and ideas at him but some of us (like me) also got
frustrated as some ideas were not received due to one objection or another
that had not been mentioned earlier when it was not seen as important.

I particularly notice a disconnect some of us had. Was this supposed to be a
search that read only as much as needed to find something and stopped
reading, or a sort of filter that returned zero or more matches and went to
the end, or perhaps something that read entire files and swallowed them into
data structures in memory and then searched and found corresponding entries,
or maybe something else?

All the above approaches could work but some designs not so much. For
example, some files are too large. We, as programmers, often consciously or
unconsciously look at many factors to try to zoom in on what approaches me
might use. To be given minimal amounts of info can be frustrating. We worry
about making a silly design. But the OP may want something minimal and not
worry as long as it is fairly easy to program and works.

We could have suggested something very simple like:

Open both files A and B
In a loop get a line from each. If the line from A is a match, do something
with the current line from B.
If you are getting only one, exit the loop.

Or, if willing, we could have suggested any other file format, such as a
CSV, in which the algorithm is similar but different as in:

Open file A
Read a line in a loop
Split it in parts
If the party of the first part matches something, use the party of the
second part

Or, of course, suggest they read the entire file, into a list of lines or a
data.frame and use some tools that search all of it and produce results.

I find I personally now often lean toward the latter approach but ages ago
when memory and CPU were considerations and maybe garbage collection was not
automatic, ...


-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com at python.org> On
Behalf Of Thomas Passin via Python-list
Sent: Wednesday, January 31, 2024 7:25 AM
To: python-list at python.org
Subject: Re: Extract lines from file, add to new files

On 1/30/2024 11:25 PM, avi.e.gross at gmail.com wrote:
> Thomas, on some points we may see it differently.

I'm mostly going by what the OP originally asked for back on Jan 11. 
He's been too stingy with information since then to be worth spending 
much time on, IMHO.

> Some formats can be done simply but are maybe better done in somewhat
> standard ways.
> 
> Some of what the OP has is already tables in a database and that can
> trivially be exported into a CSV file or other formats like your TSV file
> and more. They can also import from there. As I mentioned, many
spreadsheets
> and all kinds of statistical programs tend to support some formats making
it
> quite flexible.
> 
> Python has all kinds of functionality, such as in the pandas module, to
read
> in a CSV or write it out. And once you have the data structure in memory,
al
> kinds of queries and changes can be made fairly straightforwardly. As one
> example, Rich has mentioned wanting finer control in selecting who gets
some
> version of the email based on concepts like market segmentation. He
already
> may have info like the STATE (as in Arizona) in his database. He might at
> some point enlarge his schema so each entry is placed in one or more
> categories and thus his CSV, once imported, can do the usual tasks of
> selecting various rows and columns or doing joins or whatever.
> 
> Mind you, another architecture could place quite a bit of work completely
on
> the back end and he could send SQL queries to the database from python and
> get back his results into python which would then make the email messages
> and pass them on to other functionality to deliver. This would remove any
> need for files and just rely on the DB.
> 
> There as as usual, too many choices and not necessarily one best answer.
Of
> course if this was a major product that would be heavily used, sure, you
> could tweak and optimize. As it is, Rich is getting a chance to improve
his
> python skills no matter which way he goes.
> 
> 
> 
> -----Original Message-----
> From: Python-list <python-list-bounces+avi.e.gross=gmail.com at python.org>
On
> Behalf Of Thomas Passin via Python-list
> Sent: Tuesday, January 30, 2024 10:37 PM
> To: python-list at python.org
> Subject: Re: Extract lines from file, add to new files
> 
> On 1/30/2024 12:21 PM, Rich Shepard via Python-list wrote:
>> On Tue, 30 Jan 2024, Thomas Passin via Python-list wrote:
>>
>>> Fine, my toy example will still be applicable. But, you know, you
haven't
>>> told us enough to give you help. Do you want to replace text from values
>>> in a file? That's been covered. Do you want to send the messages using
>>> those libraries? You haven't said what you don't know how to do.
>>> Something
>>> else? What is it that you want to do that you don't know how?
>>
>> Thomas,
>>
>> For 30 years I've used a bash script using mailx to send messages to a
> list
>> of recipients. They have no salutation to personalize each one. Since I
>> want
>> to add that personalized salutation I decided to write a python script to
>> replace the bash script.
>>
>> I have collected 11 docs explaining the smtplib and email modules and
>> providing example scripts to apply them to send multiple individual
>> messages
>> with salutations and attachments.
> 
> If I had a script that's been working for 30 years, I'd probably just
> use Python to do the personalizing and let the rest of the bash script
> do the rest, like it always has.  The Python program would pipe or send
> the personalized messages to the rest of the bash program. Something in
> that ballpark, anyway.
> 
>> Today I'm going to be reading these. They each recommend using .csv input
>> files for names and addresses. My first search is learning whether I can
>> write a single .csv file such as:
>> "name1","address1"
>> "mane2","address2"
>> which I believe will work; and by inserting at the top of the message
> block
>> Hi, {yourname}
>> the name in the .csv file will replace the bracketed place holder
> If the file contents are going to be people's names and email addresses,
> I would just tab separate them and split each line on the tab.  Names
> aren't going to include tabs so that would be safe.  Email addresses
> might theoretically include a tab inside a quoted name but that would be
> extremely obscure and unlikely.  No need for CSV, it would just add
> complexity.
> 
> data = f.readlines()
> for d in data:
>       name, addr = line.split('\t') if line.strip() else ('', '')
> 
>> Still much to learn and the batch of downloaded PDF files should educate
>> me.
>>
>> Regards,
>>
>> Rich
> 

-- 
https://mail.python.org/mailman/listinfo/python-list



More information about the Python-list mailing list