Extract lines from file, add to new files

dn PythonList at DancesWithMice.info
Sat Jan 13 17:01:12 EST 2024


On 12/01/24 08:53, Rich Shepard via Python-list wrote:
> On Thu, 11 Jan 2024, Piergiorgio Sartor via Python-list wrote:
> 
>> Why not to use bash script for all?
> 
> Piergiorgio,
> 
> That's certainly a possibility, and may well be better than python for this
> task.

(sitting in a meeting with little to occupy my mind, whilst tidying 
email-InBox came back to this conversation)


In the bare-description of the task, I might agree to sticking with 
BASH. The OP did say that the output from this will become input to a 
sed/mailx task!
(we trust, does not involve spamming innocent folk)

However, that task could also be accomplished in Python. So, unless 
there is an existing script (perhaps) quite why one would choose to do 
half in Python and half in BASH (or...) is a question.


Because this is a Python forum, do the whole thing in one mode - our mode!

Previous suggestions involved identifying a line by its content.

Could use a neat state-transition solution.

However, there is no need to consider the input-data as lines because of 
the concept of "white-space", well-utilised by some of Python's built-in 
string-functions. See code-sample, below.

As mentioned before, the idea of splitting the one file (data-items 
related by serial-progression) and creating two quite-separate 
data-constructs (in this case: one holding the person's name in one file 
and the other the person's email-address in another) which are related 
'across', ie line-by-line, is an architectural error?horror. Such would 
be hard to maintain, and over-time impossible to guarantee integrity. 
Assuming this is not a one-off exercise, see elsewhere for advice to 
store the captured data in some more-useful format, eg JSON, CSV, or 
even put into a MongoDB or RDBMS.


****** code

""" PythonExperiments:rich.py
     Demonstrate string extraction.
"""

__author__ = "dn, IT&T Consultant"
__python__ = "3.12"
__created__ = "PyCharm, 14 Jan 2024"
__copyright__ = "Copyright © 2024~"
__license__ = "GNU General Public License v3.0"

# PSL
import more_itertools as it

DATA_FILE = "rich_data_file"
READ_ONLY = "r"
AS_PAIRS = 2
STRICT_PAIRING = True


if __name__ == "__main__":
     print("\nCommencing execution\n")

     with open( DATA_FILE, READ_ONLY, ) as df:
         data = df.read()

     data_as_list = data.split()
     paired_data = it.chunked( data_as_list, AS_PAIRS, STRICT_PAIRING, )

     for name, email_address in paired_data:
         # replace this with email-function
         # and/or with storage-function
         print( name, email_address, )

     print("\nTerminating")

****** sample output

Calvin calvin at example.com
Hobbs hobbs at some.com
...

******

-- 
Regards,
=dn


More information about the Python-list mailing list