[Tutor] Pointers Towards Appropriate Python Methods

David L Neil PyTutor at DancesWithMice.info
Sun Sep 29 15:13:36 EDT 2019


On 30/09/19 7:28 AM, Stephen P. Molnar wrote:
> First, let me state that this is not a homework problem.   

You are retired, but not working at home - on 'home work'?


I happen to
> be a retired Research Chemist

Is that a confession?


  whose rathre meager programming skills are
> in FORTRAN.

With such a pedigree, can you do any wrong?


> I have managed to write, and with help from the list debug, a very short 
> Python script to extract a column of data from an ASCII file:
> 
> #!/usr/bin/env python3
> # -*- coding: utf-8 -*-
> """
> 
> Created on Tue Sep 24 07:51:11 2019
> 
> """
> import numpy as np
> 
> fileName = []
> 
> name = input("Enter Molecule ID: ")
> 
> name_in = name+'_apo-1acl.RMSD'
> 
> data = np.genfromtxt(name_in, usecols=(3), dtype=None, skip_header=8, 
> skip_footer=1, encoding=None)
> 
> 
> I have uploaded the script and an example of the input file to my 
> Dropbox account in order to avoid scrambling of the file format.
> 
> https://www.dropbox.com/sh/xwsv17vkh48tsaa/AAAfIe0miWrrk49hqZCkxe-aa?dl=0
> 
> My problem is that I have a large number of data files that I wish to 
> process for input to several other different Python scripts that I use 
> as part of my Computational Chemistry research program.   I've also 
> uploaded a bash script that illustrates what I want to do in Python.
> 
> At this point what I would like are pointers towards python method for 
> processing a large number of data files. I'm not asking anyone to write 
> the coed for me.
> 
> Thanks in advance.


Have I understood you correctly? You have (sensibly) constructed a 
processor which works on a single file, and now want to expand its scope 
to process a series of similarly-formatted files?
(alternately: that the various files are in different formats?)

One of the (many) beauties of the Python eco-system is that it has 
"batteries included" (or pip-include-able) enabling an extremely wide 
variety of tasks. In this case, there is no need to separate 'Python 
work' from 'File system/BASH work' - it can ALL be done by Python!

Rather than devolving the file system work to BASH, perhaps review 
"pathlib" from the "PSL" (Python Standard Library - 
https://docs.python.org/3/library/pathlib.html).

For example, if the files to be processed are collected into a single 
directory (or 'directory tree'), pathlib will accept the (top-level) 
directory name and then "iterdir" (iterate through all the files in that 
directory/tree). Code this into a loop (or a Python "generator") and the 
already-coded process could be serially applied to each file. This saves 
(a) BASH code, and (b) the "command-line interface" between BASH and 
Python.


At the risk of causing cognitive-overload, may I also suggest reading 
(some on-line articles/book-chapters) about "logging". If you plan to 
follow the FORTRAN tradition of long-running batch programs, then this 
is an ideal way to record progress, results, and errors. (IMHO logging 
is sadly under-rated, but then much code these days is neither "batch" 
nor server-oriented)


Apologies if I'm off on the wrong-track - having solved a long-time 
issue I've had with pathlib incorrectly processing European-language 
fileNMs, yesterday; this morning I'm re-factoring a bunch of programs 
which 'walk a directory tree', to use a common/utility core 'walker' - 
and "to a man with a hammer..."
-- 
Regards =dn


More information about the Tutor mailing list