[Tutor] Pointers Towards Appropriate Python Methods
David L Neil
PyTutor at DancesWithMice.info
Sun Sep 29 15:13:36 EDT 2019
On 30/09/19 7:28 AM, Stephen P. Molnar wrote:
> First, let me state that this is not a homework problem.
You are retired, but not working at home - on 'home work'?
I happen to
> be a retired Research Chemist
Is that a confession?
whose rathre meager programming skills are
> in FORTRAN.
With such a pedigree, can you do any wrong?
> I have managed to write, and with help from the list debug, a very short
> Python script to extract a column of data from an ASCII file:
>
> #!/usr/bin/env python3
> # -*- coding: utf-8 -*-
> """
>
> Created on Tue Sep 24 07:51:11 2019
>
> """
> import numpy as np
>
> fileName = []
>
> name = input("Enter Molecule ID: ")
>
> name_in = name+'_apo-1acl.RMSD'
>
> data = np.genfromtxt(name_in, usecols=(3), dtype=None, skip_header=8,
> skip_footer=1, encoding=None)
>
>
> I have uploaded the script and an example of the input file to my
> Dropbox account in order to avoid scrambling of the file format.
>
> https://www.dropbox.com/sh/xwsv17vkh48tsaa/AAAfIe0miWrrk49hqZCkxe-aa?dl=0
>
> My problem is that I have a large number of data files that I wish to
> process for input to several other different Python scripts that I use
> as part of my Computational Chemistry research program. I've also
> uploaded a bash script that illustrates what I want to do in Python.
>
> At this point what I would like are pointers towards python method for
> processing a large number of data files. I'm not asking anyone to write
> the coed for me.
>
> Thanks in advance.
Have I understood you correctly? You have (sensibly) constructed a
processor which works on a single file, and now want to expand its scope
to process a series of similarly-formatted files?
(alternately: that the various files are in different formats?)
One of the (many) beauties of the Python eco-system is that it has
"batteries included" (or pip-include-able) enabling an extremely wide
variety of tasks. In this case, there is no need to separate 'Python
work' from 'File system/BASH work' - it can ALL be done by Python!
Rather than devolving the file system work to BASH, perhaps review
"pathlib" from the "PSL" (Python Standard Library -
https://docs.python.org/3/library/pathlib.html).
For example, if the files to be processed are collected into a single
directory (or 'directory tree'), pathlib will accept the (top-level)
directory name and then "iterdir" (iterate through all the files in that
directory/tree). Code this into a loop (or a Python "generator") and the
already-coded process could be serially applied to each file. This saves
(a) BASH code, and (b) the "command-line interface" between BASH and
Python.
At the risk of causing cognitive-overload, may I also suggest reading
(some on-line articles/book-chapters) about "logging". If you plan to
follow the FORTRAN tradition of long-running batch programs, then this
is an ideal way to record progress, results, and errors. (IMHO logging
is sadly under-rated, but then much code these days is neither "batch"
nor server-oriented)
Apologies if I'm off on the wrong-track - having solved a long-time
issue I've had with pathlib incorrectly processing European-language
fileNMs, yesterday; this morning I'm re-factoring a bunch of programs
which 'walk a directory tree', to use a common/utility core 'walker' -
and "to a man with a hammer..."
--
Regards =dn
More information about the Tutor
mailing list