How to re-write this bash script in Python?

Chris Angelico rosuav at gmail.com
Fri Jul 31 03:47:13 EDT 2015


On Fri, Jul 31, 2015 at 4:31 AM,  <sutanu.das at gmail.com> wrote:
> #!/bin/bash
>
> _maillist='pager at email.com'
> _hname=`hostname`
> _logdir=/hadoop/logs
> _dirlog=${_logdir}/directory_check.log
>
> _year=$(date -d "-5 hour" +%Y)
> _month=$(date -d "-5 hour" +%m)
> _day=$(date -d "-5 hour" +%d)
> _hour=$(date -d "-5 hour" +%H)
>
> _hdfsdir=`hdfs dfs -ls -d /hadoop/flume_ingest_*/$_year/$_month | awk '{print $8}'`
>
> echo "Checking for HDFS directories:" > ${_dirlog}
> echo >> ${_dirlog}
>
> for _currdir in $_hdfsdir
> do
> hdfs dfs -ls -d $_currdir/$_day/$_hour &>> ${_dirlog}
> done
>
> if [[ `grep -i "No such file or directory" ${_dirlog}` ]];
> then
> echo "Verify Flume is working for all  servers" | mailx -s "HDFS Hadoop Failure on Flume: ${_hname}" -a ${_dirlog} ${_maillist}
> fi
> --
> https://mail.python.org/mailman/listinfo/python-list

There are two basic approaches to this kind of job.

1) Go through every line of bash code and translate it into equivalent
Python code. You should then have a Python script which blindly and
naively accomplishes the same goal by the same method.

2) Start by describing what you want to accomplish, and then implement
that in Python, using algorithmic notes from the bash code.

The second option seems like a lot more work, but long-term it often
isn't, because you end up with better code. For example, bash lacks
decent timezone support, so I can well believe random832's guess that
your five-hour offset is a simulation of that; but Python can do much
better work with timezones, so you can get that actually correct.
Also, file handling, searching, and text manipulation and so on can
usually be done more efficiently and readably in Python directly than
by piping things through grep and awk.

ChrisA



More information about the Python-list mailing list