Do I need a parser?

Thomas Jollans tjol at tjol.eu
Sat Jun 29 10:06:56 EDT 2019


On 29/06/2019 14:39, josé mariano wrote:
> Dear all,
>
> I'm sure that this subject has been addressed many times before on this forum, but my poor knowledge of English and of computer jargon and concepts results on not being able to find the answer i'm looking for when I search the forum.
>
> So here is my problem: I have this open source project for the scientific community were i want to duplicate an old MS-DOS application written in Fortran. I don't have the source code. The idea is to re-write the software in Python. Originally, the old application would would need to input files: one config file, written with a specific format (see below) and a second one, the so-called scrip file, that defines the sequence of operations to be performed by the main software, also written in a specific format.

Is there any way you can get the source code? Can you track down the 
original author? Are there old backups?

That might eliminate the need for rewriting, and, in any case, would 
make it so much easier to be sure you're doing the right thing and not 
missing something.


>
> To make the transition to the new application as painless as possible to the users, because most of them have their collection of scrips (and settings) developed over the years and are not willing to learn a new script language, I would like to make the new app 100% compatible with the old input files.

Obviously. Make sure you have tests, tests, and more tests. If there's 
documentation, use it, but don't trust it.

That's assuming there are loads of old scripts, that just continuing to 
use the old program is not an option. DOSbox to the rescue?

Another option might be to write a script that parses the old files and 
converts them to something more friendly to your infrastructure, such as 
YAML config files and Python scripts. This has two benefits:

(1) a human can easily check the result. If there are some 
incompatibilities, they'll be easier to spot. If the script misses 
something, the user can add it in.

(2) it might be easier to add new features later

It is a more complex and less user-friendly solution, though.


>
> The operation of the new software would be like this: From the shell, run "my_new_software old_script_file.***". The new software would load the old_script, parse it (?), set the internal variables, load the script and run it.
>
> So, to get to my questions:
>
> - To load and read the config file I need a parser, right? Is their a parser library where we can define the syntax of the language to use? Are there better (meaning easier) ways to accomplish the same result?

You need to parse the file, obviously. Python is good for this sort of 
thing. str.split() and the re module are your friends. The format looks 
reasonably simple, so I'd just parse it into a simple data structure 
with the basic tools Python provides.

> - For the interpretation of the script file, I don't have any clue how to this... One important thing, the script language admits some simple control flow statements like do-wile, again written using a specific sintax.

 From the look of it, it's one instruction per line, and the first word 
of the line is, in some sense, a command? In this case, one way I can 
think of to run it would be an interpreter structured something like this:


class ScriptRunner:
      def __init__(self, config, script):
          self._variables = {}
          self._config = config
          self._script_lines = []
          for line in script.split('\n'):
              line = line.strip()
              if line.startswith('!'):
                  #comment
                  continue
              cmd, *args = line.split()
              self._script_lines.append((cmd.lower(), args))

      def run_script(self):
          self._script_iter = iter(self._script_lines)
          while True:
              try:
                  cmd, args = next(self._script_iter)
                  self.dispatch(cmd, args)
              except StopIteration:
                  return

     def dispatch(self, cmd, args):
         method_name = f'_cmd_{cmd}'
         method = getattr(self, method_name)
         method(args)

     def _cmd_set(self, args):
         varname, value = args
         self._variables[varname] = value

     def _cmd_while(self, loop_args):
         # check condition or something
         loop_body = []
         while True:
              try:
                  cmd, args = next(self._script_iter)
                  # MAGIC
                  if cmd == 'endwhile':
                      break
                  else:
                      loop_body.append((cmd, args))
              except StopIteration:
                  raise RuntimeError('loop not ended')
         while condition_is_met:
             for cmd, args in loop_body:
                  # otherwise, just carry on executing the loop body
                  self.dispatch(cmd, args)


In any case, there are three things you need to keep track of: 
variables, where in the script you are, and what control structure 
currently "has control".

In this stub, I've used the object for holding program state (variables 
that is), a Python iterator to hold the program counter, and the Python 
stack to hold the control flow information. There are other ways you can 
do it, of course. If the language allows for things like subroutines or 
GOTO, things get a little more complicated

There are also tools for this kind of thing. The traditional (UNIX) 
choice would be lex & yacc, but I think there are Python parser 
libraries that you could use. Others might mention them. I'd tend to say 
they're overkill for your purpose, but what do I know.


>
> Thanks a lot for the help and sorry for the long post.
>
> Mariano
>
>    
>
> Example of a config (settings) file
> ========================
> .....
> CONDAD     -11
> BURAD2     4 SALT1 1.0 KNO3
> ELEC5      -2.0 mV 400 58 0. 0
> .....
>
>
> Example of a script
> ===================
> !Conductivity titration
> cmnd bur1 f
> set vinit 100
> set endpt 2000
> set mvinc 20
> set drftim 1
> set rdcrit cond 0.5 per_min
> set dosinc bur1 0.02 1000
> set titdir up
> titratc cond bur1
>




More information about the Python-list mailing list