EDI parsing

Paul Rubin phr-n2002b at NOSPAMnightsong.com
Sat Sep 14 02:20:47 EDT 2002


"Emile van Sebille" <emile at fenx.com> writes:
> > 2) Is there a (python) EDI to XML converter?
> 
> That wouldn't be hard.  If there's already a DTD for edi, that would
> help a lot.  ;-)

EDI syntax doesn't fit into the SGML model, so I don't think it could
be done with an DTD-driven parser.  Parsing and converting EDI is easy.

> > [You have to know that I've never written a parser in Python before.
> >  The last (big) parser I wrote was in C with a few years back the help
> >  of lex and yacc if memory serves me right.]
> >
> > What is the best approach to writing an EDI parser in Python?

The main thing is to understand that it's not complicated.  I worked
on an EDI product years ago, and we hired several programmers with CS
backgrounds who immediately wanted to use stuff like lex and yacc on
it.  Really, EDI was designed to be processed by RPG and COBOL
programs and just doesn't have much syntactic hair.  If you find
yourself reaching for fancy compiler implementation techniques, you're
probably misunderstanding something.

Basically, an EDI document is a sequence of lines called "segments"
where the segments are sequences of "elements".  The spec describes
various document types (purchase order, invoice, etc.) and the types
of segments you expect to find in each one.  For each segment, the
spec describes which elements you expect to find in each one.  You
break a segment down into elements with a simple lexical scanner
(re.split should be more than powerful enough) and then have a
dictionary indexed by segment type of what elements should be in it.
Within documents, there are some simple looping constructs, that are
no big deal.

If you want to write a general EDI parser that understands all the
documets and segments in the spec, 99% of the work is typing in all
the rules from the spec.  The code that follows them should be very
simple.  In reality, nobody uses all the docs or segments.  They pick
a few docs and use some subset of the allowable segments in specific
ways.  So if you're trying to implement EDI because you have some
customer who wants to order widgets from you by EDI, you don't need a
fully general system.  Just ask them for the spec of the docs they
want to use, and code those.

What brought on this question?  I seem to remember this subject came
up before, maybe on sci.crypt.  I don't think there's any free general
EDI software around because the big specication documents (X12,
Edifact) aren't free.  But the code framework is pretty
straightforward to write.  I did it in C in a few weeks, and in Python
it would be a lot simpler.



More information about the Python-list mailing list