[Tutor] converting EBCIDIC to ASCII

Marc Tompkins marc.tompkins at gmail.com
Sat Jul 14 01:41:59 CEST 2012


On Fri, Jul 13, 2012 at 1:28 PM, Prinn, Craig <Craig.Prinn at bhemail.com>wrote:

>  The records are coming off of a mainframe, so there probably was a 2
> byte RDW or length indicator at one point. If there is a x0D x0A at the end
> would that work?****
>
> Thanks****
>
> Craig
>

I presume so, but (despite my bloviating about the generalities of
variable-length records) I don't actually know all that much about how
systems that use EBCDIC tend to structure their files (my "big iron" days
were in an HP 3000 shop, which DID use EBCDIC, but that was  22 years ago -
and at the time I was a database-only programmer and didn't need to worry
my little head about actual file I/O.)  By "at the end" do you mean 'at the
end of each record', or 'at the end of the file'?

If you meant 'at the end of each record', then my approach would be:
-  create an empty list called "lines"
-  read in the file (or buffer-sized chunks of it, anyway) - call it inFile
-  create recordBegin and recordEnd pointers, initialized to 0
-  search for x0D x0A (or whatever) in the stream of bytes
-  each time I find it,
   -  set the recordEnd pointer
   -  make a copy of the bytes between recordBegin and recordEnd and append
it to "lines"
   -  copy recordEnd to recordBegin
   -  lather, rinse, repeat
-  at the end, decode each bytestream in "lines"

If you meant 'at the end of the file', then I'm not sure it helps, and I
don't know what you'd need to move forward.

Good luck!

****
>
> ** **
>  ------------------------------
>
> *From:* Marc Tompkins [mailto:marc.tompkins at gmail.com]
> *Sent:* Friday, July 13, 2012 3:30 PM
> *To:* Prinn, Craig
> *Cc:* tutor at python.org
> *Subject:* Re: [Tutor] converting EBCIDIC to ASCII****
>
> ** **
>
> On Thu, Jul 5, 2012 at 9:30 AM, Prinn, Craig <Craig.Prinn at bhemail.com>
> wrote:****
>
> I am trying to convert an EBCIDIC file to ASCII, when the records are
> fixed length I can convert it fine, I have some files that are coming in as
> variable length records, is there a way to convert the file in Python? I
> tried using no length but then it just reads in to a fixed buffer size and
> I can’t seem to break the records properly****
>
> ** **
>
> I know of only three varieties of variable-length-record files:
> -  Delimited - i.e. there's some special character that ends the record,
> and (perhaps) a special character that separates fields.  CSV is the
> classic example: newlines to separate records, commas to separate fields.
>
> -  Prefixed - there's a previously-agreed schema of record lengths, where
> (for example) a record that starts with "A" is 25 characters long; a "B"
> record is 136 characters long, etc.
>
> -  Sequential - record types/lengths appear in a previously-agreed order,
> such as 25 characters, 136 characters, 45 characters, etc.
>
> For each of these types, the schema may be externally-published, or it may
> be encoded in a special record at the beginning of the file - to use an
> example near and dear to my own experience, ANSI X12 EDI files all start
> with a fixed-length "ISA" record, which among other things contains the
> element separator, repetition separator, sub-element separator, and segment
> terminator characters in positions 3, 104, 84, and 105.  To read an X12
> file, therefore, you read it in - look at positions 3,84, 104, and 105 -
> and then use that information to break up the rest of the file into records
> and fields.
>
> How you handle variable-length records depends on what kind they are, and
> how much you know about them going in.  Python is just a tool for applying
> your specialized domain knowledge - by itself, it doesn't know any more
> about your particular solution than you do.
>
> If you have more information about the structure of your files, and need
> help implementing an algorithm to deal with 'em, let us know!****
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20120713/3e3e73df/attachment-0001.html>


More information about the Tutor mailing list