Parsing String, Dictionary Lookups, Writing to Database Table

mensanator at aol.com mensanator at aol.com
Mon Sep 11 21:32:15 EDT 2006


Rich Shepard wrote:
> I need to learn how to process a byte stream from a form reader where each
> pair of bytes has meaning according to lookup dictionaries, then use the
> values to build an array of rows inserted into a sqlite3 database table.
>
>    Here's the context: The OMR card reader sends a stream of 69 bytes over
> the serial line; the last byte is a carriage return ('\r') indicating the
> end of record. Three pairs (in specific positions at the beginning of the
> stream) represent blanks (no value); two other pairs represent character
> strings; the values are determined from two dictionaries. The remaining 28
> pairs represent values from a third dictionary.
>
>    What I'm doing: I think that I have the first part correct; that is, the
> three blanks and two string values. The first three dictionaries are:
>
>      DATA_MAP_BLANK = {
>        chr(32)+chr(32): 0       # SP + SP
>        }
>      DATA_MAP_2 = {
>        chr(32)+chr(36): 'nat',  # SP + $
>        chr(96)+chr(32): 'eco',  #   + SP
>        chr(36)+chr(32): 'soc'   #  $ + SP
>        }
>      DATA_MAP_5 = {
>        chr(32)+chr(36): 'pro',  # SP + $
>        chr(96)+chr(32): 'neu',  #   + SP
>        chr(36)+chr(32): 'con'   #  $ + SP
>        }
>
>    I read the data into a string and split that into byte tokens, then start
> building the row to be inserted into a database table:
>
>      line = ser.readline()
>      split_line = line.split()
>
>      # then pre-pend the record number to the front of the row
>      row.join(', ', vote_id)
>
>      # extract category choice (row 2, bytes 2 and 3); look up value in dictionary
>      cat = split_line(2:4)
>      row.join(', ', DATA_MAP_2(cat) # I'm not sure this is a correct lookup
>
>      # extract position (row 5, bytes 8 and 9P; look up value in dictionary
>        pos = split_line(8:10)
>        row.join(', ', DATA_MAP_5(pos))
>
>    Is the above the most 'correct' way of extracting specific byte pairs,
> using them as dictionary keys to get values, then build a string of
> comma-and-quote values for insertion in the database table?

No, if you're going to insert into sqlite3, you don't want a
csv string, you want a list of values (see example code
below).

>
>    Then, I've no idea how to get the rest of the data parsed for use as keys
> in the last data mapping dictionary. I do not see a skip value for slicing,
> other than in Numeric Python, and I'm not yet building an array of data.
>
>    Here's the last dictionary:
>
>      DATA_MAP_7 = {
>        chr(32)+chr(16): 1.000,  # SP + DLE
>        chr(32)+chr(8): 2.000,   # SP + BS
>        chr(32)+chr(4): 3.000,   # SP + EOT
>        chr(32)+chr(2): 4.000,   # SP + STX
>        chr(32)+chr(1): 5.000,   # SP + SOH
>        chr(64)+chr(32): 6.000,  #  @ + SP
>        chr(16)+chr(32): 7.000,  # DLE + SP
>        chr(8)+chr(32): 8.000,   # BS + SP
>        chr(4)+chr(32): 9.000,   # EOT + SP
>        chr(34)+chr(8): 0.500,   #  " + BS
>        chr(34)+chr(4): 0.333,   #  " + EOT
>        chr(34)+chr(2): 0.025,   #  " + STX
>        chr(34)+chr(1): 0.200,   #  " + SOH
>        chr(66)+chr(32): 0.167,  #  B + SP
>        chr(18)+chr(32): 0.143,  # DC2 + SP
>        chr(10)+chr(32): 0.125,  # LF + SP
>        chr(6)+chr(32): 0.111    # ACK + SP
>        }
>
>    I know how I'd do all this in C, but since I'm learning python I have not
> found how best to accomplish this despite the books and online references
> I've read.

As I'm now an expert in sqlite3, here's a sample program that
may do what you need. Note, I had to create some test data.
Make sure it's correct with regards to how your input stream
works.

Also, if vote_id is autoincrement, that's one less thing your
program has to do.

import sqlite3
import random

con = sqlite3.connect(":memory:")
cur = con.cursor()

cur.execute("""
CREATE TABLE test (vote_id integer primary key autoincrement,
                   category text,
                   position text,
                   f01 real,
                   f02 real,
                   f03 real,
                   f04 real,
                   f05 real,
                   f06 real,
                   f07 real,
                   f08 real,
                   f09 real,
                   f10 real,
                   f11 real,
                   f12 real,
                   f13 real,
                   f14 real,
                   f15 real,
                   f16 real,
                   f17 real,
                   f18 real,
                   f19 real,
                   f20 real,
                   f21 real,
                   f22 real,
                   f23 real,
                   f24 real,
                   f25 real,
                   f26 real,
                   f27 real,
                   f28 real,
                   f29 real)
""")

DATA_MAP_BLANK = {chr(32)+chr(32):0}

DATA_MAP_2     = {chr(32)+chr(36): 'nat', \
                  chr(96)+chr(32): 'eco', \
                  chr(36)+chr(32): 'soc'}

DATA_MAP_5     = {chr(32)+chr(36): 'pro', \
                  chr(96)+chr(32): 'neu', \
                  chr(36)+chr(32): 'con'}

DATA_MAP_7     = {chr(32)+chr(16): 1.000, \
                  chr(32)+chr(8):  2.000, \
                  chr(32)+chr(4):  3.000, \
                  chr(32)+chr(2):  4.000, \
                  chr(32)+chr(1):  5.000, \
                  chr(64)+chr(32): 6.000, \
                  chr(16)+chr(32): 7.000, \
                  chr(8)+chr(32):  8.000, \
                  chr(4)+chr(32):  9.000, \
                  chr(34)+chr(8):  0.500, \
                  chr(34)+chr(4):  0.333, \
                  chr(34)+chr(2):  0.025, \
                  chr(34)+chr(1):  0.200, \
                  chr(66)+chr(32): 0.167, \
                  chr(18)+chr(32): 0.143, \
                  chr(10)+chr(32): 0.125, \
                  chr(6)+chr(32):  0.111}

# create test data
DM2_key = DATA_MAP_2.keys()
DM5_key = DATA_MAP_5.keys()
DM7_key = DATA_MAP_7.keys()
DM7_key.extend(DM7_key)

for i in xrange(10):
    random.shuffle(DM2_key)
    random.shuffle(DM5_key)
    random.shuffle(DM7_key)

    test = '  ' + DM2_key[0] + '    ' + DM5_key[0]
    for i in xrange(29):
        test = test + DM7_key[i]

    values = [DATA_MAP_2[test[2:4]],DATA_MAP_5[test[8:10]]]
    for i in xrange(10,68,2):
        values.append(DATA_MAP_7[test[i:i+2]])

    cur.execute("""
INSERT INTO test ( category,
                   position,
                   f01,
                   f02,
                   f03,
                   f04,
                   f05,
                   f06,
                   f07,
                   f08,
                   f09,
                   f10,
                   f11,
                   f12,
                   f13,
                   f14,
                   f15,
                   f16,
                   f17,
                   f18,
                   f19,
                   f20,
                   f21,
                   f22,
                   f23,
                   f24,
                   f25,
                   f26,
                   f27,
                   f28,
                   f29)
VALUES
(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"""
,values)


# read back what was inserted to verify
cur.execute("""SELECT * FROM test""")
d = cur.fetchall()

for i in d:
    print
    print
    print "Record: %d  Category: %s  Position: %s" % (i[0],i[1],i[2])
    print '-'*40,
    count = 0
    for j in i[3:31]:
        if (count % 6)==0: print '\n',
        print '%0.3f ' % (j),
        count += 1

##  Sample run
##
##    Record: 1  Category: soc  Position: con
##    ----------------------------------------
##    0.111  0.125  0.200  3.000  3.000  6.000
##    9.000  5.000  8.000  7.000  0.333  0.500
##    7.000  0.025  0.167  5.000  8.000  4.000
##    2.000  1.000  0.200  0.143  0.025  9.000
##    1.000  0.143  0.111  6.000
##
##    Record: 2  Category: soc  Position: neu
##    ----------------------------------------
##    0.500  0.333  5.000  1.000  0.143  2.000
##    0.500  6.000  0.200  1.000  2.000  0.333
##    8.000  0.125  0.143  0.111  0.200  9.000
##    4.000  7.000  3.000  0.167  9.000  8.000
##    7.000  0.167  0.111  4.000
##
##    Record: 3  Category: soc  Position: pro
##    ----------------------------------------
##    0.025  0.333  0.167  7.000  0.333  1.000
##    0.200  3.000  0.111  1.000  6.000  5.000
##    0.111  9.000  0.143  0.025  7.000  8.000
##    0.143  9.000  0.500  8.000  0.125  2.000
##    4.000  3.000  5.000  0.200
##
##    Record: 4  Category: eco  Position: neu
##    ----------------------------------------
##    2.000  8.000  3.000  4.000  0.125  0.167
##    0.125  6.000  0.143  0.025  0.025  0.500
##    7.000  2.000  0.111  3.000  4.000  9.000
##    0.167  0.500  0.200  8.000  0.333  1.000
##    9.000  0.111  7.000  5.000
##
##    Record: 5  Category: nat  Position: con
##    ----------------------------------------
##    9.000  0.333  1.000  7.000  9.000  0.111
##    0.200  0.333  5.000  0.500  0.500  5.000
##    2.000  8.000  2.000  0.125  4.000  4.000
##    0.025  0.200  0.125  0.111  3.000  6.000
##    7.000  8.000  0.167  0.167
##
##    Record: 6  Category: eco  Position: pro
##    ----------------------------------------
##    2.000  0.200  0.167  5.000  3.000  4.000
##    0.333  8.000  0.200  0.025  5.000  0.143
##    9.000  0.167  0.125  0.111  7.000  0.025
##    0.111  6.000  0.333  1.000  0.500  0.500
##    0.125  0.143  1.000  3.000
##
##    Record: 7  Category: soc  Position: neu
##    ----------------------------------------
##    0.333  1.000  0.500  0.167  0.111  5.000
##    9.000  0.143  6.000  0.125  4.000  6.000
##    3.000  0.143  0.125  7.000  0.200  4.000
##    2.000  0.025  0.167  7.000  8.000  3.000
##    9.000  0.025  5.000  0.111
##
##    Record: 8  Category: eco  Position: pro
##    ----------------------------------------
##    6.000  2.000  0.125  3.000  4.000  0.025
##    0.167  5.000  0.025  0.143  0.111  0.200
##    2.000  1.000  4.000  0.167  7.000  0.111
##    5.000  0.500  6.000  8.000  1.000  0.333
##    0.125  9.000  0.200  9.000
##
##    Record: 9  Category: soc  Position: pro
##    ----------------------------------------
##    3.000  0.500  0.143  7.000  5.000  6.000
##    4.000  0.333  0.025  1.000  0.200  3.000
##    9.000  8.000  6.000  0.500  0.111  1.000
##    0.333  0.167  0.025  5.000  8.000  9.000
##    0.125  0.111  0.200  2.000
##
##    Record: 10  Category: soc  Position: con
##    ----------------------------------------
##    4.000  2.000  0.025  7.000  0.500  0.333
##    5.000  6.000  0.125  0.143  0.500  0.200
##    0.167  0.167  4.000  9.000  9.000  6.000
##    0.111  0.200  0.143  0.333  1.000  2.000
##    0.125  3.000  8.000  3.000



>
> TIA,
>
> Rich
>
> --
> Richard B. Shepard, Ph.D.               |    The Environmental Permitting
> Applied Ecosystem Services, Inc.(TM)    |            Accelerator
> <http://www.appl-ecosys.com>     Voice: 503-667-4517      Fax: 503-667-8863




More information about the Python-list mailing list