[python-win32] Retrieve informations from NIST file.

Thu Apr 30 18:46:19 CEST 2009

Khalid Moulfi wrote:
>
> thanks for your quick answer.
> Here is a sample of the first line of the NIST file :
>
> 1.001:0000000245
1.002:3000
1.003:119
20
41
42
43
44
45
46
47
48
49
410
411
412
413
414
1515
1516
1517
1518
1.004:NPS
1.005:20081029
1.006:4
1.007:51/Live
> Scan
1.008:51/Live Scan
1.009:0844251404U
1.011:19.6850
1.012:19.6850
2.001:0000000188
2.002:0
2.003:3000
2.010:1005000190
2.019:20081029
2.029:0
2.054:Civilian
2.083:01NA
02NA
03NA
04NA
05NA
06NA
07NA
08NA
09NA
10NA
2.233:ÈÏæä
2.235:1011973400606

>
> but as the end of the line is not displayed, I send you a copy of the
> file with all the line.

That's because your file contains null bytes ('\x00').  The string you
display above shows everything up to the first null.

> The thing is even if I take the number of character from let's say
> 2.001 to the end of the line I do not get the real number of charatcer.

What do you mean by that?  Where did the numbers come from?  The file
contains one line of 471 bytes, including the newline.  Does that agree
with either of your sources?

> My goal is to modify this first line by adding new tag (with special
> character), suppress some of them, get the real number of length and
> after all this update to modify it in the original nst file.
>
> I'll try as you said to open it with rb parameters and see.

You will have to show me your code, along with what numbers you expect. 
The file you sent is 471 bytes long, and that's exactly what I read, in
both text and binary modes:

    C:\tmp>python
    Python 2.4.1 (#65, Mar 30 2005, 09:13:57) [MSC v.1310 32 bit
    (Intel)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> x = open('sample_1005000190.nst')
    >>> y = open('sample_1005000190.nst', 'rb')
    >>> x1 = x.read()
    >>> y1 = y.read()
    >>> len(x1)
    471
    >>> len(y1)
    471
    >>> x1.find('2.001')
    245
    >>> x1[-2:]
    '\x00\n'
    >>> y1[-2:]
    '\x00\n'
    >>> x.seek(0,0)
    >>> x2 = x.readlines()
    >>> len(x2)
    1
    >>> len(x2[0])
    471
    >>>

The "2.001" is located at byte 245, so there should are 126 bytes from
there to the end of the line.  However, there are zero bytes (meaning
'\x00') in this file, which might be confusing you.

You have to know something about this data format to know how to modify
it.  It looks like the file consists of two major sections, separated by
0x1C characters.  The major sections are then divided into records
separated by 0x1D characters.  Some of the records have fields in them,
separated by 0x1E.  There are 38 bytes of what look like garbage after
the last field.  So, you could parse it into records like this:

    Python 2.4.1 (#65, Mar 30 2005, 09:13:57) [MSC v.1310 32 bit
    (Intel)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> x = open('sample_1005000190.nst','rb').read()
    >>> sections = x.split('\x1c')
    >>> len(sections)
    3
    >>> [len(k) for k in sections]
    [244, 187, 38]
    >>> rec1 = sections[0].split('\x1d')
    >>> rec2 = sections[1].split('\x1d')
    >>> len(rec1)
    11
    >>> len(rec2)
    10
    >>> rec1
    ['1.001:0000000245', '1.002:3000',
    '1.003:1\x1f19\x1e2\x1f0\x1e4\x1f1\x1e4\x1f2\
    x1e4\x1f3\x1e4\x1f4\x1e4\x1f5\x1e4\x1f6\x1e4\x1f7\x1e4\x1f8\x1e4\x1f9\x1e4\x1f10
    \x1e4\x1f11\x1e4\x1f12\x1e4\x1f13\x1e4\x1f14\x1e15\x1f15\x1e15\x1f16\x1e15\x1f17
    \x1e15\x1f18', '1.004:NPS', '1.005:20081029', '1.006:4',
    '1.007:51/Live Scan', '
    1.008:51/Live Scan', '1.009:0844251404U', '1.011:19.6850',
    '1.012:19.6850']
    >>> rec2
    ['2.001:0000000188', '2.002:0', '2.003:3000', '2.010:1005000190',
    '2.019:2008102
    9', '2.029:0', '2.054:Civilian',
    '2.083:01\x1fNA\x1e02\x1fNA\x1e03\x1fNA\x1e04\x
    1fNA\x1e05\x1fNA\x1e06\x1fNA\x1e07\x1fNA\x1e08\x1fNA\x1e09\x1fNA\x1e10\x1fNA',
    '
    2.233:\xc8\xcf\xe6\xe4', '2.235:1011973400606']
    >>>

Here, "sections" contains the three major sections.  "rec1" contains the
records from the first section.  If you wanted to add a "1.013" record
to the first section, you could say:
    rec1.append( "1.013:Cool Beans" )
and then rebuild the file by saying:
    newsections = ['\x1d'.join(rec1), '\x1d'.join(rec2), sections[2]]
    open('newfile.nst','wb').write ('\x1c'.join(newsections) )

But that assumes there's nothing in that garbage 3rd section that needs
to be changed.

It's just a matter of dividing the problem up into smaller problems
until the solution pops out.

-- 
Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.