[DB-SIG] dbf files and compact indices

Carl Karsten carl at personnelware.com
Sat Sep 18 13:15:22 EDT 2010


On Sat, Sep 18, 2010 at 11:16 AM, Ethan Furman <ethan at stoneleaf.us> wrote:
> Carl Karsten wrote:
>>
>> On Sat, Sep 18, 2010 at 1:11 AM, Ethan Furman <ethan at stoneleaf.us> wrote:
>>
>>> Does anybody have any pointers, tips, web-pages, already written
>>> routines,
>>> etc, on parsing *.cdx files?  I have found the pages on MS's sight for
>>> Foxpro, but they neglect to describe the compaction algorithm used, and
>>> my
>>> Google-fu has failed to find any sites with that information.
>>>
>>> Any and all help greatly appreciated!
>>>
>>
>>
>> "Compound Index File Structure (.cdx)"
>>
>> http://msdn.microsoft.com/en-us/library/k35b9hs2%28v=VS.80%29.aspx
>>
>> which basiclly links to:
>> http://msdn.microsoft.com/en-us/library/s8tb8f47%28v=VS.80%29.aspx
>>
>> Is that what you need?
>
> Thanks for the link, unfortunately I am already familiar with the page.
>  What I need help with is the first sentence of the note at the bottom:
>
> Each entry consists of the record number, duplicate byte count and
> trailing byte count, all compacted. The key text is placed at the
> logical end of the node, working backwards, allowing for previous key
> entries.
>
> Here's a dump of the last interior node:
>
> -----
> node type: 2
> number of keys: 57
> free space: 1 (or 256) (and is this bits, bytes, keys, what?)
> --
> record number mask: c8 0e 40 b0
> duplicate byte count mask: 28
> trailing byte count mask: 00
> --
> bits used for record number: 178
> bits used for duplicate count: 29
> bits used for trail count: 64
> bytes used for rec num, dup count, trail count: 192
> -----
> 12 00 ff 3f 00 00 1f 1f 0e 05 05 03 01 00 c8 0e 40 b0 28 00
> b2 1d 40 c0 29 00 d0 42 40 d0 54 80 c0 43 40 a8 14 40 b8 40
> 40 c8 02 40 d0 08 00 b0 4c 80 b0 3a 40 a0 50 80 d0 3b 40 a8
> 09 40 b8 0a 80 88 3c 80 c0 2a 00 d8 21 c0 c0 3d 40 c0 4a 80
> b0 26 40 b8 2b 40 c0 2c 00 c0 41 40 b8 4d 80 c8 37 00 c0 04
> 40 c8 44 80 c0 1b 40 c8 15 80 c8 27 40 c8 16 00 a8 2d c0 c8
> 51 80 b8 2e 40 c0 1e 00 b0 17 40 b8 46 40 b0 2f 80 c8 4f 80
> a8 13 00 c8 59 00 c8 31 00 c8 1f 00 a8 3e 40 c0 22 40 a8 07
> 00 c8 23 80 d0 32 80 b0 52 80 c0 34 80 b0 20 40 b0 24 40 c0
> 47 80 c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 4e 44 45 4e 49 44 53 4f 4e 43 43 41 4d 4d 4f 4e 54 54 48
> 45 57 53 53 4c 45 4e 52 54 49 4e 45 5a 4e 4e 4d 41 47 45 45
> 49 45 42 45 52 4d 41 4e 45 57 49 4e 53 4c 41 56 45 4e 42 45
> 52 47 4b 41 56 41 4e 4a 4f 4e 45 53 49 52 49 53 48 53 54 45
> 54 4c 45 52 52 41 4e 4f 4c 53 54 45 49 4e 45 41 44 4c 45 59
> 48 41 54 48 41 57 41 59 52 49 4d 45 53 45 41 53 4f 4e 53 53
> 47 4c 41 44 53 54 4f 4e 45 55 52 52 59 4f 53 54 52 49 4e 4b
> 52 42 45 53 4f 4c 45 59 46 49 4c 45 4e 45 4e 49 53 4e 47 4c
> 55 4e 44 45 42 45 52 4c 45 4f 44 53 4f 4e 49 4e 47 4c 45 52
> 4d 41 52 45 53 54 45 43 4b 45 52 54 4f 4e 44 41 59 57 47 45
> 52 52 4e 45 49 4c 2d 53 55 4e 44 54 4f 4f 4b 53 45 59 4c 45
> 4e 44 45 4e 49 4e 55 4e 48 49 41 50 50 45 54 54 41 52 4e 41
> 48 41 4e 43 41 4c 44 57 45 4c 4c 55 54 54 52 55 43 45 4f 43
> 41 52 44 45 4c 4f 4f 4d 42 45 52 47 4e 53 45 4c 45 45 52 42
> 41 43 48 55 47 55 53 54 4e 44 45 52 53 4f 4e 41 4c 4c 41 4e
> -----
>
> The last half (roughly) consists of last names compressed together,
> while the first half consists of 57 (in this case) entries of the record
> number, duplicate byte count and trailing byte count, all compacted --
> how do I uncompact them?
>

huh, I see what you mean.

What are you working on?

I know a few people that may have the answer, but it would help to
explain why it is being worked on.


-- 
Carl K



More information about the Python-list mailing list