Support for new items in set type

Prateek surekap at gmail.com
Sun Apr 22 06:35:02 EDT 2007


On Apr 22, 11:09 am, Steven D'Aprano
<s... at REMOVE.THIS.cybersource.com.au> wrote:
> On Sat, 21 Apr 2007 20:13:44 -0700, Prateek wrote:
> > I have a bit of a specialized request.
>
> > I'm reading a table of strings (specifically fixed length 36 char
> > uuids generated via uuid.uuid4() in the standard library) from a file
> > and creating a set out of it.
> > Then my program is free to make whatever modifications to this set.
>
> > When I go back to save this set, I'd like to be able to only save the
> > new items.
>
> This may be a silly question, but why? Why not just save the modified set,
> new items and old, and not mess about with complicated transactions?

I tried just that. Basically ignored all the difficulties of
difference calculation and just overwrote the entire tablespace with
the new set. At about 3000 entries per file (and 3 files) along with
all the indexing etc. etc. just the extra I/O cost me 28% performance.
I got 3000 entries committed in 53s with difference calculation but in
68s with writing the whole thing.

>
> After all, you say:
>
> > PS: Yes - I need blazing fast performance - simply pickling/unpickling
> > won't do. Memory constraints are important but definitely secondary.
> > Disk space constraints are not very important.
>
> Since disk space is not important, I think that you shouldn't care that
> you're duplicating the original items. (Although maybe I'm missing
> something.)
>
> Perhaps what you should be thinking about is writing a custom pickle-like
> module optimized for reading/writing sets quickly.

I already did this. I'm not using the pickle module at all - Since I'm
guaranteed that my sets contain a variable number of fixed length
strings, I write a header at the start of each tablespace (using
struct.pack) marking the number of rows and then simply save each
string one after the other without delimiters. I can do this simply by
issuing "".join(list(set_in_question)) and then saving the string
after the header. There are a few more things that I handle (such as
automatic tablespace overflow)

Prateek




More information about the Python-list mailing list