which datastructure for fast sorted insert?

miller.paul.w at gmail.com miller.paul.w at gmail.com
Sun May 25 20:08:39 EDT 2008


On May 25, 2:37 am, notnorweg... at yahoo.se wrote:
> im writing a webcrawler.
> after visiting a new site i want to store it in alphabetical order.
>
> so obv i want fast insert. i want to delete duplicates too.
>
> which datastructure is best for this?

I think you ought to re-examine your requirements.  Why do you need to
store in alphabetical order?  (And, what does that even mean for a web
site?)  Personally, I'd probably just use a simple dict for in-memory
storage, and use a database for external storage.  In that case,
adding a site or page to the "seen" structure is a simple matter of
doing

urls[key] = url

Dicts are basically hash tables, so this insert is amortized O(1) and
you get dupe-detection for free, provided the key is unique.  If you
want to display the contents in alphabetical order by "key," you can
do this by:

keyz = sorted (urls.keys())
for key in keyz:
    print urls[key]

The problem with this is that you pay the sorting cost every time you
want to display the results.  If that is a problem, you can maintain
an in-memory index (and rely on the database to do it for you in the
case of externally stored results).



More information about the Python-list mailing list