which datastructure for fast sorted insert?
miller.paul.w at gmail.com
miller.paul.w at gmail.com
Sun May 25 20:08:39 EDT 2008
On May 25, 2:37 am, notnorweg... at yahoo.se wrote:
> im writing a webcrawler.
> after visiting a new site i want to store it in alphabetical order.
>
> so obv i want fast insert. i want to delete duplicates too.
>
> which datastructure is best for this?
I think you ought to re-examine your requirements. Why do you need to
store in alphabetical order? (And, what does that even mean for a web
site?) Personally, I'd probably just use a simple dict for in-memory
storage, and use a database for external storage. In that case,
adding a site or page to the "seen" structure is a simple matter of
doing
urls[key] = url
Dicts are basically hash tables, so this insert is amortized O(1) and
you get dupe-detection for free, provided the key is unique. If you
want to display the contents in alphabetical order by "key," you can
do this by:
keyz = sorted (urls.keys())
for key in keyz:
print urls[key]
The problem with this is that you pay the sorting cost every time you
want to display the results. If that is a problem, you can maintain
an in-memory index (and rely on the database to do it for you in the
case of externally stored results).
More information about the Python-list
mailing list