Is there anyone familiar with pybloom (bloom filter in python)?
Xell Zhang
xellzhang at gmail.com
Sun Jul 8 14:18:36 EDT 2007
Hello,
I found pybloom module from http://www.imperialviolet.org/pybloom.html and
tried to use it for my crawler:)
I want to use it to store the URLs which have been crawled. But when I
insert a URL string I always get a warning and wrong result...
My testing code is quite simple:
from pybloom import CountedBloom
cb = CountedBloom(800000, 4)
cb.insert("AAA")
print cb.__contains__("BBB")
Warning:
E:\EclipseWorkspace\demo\src\pybloom.py:74: DeprecationWarning: 'I' format
requires 0 <= number <= 4294967295
b = [ord(x) for x in struct.pack ('I', val)]
I will get warning when running the code above.
The output is "1" which means "BBB" is in the set. But actually it is not...
When I use integer for testing it seems right.
I am not familiar with arithmetic and I don't know if I wrote something
wrong.
Can anyone help me? Thanks!
--
Zhang Xiao
Junior engineer, Web development
Ethos Tech.
http://www.ethos.com.cn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20070709/75089343/attachment.html>
More information about the Python-list
mailing list