[Tutor] Re: [quicky intro to vector search engines]

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Mon Jul 21 13:37:19 2003


On Sun, 20 Jul 2003, Alexandre Ratti wrote:

> Vector search engines looked fun; I just had to give it a try :-) I
> uploaded a basic implementation to:
>
> http://www.gabuzomeu.net/alex/py/vsse/SearchEngine.zip


Hi Alexandre,


Very cool; I will have to take a look at this!



>  >>> app.search("beowulf cluster", 0.1)
> Searching in 304 files...
> ---------------------
> Beowulf-HOWTO.txt 34.56%
> SSI-UML-HOWTO.txt 26.92%
> openMosix-HOWTO.txt 18.16%
> Cluster-HOWTO.txt 12.80%
> Parallel-Processing-HOWTO.txt 11.69%
> CPU-Design-HOWTO.txt 10.59%
>
> Memory usage is quite high (about 100 MB for the PythonWin process).
> When saving the index instance to a file as a binary pickle, the file is
> quite large too (70 MB).


I've been reading a little more about Maciej Ceglowski's work on vector
search engines; I've been collecting some of my notes here:

    http://hkn.eecs.berkeley.edu/~dyoo/python/svd/

The "Latent Semantic Analysis" technique that Maciej briefly mentions at
the end of his article talks about ways of compressing the vector space
using some vector techniques.  At the moment, I don't yet feel comfortable
enough with the linear algebra to understand SVD yet, but I can collect
links pretty well.  *grin* If I have time, I'll see if I can cook up a
wrapper module for SVDPACK.



Talk to you later!