best way to index numerical data ?

Liu Jin m.liu.jin at gmail.com
Fri Mar 31 23:43:26 EST 2006


>>>>> "Jack" == Jack  <jack_posemsky at yahoo.com> writes:
    > Hi I have a lot of data that is in a TEXT file which are numbers
    > does anyone have a good suggestion for indexing TEXT numbers
    > (zip codes, other codes, dollar amounts, quantities, etc). since
    > Lucene and other indexers are really optimized for Alpha
    > character indexing. What approaches are typically taken in
    > computer science for example to index text numbers..hash maps or
    > something else ??

Lucene is not optimized for Alpha character indexing. It's for natural
language indexing. The assumption is that the dictionary is relatively
small (say, <1M words for English), and doesn't grow linearly with the
amount of text being indexed. If your data fits into this model,
Lucene can effeciently index it, no matter what the characters are.

Regards,
Liu Jin



More information about the Python-list mailing list