technique to enter text using a mobile phone keypad (T9 dictionary-based disambiguation)

Justin Azoff justin.azoff at gmail.com
Tue Aug 8 23:51:17 EDT 2006


Petr Jakeš wrote:
> I have a standard 12-key mobile phone keypad connected to my Linux
> machine as a I2C peripheral. I would like to write a code which allows
> the text entry to the computer using this keypad (something like T9 on
> the mobile phones)
>
> According to the http://www.yorku.ca/mack/uist01.html
> dictionary-based disambiguation is coming in the mind.
>
> With dictionary-based disambiguation, each key is pressed only once.
> For example, to enter the, the user enters 8-4-3-0. The 0 key, for
> SPACE, delimits words and terminates disambiguation of the preceding
> keys. The key sequence 8-4-3 has 3 × 3 × 3 = 27 possible renderings
> (see Figure 1). The system compares the possibilities to a dictionary
> of words to guess the intended word.
>
> I would like to ask some guru here to give me the direction which
> technique (Python functionality) or which strategy to use to solve
> this riddle.
>
> Thanks for your advices and comments
>
> Regards
>
> Petr Jakes

I can think of 2 approaches to this, 1) Map the numbers to parts of a
regular expression, and then use this to search through the
dictiionary. 2) Pre-compute a copy of the dictionary converted to it's
numerical equivalent, then just match the numbers.

The basic structure you need for both of these is simple.  For the
first method you use
keys = ['','abc','def','ghi',....']

then if you have s="123321"
''.join(['[%s]' % keys[int(l)] for l in s])
will give you a string like
'[abc][def][ghi][def][abc]', which you can then use to match words...

I think the second solution would end up being faster, as long as you
have the memory - no regex work, plus, you can sort the wordlist.

The following quickly written class seems to work nicely:

import string
import bisect

letters = string.lowercase
numbers = '2223334445556667777888999'
letter_mapping = dict(zip(letters, numbers))

class phone:
    def __init__(self):
        self.read_dictionary()

    def word_as_numbers(self, word):
        nums=''
        for letter in word:
            if letter in letter_mapping:
                nums += letter_mapping[letter]
        return nums

    def read_dictionary(self):
        words = []
        for line in file("/usr/share/dict/words"):
            word = line.strip().lower()
            nums = self.word_as_numbers(word)
            words.append((nums, word))

        words.sort()
        self.dict = words

    def get_matching_words(self, number_str):
        tup = (number_str,)
        left = bisect.bisect_left(self.dict,   tup)

        for num, word in self.dict[left:]:
            if num.startswith(number_str):
                yield word
            else:
                break


It takes a second or two to read the list of words in, but matching is
instant thanks to bisect:
In [14]:%time p=phone.phone()
CPU times: user 1.65 s, sys: 0.00 s, total: 1.65 s
Wall time: 1.66

In [15]:%time list(p.get_matching_words('43556'))
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.01
Out[15]:['hello', 'hellman', "hellman's", "hello's", 'hellos']

It seems the ruby version just posted takes a similar approach, but
uses an actual tree.. using the bisect module keeps it simple.

-- 
- Justin




More information about the Python-list mailing list