replacing Chinese chars with their spellings

Wed Apr 24 16:06:07 EDT 2002

Dan Jacobson wrote:

> Before I start learning python, here's what I want to do: I have a
> table of Hakka Chinese words and their pronunciations.  I scan a file
> and replace any Hakka ["big5" 2, 4, 6 ... byte long strings] there
> with their pronunciations.  If it were just one character [two byte]
> words I would use the "c2t" program.  Is there a template that munches
> forth in a file and replaces the longest match in a database
> before moving on?

Is the whole file in Big-5? If so, it's easiest to use iconv to
convert the file to unicode, and open it using Python. If you have
also converted your table to unicode, you can match on unicode char.
-- 
Boudewijn Rempt | http://www.valdyas.org