New internal string format in 3.3

Roy Smith roy at panix.com
Tue Aug 21 07:48:51 EDT 2012


In article <mailman.3587.1345522727.4697.python-list at python.org>,
 Michael Torrie <torriem at gmail.com> wrote:

> > And if you want the "fudge it somehow" behavior (which is often very 
> > useful!), there's always http://pypi.python.org/pypi/Unidecode/
> 
> Sweet tip, thanks!  I often want to process text that has smart quotes,
> emdashes, etc, and convert them to plain old ascii quotes, dashes,
> ticks, etc.  This will do that for me without resorting to a bunch of
> regexes.  Bravo.

Yup, that's one of the things it's good for.  We mostly use it to help 
map search terms, i.e. if you search for "beyonce", you're probably 
expecting it to match "Beyoncé".

We also special-case some weird stuff like "kesha" matching "ke$ha", but 
we have to hand-code those.



More information about the Python-list mailing list