convert unicode characters to visibly similar ascii characters

Jim jim.hefferon at gmail.com
Wed Jul 2 12:42:00 EDT 2008


On Jul 1, 8:42 pm, Jim <jim.heffe... at gmail.com> wrote:
> On Jul 1, 8:29 pm, John Machin <sjmac... at lexicon.net> wrote:
> > Comments on the above grep output:
> > 1. You have SOFT HYPHEN twice, mapping it to u'-' and '-'
>
> Hmph. I'll correct that.  Thanks.
Well, maybe not.  I forgot that I got the by-hand conversions from
three different sources and that's why that character appears in two
different places.  (I thought that listing all cases for each source
was less confusing.  Arguable, for sure.)

> 2. The idea of a soft hyphen is as a hint to a hyphenator about where
> > to insert a hyphen if one is necessary and the hyphenator is suspected
> > of acting cluelessly without the hint. IMHO, asciification should
> > substitute u'', not u'-'.
>
> Thanks also here.  I'll think about it.
Googling "soft hyphen" showed me that the question is not perfectly
clear-- some people seem to have very elaborate opinions on the
topic-- but I've gone with your suggestion.  Thank you.

Again, I'd appreciate additional corrections.  Not do I only speak
ASCII :-( but I admit to entering the data while watching a basketball
game, so no doubt there are some real blunders.

Thanks,
Jim



More information about the Python-list mailing list