Handle foreign character web input

Chris Angelico rosuav at gmail.com
Wed Jul 3 17:38:22 EDT 2019


On Thu, Jul 4, 2019 at 7:08 AM Igor Korot <ikorot01 at gmail.com> wrote:
>
> Hi, Thomas,
>
> On Sat, Jun 29, 2019 at 11:06 AM Thomas Jollans <tjol at tjol.eu> wrote:
> >
> > On 28/06/2019 22:25, Tobiah wrote:
> > > A guy comes in and enters his last name as RÖnngren.
> > With a capital Ö in the middle? That's unusual.
> > >
> > > So what did the browser really give me; is it encoded
> > > in some way, like latin-1?  Does it depend on whether
> > > the name was cut and pasted from a Word doc. etc?
> > > Should I handle these internally as unicode?  Right
> > > now my database tables are latin-1 and things seem
> > > to usually work, but not always.
> >
> >
> > If your database is using latin-1, German and French names will work,
> > but Croatian and Polish names often won't. Not to mention people using
> > other writing systems.
> >
> > So Günther and François are ok, but Bolesław turns into Boles?aw and
> > don't even think about anybody called Владимир or محمد.
>
> As others pointed out - it is very easy to do transliteration especially if
> its' not a user registration that will be done.
>
> But I would simply not do that at all - create your forms in English and
> accept English spellings only.
> Most people that do computers this days can enter phonetic spelling
> of their first/last names (even in Chinese/Japanese/Hebrew).
>
> And all European names can be transliterated to English.
>
> Besides as the OP said - if someone comes to him and will
> try to enter the non-English name. The OP might not even have the appropriate
> keyboard layout to input such a name. And if this is an (time consuming) event
> all (s)he can do is ask for phonetic spelling.
>
> Thank you.
>
What you basically just said was "I wish all those ugly foreign names
would just go away". Honestly, that's not really an acceptable
solution; you assume that you can transliterate any name into
"English" in some perfect way, which is acceptable to everyone in the
world. And you also assume that this transformation will be completely
consistent, so you can ask someone his/her name and always get back
the same thing.

If you want to do a Latinization and accent strip for the sake of a
search, that's fine; but make sure you retain the name as people want
it to be retained. Don't be bigoted.

https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/

ChrisA



More information about the Python-list mailing list