utf - string translation

John Machin sjmachin at lexicon.net
Wed Nov 22 16:59:01 EST 2006


Dan wrote:
> Thank you for your answers.
>
> In fact, I'm getting start with Python.

That was a good decision. Welcome!

>
> I was looking for transform a text through elementary cryptographic
> processes (Vigenère).

So why do you want to strip off accents? The history of communication
has several examples of significant difference in meaning caused by
minute differences in punctuation or accents including one of which you
may have heard: a will that could be read (in part) as either "a chacun
d'eux million francs" or "a chacun deux million francs" with the
remainder to a 3rd party.


> The initial text is in a file, and my system is under UTF-8 by default
> (Ubuntu)

Your system being "under UTF-8" does give you some clue, I suppose. Do
find the time to locate some data with accents and do print(repr(data))
as I suggested, to *verify* what you've got.

Don't guess. Different underlying representations can look the same
when rendered on your screen. Don't rely on what sysadmins tell you.
Peculiar things can happen, e.g.

me: How is your data encoded?
them: XYZese [a language]
me: I'll try again; Are you using encoding A or encoding B?
them: We've heard A mentioned; what's an encoding anyway?
[snip long explanation plus investigation of what locales [plural] had
been used when configuring their workstations and servers]
them: OK, so there's more than one way of representing XYZese on a
computer. That might explain why the government regulatory authority
for our industry is very sad [to put it mildly] about not being able to
read our monthly filings!!!

Cheers,
John




More information about the Python-list mailing list