[Tutor] joining Indic and English Text

Evuraan evuraan at gmail.com
Mon Jun 26 02:18:35 EDT 2017


Greetings!

I've a case where I need to put lines with both Indic and English Text
to a file ( and optionally to stdout):

# bash;

ml_text="മലയാളം"
en_text="Malayalam"

echo "$ml_text = $en_text" >> /tmp/output.txt
$ cat /tmp/output.txt
മലയാളം = Malayalam

That line above is what's I am trying to do in python.

When I attempt this in python3:

ml_text = u"മലയാളം"
en_text = "Malayalam"
print("{} = {}".format(ml_text, en_text))
Or,
a = subprocess.getstatusoutput("echo " + ml_text + " = " + en_text + "
>> /tmp/somefile ")

I sometimes (not always, that's the strange part for now..) get errors like:
UnicodeEncodeError: 'ascii' codec can't encode character '\u0d2b' in
position 42: ordinal not in range(128)

Searches on that error seem to suggest an .encode('utf-8),

print("{} = {}".format(ml_text.encode("utf-8"), en_text))

I am afraid that would munge up my output line as :
b'\xe0\xb4\xae\xe0\xb4\xb2\xe0\xb4\xaf\xe0\xb4\xbe\xe0\xb4\xb3\xe0\xb4\x82'
= Malayalam, instead of the desired:
മലയാളം = Malayalam

What am I doing wrong? My locale and LANG (en_US.UTF-8) etc seem to be setup.


Thanks in advance!


More information about the Tutor mailing list