[BangPypers] Internationalization in Python 2.6

Dileep dileep.ds at gmail.com
Wed Nov 27 10:25:01 CET 2013


Hi,

How the internationalization works in Python 2.6.

I have an input string to the script. I do not know the encoding of the
string.

I want to write that string to an xml file.

Here I am trying different encoding formats to decode the input string and
make it as unicode.

Then using the same encoding format while creating the xml string and
writing to a file.

Is that approach fine ? or any other way to support internationalization if
we do not know the encoding format for the input string ?

I am getting the xml string without any issues. print xml_string works fine
.

But when it is writing to a file, the tag value got changed, even though I
used the same encoding format used for decoding.

I written a sample code like below

import os
import codecs

from xml.dom.minidom import Document

def write_to_xml(output_string, encod_fmt):
    doc = Document()
    root = doc.createElement('root')
    doc.appendChild(root)
    tag_key = doc.createElement('output_string')
    tag_value = output_string
    tag_key.appendChild(doc.createTextNode((tag_value)))
    root.appendChild(tag_key)
    xml_string =  doc.toprettyxml(indent=" ",encoding=encod_fmt)
    print xml_string
    fname = os.path.join('/root/output.xml')
    doc.writexml(codecs.open(fname,'wb',encod_fmt), encoding=encod_fmt)

def convert_string(input_string):
    try:
        input_string_unicode = input_string.decode('utf-8')
        encoding = 'utf-8'
    except UnicodeDecodeError:
        try:
            input_string_unicode = input_string.decode('Latin-1')
            encoding = 'Latin-1'
        except UnicodeDecodeError:
            try:
                input_string_unicode = input_string.decode('iso-8859-1')
                encoding = 'iso-8859-1'
            except UnicodeDecodeError:
                raise
    #output_string = input_string_unicode.encode(encoding)
    write_to_xml(input_string_unicode, encoding)

if __name__ == '__main__':
    input_string = raw_input()
    convert_string(input_string)


Output
---------

[root] python i18n_test.py
Étest
<?xml version="1.0" encoding="Latin-1"?>
<root>
 <output_string>
  Étest
 </output_string>
</root>

But the file content is as below.

<?xml version="1.0"
encoding="Latin-1"?><root><output_string><C9>test</output_string></root>

-- 
  Regards
  D.S. DIleep


More information about the BangPypers mailing list