From vjktm at yahoo.com Fri Sep 8 03:06:38 2006 From: vjktm at yahoo.com (Vijaya Poudyal) Date: Thu, 7 Sep 2006 18:06:38 -0700 (PDT) Subject: [I18n-sig] Support for Devanagari Script Message-ID: <20060908010638.82489.qmail@web50307.mail.yahoo.com> Hi, I have recently discovered the power of Python. I started by trying to implement a Sanskrit transliteration translation program. I did accomplish it but the Unicode Devanagari script is not displaying as I expect on the python interpreter output lines. The same sequence of unicode does render as expected if I write it to an html file and open it with a web browser. The attached code does both, I cannot figure out if I am doing something wrong, or not setting up the fonts correctly in python, or python does not fully implement the unicode standard (for this script). I hope this is the right group to ask the question. Thanks for any help. vjktm --------------------------------- Talk is cheap. Use Yahoo! Messenger to make PC-to-Phone calls. Great rates starting at 1?/min. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/i18n-sig/attachments/20060907/8d833544/attachment.html -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/i18n-sig/attachments/20060907/8d833544/attachment.htm From sjmachin at lexicon.net Fri Sep 8 05:14:11 2006 From: sjmachin at lexicon.net (John Machin) Date: Fri, 08 Sep 2006 13:14:11 +1000 Subject: [I18n-sig] Support for Devanagari Script In-Reply-To: <20060908010638.82489.qmail@web50307.mail.yahoo.com> References: <20060908010638.82489.qmail@web50307.mail.yahoo.com> Message-ID: <4500E003.4060405@lexicon.net> On 8/09/2006 11:06 AM, Vijaya Poudyal wrote: > Hi, > I have recently discovered the power of Python. I started by trying to > implement a Sanskrit transliteration translation program. I did > accomplish it but the Unicode Devanagari script is not displaying as I > expect on the python interpreter output lines. The same sequence of > unicode does render as expected if I write it to an html file and open > it with a web browser. > > The attached code does both, I cannot figure out if I am doing something > wrong, or not setting up the fonts correctly in python, or python does > not fully implement the unicode standard (for this script). > > I hope this is the right group to ask the question. Thanks for any help. > > vjktm > It's not that much to do with Python. The concept of "setting up the fonts ... in Python" is rather novel -- what do you mean? The main determining factor is whether the stdout can render the bytestream that's thrown at it, and that depends on where you are running your script. For example, on Windows, IDLE renders your UTF16 exactly the same as Firefox, Opera and IE6 render the UTF8 in the created ex2.html. However running the script at the (DOS) command prompt will throw an exception (unless there's a Devanagari DOS codepage). [Aside: the result from IDLE and the browsers appears (to someone knowing very little about how characters combine in Indic scripts) as one character which looks nothing like the 1st & 3rd input characters -- presumably that is expected(?)] You will need to give more details about your environment. I know little abouut Unix or Linux, but I'd expect better results from throwing utf8 at the stdout, rather than utf16 -- have you tried print kSa.encode('utf_8') ? HTH Cheers John From andy at reportlab.com Fri Sep 8 08:49:25 2006 From: andy at reportlab.com (Andy Robinson) Date: Fri, 08 Sep 2006 07:49:25 +0100 Subject: [I18n-sig] Support for Devanagari Script In-Reply-To: <4500E003.4060405@lexicon.net> References: <20060908010638.82489.qmail@web50307.mail.yahoo.com> <4500E003.4060405@lexicon.net> Message-ID: <45011275.3090309@reportlab.com> > The main determining factor is whether the stdout can render the > bytestream that's thrown at it, and that depends on where you are > running your script. For example, on Windows, IDLE renders your UTF16 > exactly the same as Firefox, Opera and IE6 render the UTF8 in the > created ex2.html. However running the script at the (DOS) command prompt > will throw an exception (unless there's a Devanagari DOS codepage). Regrettably not all fonts have the character set you want and the DOS prompt is not a smart enough display device. However, browsers and IDLE are smart enough to switch to a 'fallback font' for characters they cannot display. In Idle, which uses Courier (300kb on Windows), I get; >>> print kSa.encode('UTF-8') works fine. >>> print kSa.encode('UTF-16') prints rubbish. >>> print kSa works too but is almost certainly converting to utf8. From a DOS prompt, the UTF8 version prints rubbish. The command prompt font properties only give me two font choices, 'Raster Fonts' and 'Lucida Console'. When I switch IDLE to a variety of different fonts, I still get the Devanagari character, IN THE SAME TYPEFACE, whichever font I choose. Conclusion: DOS prompt does not have the display routines needed to handle Unicode output. - Andy Robinson From vjktm at yahoo.com Fri Sep 8 13:26:59 2006 From: vjktm at yahoo.com (Vijaya Poudyal) Date: Fri, 8 Sep 2006 04:26:59 -0700 (PDT) Subject: [I18n-sig] Support for Devanagari Script In-Reply-To: <4500E003.4060405@lexicon.net> Message-ID: <20060908112659.50494.qmail@web50306.mail.yahoo.com> Hi John, Thank you for the suggestions. I am working on Windows. I did try encoding to UTF-8 but that did not help. I am new to Python, and thought that there may be a way to change the fonts used to display those characters. The reason I wanted to try a different font is that if the fonts do not contain the glyphs corresponding to the correct ligature then the characters will not render as expected. BTW, I also tried writing a Label to a Tkinter window and that did not work either, I got the same sequence of two glyphs instead of a single glyph. The IDLE rendering is allowed only if the correct glyph is not available in the font. I think it may also occur if consonant clusters are not handled correctly (I don't know what part of the code does this after I use the print statement) as per the Unicode standards for Devanagari.) The IE rendering is required if the correct glyph does exist. Thanks for the suggestions. vjktm John Machin wrote: On 8/09/2006 11:06 AM, Vijaya Poudyal wrote: > Hi, > I have recently discovered the power of Python. I started by trying to > implement a Sanskrit transliteration translation program. I did > accomplish it but the Unicode Devanagari script is not displaying as I > expect on the python interpreter output lines. The same sequence of > unicode does render as expected if I write it to an html file and open > it with a web browser. > > The attached code does both, I cannot figure out if I am doing something > wrong, or not setting up the fonts correctly in python, or python does > not fully implement the unicode standard (for this script). > > I hope this is the right group to ask the question. Thanks for any help. > > vjktm > It's not that much to do with Python. The concept of "setting up the fonts ... in Python" is rather novel -- what do you mean? The main determining factor is whether the stdout can render the bytestream that's thrown at it, and that depends on where you are running your script. For example, on Windows, IDLE renders your UTF16 exactly the same as Firefox, Opera and IE6 render the UTF8 in the created ex2.html. However running the script at the (DOS) command prompt will throw an exception (unless there's a Devanagari DOS codepage). [Aside: the result from IDLE and the browsers appears (to someone knowing very little about how characters combine in Indic scripts) as one character which looks nothing like the 1st & 3rd input characters -- presumably that is expected(?)] You will need to give more details about your environment. I know little abouut Unix or Linux, but I'd expect better results from throwing utf8 at the stdout, rather than utf16 -- have you tried print kSa.encode('utf_8') ? HTH Cheers John --------------------------------- Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and 30+ countries) for 2?/min or less. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/i18n-sig/attachments/20060908/03f622fb/attachment.html From vjktm at yahoo.com Fri Sep 8 13:31:34 2006 From: vjktm at yahoo.com (Vijaya Poudyal) Date: Fri, 8 Sep 2006 04:31:34 -0700 (PDT) Subject: [I18n-sig] Support for Devanagari Script In-Reply-To: <45011275.3090309@reportlab.com> Message-ID: <20060908113134.84563.qmail@web50312.mail.yahoo.com> Andy, Thanks for investigating this. I am using IDLE in windows 2000 and XP. I will try the variations you tried and will report the results. Thanks. vjktm Andy Robinson wrote: > The main determining factor is whether the stdout can render the > bytestream that's thrown at it, and that depends on where you are > running your script. For example, on Windows, IDLE renders your UTF16 > exactly the same as Firefox, Opera and IE6 render the UTF8 in the > created ex2.html. However running the script at the (DOS) command prompt > will throw an exception (unless there's a Devanagari DOS codepage). Regrettably not all fonts have the character set you want and the DOS prompt is not a smart enough display device. However, browsers and IDLE are smart enough to switch to a 'fallback font' for characters they cannot display. In Idle, which uses Courier (300kb on Windows), I get; >>> print kSa.encode('UTF-8') works fine. >>> print kSa.encode('UTF-16') prints rubbish. >>> print kSa works too but is almost certainly converting to utf8. >From a DOS prompt, the UTF8 version prints rubbish. The command prompt font properties only give me two font choices, 'Raster Fonts' and 'Lucida Console'. When I switch IDLE to a variety of different fonts, I still get the Devanagari character, IN THE SAME TYPEFACE, whichever font I choose. Conclusion: DOS prompt does not have the display routines needed to handle Unicode output. - Andy Robinson --------------------------------- Want to be your own boss? Learn how on Yahoo! Small Business. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/i18n-sig/attachments/20060908/fd2a66ae/attachment.htm From vjktm at yahoo.com Sat Sep 9 00:46:19 2006 From: vjktm at yahoo.com (Vijaya Poudyal) Date: Fri, 8 Sep 2006 15:46:19 -0700 (PDT) Subject: [I18n-sig] Support for Devanagari Script In-Reply-To: <45011275.3090309@reportlab.com> Message-ID: <20060908224619.80161.qmail@web50303.mail.yahoo.com> Hi Andy, I tried my example at the command prompt in XP and Xbash (cygwin). Python cannot decode the unicode in both cases. In IDLE I get the wrong rendering. I was not able to the "renders ... exactly the same ..." behavior you mentioned. When you say kSa.encode('utf8') do you get "Hello" followed by two characters joined along the top or just one character (as in the html)? When I run it I get two characters and this is a wrong rendering of the code point sequence. vjktm Andy Robinson wrote: > The main determining factor is whether the stdout can render the l > bytestream that's thrown at it, and that depends on where you are > running your script. For example, on Windows, IDLE renders your UTF16 > exactly the same as Firefox, Opera and IE6 render the UTF8 in the > created ex2.html. However running the script at the (DOS) command prompt > will throw an exception (unless there's a Devanagari DOS codepage). Regrettably not all fonts have the character set you want and the DOS prompt is not a smart enough display device. However, browsers and IDLE are smart enough to switch to a 'fallback font' for characters they cannot display. In Idle, which uses Courier (300kb on Windows), I get; >>> print kSa.encode('UTF-8') works fine. >>> print kSa.encode('UTF-16') prints rubbish. >>> print kSa works too but is almost certainly converting to utf8. >From a DOS prompt, the UTF8 version prints rubbish. The command prompt font properties only give me two font choices, 'Raster Fonts' and 'Lucida Console'. When I switch IDLE to a variety of different fonts, I still get the Devanagari character, IN THE SAME TYPEFACE, whichever font I choose. Conclusion: DOS prompt does not have the display routines needed to handle Unicode output. - Andy Robinson --------------------------------- Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and 30+ countries) for 2?/min or less. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/i18n-sig/attachments/20060908/add03d64/attachment.html From andy at reportlab.com Sat Sep 9 01:12:10 2006 From: andy at reportlab.com (Andy Robinson) Date: Sat, 09 Sep 2006 00:12:10 +0100 Subject: [I18n-sig] Support for Devanagari Script In-Reply-To: <20060908224619.80161.qmail@web50303.mail.yahoo.com> References: <20060908224619.80161.qmail@web50303.mail.yahoo.com> Message-ID: <4501F8CA.7050504@reportlab.com> Vijaya Poudyal wrote: > Hi Andy, > I tried my example at the command prompt in XP and Xbash (cygwin). > Python cannot decode the unicode in both cases. In IDLE I get the wrong > rendering. I was not able to the "renders ... exactly the same ..." > behavior you mentioned. > > When you say kSa.encode('utf8') do you get "Hello" followed by two > characters joined along the top or just one character (as in the html)? > When I run it I get two characters and this is a wrong rendering of the > code point sequence. I get exactly the same thing in both, attached. I called this 'one character' in my ignorance, but I guess the characters get combined. Strangely, if I move the cursor through it with right-arrow key, it "explodes" into 3 characters while it has the focus - these are the same ones in the unicode standards sheets for those bytes. But normally it appears as above. Best Regards, Andy -------------- next part -------------- A non-text attachment was scrubbed... Name: dev.png Type: image/png Size: 934 bytes Desc: not available Url : http://mail.python.org/pipermail/i18n-sig/attachments/20060909/14812b55/attachment.png From vjktm at yahoo.com Sat Sep 9 03:26:13 2006 From: vjktm at yahoo.com (Vijaya Poudyal) Date: Fri, 8 Sep 2006 18:26:13 -0700 (PDT) Subject: [I18n-sig] Support for Devanagari Script In-Reply-To: <4501F8CA.7050504@reportlab.com> Message-ID: <20060909012613.89147.qmail@web50302.mail.yahoo.com> Andy, The attached image is the desired, correct rendition. So there is hope for my project! Now why do you think I don't get it in IDLE (Win XP, Win 200). Any suggestions on what I should investigate to fix this? The Tk "Python Shell" window shows: Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on win32 IDLE 1.1.3 Thanks, vjktm Andy Robinson wrote: Vijaya Poudyal wrote: > Hi Andy, > I tried my example at the command prompt in XP and Xbash (cygwin). > Python cannot decode the unicode in both cases. In IDLE I get the wrong > rendering. I was not able to the "renders ... exactly the same ..." > behavior you mentioned. > > When you say kSa.encode('utf8') do you get "Hello" followed by two > characters joined along the top or just one character (as in the html)? > When I run it I get two characters and this is a wrong rendering of the > code point sequence. I get exactly the same thing in both, attached. I called this 'one character' in my ignorance, but I guess the characters get combined. Strangely, if I move the cursor through it with right-arrow key, it "explodes" into 3 characters while it has the focus - these are the same ones in the unicode standards sheets for those bytes. But normally it appears as above. Best Regards, Andy --------------------------------- All-new Yahoo! Mail - Fire up a more powerful email and get things done faster. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/i18n-sig/attachments/20060908/4c38edf1/attachment.htm