[Tutor] Unicode question
Kent Johnson
kent37 at tds.net
Tue Sep 11 15:49:24 CEST 2007
János Juhász wrote:
> Dear All,
>
> I would like to convert my DOS txt file into pdf with reportlab.
> The file can be seen correctly in Central European (DOS) encoding in
> Explorer.
>
> My winxp uses cp852 as default codepage.
>
> When I open the txt file in notepad and set OEM/DOS script for terminal
> fonts, it shows the file correctly.
>
> I tried to convert the file with the next way:
>
> from reportlab.platypus import *
> from reportlab.lib.styles import getSampleStyleSheet
> from reportlab.rl_config import defaultPageSize
> PAGE_HEIGHT=defaultPageSize[1]
>
> styles = getSampleStyleSheet()
>
> def MakePdfInvoice(InvoiceNum, page):
> style = styles["Normal"]
> PdfInv = [Spacer(0,0)]
> PdfInv.append(Preformatted(page, styles['Normal']))
> doc = SimpleDocTemplate(InvoiceNum)
> doc.build(PdfInv)
>
> if __name__ == '__main__':
> content = open('invoice01_0707.txt').readlines()
> page = ''.join(content[:92])
> page = unicode(page, 'Latin-1')
Why latin-1? Try
page = unicode(page, 'cp852')
> MakePdfInvoice('test.pdf', page)
>
> But it made funny chars somewhere.
>
> I tried it so eighter
>
> if __name__ == '__main__':
> content = open('invoice01_0707.txt').readlines()
> page = ''.join(content[:92])
> page = page.encode('cp852')
Use decode() here, not encode().
decode() goes towards Unicode
encode() goes away from Unicode
As a mnemonic I think of Unicode as pure unencoded data. (This is *not*
accurate, it is a memory aid!) Then it's easy to remember that decode()
removes encoding == convert to Unicode, encode() adds encoding ==
convert from Unicode.
> MakePdfInvoice('test.pdf', page)
>
> But it raised exception:
> debugger.run(codeObject, __main__.__dict__, start_stepping=0)
> File
> "C:\Python24\Lib\site-packages\pythonwin\pywin\debugger\__init__.py", line
> 60, in run
> _GetCurrentDebugger().run(cmd, globals,locals, start_stepping)
> File
> "C:\Python24\Lib\site-packages\pythonwin\pywin\debugger\debugger.py", line
> 631, in run
> exec cmd in globals, locals
> File "D:\devel\reportlab\MakePdfInvoice.py", line 18, in ?
> page = page.encode('cp852')
> File "c:\Python24\lib\encodings\cp852.py", line 18, in encode
> return codecs.charmap_encode(input,errors,encoding_map)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xb5 in position 112:
> ordinal not in range(128)
When you call encode on a string (instead of a unicode object) the
string is first decoded to Unicode using ascii encoding. This usually fails.
Kent
More information about the Tutor
mailing list