encoding hell - any chance of salvation ?

Terry Reedy tjreedy at udel.edu
Mon Mar 7 15:12:45 EST 2011


On 3/7/2011 6:24 AM, southof40 wrote:
> Hi - I've got some code which uses array (http://docs.python.org/
> library/array.html) to store charcters read from a file (it's not my
> code it comes from here http://sourceforge.net/projects/pygold/)
>
> The read is done, in GrammarReader.py,  like this ...
>
>      def readString(self, maxsize = -1):
>          result = array('u')
>          char = None
>          while True:
>              if (maxsize>= 0) and (len(result)>= maxsize):
>                  break
>              char = self.reader.read(2)
>              if (char == '') or (char == '\x00\x00'):
>                  break

                print(type(char),char) # to see what is going on

>              result.append(char)
>          return result.tounicode()
>
> ... and results in the error"TypeError: array item must be unicode
> character" is raised (full stack trace at bottom) .
>
> The whole unicode thing is a bit strange because the input file is a
> compiled grammar and so not a text file at all (the file able to be
> downloaded from here http:///kubadev.com/share/VBScript.cgt)
>
> Can anyone make a suggestion as to the best way to allow the array
> object to accept what is in essence a binary file ?
>
> Here's the full stack trace ...
>
>>>> p=pygold.Parser('C:/data/Gold-Parser-VBScript-Grammar/VBScript-Test0-UTF8.cgt','utf-8')
> Traceback (most recent call last):
>    File "<stdin>", line 1, in<module>
>    File "pygold\Parser.py", line 100, in __init__
>      self.loadTables(filename)
>    File "pygold\Parser.py", line 365, in loadTables
>      reader = GrammarReader(filename, self.encoding)
>    File "pygold\GrammarReader.py", line 14, in __init__
>      if not self.hasValidHeader():
>    File "pygold\GrammarReader.py", line 43, in hasValidHeader
>      header = self.readString(64) ## read max 64 chars
>    File "pygold\GrammarReader.py", line 68, in readString
>      result.append(char)
> TypeError: array item must be unicode character
>


-- 
Terry Jan Reedy




More information about the Python-list mailing list