[Tutor] Re: test if file is not ascii

Sun Nov 2 17:21:41 EST 2003

On Sun, Nov 02, 2003 at 04:15:06PM +0100, Michael Janssen wrote:
> 
> I've tested the range-definition, and it works. But are newlines also
> invalid within rtf (or better exclude \n - \x0A and \r - \x0B)?

Right. They are allowed. So I came up with this solution:

illegal_regx = re.compile( '\x00|\x01|\x02|\x03|\x04|\x05|\x06|\x07|\x08|\x0B|\x0E|\x0F|\x10|\x11|\x12|\x13')
line = 1
read_obj = open(self.__file, 'r')
write_obj = open(self.__write_to, 'w')
while line:
	line = read_obj.read(1000)
	line = line.replace ('\r', '\n')
	line = re.sub(illegal_regx, '', line)
	write_obj.write(line )

I believe that \x09 is a tab.I think that \x0B is actually a vertical
tab. I do need to replace \r with \n. I do this with:

line = line.replace('\r', '\n')

Doing a quick test:

print ' the \\r is "%s"'%  ord('\r')

I get number "13". So that means that '\r' is equal to \x0D.

\x0C seems to be eqaul to " ". But my unicode charts say it is a file
separator, so I should probably add \x0C to my regualar expression.

> 
> Note that you can make the error Message more helpful:
> 
> mt = re.search('[\x00-\x19]',bb)
> if mt:
>    print "illegal character (ascii-num: %s) on position %s" \
>     % (ord(mt.group()), mt.start())

Note that I've just substituted illegal control charcters, which don't
add any info to the file, but somehow creep in.

Thanks

Paul

-- 

************************
*Paul Tremblay         *
*phthenry at earthlink.net*
************************