piping input to an external script

Steve Howell showell30 at yahoo.com
Tue May 12 13:24:32 EDT 2009


See suggested debugging tip inline of your program....

On May 11, 11:04 am, "Tim Arnold" <tim.arn... at sas.com> wrote:
> Hi, I have some html files that I want to validate by using an external
> script 'validate'. The html files need a doctype header attached before
> validation. The files are in utf8 encoding. My code:
> ---------------
> import os,sys
> import codecs,subprocess
> HEADER = '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">'
>
> filename  = 'mytest.html'
> fd = codecs.open(filename,'rb',encoding='utf8')
> s = HEADER + fd.read()

# Try inserting lines like below, to see what characters are actually
near char 66.
print '---'
print repr(s[65])
print repr(s[66])
print repr(s[:70])
print repr(unicode(s,encoding='utf8')[:70])
print '---'

> fd.close()
>
> p = subprocess.Popen(['validate'],
>                     stdin=subprocess.PIPE,
>                     stdout=subprocess.PIPE,
>                     stderr=subprocess.STDOUT)
> validate = p.communicate(unicode(s,encoding='utf8'))
> print validate
> ---------------
>
> I get lots of lines like this:
> Error at line 1, character 66:\tillegal character number 0
> etc etc.
>

See above, it's pretty easy to see what the 66th character of "s" is.

> But I can give the command in a terminal 'cat mytest.html | validate' and
> get reasonable output. My subprocess code must be wrong, but I could use
> some help to see what the problem is.
>

Your disconnect is that in your program you are NOT actually
simulating the sending of mytest.html to the validate program, so you
are comparing apples and oranges.

The fact that you can send mytest.html to the validate program without
a header from the shell suggest to me that it is equally unnecessary
in your Python program, or maybe you just haven't thought through what
you're really trying to accomplish here.




More information about the Python-list mailing list