Why are my files in in my list - os module used with sys argv

Peter Otten __peter__ at web.de
Tue Apr 19 04:16:28 EDT 2016


Steven D'Aprano wrote:

> On Tue, 19 Apr 2016 09:44 am, Sayth Renshaw wrote:
> 
>> Hi
>> 
>> Why would it be that my files are not being found in this script?
> 
> You are calling the script with:
> 
> python jqxml.py samples *.xml
> 
> This does not do what you think it does: under Linux shells, the glob
> *.xml will be expanded by the shell. Fortunately, in your case, you have
> no files in the current directory matching the glob *.xml, so it is not
> expanded and the arguments your script receives are:
> 
> 
> "python jqxml.py"  # not used
> 
> "samples"  # dir
> 
> "*.xml"  # mask
> 
> 
> You then call:
> 
> fileResult = filter(lambda x: x.endswith(mask), files)
> 
> which looks for file names which end with a literal string (asterisk, dot,
> x, m, l) in that order. You have no files that match that string.
> 
> At the shell prompt, enter this:
> 
> touch samples/junk\*.xml
> 
> and run the script again, and you should see that it now matches one file.
> 
> Instead, what you should do is:
> 
> 
> (1) Use the glob module:
> 
> https://docs.python.org/2/library/glob.html
> https://docs.python.org/3/library/glob.html
> 
> https://pymotw.com/2/glob/
> https://pymotw.com/3/glob/
> 
> 
> (2) When calling the script, avoid the shell expanding wildcards by
> escaping them or quoting them:
> 
> python jqxml.py samples "*.xml"

(3) *Use* the expansion mechanism provided by the shell instead of fighting 
it:

$ python jqxml.py samples/*.xml

This requires that you change your script

from pyquery import PyQuery as pq
import pandas as pd
import sys

fileResult = sys.argv[1:]

if not fileResult:
     print("no files specified")
     sys.exit(1)

for file in fileResult:
    print(file)

for items in fileResult:
    try:
        d = pq(filename=items)
    except FileNotFoundError as e:
        print(e)
        continue
    res = d('nomination')
    # you could move the attrs definition before the loop
    attrs = ('id', 'horse')
    # probably a bug: you are overwriting data on every iteration
    data = [[res.eq(i).attr(x) for x in attrs] for i in range(len(res))]

I think this is the most natural approach if you are willing to accept the 
quirk that the script tries to process the file 'samples/*.xml' if the 
samples directory doesn't contain any files with the .xml suffix. Common 
shell tools work that way:

$ ls samples/*.xml
samples/1.xml  samples/2.xml  samples/3.xml
$ ls samples/*.XML
ls: cannot access samples/*.XML: No such file or directory

Unrelated: instead of working with sys.argv directly you could use argparse 
which is part of the standard library. The code to get at least one file is

import argparse

parser = argparse.ArgumentParser()
parser.add_argument("files", nargs="+")
args = parser.parse_args()

print(args.files)

Note that this doesn't fix the shell expansion oddity.





More information about the Python-list mailing list