[Tutor] Creating one file out of all the files in a directory

Sun Nov 14 18:42:18 CET 2010

> Again, thanks a lot. Too bad you and Kushal don't live close. I would
> like to invite you to a beer or a coffe or something.

Thanks for the offer. Some time ever in the far, far future perhaps ;-).

> <snip>
>> So close ;-).
>> What you're missing is the next write statement:
>> f.write(data)
>> 
>> (or
>> f.write(''.join(text))
>> which shows why read() is nicer in this case: readlines() returns a list, not just a single string).
>> 
>>>        f.close()
>> 
>> But actually, you can open and close the output file outside the entire loop; just name it differently (eg, before the first loop,
>> outfile = open('outputfile', 'w')
> 
> OK, I ask you (or anybody reading this) the same question I asked
> Kushal: why is it better to open the output file outside the entire
> loop. I understand why it should be closed outside the loop but if you
> see the code I came up with after your comments, I open the output
> file inside the loop and the script still works perfectly well. Is
> there any good reason to do it differently from the way I did it?

Your code will work fine. My reason (and there may be others) to do it this way is just to avoid a marginal bit of overhead. Closing and opening the file each time will cost some extra time. So that's why I would have moved the with statement for the output_file outside both loops. Though as said, I guess that the overhead in general is very little.

> Here's what I did:
> 
> -----------------
> import os
> path = '/Volumes/myPath'
> os.chdir(path)
> for subdir, dirs, files in os.walk(path):
>    for filename in files:
>            if filename != '.DS_Store':
>                with open(filename, 'r') as f: #see tomboy note 'with statement'
>                    data = f.read()
>                    with open('/Volumes/myPath2/output.txt', 'a') as
> output_file:
>                        output_file.write('\n\n<file name="' +
> filename + '">\n\n')
>                        output_file.write(data)
>                        output_file.write('\n\n</file>\n\n')
> -----------------
> 
> I came up with this way of doing because I was trying to follow your
> advice of using the 'with' statement and this was the first way that I
> could think of to implement it. Since in the little test that I ran it
> worked, I left it like that but I would like to know whether there is
> a more elegant way to implement this so that I learn good habits.

This is fine: 'with open(<filename>, <modifier>) as <stream>:' is, afaik, the standard way now to open a file in Python.

>> In this case, though, there's one thing to watch out for: glob or os.walk will pick up your newly (empty) created file, so you should either put the all-containg file in a different directory (best practice) or insert an if-statement to check whether file[name] != 'outputfile'
> 
> You'll have seen that I opted for the best practice but I still used
> an if statement with file[name] != 'outputfile' in order to solve some
> problems I was having with a hidden file created by Mac OSX
> (.DS_Store). The output file contained some strange characters at the
> beginning and it took me a while to figure out that this was caused by
> the fact that the loop read the contents of the .DS_Store file.

Yes, there'll few ways to avoid separate filenames apart from an if statement.

Cheers,

  Evert