[Tutor] Creating one file out of all the files in a directory

Evert Rol evert.rol at gmail.com
Thu Nov 11 13:41:51 CET 2010


> I'm trying to create a script to do the following. I have a directory
> containing hundreds of text files. I need to create a single file with
> the contents of all the files in the directory. Within that file,
> though, I need to create marks that indicate the division between the
> contents of each file that has wound up in that single file.
> 
> I got this far but I'm stumped to continue:
> 
> ----------------- code--------
> import os
> path = '/Volumes/DATA/MyPath'
> os.chdir(path)
> file_names = glob.glob('*.txt')

You don't use file_names any further. Depending on whether you want files from subdirectories or not, you can use the os.walk below or file_names.
In the latter case, your loop just becomes:
for file in file_names:
  f = open(file, 'r')
  etc

Though I would use filename instead of file, since file is a Python built-in:
>>> file
<type 'file'>


> for subdir, dirs, files in os.walk(path):
>    for file in files:
>        f = open(file, 'r')
>        text = f.readlines()

Since you don't care about lines in your files, but just the entire file contents, you could also simply use
data = f.read()


>        f.close()
>        f = open(file, 'a')

You're opening the same file from which you were just reading, and append to that. Since you do that for every file, that doesn't make much sense, imho.
But see further down.

>        f.write('\n\n' + '________________________________' + '\n')

So close ;-).
What you're missing is the next write statement:
f.write(data)

(or 
f.write(''.join(text))
which shows why read() is nicer in this case: readlines() returns a list, not just a single string).

>        f.close()

But actually, you can open and close the output file outside the entire loop; just name it differently (eg, before the first loop,
outfile = open('outputfile', 'w')

and in the loop:
    outfile.write(data)

after the loop of course:
outfile.close()

In this case, though, there's one thing to watch out for: glob or os.walk will pick up your newly (empty) created file, so you should either put the all-containg file in a different directory (best practice) or insert an if-statement to check whether file[name] != 'outputfile'


Finally, depending on the version of Python you're using, there are nice things you can do with the 'with' statement, which has an incredible advantage in case of file I/O errors (since you're not checking for any read errors).
See eg http://effbot.org/zone/python-with-statement.htm (bottom part for example) or Google around.

Cheers,

  Evert


> ------------
> 
> What's missing here is obvious. This iterates over all the files and
> creates the mark for the division at the end of each file. There is
> nothing, however, to pipe the output of this loop into a new file.
> I've checked the different manuals I own plus some more on the
> internet but I can't figure out how to do what's left.
> 
> I could get by with a little help from my Tutor friends.
> 
> Josep M.
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list