Concatenating files in order

Cameron Simpson cs at zip.com.au
Tue May 23 19:37:21 EDT 2017


On 23May2017 21:14, Mahmood Naderan <nt_mahmood at yahoo.com> wrote:
>OK guys thank you very much. It is better to sort them first.
>
>Here is what I wrote
>
>files =  glob.glob('*chunk*')

I'd be inclined to go with either '*chunk_*' or just to read the strings from 
os.listdir, because what you want isn't easily written as a glob pattern (the 
syntax just isn't expressive enough). The glob is handy, because it guarrentees 
that there is an underscore in the name to split on, avoiding a tedious 
try/except around the split for names with no "_". Also below: notice that 
we're using rsplit, not split. You want the rightmost one. Consider the file 
"foo_chunk_9".

So:

  filenames = {}
  for name in glob.glob('*chunk_*'):
    left, right = name.rsplit('_', 1)
    if left.endswith('chunk') and right.isdigit():
      filenames[int(right)] = filename
  sorted_filenames = [ filenames[k] for k in sorted(filenames.keys()) ]

There's a few things to observe here:

- using glob to select names containing 'chunk_', which (a) ensures there is an 
  underscore for the rsplit and (b) mostly picks only the files you want.

- using rsplit, to handle filenames with multiple underscores

- turning the suffix into an int, and storing the names keyed by the _numeric_ 
  value of the suffix

>sorted=[[int(name.split("_")[-1]), name] for name in files]
>with open('final.txt', 'w') as outf:
>      for fname in sorted:
>            with open(fname[1]) as inf:
>       for line in inf:
>         outf.write(line)

A few remarks:

- try to avoid the word "sorted", it is a presupplied python function

- you're not doing any sorting! you have probably just been lucky with your 
  filenames and the order they came back from the glob

Try making these files in your test directory:

  foo
  foo_chunk_0
  foo_chunk_1
  foo_chunk_2
  foo_chunk_10

and see what happens to your code. Temporarily drop the "with open..." and just 
print the filenames to see what order you would have processed the files.

Cheers,
Cameron Simpson <cs at zip.com.au>



More information about the Python-list mailing list