Concatenating files in order

Tim Chase python.list at tim.thechases.com
Tue May 23 16:46:52 EDT 2017


On 2017-05-23 19:29, Mahmood Naderan via Python-list wrote:
> There are some text files ending with _chunk_i where 'i' is an
> integer. For example,
> 
> XXX_chunk_0
> XXX_chunk_1
> ...
> 
> I want to concatenate them in order. Thing is that the total number
> of files may be variable. Therefore, I can not specify the number
> in my python script. It has to be "for all files ending with
> _chunk_i".
> 
> Next, I can write
> 
> with open('final.txt', 'w') as outf:
> for fname in filenames:
>             with open(fname) as inf:
>                 for line in inf:
>         outf.write(line)
>  
> 
> How can I specify the "filenames"?

Does the *file* or the *filename* end in _chunk_i?  If it's the
file-name and they come in in-order, you can just skip them:

  for fname in filenames:
    *_, chunk, i = filename.split('_')
    if chunk == "chunk" and i.isdigit():
      with open(fname) as inf:
        for line in inf:
          outf.write(line)

If they're not sorted, you'd have to sort & filter them first.  I'd
recommend a sorting & filtering generator:

  import re
  interesting_re = re.compile('chunk_(\d+)$', re.I)
  def filter_and_sort(filenames):
    yield from sorted((
      fname
      for fname in filenames
      if interesting_re.search(fname)
      ),
      key=lambda v: int(v.rsplit('_', 1)[-1])
      )

  for fname in filter_and_sort(filenames):
    with open(fname) as inf:
      for line in inf:
        outf.write(line)

If the "chunk_i" is *content* in the file, it's a good bit more work
to search through all the files for the data, note which file
contains which tag, then reopen/seek(0) each file and write them out
in order (you'd also have to consider the edge where a file has more
than one "chunk_i" that straddles other files).

-tkc







More information about the Python-list mailing list