key and ..

justin walters walters.justin01 at gmail.com
Thu Nov 17 23:06:46 EST 2016


On Thu, Nov 17, 2016 at 7:05 PM, Val Krem via Python-list <
python-list at python.org> wrote:

>
>
> Hi all,
> Sorry for asking such a basic question butI am trying  to merge two
> files(file1 and file2) and do some stuff. Merge the two files by the first
> column(key). Here is the description of files and what I would like to do.
>
>
> file1
>
> key c1   c2
> 1  759   939
> 2 345 154571
> 3  251 350711
> 4 3749  22159
> 5  676  76953
> 6   46    756
>
>
> file2
> key  p1    p2
> 1   759    939
> 2   345 154571
> 3   251 350711
> 4  3915  23254
> 5  7676  77953
> 7   256   4562
>
> create file3
> a) merge the two files by (key) that exit in  file1 and file2
> b) create two variables dcp1 = c1- p1 and dcp2= c2-p2
> c) sort file3 by dcp2(descending) and output
>
> create file4:-  which exist in file1 but not in file2
> create file5:-  that exist in file2 but not in file1;
>
>
> Desired output files
>
> file3
> key   c1    c2     p1  p2     dcp1   dcp2
> 4   3749  22159  3915  23254  -166  -1095
> 5    676  76953  7676  77953 -7000  -1000
> 1    759    939   759    939     0      0
> 2    345 154571   345 154571     0      0
> 3    251 350711   251 350711     0      0
>
> file4
> key c1   p1
> 6   46   756
>
> file5
> key p1   p2
> 7  256  4562
>
>
>
> Thank you in advance
> --
> https://mail.python.org/mailman/listinfo/python-list
>

1. Take each file and read it using file.open() declaring a variable to
store the string.
2. Use list.split('\n') to split the file into an array of lines.
3. Build a list of dictionaries by splitting each line at whitespace and
calling int() on the values
    of each column for each file.
4. Do what you have to do math wise between each dict storing the values in
a new dict. You can
    write this out directly to the file or append it to a new list.
5. Use file.open() to write the resulting lines to a new file.
6. transform one of the lists into a set and use set.difference() or
set.intersection() to create
    a new list. This list will be unordered by default, so you may want to
run it through
    sorted(set, key=lambda row: row['key']).
7. repeat step 5 above to write out to file 4 and 5. no need to transform
the list into a set again.
    Just find the difference/interference again.

This isn't the fastest or most efficient way of doing it, but it is
probably the most straight forward.
If these files are quite large you may want to take a different approach in
the interest of performance
and memory. If you don't want to use dicts, you should have no problem
substituting tuples or
nested lists.

The whole thing could be made into a generator as well.

Basically, there are a lot of ways to approach this.

Hope that helped at least a little bit.



More information about the Python-list mailing list