[Tutor] Concatenating columns via python

Alan Gauld alan.gauld at btinternet.com
Wed Jul 29 02:01:07 CEST 2015


On 28/07/15 19:52, Hannah G. McDonald wrote:
> I extracted a table from a PDF so the data is quite messy
 > and the data that should be in 1 row is in 3 colums, like so:
>     year       color                 location
> 1 1997       blue,                   MD
> 2                green,
> 3                and yellow
>
> SO far my code is below, but I know I am missing data I am just not sure what to put in it:
>

Please post in plain text. Your code has got mangled and
lost the indentation soi I'll need to guess...

Also tell us the Python(and OS) version, it all helps.

So far as the sample data you provided it doesn't seem to bear much 
relation to the code below apoart from (maybe) the hreader line.

DFor example what are you planning on doing with the 'and' in the 3rd 
line? There seems to be no attempt to process that?
And how can you add the strings meaningfully?

In other words can you show both the input *and the output*
you are aiming for?

> # Simply read and split an example Table 4
> import sys
>
> # Assigning count number and getting rid of right space
> def main():
> count = 0
> pieces = []
> for line in open(infile, 'U'):
> if count < 130:
> data = line.replace('"', '').rstrip().split("\t")
> data = clean_data(data)

For which I guess:

def main():
    count = 0
    pieces = []
    for line in open(infile, 'U'):
       if count < 130:
          data = line.replace('"', '').rstrip().split("\t")
          data = clean_data(data)

> if data[1] == "year" and data[1] != "":

This doesn't make sense since if data[1] is 'year'
it can never be "" so the second test is redundant.
And it should only ever be true on the header line.

> write_pieces(pieces)
> pieces = data
> str.join(pieces)

When you do the write_pieces() call pieces is
an empty list?

Then you try to join it using str.join but that is
the class method so expects a string instance as
its first argument. I suspect you should have used:

" ".join(pieces)

or

"\t".join(pieces)

But I'm not certain what you plan on doing here.
Especially since you don;t assign the result to
any variable so it gets deleted.

> else:
> for i in range(len(data)):
> pieces[i] = pieces[i] + data[i]
> str.join(pieces)

Since pieces is potentially the empty list here
you cannot safely assign anything to pieces[i].
And again I don;t know what the last line is
supposed to be doing.

>
> # Executing command to remove right space
> def clean_data(s):
> return [x.rstrip() for x in s]
>
> def write_pieces(pieces):
> print

This makes no sense since it only prints a blank line...



-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos




More information about the Tutor mailing list