Formatting Results so that They Can be Nicely Imported into a Spreadsheet.

mensanator at aol.com mensanator at aol.com
Sat Aug 4 23:25:59 EDT 2007


On Aug 4, 9:21?pm, "Jim Langston" <tazmas... at rocketmail.com> wrote:
> <mensana... at aol.com> wrote in message
>
> news:1186278638.931477.39760 at z24g2000prh.googlegroups.com...
>
>
>
>
>
> > On Aug 4, 6:35?pm, SMERSH009 <SMERSH0... at gmail.com> wrote:
> >> Hi All.
> >> Let's say I have some badly formatted text called doc:
>
> >> doc=
> >> """
> >> friendid
> >> Female
>
> >>                             23 years old
>
> >>                             Los Gatos
>
> >>                             United States
> >> friendid
> >> Male
>
> >>                             24 years old
>
> >>                             San Francisco, California
>
> >>                             United States
> >> """
>
> >> How would I get these results to be displayed in a format similar to:
> >> friendid;Female;23 years old;Los Gatos;United States
> >> friendid;Male; 24 years old;San Francisco, California;United States
>
> >> The latter is a lot easier to organize and can be quickly imported
> >> into Excel's column format.
>
> >> Thanks Much,
> >> Sam
>
> > d = doc.split('\n')
>
> > f = [i.split() for i in d if i]
>
> > g = [' '.join(i) for i in f]
>
> > rec = []
> > temprec = []
> > for i in g:
> >    if i:
> >        if i == 'friendid':
> >            rec.append(temprec)
> >            temprec = [i]
> >        else:
> >            temprec.append(i)
> > rec.append(temprec)
>
> > output = [';'.join(i) for i in rec if i]
>
> > for i in output: print i
>
> > ##    friendid;Female;23 years old;Los Gatos;United States
> > ##    friendid;Male;24 years old;San Francisco, California;United States
>
> also, I would suggest you use CSV format.  

Well, the OP asked for a specific format. One is not
always at liberty to change it.

> CSV stands for "Comma Seperated
> Variable" and Excel can load such a sheet directly.

And Excel can load the shown format directly also,
just specify the delimiter.

>
> Instead of seperating using ; seperate using ,  Of course, this provides a
> problem when there is a , in a string.  

Which explains the popularity of using tabs as delimiters.
The data deliverable specification I use at work
uses the pipe character | which never appears as data
in this particular application.

> Resolution is to quote the string.

Which makes the file bigger and isn't necessary
when tabs and pipes are used as delimiters.

> Being such, you can just go ahead and quote all strings.  So you would want
> the output to be:
>
> "friendid","Female","23 years old","Los Gatos","United States"
> "friendid","Male","24 years old","San Francisco, California","United States"

Which I would do if I had a specification that
demanded it or was making files for others. For my
own use, I wouldn't bother as it's unnecessary work.

>
> Numbers should not be quoted if you wish to treat them as numeric and not
> text.

A good reason not to use quotes at all. Besides which,
Excel can handle that also.




More information about the Python-list mailing list