parsing tree from excel sheet

alb al.basili at gmail.com
Thu Jan 29 16:22:20 EST 2015


Hi Tim,

Tim Chase <python.list at tim.thechases.com> wrote:
[]
>> I know about the xlrd module to get data from excel
> 
> If I have to get my code to read Excel files, xlrd is usually my
> first and only stop.
> 

It provides quite a good interface to manipulating excel files and I 
find it pretty easy even for my entry level!

>> Does anyone recommend any other path other than scripting through
>> these two modules?
> 
> Well, if you export from Excel as CSV, you can use the "csv" module
> in the standard library.  This is actually my preferred route because
> it prevents people (coughclientscough) from messing up the CSV file
> with formatting, joined cells, and other weirdnesses that can choke
> my utilities.

In my case there's no such risk of manipulating the excel file. I'm in 
charge of it! :-) Sure it might at a later stage be misused and messed 
up inadvertedly, but we're just trying to validate an idea, i.e. writing 
specs without using any word processor.

I'm trying to bypass the need to go through a mark up language (to a 
certain point), in order to facilitate the transition from an 
unstructured approach to document writing to a more structured one.

I would have proposed SGML or XML and style sheets but unfortunately is 
hard to move from M$Word to XML (OMG I need to write code?!?!!). So to 
facilitate the transition to a structured approach I've come up with the 
idea to go through an automatic generation of documents using excel as a 
UI.

In a later stage with could move onto a full-fledged database and have 
simpler web access, but using the same backend for generating documents 
(i.e. some parser and latex).

>> Is there any more suitable module/example/project out there that
>> would achieve the same result?
> 
> I don't believe there's anything that will natively do the work for
> you.  Additionally, you'd have to clarify what should happen if two
> rows in the same section had different sub-trees but the same
> content/name.  Based on your use-case (LaTex export using these as
> headers) I suspect you'd want a warning so you can repair the input
> and re-run.  But it would be possible to default to either keeping or
> squashing the duplicates.

Sure, there are corner cases that might mess up the whole structure, 
which at the moment is not too fool proof, but I'm trying to test the 
idea and see what I can come up with. Once the flow is in place I could 
think over some more reliable approach as an interface to a database.
 
>> p.s.: I'm not extremely proficient in python, actually I'm just
>> starting with it!
> 
> Well, you've come to the right place. Most of us are pretty fond of
> Python here. :-)

I've never understood people discarding newsgroups in favor of more 
'recent' technologies like social networks. Long live the USENET!



More information about the Python-list mailing list