table (ascii text) lin ayout recognition

James Stroud jstroud at mbi.ucla.edu
Wed Sep 13 01:52:00 EDT 2006


vbfoobar at gmail.com wrote:
> Hello,
> 
> I am looking for python code useful to process
> tables that are in ASCII text. The code must
> determine where are the columns (fields).
> Concerned tables for my application are various,
> but their columns are not very complicated
> to locate for a human, because even
> when ignoring the semantic of  words,
> our eyes see vertical alignments
> 
> Here is a sample table (must be viewed
> with fixed-width font to see alignments):
> =================================
> 
> 44544      ipod          apple     black         102
> GFGFHHF-12 unknown thing bizar     brick mortar  tbc
> 45fjk      do not know   + is less               biac
>            disk          seagate   250GB         130
> 5G_gff                   tbd       tbd
> gjgh88hgg  media record  a and b                 12
> hjj        foo           bar       hop           zip
> hg uy oi   hj uuu ii a   qqq ccc v ZZZ Ughj
> qdsd       zert                    nope          nope
> 
> =================================
> 
> I want the python code that builds a representation
> of this table (for exemple a list of lists, where each list
> represents a table line, each element of the list
> being a field value).
> 
> Any hints?
> thanks
> 

I have to catch a bus, but, quickly the algorithm is to code non-space 
as one and space as zero, then 'or' operate down the columns. Zeros will 
indicate high probability of between-column. Code tomorrow if no one 
else posts.

Must run...


-- 
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/



More information about the Python-list mailing list