fixing an horrific formatted csv file.

F.R. anthra.norell at bluewin.ch
Wed Jul 2 11:51:05 EDT 2014


On 07/02/2014 11:13 AM, flebber wrote:
>>>>> TM = TX.Table_Maker (headings =
>> ('Meeting','Date','Race','Number','Name','Trainer','Location'))
>>>>> TM (race_table (your_csv_text)).write ()
> Where do I find TX? Found this mention in the list, was it available in pip by any name?
> https://mail.python.org/pipermail/python-list/2014-February/667464.html
>
> Sayth

I'd have to make it available. I proposed it some time ago and received 
a couple of suggestions in return. It is a modular transformation 
framework written entirely in python (2.7). It consists essentially of a 
base class "Transformer" that handles input and output in such a way 
that Transformer objects can be chained. It saved me from drowning an a 
horrible and growing tangle of hacks. Finding something usable I had 
previously done took time. Understanding how it worked took more time 
and adapting it took still more time, so that writing yet another hack 
from scratch was faster.
     A number of hacks I could quickly wrap into a Transformer object 
and so could start building a library of standard Transformers. The 
Table_Maker is one of them. The table making code is quite bad. It 
suffers from feature overload. I would clean it up for distribution.
     I'd be happy to distribute the base class and a few standard 
Translators, such as I use every day. (File Reader, File Writer, DB Run 
Command, DB Write, Table Maker, PDF To Text, Text To Lines, Lines To 
Text, Sort, Sort And Unique, etc.) Writing one's own Transformers is a 
breeze. Testing too, because a Transformer keeps its input and output 
and, in line with the system's design philosophy, does only its own 
single thing.
     A Chain is a list of Transformers that run in sequence. It is 
itself derived from Transformer and is a functional equivalent. So 
Chains nest. Fixing a Chain that nothing comes out of is a 
straightforward matter too. It will still have run up to the failing 
element. Chain.show () reveals the culprit as the first one to have no 
output.
     I am not up to date on distributing and would depend on qualified 
help on that.

Frederic


--------------------------------------------------------------------------------


A brief overview


The TX solution to your race table would be (TX is the name of the module):

     class Race_Table (TX.Transformer):
         '''
         In: CSV text
         Out: Tabular data (2-dimensional list)
         '''
         name = 'Race_Table'
         @TX.setup   # Checks timestamps to prevent needless reruns in 
the absence of new input
         def transform (self):
             for line in self.Input.data:
                 # See my post
             self.Output.take (output_table)

     Example file to file:
     >>> Race_Schedule_F2F = TX.Chain (TX.File_Reader (), Race_Table (), 
TX.List_To_CSV (delimiter = ';'), TX.File_Writer (terminal = out_file_name)
     >>> Race_Schedule_F2F (input_file_name)   # Does it all!

     Example web to database:
     >>> Race_Schedule_WWW2DB = TX.Chain (TX.WWW_Reader (), 
Race_Schedule_HTML_Reader (), Race_Table (), TX.DB_Writer (table_name = 
'horses'))
     >>> Race_Schedule_WWW2DB (url)   # Does is all! You'd have to write 
the Race_Schedule_HTML_Reader

     Verify your table:
     >>> Table_Viewer = TX.Chain (TX.Table_Maker (), TX.Table_Writer ())
     >>> Race_Schedule_WWW2DB.show_tree () # See which one should display
     Chain
     Chain[0] - WWW Reader
     Chain[1] - Race_Schedule_HTML_Reader
     Chain[2] - Race_Table
     Chain[3] - DB Writer
     >>> print Table_Viewer (Race_Schedule_WWW2DB[2]()) # All 
Transformers keep their data
     (Display of table)

     Verify database:
     >>> print Table_Viewer (TX.DB_Reader (table_name = 'horses')())
     (Display of database table)




More information about the Python-list mailing list