Filtering content of a text file

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Fri Jul 27 08:49:35 EDT 2007


On Fri, 27 Jul 2007 02:28:27 -0700, Ira.Kovac wrote:

> I am working with 30K+ record datasets in flat file format (.txt) that
> look like this:
> 
> //-+alibaba sinage
> //-+amra damian//_9
> //-+anix anire//_
> //-+borom
> //-+bokima sun drane
> //-+ciren
> //-+cop calestieon eded
> //-+ciciban
> //-+drago kimano sole

The example seems to be sorted, is this true for the real data too?  And
are there records that don't start with a-z or 0-9?

> a) By looping thru the file the program should isolate all records
> that have letter a following the //-+
> b) The isolated dataset will contain only records that start with //-
> +a
> c) Save the isolated dataset as flat flat text file named a.txt
> d) Repeat a), b) and c) for all letters of english alphabet (a thru z)
> and numerical values (0 thru 9)

This might be a little bit inefficient because the file gets read 36
times.  If the data is already sorted you can use `itertools.groupby()` to
get the groups and write them to several files.  Otherwise if the files
can be read into memory completely you can sort in memory and then use
`itertools.groupby()`.

Ciao,
	Marc 'BlackJack' Rintsch



More information about the Python-list mailing list