Filtering content of a text file

Amit Khemka khemkaamit at gmail.com
Fri Jul 27 06:05:06 EDT 2007


On 7/27/07, Ira.Kovac at gmail.com <Ira.Kovac at gmail.com> wrote:
> Hello All,
>
> I'd greatly appreciate if you can take a look at the task I need help
> with.
>
> It'd be outstanding if someone can provide some sample Python code.
>
> Thanks a lot,
>
> Ira
>
> -------------------------------------------------------------------------------
> Problem
> -------------------------------------------------------------------------------
>
> I am working with 30K+ record datasets in flat file format (.txt) that
> look like this:
>
> //-+alibaba sinage
> //-+amra damian//_9
> //-+anix anire//_
> //-+borom
> //-+bokima sun drane
> //-+ciren
> //-+cop calestieon eded
> //-+ciciban
> //-+drago kimano sole
>
>
> The records start with the same string (in the example //-+) wich is
> followed by another string of characters taht's changing from record
> to record.
>
> I am working on one file at the time and for each file I need to be
> able to do the following:
>
> a) By looping thru the file the program should isolate all records
> that have letter a following the //-+
> b) The isolated dataset will contain only records that start with //-
> +a
> c) Save the isolated dataset as flat flat text file named a.txt
> d) Repeat a), b) and c) for all letters of english alphabet (a thru z)
> and numerical values (0 thru 9)

Well that should be easy if you take a look at methods in "string" module:
A rough sketch would be :

import string  # import string module
alnums = list(string.lowercase+string.digits)   # create a list of
alphabets and digits

for alnum in alnums:
    outfile = open(alnum+'.txt', 'w')
    for line in file("myrecords.txt"):   # iterate over the records
        if line.startswith("//-+"+alnum):  # check your condition
            # write the matches to a file
            outfile.write(line)
     outfile.close()

However rather than looping over the file for each alnum you may just
iterate over the file, and check the starting characters (if len(line)
> 4:    ch=line[4]) , and if it is alnum then process it.

Cheers,
-- 
----
Amit Khemka
website: www.onyomo.com
wap-site: www.owap.in
Home Page: www.cse.iitd.ernet.in/~csd00377

Endless the world's turn, endless the sun's Spinning, Endless the quest;
I turn again, back to my own beginning, And here, find rest.



More information about the Python-list mailing list