Filtering content of a text file
Amit Khemka
khemkaamit at gmail.com
Fri Jul 27 06:05:06 EDT 2007
On 7/27/07, Ira.Kovac at gmail.com <Ira.Kovac at gmail.com> wrote:
> Hello All,
>
> I'd greatly appreciate if you can take a look at the task I need help
> with.
>
> It'd be outstanding if someone can provide some sample Python code.
>
> Thanks a lot,
>
> Ira
>
> -------------------------------------------------------------------------------
> Problem
> -------------------------------------------------------------------------------
>
> I am working with 30K+ record datasets in flat file format (.txt) that
> look like this:
>
> //-+alibaba sinage
> //-+amra damian//_9
> //-+anix anire//_
> //-+borom
> //-+bokima sun drane
> //-+ciren
> //-+cop calestieon eded
> //-+ciciban
> //-+drago kimano sole
>
>
> The records start with the same string (in the example //-+) wich is
> followed by another string of characters taht's changing from record
> to record.
>
> I am working on one file at the time and for each file I need to be
> able to do the following:
>
> a) By looping thru the file the program should isolate all records
> that have letter a following the //-+
> b) The isolated dataset will contain only records that start with //-
> +a
> c) Save the isolated dataset as flat flat text file named a.txt
> d) Repeat a), b) and c) for all letters of english alphabet (a thru z)
> and numerical values (0 thru 9)
Well that should be easy if you take a look at methods in "string" module:
A rough sketch would be :
import string # import string module
alnums = list(string.lowercase+string.digits) # create a list of
alphabets and digits
for alnum in alnums:
outfile = open(alnum+'.txt', 'w')
for line in file("myrecords.txt"): # iterate over the records
if line.startswith("//-+"+alnum): # check your condition
# write the matches to a file
outfile.write(line)
outfile.close()
However rather than looping over the file for each alnum you may just
iterate over the file, and check the starting characters (if len(line)
> 4: ch=line[4]) , and if it is alnum then process it.
Cheers,
--
----
Amit Khemka
website: www.onyomo.com
wap-site: www.owap.in
Home Page: www.cse.iitd.ernet.in/~csd00377
Endless the world's turn, endless the sun's Spinning, Endless the quest;
I turn again, back to my own beginning, And here, find rest.
More information about the Python-list
mailing list