ignore specific data

pkilambi at gmail.com pkilambi at gmail.com
Mon Nov 21 16:59:12 EST 2005


I tried the solutions you provided..these are not as robust as i
thought would be...
may be i should put the problem more clearly...

here it goes....

I have a bunch of documents and each document has a header which is
common to all files. I read each file process it and compute the
frequency of words in each file. now I want to ignore the header in
each file. It is easy if the header is always at the top. but
apparently its not. it could be at the bottom as well. So I want a
function which goes through the file content and ignores the common
header and return the remaining text to compute the frequencies..Also
the header is not just one line..it includes licences and all other
stuff and may be 50 to 60 lines as well..This "remove_header" has to be
much more efficient as the files may be huge. As this is a very small
part of the whole problem i dont want this to slow down my entire
code...




More information about the Python-list mailing list