clear the files using python

Peter Hansen peter at engcorp.com
Mon May 9 08:48:27 EDT 2005


Sez sez:
> Each file's structure as below:
> Comments: This is article 1965 obtained from the website
> Title: Banana Report #65, September 2003
> Author: dylab
> Date: 1st September 2003
> Section: pulse
> 
> In the past month:
> A mass hit North America, cutting electricity to 50 million people
> across the North east
> 
> 
> I'm expected execute the python script so the file suppose to look like
> this:
> 
> pulse, In, the, past, month, A, mass, hit, North, America, cutting,
> electricity, to, 50, million, people, across, the, North east, dylab

You'll need either more examples or a more detailed description.  The 
above could be interpreted as something like "put the pulse section 
first, then exactly 19 words from the following text, removing 
punctuation and line breaks, and taking the last two words together as 
one, then add the 'author' field, and write them all out together with a 
field separator of ', ' (comma plus space)".

On the other hand, it could be interpreted a large number of other ways, 
and since none of us have any idea what you are trying to do with the 
results, we can't use our own intuition or experience to help.

I also personally find it hard to respond to questions like this with 
real code when there are things about the task which I find very 
surprising.  For example, you're throwing away the date information 
entirely, along with the comments and title.  Is that really intended?

And are the author and section fields always exactly one word, with no 
punctuation?  (What would happen if an author's name was "Hansen, 
Peter"?  How would you format that in the output without getting the 
first name confused with the next field?)

> Could you please point me to right direction here. Or provide some
> example code. In the mean time I'll be searching myself. I know you
> guys hate novice people like me but I would appreciated if you could
> provide little help here.

We don't "hate" novice people by any means... I suspect you are either 
trying to be self-deprecating or maybe you just haven't read this 
newsgroup for long.  c.l.p actually *loves* novices; it just doesn't 
prefer questions that aren't very clear.  Keep trying (and improving!) 
and you'll definitely get the help you need.

And your comment about Python being the best language for this is pretty 
close to the mark... but there are certainly a variety of ways to go 
about the task and the best might depend on a lot of unanswered questions.

-Peter



More information about the Python-list mailing list