[Tutor] a quick Q: how to use for loop to read a series of files with .doc end

Tue Oct 4 17:27:04 CEST 2011

On 10/04/2011 10:26 AM, lina wrote:
> On Thu, Sep 29, 2011 at 11:28 PM, Dave Angel<d at davea.name>  wrote:
>
>> (Please don't top-post.  Put your remarks AFTER the part you're quoting
>> from the previous message)
>>
>>
>> On 09/29/2011 10:55 AM, lina wrote:
>>
>>> import os.path
>>>
>>> tokens=['E']
>>> result=[]
>>>
>>> for fileName in os.listdir("."):
>>>      if os.path.isfile(fileName) and os.path.splitext(fileName)=="**xpm":
>>>          filedata = open(fileName)
>>>          text=filedata.readlines()
>>>          for line in text:
>>>
>>>
>>> How can I read from line 24 and do further looking for "E".
>>>
>>> Thanks,
>>>
>>>
>>>
>> As I said in my earlier message, this was untested.  It gave you the
>> building blocks, but was not correct.
>>
>> In particular, that if-test will always fail, so you're not seeing any
>> files.
>>
>>
>> import os.path
>>
>> tokens=['E']
>> result=[]
>>
>> for fileName in os.listdir("."):
>>
>>     if os.path.isfile(fileName) and os.path.splitext(fileName)[1]=**
>> =".xpm":
>>
>>         filedata = open(fileName)
>>         text=filedata.readlines()
>>         for line in text:
>>             print line
>>
>>
>> Once you've tested that, then you're ready to just look at line 24.
>>
>> text is a list, so you can refer to line 24 as text[24]
>>
>> Or you can get lines 24-28, with  text[24, 29]   (look up slices in the
>> Python doc)
>>
>> ==
>> DaveA
>>
>>
>> Thanks for former help,
> but I wonder how to output (write) the final result in each respectively
> fileName with just different extension, such as original a.xpm write to
> a.txt
>
> Thanks,
>
> #!/bin/python
>
> import os.path
>
> tokens=['E']
> result=[]
>
> for fileName in os.listdir("."):
>      if os.path.isfile(fileName) and os.path.splitext(fileName)[1]==".xpm":
>          filedata = open(fileName)
>          text=filedata.readlines()
>          for line in text[23:len(text)-1]:
>              result.append({t:line.count(t) for t in tokens})
>          for index,r in enumerate(result):
>              fileName.txt.write(index,"-----",r)
> ???
>

  for line in text[23:len(text)-1]:

probably doesn't do what you expect.  It'll start at the 24th line, but 
it won't include the last line. slicing uses half-open intervals, same 
as range.  So you don't want the -1 on that line.

Fortunately, all you have to do is use the default second parm,

  for line in text[23:]:

Now I have no idea why you want such a complex structure in result, but 
I'll ignore that for the moment.
You want to know how to write an output file. Just like an input file, 
you first have to open it (in 'w' mode).
       outfile = open(newfilename,  "w")
will give you a file object, just like filedata did for the input file.
So you would then do outfile.write(somedata)  as needed.  Notice that if 
you use write(), it does NOT put newlines in.  That's up to you.

Note also that opening a file with "w" deletes an existing file of the 
same name.  So you want to thoroughly test your transformation code 
before running the actual command.

Now back to your result.append line.  As it stands now, result contains 
the results for all the files you've processed so far.  In other words, 
as you process multiple files, it'll get larger and larger.  If you're 
writing the data out to multiple files, that isn't likely what you want.

But I also thought you wanted to count the occurrences of  each token by 
column, and you're counting them by rows.  That count method will return 
how many are in that particular line.

Perhaps this would be clearer if your data wasn't square.  if you had 10 
lines with 3 characters in each, it might be more obvious.  I assume you 
would then want 3 result counts.

DaveA