[Tutor] a quick Q: how to use for loop to read a series of files with .doc end
Dave Angel
davea at ieee.org
Tue Oct 4 17:27:04 CEST 2011
On 10/04/2011 10:26 AM, lina wrote:
> On Thu, Sep 29, 2011 at 11:28 PM, Dave Angel<d at davea.name> wrote:
>
>> (Please don't top-post. Put your remarks AFTER the part you're quoting
>> from the previous message)
>>
>>
>> On 09/29/2011 10:55 AM, lina wrote:
>>
>>> import os.path
>>>
>>> tokens=['E']
>>> result=[]
>>>
>>> for fileName in os.listdir("."):
>>> if os.path.isfile(fileName) and os.path.splitext(fileName)=="**xpm":
>>> filedata = open(fileName)
>>> text=filedata.readlines()
>>> for line in text:
>>>
>>>
>>> How can I read from line 24 and do further looking for "E".
>>>
>>> Thanks,
>>>
>>>
>>>
>> As I said in my earlier message, this was untested. It gave you the
>> building blocks, but was not correct.
>>
>> In particular, that if-test will always fail, so you're not seeing any
>> files.
>>
>>
>> import os.path
>>
>> tokens=['E']
>> result=[]
>>
>> for fileName in os.listdir("."):
>>
>> if os.path.isfile(fileName) and os.path.splitext(fileName)[1]=**
>> =".xpm":
>>
>> filedata = open(fileName)
>> text=filedata.readlines()
>> for line in text:
>> print line
>>
>>
>> Once you've tested that, then you're ready to just look at line 24.
>>
>> text is a list, so you can refer to line 24 as text[24]
>>
>> Or you can get lines 24-28, with text[24, 29] (look up slices in the
>> Python doc)
>>
>> ==
>> DaveA
>>
>>
>> Thanks for former help,
> but I wonder how to output (write) the final result in each respectively
> fileName with just different extension, such as original a.xpm write to
> a.txt
>
> Thanks,
>
> #!/bin/python
>
> import os.path
>
> tokens=['E']
> result=[]
>
> for fileName in os.listdir("."):
> if os.path.isfile(fileName) and os.path.splitext(fileName)[1]==".xpm":
> filedata = open(fileName)
> text=filedata.readlines()
> for line in text[23:len(text)-1]:
> result.append({t:line.count(t) for t in tokens})
> for index,r in enumerate(result):
> fileName.txt.write(index,"-----",r)
> ???
>
for line in text[23:len(text)-1]:
probably doesn't do what you expect. It'll start at the 24th line, but
it won't include the last line. slicing uses half-open intervals, same
as range. So you don't want the -1 on that line.
Fortunately, all you have to do is use the default second parm,
for line in text[23:]:
Now I have no idea why you want such a complex structure in result, but
I'll ignore that for the moment.
You want to know how to write an output file. Just like an input file,
you first have to open it (in 'w' mode).
outfile = open(newfilename, "w")
will give you a file object, just like filedata did for the input file.
So you would then do outfile.write(somedata) as needed. Notice that if
you use write(), it does NOT put newlines in. That's up to you.
Note also that opening a file with "w" deletes an existing file of the
same name. So you want to thoroughly test your transformation code
before running the actual command.
Now back to your result.append line. As it stands now, result contains
the results for all the files you've processed so far. In other words,
as you process multiple files, it'll get larger and larger. If you're
writing the data out to multiple files, that isn't likely what you want.
But I also thought you wanted to count the occurrences of each token by
column, and you're counting them by rows. That count method will return
how many are in that particular line.
Perhaps this would be clearer if your data wasn't square. if you had 10
lines with 3 characters in each, it might be more obvious. I assume you
would then want 3 result counts.
DaveA
More information about the Tutor
mailing list