[Tutor] a quick Q: how to use for loop to read a series of files with .doc end
lina
lina.lastname at gmail.com
Tue Oct 4 17:48:01 CEST 2011
On Tue, Oct 4, 2011 at 11:27 PM, Dave Angel <davea at ieee.org> wrote:
> On 10/04/2011 10:26 AM, lina wrote:
>
>> On Thu, Sep 29, 2011 at 11:28 PM, Dave Angel<d at davea.name> wrote:
>>
>> (Please don't top-post. Put your remarks AFTER the part you're quoting
>>> from the previous message)
>>>
>>>
>>> On 09/29/2011 10:55 AM, lina wrote:
>>>
>>> import os.path
>>>>
>>>> tokens=['E']
>>>> result=[]
>>>>
>>>> for fileName in os.listdir("."):
>>>> if os.path.isfile(fileName) and os.path.splitext(fileName)=="***
>>>> *xpm":
>>>>
>>>> filedata = open(fileName)
>>>> text=filedata.readlines()
>>>> for line in text:
>>>>
>>>>
>>>> How can I read from line 24 and do further looking for "E".
>>>>
>>>> Thanks,
>>>>
>>>>
>>>>
>>>> As I said in my earlier message, this was untested. It gave you the
>>> building blocks, but was not correct.
>>>
>>> In particular, that if-test will always fail, so you're not seeing any
>>> files.
>>>
>>>
>>> import os.path
>>>
>>> tokens=['E']
>>> result=[]
>>>
>>> for fileName in os.listdir("."):
>>>
>>> if os.path.isfile(fileName) and os.path.splitext(fileName)[1]=****
>>>
>>> =".xpm":
>>>
>>> filedata = open(fileName)
>>> text=filedata.readlines()
>>> for line in text:
>>> print line
>>>
>>>
>>> Once you've tested that, then you're ready to just look at line 24.
>>>
>>> text is a list, so you can refer to line 24 as text[24]
>>>
>>> Or you can get lines 24-28, with text[24, 29] (look up slices in the
>>> Python doc)
>>>
>>> ==
>>> DaveA
>>>
>>>
>>> Thanks for former help,
>>>
>> but I wonder how to output (write) the final result in each respectively
>> fileName with just different extension, such as original a.xpm write to
>> a.txt
>>
>> Thanks,
>>
>> #!/bin/python
>>
>> import os.path
>>
>> tokens=['E']
>> result=[]
>>
>> for fileName in os.listdir("."):
>> if os.path.isfile(fileName) and os.path.splitext(fileName)[1]=**
>> =".xpm":
>> filedata = open(fileName)
>> text=filedata.readlines()
>> for line in text[23:len(text)-1]:
>> result.append({t:line.count(t) for t in tokens})
>> for index,r in enumerate(result):
>> fileName.txt.write(index,"----**-",r)
>> ???
>>
>>
> for line in text[23:len(text)-1]:
>
> probably doesn't do what you expect. It'll start at the 24th line, but it
> won't include the last line. slicing uses half-open intervals, same as
> range. So you don't want the -1 on that line.
>
Yes. It starts from 24th line, the first 23 were irrelevant here for
analysis. I made a mistake, it should be text[23:len(text)]
>
> Fortunately, all you have to do is use the default second parm,
>
> for line in text[23:]:
>
> Now I have no idea why you want such a complex structure in result, but
> I'll ignore that for the moment.
> You want to know how to write an output file. Just like an input file, you
> first have to open it (in 'w' mode).
> outfile = open(newfilename, "w")
> will give you a file object, just like filedata did for the input file.
> So you would then do outfile.write(somedata) as needed. Notice that if
> you use write(), it does NOT put newlines in. That's up to you.
>
> Note also that opening a file with "w" deletes an existing file of the same
> name. So you want to thoroughly test your transformation code before
> running the actual command.
>
Yes,
for fileName in os.listdir("."):
result=[]
if os.path.isfile(fileName) and os.path.splitext(fileName)[1]==".xpm":
filedata = open(fileName)
text=filedata.readlines()
for line in text[0:]:
result.append({t:line.strip().count(t) for t in tokens})
for index,r in enumerate(result):
outfiledata=open("fileName.txt","w").write(index,"-----",r)
I still have problem using the value of the fileName,
here the output is fileName.txt, not $fileName.txt which is supposed to be
1.txt following input 1.xpm
>
> Now back to your result.append line. As it stands now, result contains the
> results for all the files you've processed so far. In other words, as you
> process multiple files, it'll get larger and larger. If you're writing the
> data out to multiple files, that isn't likely what you want.
>
Thanks for reminding, now I put result=[] in the for loop for files.
>
> But I also thought you wanted to count the occurrences of each token by
> column, and you're counting them by rows. That count method will return how
> many are in that particular line.
>
Yes, but indeed I calculated it in each row. big mistake.
can you tell me how to achieve this, transpose?
>
> Perhaps this would be clearer if your data wasn't square. if you had 10
> lines with 3 characters in each, it might be more obvious. I assume you
> would then want 3 result counts.
>
It's n*n matrix. now I used this one as example:
aaEbb
aEEbb
EaEbb
EaEbE
Thanks ahead for any further suggestions,
>
> DaveA
>
>
--
Best Regards,
lina
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20111004/500599e6/attachment.html>
More information about the Tutor
mailing list