[Tutor] a quick Q: how to use for loop to read a series of files with .doc end

Tue Oct 4 17:48:01 CEST 2011

On Tue, Oct 4, 2011 at 11:27 PM, Dave Angel <davea at ieee.org> wrote:

> On 10/04/2011 10:26 AM, lina wrote:
>
>> On Thu, Sep 29, 2011 at 11:28 PM, Dave Angel<d at davea.name>  wrote:
>>
>>  (Please don't top-post.  Put your remarks AFTER the part you're quoting
>>> from the previous message)
>>>
>>>
>>> On 09/29/2011 10:55 AM, lina wrote:
>>>
>>>  import os.path
>>>>
>>>> tokens=['E']
>>>> result=[]
>>>>
>>>> for fileName in os.listdir("."):
>>>>     if os.path.isfile(fileName) and os.path.splitext(fileName)=="***
>>>> *xpm":
>>>>
>>>>         filedata = open(fileName)
>>>>         text=filedata.readlines()
>>>>         for line in text:
>>>>
>>>>
>>>> How can I read from line 24 and do further looking for "E".
>>>>
>>>> Thanks,
>>>>
>>>>
>>>>
>>>>  As I said in my earlier message, this was untested.  It gave you the
>>> building blocks, but was not correct.
>>>
>>> In particular, that if-test will always fail, so you're not seeing any
>>> files.
>>>
>>>
>>> import os.path
>>>
>>> tokens=['E']
>>> result=[]
>>>
>>> for fileName in os.listdir("."):
>>>
>>>    if os.path.isfile(fileName) and os.path.splitext(fileName)[1]=****
>>>
>>> =".xpm":
>>>
>>>        filedata = open(fileName)
>>>        text=filedata.readlines()
>>>        for line in text:
>>>            print line
>>>
>>>
>>> Once you've tested that, then you're ready to just look at line 24.
>>>
>>> text is a list, so you can refer to line 24 as text[24]
>>>
>>> Or you can get lines 24-28, with  text[24, 29]   (look up slices in the
>>> Python doc)
>>>
>>> ==
>>> DaveA
>>>
>>>
>>> Thanks for former help,
>>>
>> but I wonder how to output (write) the final result in each respectively
>> fileName with just different extension, such as original a.xpm write to
>> a.txt
>>
>> Thanks,
>>
>> #!/bin/python
>>
>> import os.path
>>
>> tokens=['E']
>> result=[]
>>
>> for fileName in os.listdir("."):
>>     if os.path.isfile(fileName) and os.path.splitext(fileName)[1]=**
>> =".xpm":
>>         filedata = open(fileName)
>>         text=filedata.readlines()
>>         for line in text[23:len(text)-1]:
>>             result.append({t:line.count(t) for t in tokens})
>>         for index,r in enumerate(result):
>>             fileName.txt.write(index,"----**-",r)
>> ???
>>
>>
>  for line in text[23:len(text)-1]:
>
> probably doesn't do what you expect.  It'll start at the 24th line, but it
> won't include the last line. slicing uses half-open intervals, same as
> range.  So you don't want the -1 on that line.
>

Yes. It starts from 24th line, the first 23 were irrelevant here for
analysis. I made a mistake, it should be text[23:len(text)]

>
> Fortunately, all you have to do is use the default second parm,
>
>  for line in text[23:]:
>
> Now I have no idea why you want such a complex structure in result, but
> I'll ignore that for the moment.
> You want to know how to write an output file. Just like an input file, you
> first have to open it (in 'w' mode).
>      outfile = open(newfilename,  "w")
> will give you a file object, just like filedata did for the input file.
> So you would then do outfile.write(somedata)  as needed.  Notice that if
> you use write(), it does NOT put newlines in.  That's up to you.
>
> Note also that opening a file with "w" deletes an existing file of the same
> name.  So you want to thoroughly test your transformation code before
> running the actual command.
>
Yes,

for fileName in os.listdir("."):
    result=[]
    if os.path.isfile(fileName) and os.path.splitext(fileName)[1]==".xpm":
        filedata = open(fileName)
        text=filedata.readlines()
        for line in text[0:]:
            result.append({t:line.strip().count(t) for t in tokens})
        for index,r in enumerate(result):
           outfiledata=open("fileName.txt","w").write(index,"-----",r)
I still have problem using the value of the fileName,

here the output is fileName.txt, not $fileName.txt which is supposed to be
1.txt following input 1.xpm

>
> Now back to your result.append line.  As it stands now, result contains the
> results for all the files you've processed so far.  In other words, as you
> process multiple files, it'll get larger and larger.  If you're writing the
> data out to multiple files, that isn't likely what you want.
>
Thanks for reminding, now I put result=[] in the for loop for files.

>
> But I also thought you wanted to count the occurrences of  each token by
> column, and you're counting them by rows.  That count method will return how
> many are in that particular line.
>
Yes, but indeed I calculated it in each row. big mistake.
can you tell me how to achieve this, transpose?

>
> Perhaps this would be clearer if your data wasn't square.  if you had 10
> lines with 3 characters in each, it might be more obvious.  I assume you
> would then want 3 result counts.
>
It's n*n matrix. now I used this one as example:
aaEbb
aEEbb
EaEbb
EaEbE

Thanks ahead for any further suggestions,

>
> DaveA
>
>

-- 
Best Regards,

lina
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20111004/500599e6/attachment.html>