Opening multiple Files in Different Encoding

Oscar Benjamin oscar.j.benjamin at gmail.com
Wed Jul 11 16:55:31 EDT 2012


On 11 July 2012 19:15, <subhabangalore at gmail.com> wrote:

> On Tuesday, July 10, 2012 11:16:08 PM UTC+5:30, Subhabrata wrote:
> > Dear Group,
> >
> > I kept a good number of files in a folder. Now I want to read all of
> > them. They are in different formats and different encoding. Using
> > listdir/glob.glob I am able to find the list but how to open/read or
> > process them for different encodings?
> >
> > If any one can help me out.I am using Python3.2 on Windows.
> >
> > Regards,
> > Subhabrata Banerjee.
> Dear Group,
>
> No generally I know the glob.glob or the encodings as I work lot on
> non-ASCII stuff, but I recently found an interesting issue, suppose there
> are .doc,.docx,.txt,.xls,.pdf files with different encodings.


Some of the formats you have listed are not text-based. What do you mean by
the encoding of e.g. a .doc or .xls file?

My understanding is that these are binary files. You won't be able to read
them without the help of a special module (I don't know of one that can).


> 1) First I have to determine on the fly the file type.
> 2) I can not assign encoding="..." whatever be the encoding I have to read
> it.
>

Perhaps you just want to open the file as binary? The following will read
the contents of any file binary or text regardless of encoding or anything
else:

f = open('spreadsheet.xls', 'rb')
data = f.read()   # returns binary data rather than text


>
> Any idea. Thinking.
>
> Thanks in Advance,
> Regards,
> Subhabrata Banerjee.
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20120711/e3b396b5/attachment.html>


More information about the Python-list mailing list