XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\r\n\r\n\r\n\r\n'

Wed Sep 29 08:11:49 EDT 2021

On 29/09/2021 13.10, hongy... at gmail.com wrote:
> On Wednesday, September 29, 2021 at 5:40:58 PM UTC+8, J.O. Aho wrote:
>> On 29/09/2021 10.22, hongy... at gmail.com wrote:
>>> I tried to convert a xls file into csv with the following command, but failed:
>>>
>>> $ in2csv --sheet 'Sheet1' 2021-2022-1.xls
>>> XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\r\n\r\n\r\n\r\n'
>>>
>>> The above testing file is located at here [1].
>>>
>>> [1] https://github.com/hongyi-zhao/temp/blob/master/2021-2022-1.xls
>>>
>>> Any hints for fixing this problem?
>> You need to delete the 13 first lines in the file
> 
> Yes. After deleting the top 3 lines, the problem has been fixed.
> 
>> or you see to that your code does first trim the data before start xml parse it.
> 
> Yes. I really want to do this trick programmatically, but how do I do it without manually editing the file?

You could do something like loading the XML into a string (myxmlstr) and 
then find the fist < in that string

xmlstart = myxmlstr.find('<')

xmlstr = myxmlstr[xmlstart:]

then use the xmlstr in the xml parser, sure not as convenient as loading 
the file directly to the xml parser.

I don't say this is the best way of doing it, I'm sure some python wiz 
here would have a smarter solution.

-- 

  //Aho