Need help reading damaged file

Anton Vredegoor anton at vredegoor.doge.nl
Mon Oct 14 12:29:10 EDT 2002


On Mon, 14 Oct 2002 11:21:49 -0400, PoulsenL at capanalysis.com wrote:

>I have about 100+ files that are a dump of old tape from a database.  Most
>of the data is good, but it is interspersed with damage that contains
>backspace characters and I _believe_ EOF characters.  When we try to import
>the data it only imports 1/3 or 1/10, etc of the data depending on the file.
>I can pull it up in WinEdt and see that it contains far more lines.  I
>created a script that reads through the file and counts the lines (not the
>most efficient script in the world, I'm sure).  The problem is that the
>script suffers from the same problem as the import utility.  It stops far
>short of the the end of the file.  Any help would be appreciated.
>
>Here is the script:
>
>import glob, os
>
>def countlines(a,b,c):
>    for file in c:
>        if os.path.isfile(b + '\\' + file):
>            input = open(b + '\\' + file)
>            print b + '\\' + file
>            x = 0
>            for y in input:
>                x += 1 
>            print x
>        
>os.path.walk('\\\\Server\\Dir\\',countlines, None)

>From this it seems you are counting bytes instead of lines. I am not
exactly sure what the situation is, but I would try:

>            input = open(b + '\\' + file,'rb')

And check if this makes any difference.

Anton.



More information about the Python-list mailing list