Problem with tarfile module to open *.tar.gz files - unreliable ?

m_ahlenius ahleniusm at gmail.com
Thu Aug 19 22:02:57 EDT 2010


Hi,

I am relatively new to doing serious work in python.  I am using it to
access a large number of log files.  Some of the logs get corrupted
and I need to detect that when processing them.  This code seems to
work for quite a few of the logs (all same structure)  It also
correctly identifies some corrupt logs but then it identifies others
as being corrupt when they are not.

example error msg from below code:

Could not open the log file: '/disk/7-29-04-02-01.console.log.tar.gz'
Exception: CRC check\
 failed 0x8967e931 != 0x4e5f1036L

When I manually examine the supposed corrupt log file and use
"tar -xzvof /disk/7-29-04-02-01.console.log.tar.gz "  on it, it opens
just fine.

Is there anything wrong with how I am using this module?  (extra code
removed for clarity)

 if tarfile.is_tarfile( file ):
        try:
            xf = tarfile.open( file, "r:gz" )
            for locFile in xf:
                logfile = xf.extractfile( locFile )
                validFileFlag = True
                # iterate through each log file, grab the first and
the last lines
                lines = iter( logfile )
                firstLine = lines.next()
                for nextLine in lines:
                    ....
                        continue

                logfile.close()
                 ...
            xf.close()
        except Exception, e:
            validFileFlag = False
            msg = "\nCould not open the log file: " + repr(file) + "
Exception: " + str(e) + "\n"
 else:
        validFileFlag = False
        lTime = extractFileNameTime( file )
        msg = ">>>>>>> Warning " + file + " is NOT a valid tar archive
\n"
        print msg



More information about the Python-list mailing list