Problem with tarfile module to open *.tar.gz files - unreliable ?

m_ahlenius ahleniusm at gmail.com
Fri Aug 20 09:23:57 EDT 2010


On Aug 20, 6:57 am, m_ahlenius <ahleni... at gmail.com> wrote:
> On Aug 20, 5:34 am, Dave Angel <da... at ieee.org> wrote:
>
>
>
>
>
> > m_ahlenius wrote:
> > > Hi,
>
> > > I am relatively new to doing serious work in python.  I am using it to
> > > access a large number of log files.  Some of the logs get corrupted
> > > and I need to detect that when processing them.  This code seems to
> > > work for quite a few of the logs (all same structure)  It also
> > > correctly identifies some corrupt logs but then it identifies others
> > > as being corrupt when they are not.
>
> > > example error msg from below code:
>
> > > Could not open the log file: '/disk/7-29-04-02-01.console.log.tar.gz'
> > > Exception: CRC check\
> > >  failed 0x8967e931 != 0x4e5f1036L
>
> > > When I manually examine the supposed corrupt log file and use
> > > "tar -xzvof /disk/7-29-04-02-01.console.log.tar.gz "  on it, it opens
> > > just fine.
>
> > > Is there anything wrong with how I am using this module?  (extra code
> > > removed for clarity)
>
> > >  if tarfile.is_tarfile( file ):
> > >         try:
> > >             xf = tarfile.open( file, "r:gz" )
> > >             for locFile in xf:
> > >                 logfile = xf.extractfile( locFile )
> > >                 validFileFlag = True
> > >                 # iterate through each log file, grab the first and
> > > the last lines
> > >                 lines = iter( logfile )
> > >                 firstLine = lines.next()
> > >                 for nextLine in lines:
> > >                     ....
> > >                         continue
>
> > >                 logfile.close()
> > >                  ...
> > >             xf.close()
> > >         except Exception, e:
> > >             validFileFlag = False
> > >             msg = "\nCould not open the log file: " + repr(file) + "
> > > Exception: " + str(e) + "\n"
> > >  else:
> > >         validFileFlag = False
> > >         lTime = extractFileNameTime( file )
> > >         msg = ">>>>>>> Warning " + file + " is NOT a valid tar archive
> > > \n"
> > >         print msg
>
> > I haven't used tarfile, but this feels like a problem with the Win/Unix
> > line endings.  I'm going to assume you're running on Windows, which
> > could trigger the problem I'm going to describe.
>
> > You use 'file' to hold something, but don't show us what.  In fact, it's
> > a lousy name, since it's already a Python builtin.  But if it's holding  
> > fileobj, that you've separately opened, then you need to change that
> > open to use mode 'rb'
>
> > The problem, if I've guessed right, is that occasionally you'll
> > accidentally encounter a 0d0a sequence in the middle of the (binary)
> > compressed data.  If you're on Windows, and use the default 'r' mode,
> > it'll be changed into a 0a byte.  Thus corrupting the checksum, and
> > eventually the contents.
>
> > DaveA
>
> Hi,
>
> thanks for the comments - I'll change the variable name.
>
> I am running this on linux so don't think its a Windows issue.  So if
> that's the case
> is the 0d0a still an issue?
>
> 'mark

Oh and what's stored currently in
The file var us just the unopened pathname to the
Target file I want to open




More information about the Python-list mailing list