Scanning a file

Steven D'Aprano steve at REMOVEMEcyber.com.au
Tue Nov 1 20:37:56 EST 2005


David Rasmussen wrote:

> Lasse Vågsæther Karlsen wrote:
> 
>> David Rasmussen wrote:
>> <snip>
>>
>>> If you must know, the above one-liner actually counts the number of 
>>> frames in an MPEG2 file. I want to know this number for a number of 
>>> files for various reasons. I don't want it to take forever.
>>
>>
>> Don't you risk getting more "frames" than the file actually have? What 
>> if the encoded data happens to have the magic byte values for 
>> something else?
>>
> 
> I am not too sure about the details, but I've been told from a reliable 
> source that 0x00000100 only occurs as a "begin frame" marker, and not 
> anywhere else. So far, it has been true on the files I have tried it on.

Not too reliable then.

0x00000100 is one of a number of unique start codes in 
the MPEG2 standard. It is guaranteed to be unique in 
the video stream, however when searching for codes 
within the video stream, make sure you're in the video 
stream!

See, for example, 
http://forum.doom9.org/archive/index.php/t-29262.html

"Actually, one easy way (DVD specific) is to look for 
00 00 01 e0 at byte offset 00e of the pack. Then look 
at byte 016, it contains the size of the extension. 
Resume your scan at 017 + contents of 016."

Right. Glad that's the easy way.

I really suspect that you need a proper MPEG2 parser, 
and not just blindly counting bytes -- at least if you 
want reliable, accurate counts and not just "number of 
frames, plus some file-specific random number". And 
heaven help you if you want to support MPEGs that are 
slightly broken...

(It has to be said, depending on your ultimate needs, 
"close enough" may very well be, um, close enough.)

Good luck!



-- 
Steven.




More information about the Python-list mailing list