[Python-Dev] Identifying magic prefix on Python files?

Eric S. Raymond esr@thyrsus.com
Sun, 4 Feb 2001 19:34:41 -0500


Tim Peters <tim.one@home.com>:
> > The first eight bytes of a PNG file always contain the following
> > values:
> >
> >    (decimal)              137  80  78  71  13  10  26  10
> >    (hexadecimal)           89  50  4e  47  0d  0a  1a  0a
> >    (ASCII C notation)    \211   P   N   G  \r  \n \032 \n
> 
> Cool!  I vote we take it exactly.  I don't even know what PNG is, so it's
> doubtful my Windows box will be confused by decorating Python files the same
> way <wink>.
> 
> > The first two bytes distinguish PNG files on systems that expect
> > the first two bytes to identify the file type uniquely.
> > The first byte is chosen as a non-ASCII value to reduce the
> > probability that a text file may be misrecognized as a PNG file; also,
> > it catches bad file transfers that clear bit 7.
> 
> OK, I suggest (decimal) 143 for Python's first byte.  That's a "control
> code" in Latin-1, and (unlike PNG's 137) not even Windows assigns it to a
> character in their Latin-1 superset (yet).
> 
>     (decimal)              143  80  89  84  13  10  26  10
>     (hexadecimal)           8f  50  59  54  0d  0a  1a  0a
>     (ASCII C notation)    \217   P   Y   T  \r  \n \032 \n

\217 is good.  It doesn't occur in /usr/share/magic at all, which
is a good sign.   Why just PYT, though?  Why not spell out "Python"?
That would let us detect case-smashing, too.  
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

False is the idea of utility that sacrifices a thousand real advantages for
one imaginary or trifling inconvenience; that would take fire from men because
it burns, and water because one may drown in it; that has no remedy for evils
except destruction.  The laws that forbid the carrying of arms are laws of
such a nature.  They disarm only those who are neither inclined nor determined
to commit crimes.
        -- Cesare Beccaria, as quoted by Thomas Jefferson's Commonplace book