A comp.lang.python code snippet archive?

Mikael Olofsson mikael at isy.liu.se
Wed Mar 1 03:23:05 EST 2000


On 29-Feb-00 Hans Nowak wrote:
 >  Maybe some kind of "Python code detector" could be helpful, if only for a 
 >  first rough scan of all the messages. I have no idea how to do this though.
 >  (In Perl it's easy, you simply check if the number of $'s is above 
 >  average... <wink>)  Maybe it should look for reserved words like def or 
 >  import? But a snippet is not guaranteed to include those words, and a 
 >  message which does use them is not guaranteed to have Python code. =/

This may be hard. It all depends on what kind of snippets should qualify
for the archieve. Many snippets in the existing archieve use a lot of 
reserved words. However, some use very few. Below is a snippet from the 
existing archieve.

------------Begin snippet---------------
# 82.py
# Author: Joe Strout
# Subject: 2-dimensional array
# Packages: maths.numeric;basic_applications.arrays
# Requires: Numeric, obviously.

"""
> I like to define a 2-dimensional array of integer values.
> Something like array('i','i').

Use Numeric:
"""

import Numeric
foo = Numeric.zeros( (6,4), Numeric.Int16 )

"""
This creates a 6x4 matrix (array) of zeros of 16-bit integers.
"""
-------------End snippet----------------

As you see, there is only one reserved word here: import. So, if the 
Python Snippet Detector is supposed to detect this snippet based only 
on reserved words, it would have to detect almost every posting to the 
newsgroup. Perhaps though, it could be used not to filter out postings, 
but to group postings depending on the number of reserved words. The 
more reserved words there are in a posting, the higher is the 
probability that it contains actual code.

Another problem with reserved word detection is postings containing 
code in other programming languages (did someone already mention this) 
using the same reserved words. Yet another problem is that many of the 
reserved words are common in ordinary English. That's the very reason
that these words are used. They are easy to remember and to understand.

The bottom line here is that reserved word detection is far from enough
to create a useful Python snippet detector. We do need something more. 
Perhaps it could check for indentation aswell. After all, thats one of 
the things that is special for Python. 

Could it be possible to use parts of the interpreter?

Problems-are-supposed-to-be-solved-ly y'rs

/Mikael

-----------------------------------------------------------------------
E-Mail:  Mikael Olofsson <mikael at isy.liu.se>
WWW:     http://www.dtr.isy.liu.se/dtr/staff/mikael
Phone:   +46 - (0)13 - 28 1343
Telefax: +46 - (0)13 - 28 1339
Date:    01-Mar-00
Time:    08:42:24

This message was sent by XF-Mail.
-----------------------------------------------------------------------




More information about the Python-list mailing list