Python pattern repository

Chris Liechti cliechti at gmx.net
Tue Oct 15 21:19:26 EDT 2002


[posted and mailed]

bokr at oz.net (Bengt Richter) wrote in news:aoi76i$b1m$0 at 216.39.172.122:
> On Tue, 15 Oct 2002 18:33:13 GMT, Robin Munn <rmunn at pobox.com> wrote:
> The fact that there is so much that can be found via Google
...
> Introducing PyPAN: Python Pervasive Archive Network ;-)
> 
> Here's the concept: If you want to include your code snippet
> in the PyPAN, post it embedded in a document that Google will see.

while i think your idea is nice, this can be the first problem...
google is only includeing pages that are linked somewhere. it took many 
months until it indexed my page with a gcc port...
 
> You embed it for easy extraction by putting a PyPAN expression
> in the first and last (+/- 1, discussed later[1]) lines of your
> snippet, e.g., 
> 
> # ++PyPAN++ mySnippet.py /clp/forcomment/ -- minimal PyPan snippet
> def mySnippet():
>     print 'Hello PyPAN!'
> # --PyPAN--
> 
> I think Google would find '++PyPan++' and show an interesting list.

why do you use "+-" etc in the marker? those are special characters to 
google and it ignores punctuation/special chars in other cases.
i'd stay with letters only.

> The "/clp/forcomment/" part of the expression is optional, but the
> intent is to express the location of mySnippet.py in a classification
> hierarchy, to aid in searching, to limit hits to particular topics
> etc. mySnippet.py is a recommended file name, and comes first after
> the '++PyPAN++' tag. 
> 
> The classification path is also for optional use as an actual
> directory path which can be rooted anywhere convenient for the user
> (e.g., ~/PyPAN or C:\pywk\PyPan etc.) and thereby support automatic
> extraction/downloading/placement from e.g., newsgroup archives, disk
> files, etc. 
> 
> One common usage would be to see a PyPAN snippet in a post -- like the
> above in this post. To make it available to the extraction tool as a
> file, I wrote a little program [2] called getclip.exe which simply
> gets the text from the windows clipboard and writes it to stdout. This
> makes the clipboard visible as a file object using os.popen('getclip')
> -- which you can pass to anything that wants to read a file. (getclip
> is also handy outside of Python, since you can easily pipe the output
> or redirect it to a file, without having to go to an editor an pasting
> and saving-as. Instead you just type getclip>theFile.txt). 
> 
> The intent is not to require you to select the exact lines, but just
> do a select-all, copy or whatever is easy. Then getclip will make all
> that available to the actual snippet extractor, which can put it in
> particular directories, etc. 
> 
> I am putting together a PyPAN.py module to provide convenient methods
> for retrieving PyPAN snippets from clipboard, files, or urls, etc. by
> regex pattern matches, but it's not finished. It will be runnable from
> the command line or importable for programmatic use. There will be 
> options for file placement similar to winzip extraction. I.e., you can
> ignore paths and put everything in a specified directory, or you can
> root the paths where you like etc. I guess if there is no interest in
> PyPAN, I may only be able to retrieve my own snippets ;-) 

oh, i think if it's that simple to use that many people will use the marker 
in their news posts.

> In any case, I would be interested in hearing of any standard
> hierarchy for classifying software. Is there real librarian in the
> house? 

well i'm no expert in that area... but i found the reverse URL type of 
hierarchy of Java a good idea. that way you avoid name conflicts. on the 
other hand many people will prefer an order by topic rather than 
organization/author. 

> ---------------
> [1] Variations on the PyPAN tags:
> (Note that PyPAN will search based on space-delimited tags, therefore
> quoting them as in the following makes them safe against inadvertently
> interfering with searching for an actual snippet like the (not quite)
> minimal one above). 
> 
> '++PyPAN++'   => start with current line
> '++PyPAN++-'  => start with previous line
> '++PyPAN+++'  => start with next line
> '++PyPAN--'   => reserved for future expressions within a snippet
> '--PyPAN--'   => end with current line
> '--PyPAN---'  => end with previous line
> '--PyPAN--+'  => end with next line

as mentioned above, i'm not sure how well the special characters will work. 
e.g. google searches for entire words and ignores special chars, but i 
think it understands "+" and "-" as include/exclude word so that --PyPAN--
would possibly mean no results containing that string...

why do you want to complicate thigs with that many magic tags anyway?
why not '# PyPANsnippet filename.py /hier/archy version'

> ---------------
> 
> [2]
> /* ++PyPAN++ getclip.c -- get and write win32 clipboard text to stdout
> */ /*
> ** To compile with msvc++60 at command line use
> **      cl getclip.c /link /defaultlib:user32
> */
> 
> #include <io.h>
> #include <string.h>
> #include <windows.h>
> 
> int main (){
>     HANDLE hClipData;                   /* handle to clip data  */ 
>     LPSTR lpClipData;                   /* pointer to clip data */ 
>     if (!OpenClipboard(NULL)) return 0; /* NULL <=> current task */
>     /* get text from the clipboard */ 
>     if( (hClipData = GetClipboardData(CF_TEXT)) &&
>         (lpClipData = GlobalLock(hClipData))
>     ){ 
>         write(1, lpClipData, strlen(lpClipData)); /*text string to
>         stdout */ GlobalUnlock(hClipData); 
>         CloseClipboard(); return 0;
>     } else {
>         CloseClipboard(); return 1; 
>     }
> }
> /* --PyPAN-- */
> 
> 
> Note: The cl command assumes environment settings, which
> you can set by invoking D:\VC98\Bin\VCVARS32.BAT (or
> whatever your path to it is).
> 
> BTW, getclip.exe is not big (freshly recompiled):
> 
>     02-10-15  15:58                 24,576 getclip.exe

hehe. 3'584 Bytes GCC/stripped.

--makefile--
CFLAGS = -mno-cygwin
getclip.exe: getclip.o
	$(CC) -mno-cygwin -o $@ $^
	strip $@
------------

> Further ideas?

as mentioned above, google is a bit picky if a site is not referenced 
anywhere. so it would mabe make sense if there were a page (where many 
links point to) where anyone could enter his URL, so that the chances are 
increased that it is found by google and other search engines.
a simple wikki page would do, or a mailinglist with HTML accessible 
archive.
a bunch of people should place a link to that site on their pages to 
increase the google pagerank and to increase the probability that the 
linked pages are found.

i think this is a nice idea. with your PyPAN module there could be an easy 
access to the information and we get a lot of infrastructure "sponsored" 
(storage space for the snippets (with redundancy :-) search engines, ...)

it may be a bit slow 'cause most pages are not refreshed that often. so it 
may take a week or two until a snipped is indexed.

and an other problem might come up. i'll just mention "versions"...
it will become difficult to find the best snipped if there are so many 
similar entries etc.

oh, and maybe there should be some convention for keywords, like that they 
should follow right after the tagline as python comment or so. or do you 
think that the source code + message around it is enough to find it?

there will be ground for many tools, like pygoogle combined with xx etc, 
indexing bots, that collect all snippets and place them on their own page, 
spcialized search engines, ...

chris

-- 
Chris <cliechti at gmx.net>




More information about the Python-list mailing list