looking for speed-up ideas

William Park opengeometry at yahoo.ca
Mon Feb 3 22:32:08 EST 2003


Andrew Dalke <adalke at mindspring.com> wrote:
> William Park wrote:
>> Behold:
>>     egrep '^F' dumpfile | sort -t '/' -n -k 2,2 | tail -200
>> 
>> How fast does it run?
> 
> That was my first thought too.  The problem is that it doesn't
> keep track of the directory names, which is needed to display
> the full path name, which I believe he dumps in
> 
>     for t in all_file_list:
>         print t[2], t[1], get_dir_name(t[3])
> 
> It's too bad he didn't include example output.

In that case, generate another file with full pathnames.

	T /remote 0
	S/name/0/1
	S/joe/1/2
	S/bob/1/3
	F/3150900/big_file.tar.gz
	S/testing/3/4
	F/414/.envrc
	F/276/BUILD_FLAGS
	F/36505/make.incl
	F/3861/build_envrc

Let's see, using '@' for pathname separator...
    
    awk 'BEGIN {dir[0] = "remote" ; OFS = "/" ; FS = "/"}
	$1 ~ /^S$/ { $2 = dir[$4] = pwd = dir[$3] "@" $2 }
	$1 ~ /^F$/ { $3 = pwd "@" $3 ; print}
    ' dumpfile | sort -t '/' -n -k 2,2 | tail -200

Python translation is left as homework for the OP.

-- 
William Park, Open Geometry Consulting, <opengeometry at yahoo.ca>
Linux solution for data management and processing. 




More information about the Python-list mailing list