Python isn't necessarily slow

Mon Mar 4 04:23:37 EST 2002

Python isn't necessarily slow:

When one scrambles through the dejanews-archive he unavoidably will get
the impression that Python is a
nice toy but hardly ever usable for real numerical computations; this is
even supported by the silly language-shootout page.

After my nightmare using Clean (if you are not a language-junkie and
believe -- as I do -- programming
should only be a side-effect in your life, then you should stay away
from Clean: it is a useless big
pile of shit! There were even the time,  and I derided the Python
programmer. Okay, Clean is very, very
fast!), I decided for myself to use Common Lisp for my scientific work.
Though, most of my colleagues
are using IDL (and Fortran90/95 -- not Fortran77! -- for  their
numerical code). Common Lisp has many
similarities with Clean, except that one gets really results in his life
(Clean's array-type-system is
an impudent scandal for its own), but there is one weak point: On this
earth,  there are only 2 serious
implementations, which one wants to work with: Allegro Common Lisp and
Lisp Works. Both are commercial
ones and one of them is very, very expensive (prices beeing a secret,
but there are rumors that an
Allegro Common Lisp license is above USD 4000.-; and one would have to
pay an extra fee for
distributing, commercially, his programs). Okay, there is the free CMUCL
out there for Unix; but they
believe one wants to use Emacs for editing...

Another problem with Lisp: there are nearly no libraries available for
plotting (okay, gnuplot; but
gnuplot is not a plotting library which one wants to make plots with;
show your gnuplot graphics to your
IDL-colleague).

But thanks to F. Dubois (he did even a good job with his column in
"Computing in Science and
Engineering" in order to push Python; otherwise a great many of
scientists hadn't hardly ever
contemplated to take a look at Python) and especially Michels for
contributing DISLIN to Python. I
installed Python due to DISLIN and  NumPy a few days ago. 
I think actually that NumPy in combination with DISLIN is a serious
competitor to IDL or Matlab. DISLIN
even lets you draw some simple maps.

So, comp.lang.python is not the stage for life-stories.

I have to deal a lot with files of the following scheme:

122.233,22.23,344.545,566.67,...
1.22,34.445,566.677,8.889,...
...

I wrote some functions in Common Lisp in order to extract the values and
read them in into an array.
Normaly,  Common Lisp is fast and I didn't have the requirement to speed
it up. But the often strained
"80% of C speed" is pure nonsense. Yes, you can reach it, but then you
would wasting the rest of your
life in order to declare types (and declaring types in Lisp is much
different as lets say in C or
Fortran). Hence: declaring types in Lisp is hardly ever a matter; even
specialists (e.g. Guy Steele)
will agree on them.

After reading the Python tutorial I wrote a function in order to extract
the floating-point values
from a file. It is certainly not the same function as in Lisp, because
in
Lisp one can read in everything
into an array:

1.223,NaN,2.333,233,444
NaN,2.334,3.445,1.223

My Python function sets a value (default -1.0) when it encounters "NaN".
My Python function is using the
NumPy library for the arrays. I am using only the arrays; but I have
also red that some people complain
that NumPy in  combination with native Python loops is way too slow. But
I cannot confirm this (see
below).

Sometimes I have to deal with large files: 8000 lines and 34 columns
(about 2.5MB files). Reading this
file into a Lisp array takes on my 1000MhZ Celeron Notebook with 256MB
RAM under Windows XP:

Allegro Common Lisp (demo version; it is free): 30sec
Lisp Works (personal edition; it is free): 30sec

[Compiler settings as fast as possible: (proclaim '(optimize (speed 3)
(safety 1) (space 0) (debug 0)))]

And to my surprise the Python function takes about 25 sec.

A Clean version takes 13sec.

[I am not sure how to use the profiler in Python; the profiler
(sometimes the documentation is really
weak in this respect) reports 70sec. This is clearly wrong.]

I am not claiming here that Python is faster than Lisp (I am not sure
how it had performed without the
NumPy-library). But as I have written above: Declaring in Lisp would at
least be impossible for me
(okay, my Lisp experience is about 4 month only). 

But this has been a real (micro) life application (with the nice sugar
that in Python I can extract some
values from the array an pass it on to a DISLIN plotting library;
whereas in Common Lisp I would have to
save it to a file and consult e.g. Yorick).

The Python code: (C) Siegfried Gonzi 2002

def readInFloats(file,nHeader=0, whatDel=',',NaN='NIL',NaN_NumPy=-1.0):
        f = open(file,'r')
        s = f.readlines()
        #
        cols = 1 + s[nHeader].count(whatDel)
        rows = len(s) 
        ergArray = zeros((rows,cols), Float)
        count_rows = nHeader
        for x in range(nHeader,rows):
                start = 0
                floatString = s[x]
                if not floatString.isspace():
                        count_rows = count_rows + 1
                        for y in range(cols):
                                if y < (cols - 1):
                                        indx =
floatString[start:].find(whatDel)
                                        dummy =
floatString[start:(start+indx)]
                                        if dummy == NaN:
                                            ergArray[x,y] = NaN_NumPy
                                        else:
                                            ergArray[x,y] = float( dummy
)
                                        start = start + indx + 1
                                else:
                                    dummy = floatString[start:]
                                    if dummy == NaN:
                                        ergArray[x,y] = NaN_NumPy
                                    else:
                                        ergArray[x,y] = float( dummy )
        print 'fertig'                  
        return ergArray[nHeader:count_rows,]

[I have used an extra count_rows because often there is an extra line in
a file without any datas (e.g.
a blank line by accident)]

Enclosed is also the Lisp and Clean code.

Surely, my Lisp and Python code is not the perfect one. But you may not
forget that my Python experience
is a few days, whereas my Lisp experience is a few month. One can argue:
the second time, even in a
different language, it is always easier to implement the same. But this
also holds true for Lisp,
because I implemented the same in Clean last summer, before I even
encountered Common Lisp.

I wrote this all, because I have sometimes seen people in comp.lang.lisp
which are saying that they come
from Python and want to use Lisp because Python is very slow. Surely, a
matrix-matrix multiplication of
the dimension 1024x1024 in Common Lisp (only a simple code without any
declarations) is about 5 to 10
times faster than a native Python code with native Python arrays (a
Clean version is even about 100
times faster). This is a  fact. But there is NumPy out there and most of
the time a scientific data
evaluation does not include array handling ala matrix-matrix
multiplication. Often you steer through
your data sequentially.

And when one does declare (for speed) arrays  in Lisp then he cannot use
a mixed array (lets say strings
and numbers)and this would be a same array as an NumPy array. This is
important to know, because
sometimes it is said that a NumPy library is a weakness, because Lisp
has got all this native and is way
faster. But this is only superficial the case.

Personally, I do not know the Python community, but what I have seen so
far is the fact that they are
way more open-minded for the scientific programmer (try to find any
numerical libraries for Lisp --
except statistics;  or any plotting libraries). Most of the time the
Lispers are busy to fall out with
God and the rest of the world in order to show that a Lisp programmer is
God's gift to the mankind.

In Clean there is also the programming with whitespaces (except that
more than one indentation at once
is allowed); but I haven't had any problems with this and hence I think
this will not impede my Python
programming.

There is one weak point with Lisp: you can only read/write 1 byte as
binary. I am not sure, why nobody
in the Lisp community is reading binary files. 

The native Python IDE should become improved (at least on Windows);
compared to a commercial Lisp
environment  the IDE looks more or less like an
amateur-programmer-project. It is stable (but not as I
know it from Allegro Common Lisp), but the delay-time and behavior of
the windows is nearly often
unacceptable. Okay, the general feeling is like developing as with a
Lisp editor; this is a fine thing
(especially developing code at the command line).

S. Gonzi
Enclosed the Lisp and Clean version for the sake of completeness:

==
Common Lisp version:
==
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Zaehle die Zeilen in einem File
;;
;; Funktion: (count-lines dateiFile :nHeader 0 :break-point "999")
;; Parameter: dateiFile...File mit den Daten
;;            :nHeader...Anzahl der Kopfzeilen
;;            :break-point...Soll an einer bestimmten Stelle abgebrochen
werden?
;;
;; Ausgabe: Anzahl der Zeilen
;;
;; (C) Siegfried Gonzi 2002
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defun count-lines (datei &key (nHeader 0) (break-point "999"))
  (let ((n 0)
        (next-line t))
    (with-open-file (infile datei)
      (read-header-lines infile nHeader)
      (loop while(not (eq next-line nil)) do
            (setf next-line (read-line infile nil))
            (if (equal next-line break-point)
                (setf  next-line nil))
            (setf n (+ n 1))))
n))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Ueberlese N Kopfzeilen in einem File
;;
;; Funktion: (read-header-lines dateiFile (nHeader 0))
;; Parameter: dateiFile...File
;;            nHeader (optional)...Anzahl der Kopfzeilen
;; 
;; Ausgabe: Filepointer am Ende der Kopfzeile
;;
;; (C) Siegfried Gonzi 2001
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defun read-header-lines (infile &optional (nHeader 0))
               (loop :for i :from 0 :below nHeader :do
                     (read-line infile nil)))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Zaehle einen gewuenschten Character in einem String, und liefere
;; dessen Position:
;;
;;   z.B.:  1.23,2.233,23.34,45.56
;;
;;     liefert eine Liste mit: (4 10 16)
;;
;; Funktion: (get-del-list numberString del)
;; Parameter: numberString...String welcher ausgewertet werden soll
;;            del...Begrenzer (z.B. ",")
;; 
;; Ausgabe: Liste mit den Positionen
;;
;; (C) Siegfried Gonzi 2002
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defun get-del-list (numberString del)
  (let* ((n (length numberString))
            (el 0)
            (el-list nil))
               (loop :for i :from 0 :below n :do
                     (setf el (aref numberString i))
                     (if (string= el del)
                         (setf el-list (append el-list (list i)))))
el-list))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Lese eine Datei,welche folgendes Format besitzt:
;;
;;   Kopfzeile 1
;;   Kopfzeile 2
;;   Kopfzeile n
;;   12.23,23.34,1.234,34.456,2.345,56.67,...
;;   1.23,233.44,3.45,2.34,45.5677,6.778,...
;;   ...
;;
;;   Funktion: (read-in-array-del dateiFile :nHeader 0 :break-point
"999" :del ",")
;;   Parameter:   dateiFile...Datei mit den Werten; Allegro CL
Konvention:
;;                         z.B.: C:/Wissenschaft/R/D/....
;;                :nHeader...Anzahl der Kopfzeilen, die überlesen werden
;;                :break-point...An welcher Stelle soll - ausser dem EOF
- 
;;                               abgebrochen werden
;;                :del...Begrenzer (im obigen Beispiel ',')
;;   Externe Funktionen:   count-lines, read-header-lines, get-del-list,
;;
;;   Ausgabe: Array, welches die Werte in Spalten und Zeilen enthaelt
;;
;; (C) Siegfried Gonzi 2002
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defun read-in-array-del (datei &key (nHeader 0) (break-point "999")
(del ","))
  (let* ((rows (- (count-lines datei :nHeader nHeader :break-point
break-point) 1))
         (cols nil)
         (val nil)
         (next-line nil)
         (indx-list ())
         (indx-start nil)
         (indx-end nil)
         (array nil))
    (with-open-file (infile datei)
                 (read-header-lines infile nHeader)
                 (loop :for i :from 0 :below rows :do
                       (setf next-line (read-line infile nil))
                       (setf indx-list (get-del-list next-line del))
                       (setf indx-list (append indx-list (list  (length
next-line))))
                       (if (< i 1)
                           (progn
                             (setf cols  (length indx-list))
                             (setf array (append (make-array (list rows
cols))))))
                       (setf indx-start 0)
                       (setf indx-end -1)
                       (loop :for j :from 0 :below cols :do
                             (setf indx-start (+ indx-end 1))
                             (setf indx-end (- (nth j indx-list) 0))
                             (with-input-from-string (s next-line :index
k :start indx-start :end indx-end)
                               (setf val (read s)))
                             (setf (aref array i j) val))))
array))
==
End Common Lisp version
==

======
Clean version: (C) Siegfried Gonzi 2001
======

module convertSeasonD
import StdEnv

In11 :== "nij.txt"
Nh :== 0
T :== ','
Sn :== "999\n"

//Start:: !*World -> (!Real,!*World)
Start world = accFiles( ACopyDatumTimeString_Season Nh T Sn In11 ) world

ACopyDatumTimeString_Season:: !Int !Char !String !String  !*Files ->
(!Real,!*Files)
ACopyDatumTimeString_Season nh  char_d  string_n  inputfile  file
        #! ( readok, infile,file ) = sfopen inputfile FReadText file
        | not readok = abort ( "Cannot read inputfile: " +++ inputfile )
        //
        #! nLines = ( CountLines string_n infile )  
        #! ni = ( nLines - nh )

//                                                                                                                                      
        #! infiler = ReadNLines nh
infile                                       
        #! ( string, infile_dummy ) = sfreadline infiler
        #! nj = ( length (CharPosInString 0 char_d string) ) + 1
        #! array = MakeArray ni (nj+1)
        #! arrayread = ( AStringRead char_d array infiler )
        //#! arrayread =  {{arrayread.[x,y]\\x <- [0..(nj-1)]}\\y <-
[0..(ni-1)]}
        //
        #! erg = arrayread.[7994,23]
        = (erg,file)

AStringRead:: !Char  !{#*{#Real}} !File -> {#*{#Real}}
AStringRead char  marray file = fill_up 0 marray file
where
        fill_up:: !Int !{#*{#Real}} !File -> {#*{#Real}}
        fill_up n marray file
                | n == (size marray) = marray
                #! (string, file) = sfreadline file
                #! elem =  (RealfromString (DStringtoString_List char
string)) 
                #! elemo = {x\\x<- elem}
                = fill_up (n+1) { marray & [n] = elemo } file

RealfromString:: ![!String] -> ![!Real]
RealfromString [] = []
RealfromString [x:r] 
        #! el = toRealNaN( x )
        = [el : RealfromString r ]
where
        toRealNaN:: !String -> !Real
        toRealNaN str
                | str == "NIL" = -1.0
                | str == "NaN" = 1.0
                = toReal( str )

CharPosInString:: !Int !Char !String -> ![!Int]
CharPosInString n  char string=:{ [n] = nk }
        | n == (size string - 1) = []
        = filter ((<>) 0) [ (if (nk == char)  n 0) : CharPosInString (n
+ 1) char string ]

ConvertStringtoString_List:: !Int !String ![!Int] -> ![!String]
ConvertStringtoString_List n string list
        | n == (length list - 1) = []
        #! start = list !! n
        #! stop = list !! (n+1)
        = [ ( toString(char_list (start+1) stop string) ) :
ConvertStringtoString_List (n+1) string list ]
where
        char_list:: !Int  !Int !String -> ![!Char]
        char_list start stop string=:{ [start] = nk }
                | start == (stop ) = []
                = [ nk : char_list (start+1) stop string ]      

DStringtoString_List:: !Char !String -> ![!String]
DStringtoString_List char string = ConvertStringtoString_List 0 string
pos_list
where
        pos_list = [-1] ++ (CharPosInString 0 char string) ++ [size
string - 1]

CountLines:: !String !File -> !Int
CountLines  string_end file = ReadLines 0 file
where
      ReadLines:: !Int !File -> !Int
      ReadLines nLines file
            | sfend file = nLines
            #! (line,filerest) = sfreadline file
            | line == string_end = nLines
            = ReadLines (nLines + 1) filerest

ReadNLines:: !Int !File -> !File
ReadNLines n file
         | n == 0 = file
         #! (line, file) = sfreadline file
         = ReadNLines (n - 1) file

MakeArray:: !Int !Int -> {#*{#Real}}
MakeArray ni nj
             | ni <=0 || nj <=0 = abort("Negative array index")
             =  {{0.0\\x <- [0..(nj-1)]}\\y <- [0..(ni-1)]}

==
End Clean version
==