[Numpy-discussion] numpy.append & numpy.where vs list.append and brute iterative for loop

Christopher Barker Chris.Barker at noaa.gov
Thu Jan 27 16:19:43 EST 2011


On 1/27/11 1:03 PM, Dewald Pieterse wrote:
> I am processing two csv files against another, my first implementation
> used python list of lists and list.append to generate a new list while
> looping all the data including the non-relevant data (can't determine
> location of specific data element in a list of list). So I re-implented
> the exact same code but using numpy.array's (2d arrays) using
> numpy.where to prevent looping over an entire dataset needlessly but the
> numpy.array based code is about 7.6 times slower?

Didn't look at your code in any detail, but:

numpy arrays are not designed to be re-sizable, so numpy.append actually 
creates a new array, and copies the old to the new, along with the new 
stuff. It's a convenience function, but it means you are re-allocating 
and copying all your data with each call.

python lists, on the other hand, are designed to be re-sizable, so they 
pre-allocate extra room, so that appending can be fast.

In general, the recommended solution in this sort of situation is to 
build up your data in a python list, then convert it to an array.

If I'm right about what you're doing you could keep the "rows" as numpy 
arrays, but put them in a list while building it up.

Also, a numpy array of strings isn't necessarily a great dats structure 
for this kind of data. YOu might want to look at structured arrays.

I wrote an appendable numpy array class a while back, to address this. 
It has some advantages, though, as it it written, not as much as you'd 
think. It does have some benifits for structured arrays, though.


Code enclosed

-Chris



> relevant list of list code:
>
>     starttime = time.clock()
>     #NI_data_list room_eqp_list
>     NI_data_list_new = []
>     for NI_row in NI_data_list:
>          treelevel = NI_row[0]
>          elevation = NI_row[1]
>          locater = NI_row[2]
>          area = NI_row[3]
>          NIroom = NI_row[4]
>          #Write appropriate equipment models and drawing into new list
>          if NIroom != '':
>              #Write appropriate equipment models and drawing into new list
>              for row in room_eqp_list:
>                  eqp_room = row[0]
>                  if len(eqp_room) == 5:
>                      eqp_drawing = row[1]
>                      if NIroom == eqp_room:
>                          newrow =
>     [int(treelevel)+1,elevation,locater,area,NIroom,eqp_drawing]
>                          NI_data_list_new.append(newrow)
>              #Write appropriate piping info into the new list
>              for prow in unique_piping_list:
>                  pipe_room = prow[0]
>                  if len(pipe_room) == 5:
>                      pipe_drawing = prow[1]
>                      if pipe_room == NIroom:
>                          piperow =
>     [int(treelevel)+1,elevation,locater,area,NIroom,pipe_drawing]
>                          NI_data_list_new.append(piperow)
>          #Write appropriate equipment models and drawing into new list
>          if (locater != '' and NIroom == ''):
>              #Write appropriate equipment models and drawing into new list
>              for row in room_eqp_list:
>                  eqp_locater = row[0]
>                  if len(eqp_locater) == 4:
>                      eqp_drawing = row[1]
>                      if locater == eqp_locater:
>                          newrow =
>     [int(treelevel)+1,elevation,eqp_locater,area,'',eqp_drawing]
>                          NI_data_list_new.append(newrow)
>              #Write appropriate piping info into the new list
>              for prow in unique_piping_list:
>                  pipe_locater = prow[0]
>                  if len(pipe_locater) == 4:
>                      pipe_drawing = prow[1]
>                      if pipe_locater == locater:
>                          piperow =
>     [int(treelevel)+1,elevation,pipe_locater,area,'',pipe_drawing]
>                          NI_data_list_new.append(piperow)
>          #Rewrite NI_data to new list
>          if NIroom == '':
>              NI_data_list_new.append(NI_row)
>
>     print (time.clock()-starttime)
>
>
> relevant numpy.array code:
>
>     NI_data_write_url = reports_dir + 'NI_data_room2.csv'
>     NI_data_list_file = open(NI_data_write_url, 'wb')
>     NI_data_list_writer = csv.writer(NI_data_list_file, delimiter=',',
>     quotechar='"')
>     starttime = time.clock()
>     #NI_data_list room_eqp_list
>     NI_data_list_new = numpy.array([['TreeDepth', 'Elevation',
>     'BuildingLocater', 'Area', 'Room', 'Item']])
>     for NI_row in NI_data_list:
>          treelevel = NI_row[0]
>          elevation = NI_row[1]
>          locater = NI_row[2]
>          area = NI_row[3]
>          NIroom = NI_row[4]
>          #Write appropriate equipment models and drawing into new array
>          if NIroom != '':
>              #Write appropriate equipment models and drawing into new array
>              (rowtest, columntest) = numpy.where(room_eqp_list==NIroom)
>              for row_iter in rowtest:
>                  eqp_room = room_eqp_list[row_iter,0]
>                  if len(eqp_room) == 5:
>                      eqp_drawing = room_eqp_list[row_iter,1]
>                      if NIroom == eqp_room:
>                          newrow =
>     numpy.array([[int(treelevel)+1,elevation,locater,area,NIroom,eqp_drawing]])
>                          NI_data_list_new =
>     numpy.append(NI_data_list_new, newrow, 0)
>
>              #Write appropriate piping info into the new array
>              (rowtest, columntest) =
>     numpy.where(unique_room_piping_list==NIroom)
>              for row_iter in rowtest: #unique_room_piping_list
>                  pipe_room = unique_room_piping_list[row_iter,0]
>                  if len(pipe_room) == 5:
>                      pipe_drawing = unique_room_piping_list[row_iter,1]
>                      if pipe_room == NIroom:
>                          piperow =
>     numpy.array([[int(treelevel)+1,elevation,locater,area,NIroom,pipe_drawing]])
>                          NI_data_list_new =
>     numpy.append(NI_data_list_new, piperow, 0)
>          #Write appropriate equipment models and drawing into new array
>          if (locater != '' and NIroom == ''):
>              #Write appropriate equipment models and drawing into new array
>              (rowtest, columntest) = numpy.where(room_eqp_list==locater)
>              for row_iter in rowtest:
>                  eqp_locater = room_eqp_list[row_iter,0]
>                  if len(eqp_locater) == 4:
>                      eqp_drawing = room_eqp_list[row_iter,1]
>                      if locater == eqp_locater:
>                          newrow =
>     numpy.array([[int(treelevel)+1,elevation,eqp_locater,area,'',eqp_drawing]])
>                          NI_data_list_new =
>     numpy.append(NI_data_list_new, newrow, 0)
>              #Write appropriate piping info into the new array
>              (rowtest, columntest) =
>     numpy.where(unique_room_eqp_list==locater)
>              for row_iter in rowtest:
>                  pipe_locater = unique_room_piping_list[row_iter,0]
>                  if len(pipe_locater) == 4:
>                      pipe_drawing = unique_room_piping_list[row_iter,1]
>                      if pipe_locater == locater:
>                          piperow =
>     numpy.array([[int(treelevel)+1,elevation,pipe_locater,area,'',pipe_drawing]])
>                          NI_data_list_new =
>     numpy.append(NI_data_list_new, piperow, 0)
>          #Rewrite NI_data to new list
>          if NIroom == '':
>              NI_data_list_new = numpy.append(NI_data_list_new,[NI_row],0)
>
>     print (time.clock()-starttime)
>
>
> some relevant output
>
>      >>> print NI_data_list_new
>     [['TreeDepth' 'Elevation' 'BuildingLocater' 'Area' 'Room' 'Item']
>       ['0' '' '1000' '' '' '']
>       ['1' '' '1000' '' '' 'docname Rev 0']
>       ...,
>       ['5' '6' '1164' '4' '' 'eqp11 RB, R. surname, 24-NOV-08']
>       ['4' '6' '1164' '4' '' 'anotherdoc Rev A']
>       ['0' '' '' '' '' '']]
>
>
> Is numpy.append so slow? or is the culprit numpy.where?
>
> Dewald Pieterse
>
> "A democracy is nothing more than mob rule, where fifty-one percent of
> the people take away the rights of the other forty-nine." ~ Thomas Jefferson
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Accumulator.zip
Type: application/zip
Size: 4703 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110127/325a474b/attachment.zip>


More information about the NumPy-Discussion mailing list