[Numpy-discussion] numpy.append & numpy.where vs list.append and brute iterative for loop

Dewald Pieterse dewald.pieterse at gmail.com
Thu Jan 27 16:33:43 EST 2011


On Thu, Jan 27, 2011 at 4:19 PM, Christopher Barker
<Chris.Barker at noaa.gov>wrote:

> On 1/27/11 1:03 PM, Dewald Pieterse wrote:
>
>> I am processing two csv files against another, my first implementation
>> used python list of lists and list.append to generate a new list while
>> looping all the data including the non-relevant data (can't determine
>> location of specific data element in a list of list). So I re-implented
>> the exact same code but using numpy.array's (2d arrays) using
>> numpy.where to prevent looping over an entire dataset needlessly but the
>> numpy.array based code is about 7.6 times slower?
>>
>
> Didn't look at your code in any detail, but:
>
> numpy arrays are not designed to be re-sizable, so numpy.append actually
> creates a new array, and copies the old to the new, along with the new
> stuff. It's a convenience function, but it means you are re-allocating and
> copying all your data with each call.
>
> python lists, on the other hand, are designed to be re-sizable, so they
> pre-allocate extra room, so that appending can be fast.
>
> In general, the recommended solution in this sort of situation is to build
> up your data in a python list, then convert it to an array.
>
> If I'm right about what you're doing you could keep the "rows" as numpy
> arrays, but put them in a list while building it up.
>

Thanks Chris, I believe this is the problem then, I can continue to use the
arrays as reference data but build list instead, the only reason I used the
arrays was to be able to use numpy.where, I just use both data types, best
of both worlds. As I already have row arrays I will do a build a list or
arrays.


> Also, a numpy array of strings isn't necessarily a great dats structure for
> this kind of data. YOu might want to look at structured arrays.
>

Atm, I use :
comit_eqp_reader = csv.reader(comit_eqp_file, delimiter=',', quotechar='"')
comit_eqp_lt = numpy.array([[col for col in row] for row in
comit_eqp_reader])
to setup the arrays, I will look at using structured arrays

>
> I wrote an appendable numpy array class a while back, to address this. It
> has some advantages, though, as it it written, not as much as you'd think.
> It does have some benifits for structured arrays, though.
>
>
> Code enclosed
>
> -Chris
>
>
>
>  relevant list of list code:
>>
>>    starttime = time.clock()
>>    #NI_data_list room_eqp_list
>>    NI_data_list_new = []
>>    for NI_row in NI_data_list:
>>         treelevel = NI_row[0]
>>         elevation = NI_row[1]
>>         locater = NI_row[2]
>>         area = NI_row[3]
>>         NIroom = NI_row[4]
>>         #Write appropriate equipment models and drawing into new list
>>         if NIroom != '':
>>             #Write appropriate equipment models and drawing into new list
>>             for row in room_eqp_list:
>>                 eqp_room = row[0]
>>                 if len(eqp_room) == 5:
>>                     eqp_drawing = row[1]
>>                     if NIroom == eqp_room:
>>                         newrow =
>>    [int(treelevel)+1,elevation,locater,area,NIroom,eqp_drawing]
>>                         NI_data_list_new.append(newrow)
>>             #Write appropriate piping info into the new list
>>             for prow in unique_piping_list:
>>                 pipe_room = prow[0]
>>                 if len(pipe_room) == 5:
>>                     pipe_drawing = prow[1]
>>                     if pipe_room == NIroom:
>>                         piperow =
>>    [int(treelevel)+1,elevation,locater,area,NIroom,pipe_drawing]
>>                         NI_data_list_new.append(piperow)
>>         #Write appropriate equipment models and drawing into new list
>>         if (locater != '' and NIroom == ''):
>>             #Write appropriate equipment models and drawing into new list
>>             for row in room_eqp_list:
>>                 eqp_locater = row[0]
>>                 if len(eqp_locater) == 4:
>>                     eqp_drawing = row[1]
>>                     if locater == eqp_locater:
>>                         newrow =
>>    [int(treelevel)+1,elevation,eqp_locater,area,'',eqp_drawing]
>>                         NI_data_list_new.append(newrow)
>>             #Write appropriate piping info into the new list
>>             for prow in unique_piping_list:
>>                 pipe_locater = prow[0]
>>                 if len(pipe_locater) == 4:
>>                     pipe_drawing = prow[1]
>>                     if pipe_locater == locater:
>>                         piperow =
>>    [int(treelevel)+1,elevation,pipe_locater,area,'',pipe_drawing]
>>                         NI_data_list_new.append(piperow)
>>         #Rewrite NI_data to new list
>>         if NIroom == '':
>>             NI_data_list_new.append(NI_row)
>>
>>    print (time.clock()-starttime)
>>
>>
>> relevant numpy.array code:
>>
>>    NI_data_write_url = reports_dir + 'NI_data_room2.csv'
>>    NI_data_list_file = open(NI_data_write_url, 'wb')
>>    NI_data_list_writer = csv.writer(NI_data_list_file, delimiter=',',
>>    quotechar='"')
>>    starttime = time.clock()
>>    #NI_data_list room_eqp_list
>>    NI_data_list_new = numpy.array([['TreeDepth', 'Elevation',
>>    'BuildingLocater', 'Area', 'Room', 'Item']])
>>    for NI_row in NI_data_list:
>>         treelevel = NI_row[0]
>>         elevation = NI_row[1]
>>         locater = NI_row[2]
>>         area = NI_row[3]
>>         NIroom = NI_row[4]
>>         #Write appropriate equipment models and drawing into new array
>>         if NIroom != '':
>>             #Write appropriate equipment models and drawing into new array
>>             (rowtest, columntest) = numpy.where(room_eqp_list==NIroom)
>>             for row_iter in rowtest:
>>                 eqp_room = room_eqp_list[row_iter,0]
>>                 if len(eqp_room) == 5:
>>                     eqp_drawing = room_eqp_list[row_iter,1]
>>                     if NIroom == eqp_room:
>>                         newrow =
>>
>>  numpy.array([[int(treelevel)+1,elevation,locater,area,NIroom,eqp_drawing]])
>>                         NI_data_list_new =
>>    numpy.append(NI_data_list_new, newrow, 0)
>>
>>             #Write appropriate piping info into the new array
>>             (rowtest, columntest) =
>>    numpy.where(unique_room_piping_list==NIroom)
>>             for row_iter in rowtest: #unique_room_piping_list
>>                 pipe_room = unique_room_piping_list[row_iter,0]
>>                 if len(pipe_room) == 5:
>>                     pipe_drawing = unique_room_piping_list[row_iter,1]
>>                     if pipe_room == NIroom:
>>                         piperow =
>>
>>  numpy.array([[int(treelevel)+1,elevation,locater,area,NIroom,pipe_drawing]])
>>                         NI_data_list_new =
>>    numpy.append(NI_data_list_new, piperow, 0)
>>         #Write appropriate equipment models and drawing into new array
>>         if (locater != '' and NIroom == ''):
>>             #Write appropriate equipment models and drawing into new array
>>             (rowtest, columntest) = numpy.where(room_eqp_list==locater)
>>             for row_iter in rowtest:
>>                 eqp_locater = room_eqp_list[row_iter,0]
>>                 if len(eqp_locater) == 4:
>>                     eqp_drawing = room_eqp_list[row_iter,1]
>>                     if locater == eqp_locater:
>>                         newrow =
>>
>>  numpy.array([[int(treelevel)+1,elevation,eqp_locater,area,'',eqp_drawing]])
>>                         NI_data_list_new =
>>    numpy.append(NI_data_list_new, newrow, 0)
>>             #Write appropriate piping info into the new array
>>             (rowtest, columntest) =
>>    numpy.where(unique_room_eqp_list==locater)
>>             for row_iter in rowtest:
>>                 pipe_locater = unique_room_piping_list[row_iter,0]
>>                 if len(pipe_locater) == 4:
>>                     pipe_drawing = unique_room_piping_list[row_iter,1]
>>                     if pipe_locater == locater:
>>                         piperow =
>>
>>  numpy.array([[int(treelevel)+1,elevation,pipe_locater,area,'',pipe_drawing]])
>>                         NI_data_list_new =
>>    numpy.append(NI_data_list_new, piperow, 0)
>>         #Rewrite NI_data to new list
>>         if NIroom == '':
>>             NI_data_list_new = numpy.append(NI_data_list_new,[NI_row],0)
>>
>>    print (time.clock()-starttime)
>>
>>
>> some relevant output
>>
>>     >>> print NI_data_list_new
>>    [['TreeDepth' 'Elevation' 'BuildingLocater' 'Area' 'Room' 'Item']
>>      ['0' '' '1000' '' '' '']
>>      ['1' '' '1000' '' '' 'docname Rev 0']
>>      ...,
>>      ['5' '6' '1164' '4' '' 'eqp11 RB, R. surname, 24-NOV-08']
>>      ['4' '6' '1164' '4' '' 'anotherdoc Rev A']
>>      ['0' '' '' '' '' '']]
>>
>>
>> Is numpy.append so slow? or is the culprit numpy.where?
>>
>> Dewald Pieterse
>>
>> "A democracy is nothing more than mob rule, where fifty-one percent of
>> the people take away the rights of the other forty-nine." ~ Thomas
>> Jefferson
>>
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> --
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Dewald Pieterse

"A democracy is nothing more than mob rule, where fifty-one percent of the
people take away the rights of the other forty-nine." ~ Thomas Jefferson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110127/90e953a3/attachment.html>


More information about the NumPy-Discussion mailing list