[Tutor] counting a list of elements

Sat Apr 2 08:30:20 CEST 2011

On 04/02/2011 07:00 AM, Knacktus wrote:
> Am 01.04.2011 21:31, schrieb Karim:
>> On 04/01/2011 08:41 PM, Knacktus wrote:
>>> Am 01.04.2011 19:16, schrieb Karim:
>>>>
>>>>
>>>> Hello,
>>>>
>>>> I would like to get advice on the best practice to count elements in a
>>>> list (built from scractch).
>>>> The number of elements is in the range 1e3 and 1e6.
>>>>
>>>> 1) I could create a generator and set a counter (i +=1) in the loop.
>>>>
>>>> 2) or simply len(mylist).
>>>>
>>>> I don't need the content of the list, indeed, in term of memory I 
>>>> don't
>>>> want to wast it. But I suppose len() is optimized too (C 
>>>> impementation).
>>>>
>>>> If you have some thought to share don't hesitate.
>>>
>>> Just a general suggestion: Provide code examples. I know most of the
>>> times you don't have code examples yet as you're thinking of how to
>>> solve your problems. But if you post one of the possible solutions the
>>> experienced guys here will very likely direct you in the proper
>>> direction. But without code it's hard to understand what you're after.
>>>
>>> Cheers,
>>>
>>> Jan
>>>
>>
>> Thank you all for you answers to clarified I built a collection of
>> dictionnaries which represent database query on a bug tracking system:
>>
>> backlog_tables , csv_backlog_table = _backlog_database(table=table,
>> periods=intervals_list)
>>
>> backlog_tables is a dictionnary of bug info dictionnaries. The keys of
>> backlog_tables is a time intervall (YEAR-MONTH) as shown below:
>>
>> backlog_tables= {'2011-01-01': [{'Assigned Date': datetime.date(2010,
>> 10, 25),
>> 'Category': 'Customer_claim',
>> 'Date': datetime.date(2010, 10, 22),
>> 'Duplicate Date': None,
>> 'Fixed Reference': None,
>> 'Headline': 'Impovement for all test',
>> 'Identifier': '23269',
>> 'Last Modified': datetime.date(2010, 10, 25),
>> 'Priority': 'Low',
>> 'Project': 'MY_PROJECT',
>> 'Reference': 'MY_PROJECT at 1.7beta2@20101006.0',
>> 'Resolved Date': None,
>> 'Severity': 'improvement',
>> 'State': 'A',
>> 'Submitter': 'Somebody'},
>> .....
>> }
>>
>> _backlog_database() compute the tuple backlog_tables , 
>> csv_backlog_table:
>> In fact csv_backlog_table is the same as backlog_tables but instead of
>> having
>> the query dictionnaries it holds only the number of query which I use to
>> create
>> a CSV file and a graph over time range.
>>
>> _backlog_database() is as follow:
>>
>> def _backlog_database(table=None, periods=None):
>> """Internal function. Re-arrange database table
>> according to a time period. Only monthly management
>> is computed in this version.
>>
>> @param table the database of the list of defects. Each defect is a
>> dictionnary with fixed keys.
>> @param periods the intervals list of months and the first element is the
>> starting date and the
>> the last element is the ending date in string format.
>> @return (periods_tables, csv_table), a tuple of periodic dictionnary
>> table and
>> the same keys dictionnary with defect numbers associated values.
>> """
>> if periods is None:
>> raise ValueError('Time interval could not be empty!')
>>
>> periods_tables = {}
>> csv_table = {}
>>
>> interval_table = []
>>
>> for interval in periods:
>> split_date = interval.split('-')
>> for row in table:
>> if not len(split_date) == 3:
>> limit_date = _first_next_month_day(year=int(split_date[0]),
>> month=int(split_date[1]), day=1)
>> if row['Date'] < limit_date:
>> if not row['Resolved Date']:
>> if row['State'] == 'K':
>> if row['Last Modified'] >= limit_date:
>> interval_table.append(row)
>> elif row['State'] == 'D':
>> if row['Duplicate Date'] >= limit_date:
>> interval_table.append(row)
>> # New, Assigned, Opened, Postponed, Forwarded, cases.
>> else:
>> interval_table.append(row)
>> else:
>> if row['Resolved Date'] >= limit_date:
>> interval_table.append(row)
>>
>> periods_tables[interval] = interval_table
>> csv_table[interval] = str(len(interval_table))
>>
>> interval_table = []
>>
>> return periods_tables, csv_table
>>
>>
>> This is not the whole function I reduce it on normal case but it shows
>> what I am doing.
>> In fact I choose to have both dictionnaries to debug my function and
>> analyse what's going
>> on. When everything will be fine I will need only the csv table (with
>> number per period) to create the graphs.
>> That's why I was asking for length computing. Honnestly, the actual
>> queries number is 500 (bug id) but It could be more
>> in other project. I was ambitious when I sais 1000 to 100000
>> dictionnaries elements but for the whole
>> list of products we have internally It could be 50000.
>
> I see some similarity with my coding style (doing things "by the 
> way"), which might not be so good ;-).
>
> With this background information I would keep the responsibilities 
> seperated. Your _backlog_database() function is supposed to do one 
> thing: Return a dictionary which holds the interval and a list of 
> result dicts. You could call this dict interval_to_result_tables (to 
> indicate that the values are lists). That's all your function should do.
>
> Then you want to print a report. This piece of functionality needs to 
> know how long the lists for each dictionary entry are. Then this 
> print_report function should be responsible to get the information it 
> needs by creating it itself or calling another function, which has the 
> purpose to create the information. Latter would be a bit too much, as 
> the length would be simply be:
>
> number_of_tables = len(interval_to_result_tables[interval])
>
> I hope I understood your goals correctly and could help a bit,
>
> Jan
>

One more time thank you Jan!
It 's true I coded it very fast and I forget about design...
With your answer it opens my mind about separate tasks
distinctively. My concern when doing this was compute the length
during creation of lists, dicts or after to optimize memory and time.

But it ended due to time constraint (demo to do) I coded it coarse like 
a brute ;o)

Regards
Karim

>
>
>
>>
>> Regards
>> Karim
>>
>>>
>>>>
>>>> Karim
>>>> _______________________________________________
>>>> Tutor maillist - Tutor at python.org
>>>> To unsubscribe or change subscription options:
>>>> http://mail.python.org/mailman/listinfo/tutor
>>>
>>> _______________________________________________
>>> Tutor maillist - Tutor at python.org
>>> To unsubscribe or change subscription options:
>>> http://mail.python.org/mailman/listinfo/tutor
>>
>> _______________________________________________
>> Tutor maillist - Tutor at python.org
>> To unsubscribe or change subscription options:
>> http://mail.python.org/mailman/listinfo/tutor
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor