functools documentation - help with funny words

Sun Nov 9 12:05:03 EST 2014

On Sun, Nov 9, 2014 at 2:06 AM, Veek M <vek.m1234 at gmail.com> wrote:
> https://docs.python.org/3.4/library/functools.html
>
> 1. "A key function is a callable that accepts one argument and returns
> another value indicating the position in the desired collation sequence."
>
> x = ['x','z','q']; sort(key=str.upper)

This call isn't correct. sort is a list method, not a builtin.  So the
above should be:

>>> x = ['x', 'z', 'q']
>>> x.sort(key=str.upper)

> My understanding is that, x, y, .. are passed to the key function which
> converts it to upper case and returns the value, which is then sorted.
> So what does he mean by 'position in the desired..."? Position is 0, 1, 2..
>
> upper is 'normalizing' the input and then sort gathers the values and sorts
> them.. so where is the 'key' function returning a position in the sorted-
> list.. or whatever.. what does a key function do?

When you pass a key function to the list.sort method, the key function
is called on each element of the list, and the return values are
compared to sort the list in place of the elements.  For example:

>>> sorted(['23', '4', '162'])
['162', '23', '4']

Because '162' < '23' < '4'.  But:

>>> sorted(['23', '4', '162'], key=int)
['4', '23', '162']

Because 4 < 23 < 162.

https://docs.python.org/3/glossary.html?highlight=%22key+function%22

> 2. lru_cache
>
> "Since a dictionary is used to cache results, the positional and keyword
> arguments to the function must be hashable."
>
> basically says that the args must be immutable and not subject to change
> because he's using the args as part of a key to the result?

https://docs.python.org/3/glossary.html?highlight=hashable

> "To help measure the effectiveness of the cache and tune the maxsize
> parameter, the wrapped function is instrumented with a cache_info() function
> that returns a named tuple showing hits, misses, maxsize and currsize"
>
> What does he mean by 'instrument'? Just a 'helper' function or is it nested
> in lru_cache or structured in some special way.

It means code added in some way to enable debugging or performance analysis.

http://en.wikipedia.org/wiki/Instrumentation_(computer_programming)

In this case it takes the form of a method added to the function object:

>>> @functools.lru_cache()
... def codepoint(s): return ord(s)
...
>>> list(map(codepoint, 'abracadabra'))
[97, 98, 114, 97, 99, 97, 100, 97, 98, 114, 97]
>>> codepoint.cache_info()
CacheInfo(hits=6, misses=5, maxsize=128, currsize=5)

> "An LRU (least recently used) cache works best when the most recent calls
> are the best predictors of upcoming calls (for example, the most popular
> articles on a news server tend to change each day)."
>
> What? So if article1 changes.. how does that predict anything..? If article1
> is most popular, you want it in the cache, but it'll become most popular
> only after some time.. or is he saying that popular articles must be kicked
> out off the cache at the end of 24hrs?

By "predicts" it means whether the fact that an item was recently
requested is positively correlated with the probability that it will
soon be requested again. Popular articles are an example of this; when
an article is requested, there is a good chance that it was requested
because it is featured or trending and will likely be requested again
soon.  On the other hand if the incoming requests tend to be random
and not correlated in time, then an LRU cache won't work as well. If
each request is negatively correlated with its probability of being
requested again soon (e.g. a graph traversal where some property is
being computed once for each node; once it's been computed for a
particular node, it won't be called with the same node again during
the same traversal), then an LRU cache may not perform well at all.