Jeremy Hylton : weblog : 2003-12-04

Working Out Kinks in Cache Simulation

Thursday, December 04, 2003

I've spent a lot of time this week working out bugs in the cache simulations. Many of the results reported in earlier blog entries -- both on cache access performance and on 2Q simulation results were wrong. That seems to be par for the course: The cache simulations are sometimes complicated. Understanding simulation results means checking the results for consistency ten different ways. Eventually you understand what the numbers are telling you and how they relate to each other.

The new 2Q implementation is probably correct, but doesn't produce the dramatically good results reported earlier. It's still better than LRU by a non-trivial margin (5% difference in hit rate) for large caches.

The ARC algorithm has been very difficult to implementation for a cache with variable-sized objects. Much of the specific logic from their pseudocode is different when you are calculating with object sizes instead of pages. For example, if you add a new object, the object you choose to evict may be smaller, so you have to evict several objects.

The other interesting problem with the ARC cache is that it seems to adjust its p factor too slowly. In the paper, they say that 1 results in changes that are fast enough, but they only have tiny caches in the simulation. For a very large cache (like a 500MB ZEO cache), it seems to adapt far too slowly. The paper boasts that there are no tuning parameters in ARC, but I needed to tune the adjustment rate to get decent results.

We're still stuck trying to find a decent implementation of an LRU list when the objects in the list are stored on disk and not in memory. The Iyengar paper doesn't cover replacements at all, even though it seems to say that a cache is typically in an equilibrium where it must always do replacement.

For the record, here is apparently correct simulation results for LRU and 2Q on the zope.org cache trace.

LRUCacheSimulation, cache size 50,000,000 bytes
  START TIME  DURATION    LOADS     HITS INVALS WRITES HITRATE EVICTS
Nov 25 00:36   2:19:55    51340    15399     47    116   30.0%  28391
Nov 25 02:56   1:55:01    54952    14318     26    442   26.1%  31652
Nov 25 04:51   1:30:01    72530    23387     45     97   32.2%  41593
Nov 25 06:21   1:00:13    46689    15161     16    111   32.5%  25083
Nov 25 07:21   1:45:09    53628    17812     46    143   33.2%  28617
Nov 25 09:06   1:15:02    58238    19730     23    163   33.9%  30879
Nov 25 10:21   2:05:01   126466    36876    527   4751   29.2%  84047
Nov 25 12:26   1:15:03    51919    15501     22     68   29.9%  25989
Nov 25 13:41   2:45:03    69965    17362    107    296   24.8%  45034
Nov 25 16:26   4:19:34    37131    11960      3     38   32.2%  21452
Nov 25 20:46   6:05:06   110716    26365    140   2605   23.8%  78027
Nov 26 02:51   2:04:57    79765    24191    438   1157   30.3%  48735
Nov 26 04:56     40:01    53910    16997     63    187   31.5%  28491
Nov 26 05:36   1:15:11    51677    14870      8    456   28.8%  30886
Nov 26 06:51   1:00:08    51355    13124     16     92   25.6%  30407
Nov 26 07:51   1:04:56    51923    16151      5     68   31.1%  28086
Nov 26 08:56   1:05:13    66551    21848     12     90   32.8%  37771
Nov 26 10:01     50:04    48137    17088      6     80   35.5%  22723
Nov 26 10:51     28:00    40506    15784      3     70   39.0%  16731
Nov 25 00:36  34:43:38  1177398   353924   1553  11030   30.1% 684594 OVERALL

TwoQSimluation, cache size 50,000,000 bytes
  START TIME  DURATION    LOADS     HITS INVALS WRITES HITRATE EVICTS HOTHIT
Nov 25 00:36   2:19:55    51340    14223     71    116   27.7%  29929     90
Nov 25 02:56   1:55:01    54952    17129     28    442   31.2%  36735   5932
Nov 25 04:51   1:30:01    72530    29424     52     97   40.6%  43540  13274
Nov 25 06:21   1:00:13    46689    19995     17    111   42.8%  27212   7724
Nov 25 07:21   1:45:09    53628    19438     46    143   36.2%  34643   7832
Nov 25 09:06   1:15:02    58238    22877     15    163   39.3%  34734   9308
Nov 25 10:21   2:05:01   126466    45287    432   4751   35.8%  81699  17469
Nov 25 12:26   1:15:03    51919    18809     19     68   36.2%  28539   7551
Nov 25 13:41   2:45:03    69965    14727    107    296   21.0%  59106  11076
Nov 25 16:26   4:19:34    37131    15132      2     38   40.8%  23250   4213
Nov 25 20:46   6:05:06   110716    27964    121   2605   25.3%  82201  16686
Nov 26 02:51   2:04:57    79765    28798    292   1157   36.1%  50332  12103
Nov 26 04:56     40:01    53910    20701     60    187   38.4%  32999   7766
Nov 26 05:36   1:15:11    51677    19818      9    456   38.3%  31505   6998
Nov 26 06:51   1:00:08    51355    18963     11     92   36.9%  30628   8204
Nov 26 07:51   1:04:56    51923    21585      6     68   41.6%  32241   7465
Nov 26 08:56   1:05:13    66551    27667     13     90   41.6%  38881   9298
Nov 26 10:01     50:04    48137    20918      7     80   43.5%  26887   6758
Nov 26 10:51     28:00    40506    19589      4     70   48.4%  20435   5447
Nov 25 00:36  34:43:38  1177398   423044   1312  11030   35.9% 745496 165194 OVERALL

The ARC simulator is barely functional. There are a few corner cases where I don't understand what it's doing, so I doubt it's right. I'm also ignoring invalidations and writes, but it's producing some basic results that make some sense. It looks like it's favoring the LRU (L2) component over the FIFO component (L1) at the expense of performance. When I compare to 2Q, which is using more space for the FIFO component, I see that ARC has more LRU hits but many fewer FIFO hits.

It's difficult to say if these results are representative of ARC or are caused by bugs in my implementation. (Likely the latter given my recent track record and the hour.)

The first chart is with a "walk factor" of 500. If it needs to adjust p, it does so using the basic algorithm multiplied by the walk factor.

ARCCacheSimulation, cache size 50,000,000 bytes
  START TIME  DURATION    LOADS     HITS INVALS WRITES HITRATE LRUTHITS  EVICTS       P
Nov 25 00:36   2:19:55    51340    13001      0    116   25.3%     170   32170 2332000
Nov 25 02:56   1:55:01    54952    17790      0    442   32.4%    8280   33913 1145500
Nov 25 04:51   1:30:01    72530    18813      0     97   25.9%    8983   55903 5223500
Nov 25 06:21   1:00:13    46689    15301      0    111   32.8%    3566   31520 5353500
Nov 25 07:21   1:45:09    53628    15866      0    143   29.6%    4885   35772 6715500
Nov 25 09:06   1:15:02    58238    16603      0    163   28.5%    4693   42441 6187500
Nov 25 10:21   2:05:01   126466    38843      0   4751   30.7%   18441   90298  508500
Nov 25 12:26   1:15:03    51919    11498      0     68   22.1%    2424   38796 1910000
Nov 25 13:41   2:45:03    69965    10883      0    296   15.6%    8175   57130 7010000
Nov 25 16:26   4:19:34    37131     9392      0     38   25.3%    2146   23714 8427500
Nov 25 20:46   6:05:06   110716    32635      0   2605   29.5%   31363   83417 4433000
Nov 26 02:51   2:04:57    79765    22837      0   1157   28.6%    8656   58531 5609000
Nov 26 04:56     40:01    53910    14455      0    187   26.8%    3025   38794 6279000
Nov 26 05:36   1:15:11    51677    14862      0    456   28.8%    4006   36383 7866000
Nov 26 06:51   1:00:08    51355    13421      0     92   26.1%    2902   37051 10099500
Nov 26 07:51   1:04:56    51923    15462      0     68   29.8%    2922   38083 9251000
Nov 26 08:56   1:05:13    66551    21951      0     90   33.0%    4321   44048 9926000
Nov 26 10:01     50:04    48137    16766      0     80   34.8%    3382   31469 8320000
Nov 26 10:51     28:00    40506    15277      0     70   37.7%    3165   23202 7099500
Nov 25 00:36  34:43:38  1177398   335656      0  11030   28.5%  125505  832635 7099500 OVERALL

The couple of places where ARC was competitive with 2Q, it was were the LRU hit rate was very high. Everywhere else 2Q appeared to win by devoting more space to the LRU component. (This over simplifies, because 2Q is also more hesitant to promote to the LRU component.) So I tried changing the walk factor to 2000, since ARC is never using much of the cache for FIFO. It showed a small improvement, but not much.

ARCCacheSimulation, cache size 50,000,000 bytes
  START TIME  DURATION    LOADS     HITS INVALS WRITES HITRATE LRUTHITS  EVICTS       P
Nov 25 00:36   2:19:55    51340    13160      0    116   25.6%     163   32042 9064000
Nov 25 02:56   1:55:01    54952    16400      0    442   29.8%    5725   35860 10772000
Nov 25 04:51   1:30:01    72530    23346      0     97   32.2%    5752   50476 12750000
Nov 25 06:21   1:00:13    46689    15478      0    111   33.2%    2868   31651 10224000
Nov 25 07:21   1:45:09    53628    16411      0    143   30.6%    4215   35884 16850000
Nov 25 09:06   1:15:02    58238    19977      0    163   34.3%    4066   38113 16094000
Nov 25 10:21   2:05:01   126466    38994      0   4751   30.8%   16923   92417   14000
Nov 25 12:26   1:15:03    51919    12217      0     68   23.5%    3050   37720 5578000
Nov 25 13:41   2:45:03    69965    11217      0    296   16.0%    5612   58317 20374000
Nov 25 16:26   4:19:34    37131    13122      0     38   35.3%    1238   26363 20258000
Nov 25 20:46   6:05:06   110716    24968      0   2605   22.6%   22161   84526 11990000
Nov 26 02:51   2:04:57    79765    22640      0   1157   28.4%    7114   57781 9536000
Nov 26 04:56     40:01    53910    14579      0    187   27.0%    2378   38769 13806000
Nov 26 05:36   1:15:11    51677    16014      0    456   31.0%    3039   35352 18124000
Nov 26 06:51   1:00:08    51355    14413      0     92   28.1%    1852   35752 28536000
Nov 26 07:51   1:04:56    51923    16757      0     68   32.3%    1785   35976 26386000
Nov 26 08:56   1:05:13    66551    23303      0     90   35.0%    2963   40329 25508000
Nov 26 10:01     50:04    48137    17391      0     80   36.1%    1817   32920 20414000
Nov 26 10:51     28:00    40506    16211      0     70   40.0%    2987   24224 17756000
Nov 25 00:36  34:43:38  1177398   346598      0  11030   29.4%   95708  824472 17756000 OVERALL

Boosting the walk factor to 10,000 gets the overall hit rate up to 31.4%, a little better than LRU and still worse than 2Q.