Back from FAST, this week I’m continuing my OSDI’20 series. I couldn’t update this week at my usual times as I have been crazy with meetings the whole week. Friday I only had 30 mins break the whole day while the rest is filled with meetings so I’m updating our weekly paper reading now on Saturady. I do understand why the morning paper discontinued and I actually do have the fear that this one won’t last long either… Anyways, today we will look at a less technical paper but I think would be valueable to researchers. Let’s look at the analysis of a large scale in-memory cache at Twitter.

How are in-memory caches deployed at Twitter?

  • One service one cache clsuter (100s clusters in total)
  • A cache cluster is composed by containers
  • In total they serve billions QPS and consume 100s TB DRAM
  • Method
    • Week-long traces from one instance of each cluster (open sourced)
      • 700 billion requests, 80 TB in size
    • Focus on 54 clsuters in this paper

How are cache clusters used?

  • Use cases
Use cases
  • Caching for storage (KVStore)
  • Caching for computation
  • Intermediate data no persistency
    • Workloads
  • 35% of clusters are write-heavy with more than 30% writes (challenges: scalability, tail latency)
  • Object sizes are small
    • 24% cluster mean object size < 100 bytes
    • Median: 230 bytes
    • Key size is large, Value/key size is small: 15% value size <= key size; 50% value size <= 5x key size (key compression)
    • Sizes are changing as time goes by, and it can be very sudden with no pattern
  • Metadata size is large: 56 bytes per object

How long before objects expire? Time-to-live (TTL)

  • Objects are set to expire because of inconsistency, result update (refresh), etc.
  • TTLs are usually short: < 20 mins, only 25% TTLs are > 2 days
  • Bounded working set size (3-6GB) -> No need for huge cache size
  • The paper mentions multiple TTL methods and all are inefficient

Other statistics

  • Small cache miss and variations
  • Request spikes are not always caused by hot keys
  • Most are zipfian with large alpha with small deviations
  • Cache policy is workload denpendent and FIFO performs similar to LRU

Comment

This is another no innovation but high value paper. I would definately read again carefully and come back with more notes as I found more interesting things to log. The trace may be useful for my own future papers as well.